Token Text Chunker¶
- class TokenTextChunker¶
This is the simplest splitting method. Designed to split input text into smaller chunks by looking at word tokens.
- Parameters:
chunk_size (int, optional) – Size of each chunk. Default is 512.
chunk_overlap (int, optional) – Amount of overlap between chunks. Default is 256.
separator (str, optional) – Separators used for splitting into words. Default is \n\n.
Example
from pineflow.core.text_chunkers import TokenTextChunker text_chunker = TokenTextChunker()
- from_documents(documents)¶
Split documents into chunks.
- from_text(text)¶
Split text into chunks.
- Parameters:
text (str) – Input text to split.
- Returns:
List of text chunks.
- Return type:
List[str]
Example
chunks = text_chunker.from_text( "Pineflow is a data framework to load any data in one line of code and connect with AI applications." )