Token Text Chunker¶

class TokenTextChunker¶

This is the simplest splitting method. Designed to split input text into smaller chunks by looking at word tokens.

Parameters:

chunk_size (int, optional) – Size of each chunk. Default is 512.
chunk_overlap (int, optional) – Amount of overlap between chunks. Default is 256.
separator (str, optional) – Separators used for splitting into words. Default is \n\n.

Example

from pineflow.core.text_chunkers import TokenTextChunker

text_chunker = TokenTextChunker()

from_documents(documents)¶

Split documents into chunks.

Parameters:: documents (List[Document]) – List of Document objects to split.
Returns:: List of chunked documents objects.
Return type:: List[Document]

from_text(text)¶

Split text into chunks.

Example

chunks = text_chunker.from_text(
    "Pineflow is a data framework to load any data in one line of code and connect with AI applications."
)