Token Text Chunker

class TokenTextChunker

This is the simplest splitting method. Designed to split input text into smaller chunks by looking at word tokens.

Parameters:
  • chunk_size (int, optional) – Size of each chunk. Default is 512.

  • chunk_overlap (int, optional) – Amount of overlap between chunks. Default is 256.

  • separator (str, optional) – Separators used for splitting into words. Default is \n\n.

Example

from pineflow.core.text_chunkers import TokenTextChunker

text_chunker = TokenTextChunker()
from_documents(documents)

Split documents into chunks.

Parameters:

documents (List[Document]) – List of Document objects to split.

Returns:

List of chunked documents objects.

Return type:

List[Document]

from_text(text)

Split text into chunks.

Parameters:

text (str) – Input text to split.

Returns:

List of text chunks.

Return type:

List[str]

Example

chunks = text_chunker.from_text(
    "Pineflow is a data framework to load any data in one line of code and connect with AI applications."
)