Sentence Chunker

class SentenceChunker

Designed to split input text into smaller chunks, particularly useful for processing large documents or texts. Tries to keep sentences and paragraphs together.

Parameters:
  • chunk_size (int, optional) – Size of each chunk. Default is 512.

  • chunk_overlap (int, optional) – Amount of overlap between chunks. Default is 256.

  • separator (str, optional) – Separator used for splitting text. Default is “ “.

Example

from pineflow.core.text_chunkers import SentenceChunker

text_chunker = SentenceChunker()
from_documents(documents)

Split documents into chunks.

Parameters:

documents (List[Document]) – List of Document objects to split.

Returns:

List of chunked documents objects.

Return type:

List[Document]

from_text(text)

Split text into chunks.

Parameters:

text (str) – Input text to split.

Returns:

List of text chunks.

Return type:

List[str]

Example

chunks = text_chunker.from_text(
    "Pineflow is a data framework to load any data in one line of code and connect with AI applications."
)