Sentence Chunker¶

class SentenceChunker¶

Designed to split input text into smaller chunks, particularly useful for processing large documents or texts. Tries to keep sentences and paragraphs together.

Parameters:

chunk_size (int, optional) – Size of each chunk. Default is 512.
chunk_overlap (int, optional) – Amount of overlap between chunks. Default is 256.
separator (str, optional) – Separator used for splitting text. Default is “ “.

Example

from pineflow.core.text_chunkers import SentenceChunker

text_chunker = SentenceChunker()

from_documents(documents)¶

Split documents into chunks.

Parameters:: documents (List[Document]) – List of Document objects to split.
Returns:: List of chunked documents objects.
Return type:: List[Document]

from_text(text)¶

Split text into chunks.

Parameters:: text (str) – Input text to split.
Returns:: List of text chunks.
Return type:: List[str]

Example

chunks = text_chunker.from_text(
    "Pineflow is a data framework to load any data in one line of code and connect with AI applications."
)