Semantic Chunker¶
- class SemanticChunker¶
Python class designed to split text into chunks using semantic understanding.
Credit to Greg Kamradt’s notebook: 5 Levels Of Text Splitting.
- Parameters:
embed_model (BaseEmbedding) – Embedding model used for semantic chunking.
buffer_size (int, optional) – Size of the buffer for semantic chunking. Default is 1.
breakpoint_threshold_amount (int, optional) – Threshold percentage for detecting breakpoints. Default is 95.
device (str, optional) – Device to use for processing. Currently supports “cpu” and “cuda”. Default is cpu.
Example
from pineflow.core.text_chunkers import SemanticChunker from pineflow.embeddings.huggingface import HuggingFaceEmbedding embedding = HuggingFaceEmbedding() text_chunker = SemanticChunker(embed_model=embedding)
- from_documents(documents)¶
Split documents into chunks.
- from_text(text)¶
Split text into chunks.
- Parameters:
text (str) – Input text to split.
- Returns:
List of text chunks.
- Return type:
List[str]