Docling

pip install pineflow-readers-docling
class DoclingReader

A document reader that uses the docling library to extract and structure content from various file types including PDF, DOCX, and HTML.

For more information, see Docling.

Parameters:
  • detached_tables (bool) – If True, separates extracted tables from the main document text and treats them as individual documents. Default is False.

  • export_table_format (str) – Format used when exporting tables. Applicable only if detached_tables is True. Choose between “markdown” or “html”. Defaults to “markdown”.

load_data(input_file)

Loads data from the given input file.

Parameters:

input_file (str) – File path to load.

Returns:

A list of Document objects loaded from the file.

Return type:

List[Document]