Quick start

Either read the Installation instructions or run below commands for Python3.7 under Linux:

mkdir libpdf_test
cd libpdf_test
python3.7 -m venv .venv
source .venv/bin/activate
pip install libpdf[tqdm,colorama]

Then use the command line interface to extract a PDF into a YAML file:

libpdf -o output.yaml -f yaml <path to your PDF>

Use libpdf --help/-h to show the help page with all available options.

For the API usage, create a new file test.py with the content:

import logging

import libpdf

# constrain log levels of pdfminer and PIL to avoid log spam
logging.getLogger('pdfminer').level = logging.WARNING
logging.getLogger('PIL').level = logging.WARNING

# load a PDF with log level set to INFO
objects = libpdf.load('<path to your PDF>', verbose=2)

Run test.py with a debugger of your choice and inspect the objects variable.