bluesearch.entrypoint.database.convert_pdf module

Implementation of the convert-pdf subcommand.

init_parser(parser: argparse.ArgumentParser) argparse.ArgumentParser[source]

Initialise the argument parser for the convert-pdf subcommand.

Parameters

parser – The argument parser to initialise.

Returns

The initialised argument parser. The same object as the parser argument.

Return type

argparse.ArgumentParser

run(grobid_host: str, grobid_port: int, input_path: pathlib.Path, output_dir: pathlib.Path | None, num_workers, *, force: bool) int[source]

Run the convert-pdf subcommand.

Note that the names and types of the parameters should match the parser arguments added in init_parser. The purpose of the matching is to be able to combine the functions in this way:

>>> import argparse
>>> from bluesearch.entrypoint.database import convert_pdf
>>> parser = convert_pdf.init_parser(argparse.ArgumentParser())
>>> # replace with true values and uncomment
>>> argv = ["host", "port", "pdf_path", "xml_path"]
>>> # args = parser.parse_args(argv)
>>> # convert_pdf.run(**vars(args))

This will run the convert-pdf subcommand implemented here as a standalone application.

Parameters
  • grobid_host – The host of the GROBID service.

  • grobid_port – The port of the GROBID service.

  • input_path – The path to the input PDF file or a directory with PDF files.

  • output_dir – The output directory for the XML files.

  • num_workers – The number of parallel workers.

  • force – If true overwrite the output file if it already exists.

Returns

The exit code of the command

Return type

int