bluesearch.entrypoint.database.convert_pdf module¶
Implementation of the convert-pdf subcommand.
- init_parser(parser: argparse.ArgumentParser) argparse.ArgumentParser[source]¶
Initialise the argument parser for the convert-pdf subcommand.
- Parameters
parser – The argument parser to initialise.
- Returns
The initialised argument parser. The same object as the parser argument.
- Return type
argparse.ArgumentParser
- run(grobid_host: str, grobid_port: int, input_path: pathlib.Path, output_dir: pathlib.Path | None, num_workers, *, force: bool) int[source]¶
Run the convert-pdf subcommand.
Note that the names and types of the parameters should match the parser arguments added in
init_parser. The purpose of the matching is to be able to combine the functions in this way:>>> import argparse >>> from bluesearch.entrypoint.database import convert_pdf >>> parser = convert_pdf.init_parser(argparse.ArgumentParser()) >>> # replace with true values and uncomment >>> argv = ["host", "port", "pdf_path", "xml_path"] >>> # args = parser.parse_args(argv) >>> # convert_pdf.run(**vars(args))
This will run the convert-pdf subcommand implemented here as a standalone application.
- Parameters
grobid_host – The host of the GROBID service.
grobid_port – The port of the GROBID service.
input_path – The path to the input PDF file or a directory with PDF files.
output_dir – The output directory for the XML files.
num_workers – The number of parallel workers.
force – If true overwrite the output file if it already exists.
- Returns
The exit code of the command
- Return type
int