bluesearch.entrypoint.database.topic_filter module

Filter articles with relevant topics.

filter_topics(topic_infos: list[TopicInfo], topic_rules_accept: list[TopicRule], topic_rules_reject: list[TopicRule]) pd.DataFrame[source]

Filter topics.

Parameters
  • topic_infos – List of TopicInfo.

  • topic_rules_accept – List of acceptance TopicRule.

  • topic_rules_reject – List of rejection TopicRule.

Returns

DataFrame containing all the topic info and if it is accepted or not.

Return type

pd.DataFrame

init_parser(parser: argparse.ArgumentParser) argparse.ArgumentParser[source]

Initialise the argument parser for the topic-filter subcommand.

Parameters

parser – The argument parser to initialise.

Returns

The initialised argument parser. The same object as the parser argument.

Return type

argparse.ArgumentParser

parse_filter_config(config: list[dict]) tuple[list[TopicRule], list[TopicRule]][source]

Parse filter configuration.

Parameters

config – Topic Rules configuration

Returns

  • topic_rules_accept (list[TopicRule]) – List of acceptance TopicRule

  • topic_rules_reject (list[TopicRule]) – List of rejection TopicRule

Raises

ValueError – If one of the label value is different from accept and reject.

run(extracted_topics: pathlib.Path, filter_config: pathlib.Path, output_file: pathlib.Path) int[source]

Filter articles containing relevant topics.

Parameter description and potential defaults are documented inside of the init_parser function.