bluesearch.database.topic module¶
Utils for journal/articles topics.
- extract_article_topics_for_pubmed_article(xml_article: Element) list[str] | None[source]¶
Extract article topics of a PubMed article.
- Parameters
xml_article – XML parse of an article for which to extract journal and article topics.
- Returns
article_topics – Article topics extracted for the given article.
- Return type
list[str] | None
- extract_article_topics_from_medrxiv_article(path: pathlib.Path | str) tuple[str, str][source]¶
Extract topic of a medRxiv/bioRxiv article.
The .meca file should always have a fixed structure. Namely, there is a folder content and inside of it there should be a single .xml file containing the text and the metadata of the article.
- Parameters
path – Path to a .meca file (which is nothing else than a zip archive) with a fixed structured.
- Returns
topic (str) – The subject area of the article.
journal (str) – The journal the article was published in. Should be either “medRxiv” or “bioRxiv”.
- Raises
ValueError – Appropriate XML not found or the journal or topic are missing.
- extract_journal_topics_for_pubmed_article(xml_article: Element) list[str] | None[source]¶
Extract journal topics of a PubMed article.
- Parameters
xml_article – XML parse of an article for which to extract journal and article topics.
- Returns
journal_topics – Journal topics extracted for the given article.
- Return type
list[str] | None
- extract_pubmed_id_from_pmc_file(path: str | pathlib.Path) str | None[source]¶
Retrieve Pubmed ID from PMC XML file.
- Parameters
path – Path to PMC XML.
- Returns
pubmed_id – Pubmed ID of the given article
- Return type
str
- get_topics_for_arxiv_articles(arxiv_paths: Iterable[pathlib.Path | str], batch_size: int = 400) dict[pathlib.Path, list[str]][source]¶
Extract journal topics of one or more arXiv article.
- Parameters
arxiv_paths – Full paths to the arXiv articles to consider.
batch_size – Metadata are retrieved using the arXiv API [1] in batches of size batch_size. Large batches values may create long request URLs that cause the arXiv API to fail.
- Returns
article_topics – Maps each of the paths to a list of corresponding arXiv article topics. See [2] for an explanation of arXiv topics taxonomy.
- Return type
dict[pathlib.Path , list[str]]
- Raises
ValueError – If the arXiv API does not return the correct number of metadata.
References
[1] https://arxiv.org/help/api/user-manual [2] https://arxiv.org/category_taxonomy
- get_topics_for_pmc_article(pmc_path: pathlib.Path | str) list[str] | None[source]¶
Extract journal topics of a PMC article.
- Parameters
pmc_path – Path to the PMC article to consider
- Returns
journal_topics – Journal topics for the given article.
- Return type
list[str] | None
- request_mesh_from_nlm_ta(nlm_ta: str) list[dict] | None[source]¶
Retrieve Medical Subject Heading from Journal’s NLM Title Abbreviation.
- Parameters
nlm_ta – NLM Title Abbreviation of Journal.
- Returns
List containing all meshs of the Journal.
- Return type
meshs
References
https://www.ncbi.nlm.nih.gov/books/NBK3799/#catalog.Title_Abbreviation_ta
- request_mesh_from_pubmed_id(pubmed_ids: Iterable[str]) dict[source]¶
Retrieve Medical Subject Headings from Pubmed ID.
- Parameters
pubmed_ids – List of Pubmed IDs.
- Returns
pubmed_to_meshs – Dictionary containing Pubmed IDs as keys with corresponding Medical Subject Headings list as values.
- Return type
dict
References
https://dataguide.nlm.nih.gov/eutilities/utilities.html#efetch