bluesearch.widgets package¶

Submodules¶

Module contents¶

Various widgets related to the BBS.

class ArticleSaver(connection)[source]¶

Bases: object

Keeps track of selected articles.

This class can be used to save a number of articles and paragraphs for a later use. A typical use case is to keep track of the items selected in the search widget, and to retrieve them later in the mining widget.

Furthermore this class allows to print a summary of all selected items using the summary_table method, to resolve all items into paragraphs with the corresponding section name and to summarize them in a pandas data frame using the method get_chosen_texts, and to export a PDF report of all saved items using the method report.

Parameters: connection (sqlalchemy.engine.Engine) – An SQL database connectable compatible with pandas.read_sql. The database is supposed to have paragraphs and articles tables.

connection¶

An SQL database connectable compatible with pandas.read_sql. The database is supposed to have paragraphs and articles tables.

Type: sqlalchemy.engine.Engine

state¶

The state that keeps track of saved items. It is a set of tuples of the form (article_id, paragraph_id) each representing one saved item. The items with paragraph_id = -1 indicate that the whole article should be saved.

Type: set

state_hash¶

A hash uniquely identifying a certain state. This is used to cache df_chosen_texts and avoid recomputing it if the state has not changed.

Type: int or None

df_chosen_texts¶

The rows represent different paragraphs and the columns are ‘article_id’, ‘section_name’, ‘paragraph_id’, ‘text’.

Type: pd.DataFrame

add_article(article_id)[source]¶

Save an article.

Parameters: article_id (int) – The article ID.

add_paragraph(article_id, paragraph_pos_in_article)[source]¶

Save a paragraph.

Parameters

article_id (int) – The article ID.
paragraph_pos_in_article (int) – The paragraph ID.

get_chosen_texts()[source]¶

Retrieve the currently saved items.

For all entire articles that are saved the corresponding paragraphs are resolved first.

Returns: df_chosen_texts
Return type: pandas.DataFrame

get_saved_items()[source]¶

Retrieve the saved items that summarize the choice of the users.

Returns: identifiers – Tuple (article_id, paragraph_pos_in_article) chosen by the user.
Return type: list of tuple

has_article(article_id)[source]¶

Check if an article has been saved.

Parameters: article_id (int) – The article ID.
Returns: result – Whether or not the given article has been saved.
Return type: bool

has_paragraph(article_id, paragraph_pos_in_article)[source]¶

Check if a paragraph has been saved.

Parameters

article_id (int) – The article ID.
paragraph_pos_in_article (int) – The paragraph ID.

Returns

result – Whether or not the given paragraph has been saved.

Return type

bool

make_report(output_dir=None)[source]¶

Create the saved articles report.

Parameters: output_dir (str or pathlib.Path) – The directory for writing the report.
Returns: output_file_path – The file to which the report was written.
Return type: pathlib.Path

remove_all()[source]¶: Remove all saved items.

remove_article(article_id)[source]¶

Remove an article from saved.

Parameters: article_id (int) – The article ID.

remove_paragraph(article_id, paragraph_pos_in_article)[source]¶

Remove a paragraph from saved.

Parameters

article_id (int) – The article ID.
paragraph_pos_in_article (int) – The paragraph ID.

summary_table()[source]¶

Create a dataframe table with saved articles.

Returns: table – DataFrame containing all the paragraphs seen and choice made for it.
Return type: pd.DataFrame

class MiningSchema[source]¶

Bases: object

The mining schema for the mining widget.

add_entity(entity_type, property_name=None, property_type=None, property_value_type=None, ontology_source=None)[source]¶

Add a new entity to the schema.

A warning is issued for duplicate entities.

Parameters

entity_type (str) – The entity type, for example “CHEMICAL”.
property_name (str, optional) – The property name, for example “isChiral”.
property_type (str, optional) – The property type, for example “ATTRIBUTE”.
property_value_type (str, optional) – The property value type, for example “BOOLEAN”.
ontology_source (str, optional) – The ontology source, for example “NCIT”.

add_from_df(entity_df)[source]¶

Add entities from a given dataframe.

The data frame has to contain a column named “entity_type”. Any columns matching the schema columns will be processed, all other columns will be ignored.

Parameters: entity_df (pd.DataFrame) – The dataframe with new entities.

property df¶

Get a dataframe with all entities.

Returns: schema_df – The dataframe with all entities.
Return type: pd.DataFrame

class MiningWidget(**kwargs)[source]¶

Bases: ipywidgets.widgets.widget_box.VBox

The mining widget.

Parameters

mining_server_url (str) – The URL of the mining server.
mining_schema (bluesearch.widgets.MiningSchema) – The requested mining schema (entity, relation, attribute types).
article_saver (bluesearch.widgets.ArticleSaver) – An instance of the article saver.
default_text (string, optional) – The default text assign to the text area.
use_cache (bool) – If True the mining server will use cached mining results stored in an SQL database. Should lead to major speedups.
checkpoint_path (str or pathlib.Path, optional) – Path where checkpoints are saved to and loaded from. If None, defaults to ~/.cache/bluesearch/widgets_checkpoints folder.

get_extracted_table()[source]¶

Retrieve the table with the mining results.

Returns: results_table – The table with the mining results.
Return type: pandas.DataFrame

textmining_pipeline(information, schema_df, debug=False)[source]¶

Handle text mining server requests depending on the type of information.

Parameters

information (str or list.) – Information can be either a raw string text, either a list of tuples (article_id, paragraph_id) related to the database.
schema_df (pd.DataFrame) – A dataframe with the requested mining schema (entity, relation, attribute types).
debug (bool) – If True, columns are not necessarily matching the specification. However, they contain debugging information. If False, then matching exactly the specification.

Returns

table_extractions – The final table. If debug=True then it contains all the metadata. If False then it only contains columns in the official specification.

Return type

pd.DataFrame

class SearchWidget(**kwargs)[source]¶

Bases: ipywidgets.widgets.widget_box.VBox

Widget for search engine.

Parameters

bbs_search_url (str) – The URL of the bbs_search server.
bbs_mysql_engine (sqlalchemy.engine.Engine) – Engine for connections to the bbs_mysql server.
article_saver (bluesearch.widgets.ArticleSaver, optional) – If specified, this article saver will keep all the article_id of interest for the user during the different queries.
results_per_page (int, optional) – The number of results to display per results page.
checkpoint_path (str or pathlib.Path, optional) – Path where checkpoints are saved to and loaded from. If None, defaults to ~/.cache/bluesearch/widgets_checkpoints.

static highlight_in_paragraph(paragraph, sentence)[source]¶

Highlight a given sentence in the paragraph.

Parameters

paragraph (str) – The paragraph in which to highlight the sentence.
sentence (str) – The sentence to highlight.

Returns

formatted_paragraph – The paragraph containing sentence with the sentence highlighted in color

Return type

str

print_single_result(result_info, print_whole_paragraph)[source]¶

Retrieve metadata and complete the report with HTML string given sentence_id.

Parameters

result_info (dict) – The information for a single result obtained by calling _fetch_result_info.
print_whole_paragraph (bool) – If true, the whole paragraph will be displayed in the results of the widget.

Returns

article_metadata (str) – Formatted string containing the metadata of the article.
formatted_output (str) – Formatted output of the sentence.

saved_results()[source]¶

Get all search results that were flagged for saving.

Returns: saved_items_df – A data frame with all saved search results.
Return type: pd.DataFrame

set_page(new_page, force=False)[source]¶

Go to a given page in the results view.

Parameters

new_page (int) – The new page number to go to.
force (bool) – By default, if new_page is the same one as the one currently viewed, the the page is not reloaded. To reload the page set this parameter to True. This is ueful when new results have been fetched and so the view needs to be updated.