bluesearch.widgets package

Submodules

Module contents

Various widgets related to the BBS.

class ArticleSaver(connection)[source]

Bases: object

Keeps track of selected articles.

This class can be used to save a number of articles and paragraphs for a later use. A typical use case is to keep track of the items selected in the search widget, and to retrieve them later in the mining widget.

Furthermore this class allows to print a summary of all selected items using the summary_table method, to resolve all items into paragraphs with the corresponding section name and to summarize them in a pandas data frame using the method get_chosen_texts, and to export a PDF report of all saved items using the method report.

Parameters

connection (sqlalchemy.engine.Engine) – An SQL database connectable compatible with pandas.read_sql. The database is supposed to have paragraphs and articles tables.

connection

An SQL database connectable compatible with pandas.read_sql. The database is supposed to have paragraphs and articles tables.

Type

sqlalchemy.engine.Engine

state

The state that keeps track of saved items. It is a set of tuples of the form (article_id, paragraph_id) each representing one saved item. The items with paragraph_id = -1 indicate that the whole article should be saved.

Type

set

state_hash

A hash uniquely identifying a certain state. This is used to cache df_chosen_texts and avoid recomputing it if the state has not changed.

Type

int or None

df_chosen_texts

The rows represent different paragraphs and the columns are ‘article_id’, ‘section_name’, ‘paragraph_id’, ‘text’.

Type

pd.DataFrame

add_article(article_id)[source]

Save an article.

Parameters

article_id (int) – The article ID.

add_paragraph(article_id, paragraph_pos_in_article)[source]

Save a paragraph.

Parameters
  • article_id (int) – The article ID.

  • paragraph_pos_in_article (int) – The paragraph ID.

get_chosen_texts()[source]

Retrieve the currently saved items.

For all entire articles that are saved the corresponding paragraphs are resolved first.

Returns

df_chosen_texts

Return type

pandas.DataFrame

get_saved_items()[source]

Retrieve the saved items that summarize the choice of the users.

Returns

identifiers – Tuple (article_id, paragraph_pos_in_article) chosen by the user.

Return type

list of tuple

has_article(article_id)[source]

Check if an article has been saved.

Parameters

article_id (int) – The article ID.

Returns

result – Whether or not the given article has been saved.

Return type

bool

has_paragraph(article_id, paragraph_pos_in_article)[source]

Check if a paragraph has been saved.

Parameters
  • article_id (int) – The article ID.

  • paragraph_pos_in_article (int) – The paragraph ID.

Returns

result – Whether or not the given paragraph has been saved.

Return type

bool

make_report(output_dir=None)[source]

Create the saved articles report.

Parameters

output_dir (str or pathlib.Path) – The directory for writing the report.

Returns

output_file_path – The file to which the report was written.

Return type

pathlib.Path

remove_all()[source]

Remove all saved items.

remove_article(article_id)[source]

Remove an article from saved.

Parameters

article_id (int) – The article ID.

remove_paragraph(article_id, paragraph_pos_in_article)[source]

Remove a paragraph from saved.

Parameters
  • article_id (int) – The article ID.

  • paragraph_pos_in_article (int) – The paragraph ID.

summary_table()[source]

Create a dataframe table with saved articles.

Returns

table – DataFrame containing all the paragraphs seen and choice made for it.

Return type

pd.DataFrame

class MiningSchema[source]

Bases: object

The mining schema for the mining widget.

add_entity(entity_type, property_name=None, property_type=None, property_value_type=None, ontology_source=None)[source]

Add a new entity to the schema.

A warning is issued for duplicate entities.

Parameters
  • entity_type (str) – The entity type, for example “CHEMICAL”.

  • property_name (str, optional) – The property name, for example “isChiral”.

  • property_type (str, optional) – The property type, for example “ATTRIBUTE”.

  • property_value_type (str, optional) – The property value type, for example “BOOLEAN”.

  • ontology_source (str, optional) – The ontology source, for example “NCIT”.

add_from_df(entity_df)[source]

Add entities from a given dataframe.

The data frame has to contain a column named “entity_type”. Any columns matching the schema columns will be processed, all other columns will be ignored.

Parameters

entity_df (pd.DataFrame) – The dataframe with new entities.

property df

Get a dataframe with all entities.

Returns

schema_df – The dataframe with all entities.

Return type

pd.DataFrame

class MiningWidget(**kwargs)[source]

Bases: ipywidgets.widgets.widget_box.VBox

The mining widget.

Parameters
  • mining_server_url (str) – The URL of the mining server.

  • mining_schema (bluesearch.widgets.MiningSchema) – The requested mining schema (entity, relation, attribute types).

  • article_saver (bluesearch.widgets.ArticleSaver) – An instance of the article saver.

  • default_text (string, optional) – The default text assign to the text area.

  • use_cache (bool) – If True the mining server will use cached mining results stored in an SQL database. Should lead to major speedups.

  • checkpoint_path (str or pathlib.Path, optional) – Path where checkpoints are saved to and loaded from. If None, defaults to ~/.cache/bluesearch/widgets_checkpoints folder.

get_extracted_table()[source]

Retrieve the table with the mining results.

Returns

results_table – The table with the mining results.

Return type

pandas.DataFrame

textmining_pipeline(information, schema_df, debug=False)[source]

Handle text mining server requests depending on the type of information.

Parameters
  • information (str or list.) – Information can be either a raw string text, either a list of tuples (article_id, paragraph_id) related to the database.

  • schema_df (pd.DataFrame) – A dataframe with the requested mining schema (entity, relation, attribute types).

  • debug (bool) – If True, columns are not necessarily matching the specification. However, they contain debugging information. If False, then matching exactly the specification.

Returns

table_extractions – The final table. If debug=True then it contains all the metadata. If False then it only contains columns in the official specification.

Return type

pd.DataFrame

class SearchWidget(**kwargs)[source]

Bases: ipywidgets.widgets.widget_box.VBox

Widget for search engine.

Parameters
  • bbs_search_url (str) – The URL of the bbs_search server.

  • bbs_mysql_engine (sqlalchemy.engine.Engine) – Engine for connections to the bbs_mysql server.

  • article_saver (bluesearch.widgets.ArticleSaver, optional) – If specified, this article saver will keep all the article_id of interest for the user during the different queries.

  • results_per_page (int, optional) – The number of results to display per results page.

  • checkpoint_path (str or pathlib.Path, optional) – Path where checkpoints are saved to and loaded from. If None, defaults to ~/.cache/bluesearch/widgets_checkpoints.

static highlight_in_paragraph(paragraph, sentence)[source]

Highlight a given sentence in the paragraph.

Parameters
  • paragraph (str) – The paragraph in which to highlight the sentence.

  • sentence (str) – The sentence to highlight.

Returns

formatted_paragraph – The paragraph containing sentence with the sentence highlighted in color

Return type

str

print_single_result(result_info, print_whole_paragraph)[source]

Retrieve metadata and complete the report with HTML string given sentence_id.

Parameters
  • result_info (dict) – The information for a single result obtained by calling _fetch_result_info.

  • print_whole_paragraph (bool) – If true, the whole paragraph will be displayed in the results of the widget.

Returns

  • article_metadata (str) – Formatted string containing the metadata of the article.

  • formatted_output (str) – Formatted output of the sentence.

saved_results()[source]

Get all search results that were flagged for saving.

Returns

saved_items_df – A data frame with all saved search results.

Return type

pd.DataFrame

set_page(new_page, force=False)[source]

Go to a given page in the results view.

Parameters
  • new_page (int) – The new page number to go to.

  • force (bool) – By default, if new_page is the same one as the one currently viewed, the the page is not reloaded. To reload the page set this parameter to True. This is ueful when new results have been fetched and so the view needs to be updated.