bluesearch.widgets package¶
Submodules¶
Module contents¶
Various widgets related to the BBS.
- class ArticleSaver(connection)[source]¶
Bases:
object
Keeps track of selected articles.
This class can be used to save a number of articles and paragraphs for a later use. A typical use case is to keep track of the items selected in the search widget, and to retrieve them later in the mining widget.
Furthermore this class allows to print a summary of all selected items using the summary_table method, to resolve all items into paragraphs with the corresponding section name and to summarize them in a pandas data frame using the method get_chosen_texts, and to export a PDF report of all saved items using the method report.
- Parameters
connection (sqlalchemy.engine.Engine) – An SQL database connectable compatible with pandas.read_sql. The database is supposed to have paragraphs and articles tables.
- connection¶
An SQL database connectable compatible with pandas.read_sql. The database is supposed to have paragraphs and articles tables.
- Type
sqlalchemy.engine.Engine
- state¶
The state that keeps track of saved items. It is a set of tuples of the form (article_id, paragraph_id) each representing one saved item. The items with paragraph_id = -1 indicate that the whole article should be saved.
- Type
set
- state_hash¶
A hash uniquely identifying a certain state. This is used to cache df_chosen_texts and avoid recomputing it if the state has not changed.
- Type
int or None
- df_chosen_texts¶
The rows represent different paragraphs and the columns are ‘article_id’, ‘section_name’, ‘paragraph_id’, ‘text’.
- Type
pd.DataFrame
- add_paragraph(article_id, paragraph_pos_in_article)[source]¶
Save a paragraph.
- Parameters
article_id (int) – The article ID.
paragraph_pos_in_article (int) – The paragraph ID.
- get_chosen_texts()[source]¶
Retrieve the currently saved items.
For all entire articles that are saved the corresponding paragraphs are resolved first.
- Returns
df_chosen_texts
- Return type
pandas.DataFrame
- get_saved_items()[source]¶
Retrieve the saved items that summarize the choice of the users.
- Returns
identifiers – Tuple (article_id, paragraph_pos_in_article) chosen by the user.
- Return type
list of tuple
- has_article(article_id)[source]¶
Check if an article has been saved.
- Parameters
article_id (int) – The article ID.
- Returns
result – Whether or not the given article has been saved.
- Return type
bool
- has_paragraph(article_id, paragraph_pos_in_article)[source]¶
Check if a paragraph has been saved.
- Parameters
article_id (int) – The article ID.
paragraph_pos_in_article (int) – The paragraph ID.
- Returns
result – Whether or not the given paragraph has been saved.
- Return type
bool
- make_report(output_dir=None)[source]¶
Create the saved articles report.
- Parameters
output_dir (str or pathlib.Path) – The directory for writing the report.
- Returns
output_file_path – The file to which the report was written.
- Return type
pathlib.Path
- remove_article(article_id)[source]¶
Remove an article from saved.
- Parameters
article_id (int) – The article ID.
- class MiningSchema[source]¶
Bases:
object
The mining schema for the mining widget.
- add_entity(entity_type, property_name=None, property_type=None, property_value_type=None, ontology_source=None)[source]¶
Add a new entity to the schema.
A warning is issued for duplicate entities.
- Parameters
entity_type (str) – The entity type, for example “CHEMICAL”.
property_name (str, optional) – The property name, for example “isChiral”.
property_type (str, optional) – The property type, for example “ATTRIBUTE”.
property_value_type (str, optional) – The property value type, for example “BOOLEAN”.
ontology_source (str, optional) – The ontology source, for example “NCIT”.
- add_from_df(entity_df)[source]¶
Add entities from a given dataframe.
The data frame has to contain a column named “entity_type”. Any columns matching the schema columns will be processed, all other columns will be ignored.
- Parameters
entity_df (pd.DataFrame) – The dataframe with new entities.
- property df¶
Get a dataframe with all entities.
- Returns
schema_df – The dataframe with all entities.
- Return type
pd.DataFrame
- class MiningWidget(**kwargs)[source]¶
Bases:
ipywidgets.widgets.widget_box.VBox
The mining widget.
- Parameters
mining_server_url (str) – The URL of the mining server.
mining_schema (bluesearch.widgets.MiningSchema) – The requested mining schema (entity, relation, attribute types).
article_saver (bluesearch.widgets.ArticleSaver) – An instance of the article saver.
default_text (string, optional) – The default text assign to the text area.
use_cache (bool) – If True the mining server will use cached mining results stored in an SQL database. Should lead to major speedups.
checkpoint_path (str or pathlib.Path, optional) – Path where checkpoints are saved to and loaded from. If None, defaults to ~/.cache/bluesearch/widgets_checkpoints folder.
- get_extracted_table()[source]¶
Retrieve the table with the mining results.
- Returns
results_table – The table with the mining results.
- Return type
pandas.DataFrame
- textmining_pipeline(information, schema_df, debug=False)[source]¶
Handle text mining server requests depending on the type of information.
- Parameters
information (str or list.) – Information can be either a raw string text, either a list of tuples (article_id, paragraph_id) related to the database.
schema_df (pd.DataFrame) – A dataframe with the requested mining schema (entity, relation, attribute types).
debug (bool) – If True, columns are not necessarily matching the specification. However, they contain debugging information. If False, then matching exactly the specification.
- Returns
table_extractions – The final table. If debug=True then it contains all the metadata. If False then it only contains columns in the official specification.
- Return type
pd.DataFrame
- class SearchWidget(**kwargs)[source]¶
Bases:
ipywidgets.widgets.widget_box.VBox
Widget for search engine.
- Parameters
bbs_search_url (str) – The URL of the bbs_search server.
bbs_mysql_engine (sqlalchemy.engine.Engine) – Engine for connections to the bbs_mysql server.
article_saver (bluesearch.widgets.ArticleSaver, optional) – If specified, this article saver will keep all the article_id of interest for the user during the different queries.
results_per_page (int, optional) – The number of results to display per results page.
checkpoint_path (str or pathlib.Path, optional) – Path where checkpoints are saved to and loaded from. If None, defaults to ~/.cache/bluesearch/widgets_checkpoints.
- static highlight_in_paragraph(paragraph, sentence)[source]¶
Highlight a given sentence in the paragraph.
- Parameters
paragraph (str) – The paragraph in which to highlight the sentence.
sentence (str) – The sentence to highlight.
- Returns
formatted_paragraph – The paragraph containing sentence with the sentence highlighted in color
- Return type
str
- print_single_result(result_info, print_whole_paragraph)[source]¶
Retrieve metadata and complete the report with HTML string given sentence_id.
- Parameters
result_info (dict) – The information for a single result obtained by calling _fetch_result_info.
print_whole_paragraph (bool) – If true, the whole paragraph will be displayed in the results of the widget.
- Returns
article_metadata (str) – Formatted string containing the metadata of the article.
formatted_output (str) – Formatted output of the sentence.
- saved_results()[source]¶
Get all search results that were flagged for saving.
- Returns
saved_items_df – A data frame with all saved search results.
- Return type
pd.DataFrame
- set_page(new_page, force=False)[source]¶
Go to a given page in the results view.
- Parameters
new_page (int) – The new page number to go to.
force (bool) – By default, if new_page is the same one as the one currently viewed, the the page is not reloaded. To reload the page set this parameter to True. This is ueful when new results have been fetched and so the view needs to be updated.