bluesearch.widgets.article_saver module

Module for the article_saver.

class ArticleSaver(connection)[source]

Bases: object

Keeps track of selected articles.

This class can be used to save a number of articles and paragraphs for a later use. A typical use case is to keep track of the items selected in the search widget, and to retrieve them later in the mining widget.

Furthermore this class allows to print a summary of all selected items using the summary_table method, to resolve all items into paragraphs with the corresponding section name and to summarize them in a pandas data frame using the method get_chosen_texts, and to export a PDF report of all saved items using the method report.

Parameters

connection (sqlalchemy.engine.Engine) – An SQL database connectable compatible with pandas.read_sql. The database is supposed to have paragraphs and articles tables.

connection

An SQL database connectable compatible with pandas.read_sql. The database is supposed to have paragraphs and articles tables.

Type

sqlalchemy.engine.Engine

state

The state that keeps track of saved items. It is a set of tuples of the form (article_id, paragraph_id) each representing one saved item. The items with paragraph_id = -1 indicate that the whole article should be saved.

Type

set

state_hash

A hash uniquely identifying a certain state. This is used to cache df_chosen_texts and avoid recomputing it if the state has not changed.

Type

int or None

df_chosen_texts

The rows represent different paragraphs and the columns are ‘article_id’, ‘section_name’, ‘paragraph_id’, ‘text’.

Type

pd.DataFrame

add_article(article_id)[source]

Save an article.

Parameters

article_id (int) – The article ID.

add_paragraph(article_id, paragraph_pos_in_article)[source]

Save a paragraph.

Parameters
  • article_id (int) – The article ID.

  • paragraph_pos_in_article (int) – The paragraph ID.

get_chosen_texts()[source]

Retrieve the currently saved items.

For all entire articles that are saved the corresponding paragraphs are resolved first.

Returns

df_chosen_texts

Return type

pandas.DataFrame

get_saved_items()[source]

Retrieve the saved items that summarize the choice of the users.

Returns

identifiers – Tuple (article_id, paragraph_pos_in_article) chosen by the user.

Return type

list of tuple

has_article(article_id)[source]

Check if an article has been saved.

Parameters

article_id (int) – The article ID.

Returns

result – Whether or not the given article has been saved.

Return type

bool

has_paragraph(article_id, paragraph_pos_in_article)[source]

Check if a paragraph has been saved.

Parameters
  • article_id (int) – The article ID.

  • paragraph_pos_in_article (int) – The paragraph ID.

Returns

result – Whether or not the given paragraph has been saved.

Return type

bool

make_report(output_dir=None)[source]

Create the saved articles report.

Parameters

output_dir (str or pathlib.Path) – The directory for writing the report.

Returns

output_file_path – The file to which the report was written.

Return type

pathlib.Path

remove_all()[source]

Remove all saved items.

remove_article(article_id)[source]

Remove an article from saved.

Parameters

article_id (int) – The article ID.

remove_paragraph(article_id, paragraph_pos_in_article)[source]

Remove a paragraph from saved.

Parameters
  • article_id (int) – The article ID.

  • paragraph_pos_in_article (int) – The paragraph ID.

summary_table()[source]

Create a dataframe table with saved articles.

Returns

table – DataFrame containing all the paragraphs seen and choice made for it.

Return type

pd.DataFrame