bluesearch.database.mining_cache module

Module for the Database Creation.

class CreateMiningCache(database_engine, ee_models_paths, target_table_name, workers_per_model=1)[source]

Bases: object

Create SQL database to save results of mining into a cache.

Parameters
  • database_engine (sqlalchemy.engine.Engine) – Connection to the CORD-19 database.

  • ee_models_paths (dict[str, pathlib.Path]) – Dictionary mapping entity type to model path detecting it.

  • target_table_name (str) – The target table name for the mining results.

  • workers_per_model (int, optional) – Number of max processes to spawn to run text mining and table population in parallel.

construct()[source]

Construct and populate the cache of mined results.

create_tasks(task_queues, workers_by_queue)[source]

Create tasks for the mining workers.

Parameters
  • task_queues (dict[str or pathlib.Path, multiprocessing.Queue]) – Task queues for different models. The keys are the model paths and the values are the actual queues.

  • workers_by_queue (dict[str]) – All worker processes working on tasks from a given queue.

do_mining()[source]

Do the parallelized mining.

class Miner(database_url, model_path, target_table, task_queue, can_finish)[source]

Bases: object

Multiprocessing worker class for mining named entities.

Parameters
  • database_url (str) – URL of a database already containing tables articles and sentences. The URL should indicate database dialect and connection argument, e.g. database_url = “postgresql://scott:tiger@localhost/test”.

  • model_path (str) – The path for loading the spacy model that will perform the named entity extraction.

  • target_table (str) – The target table name for the mining results.

  • task_queue (multiprocessing.Queue) – The queue with tasks for this worker

  • can_finish (multiprocessing.Event) – A flag to indicate that the worker can stop waiting for new tasks. Unless this flag is set, the worker will continue polling the task queue for new tasks.

clean_up()[source]

Clean up after task processing has been finished.

classmethod create_and_mine(database_url, model_path, target_table, task_queue, can_finish)[source]

Create a miner instance and start the mining loop.

Parameters
  • database_url (str) – URL of a database already containing tables articles and sentences. The URL should indicate database dialect and connection argument, e.g. database_url = “postgresql://scott:tiger@localhost/test”.

  • model_path (str) – The path for loading the spacy model that will perform the named entity extraction.

  • target_table (str) – The target table name for the mining results.

  • task_queue (multiprocessing.Queue) – The queue with tasks for this worker

  • can_finish (multiprocessing.Event) – A flag to indicate that the worker can stop waiting for new tasks. Unless this flag is set, the worker will continue polling the task queue for new tasks.

work_loop()[source]

Do the mining work loop.