bluesearch.database.mining_cache module¶
Module for the Database Creation.
- class CreateMiningCache(database_engine, ee_models_paths, target_table_name, workers_per_model=1)[source]¶
Bases:
object
Create SQL database to save results of mining into a cache.
- Parameters
database_engine (sqlalchemy.engine.Engine) – Connection to the CORD-19 database.
ee_models_paths (dict[str, pathlib.Path]) – Dictionary mapping entity type to model path detecting it.
target_table_name (str) – The target table name for the mining results.
workers_per_model (int, optional) – Number of max processes to spawn to run text mining and table population in parallel.
- create_tasks(task_queues, workers_by_queue)[source]¶
Create tasks for the mining workers.
- Parameters
task_queues (dict[str or pathlib.Path, multiprocessing.Queue]) – Task queues for different models. The keys are the model paths and the values are the actual queues.
workers_by_queue (dict[str]) – All worker processes working on tasks from a given queue.
- class Miner(database_url, model_path, target_table, task_queue, can_finish)[source]¶
Bases:
object
Multiprocessing worker class for mining named entities.
- Parameters
database_url (str) – URL of a database already containing tables articles and sentences. The URL should indicate database dialect and connection argument, e.g. database_url = “postgresql://scott:tiger@localhost/test”.
model_path (str) – The path for loading the spacy model that will perform the named entity extraction.
target_table (str) – The target table name for the mining results.
task_queue (multiprocessing.Queue) – The queue with tasks for this worker
can_finish (multiprocessing.Event) – A flag to indicate that the worker can stop waiting for new tasks. Unless this flag is set, the worker will continue polling the task queue for new tasks.
- classmethod create_and_mine(database_url, model_path, target_table, task_queue, can_finish)[source]¶
Create a miner instance and start the mining loop.
- Parameters
database_url (str) – URL of a database already containing tables articles and sentences. The URL should indicate database dialect and connection argument, e.g. database_url = “postgresql://scott:tiger@localhost/test”.
model_path (str) – The path for loading the spacy model that will perform the named entity extraction.
target_table (str) – The target table name for the mining results.
task_queue (multiprocessing.Queue) – The queue with tasks for this worker
can_finish (multiprocessing.Event) – A flag to indicate that the worker can stop waiting for new tasks. Unless this flag is set, the worker will continue polling the task queue for new tasks.