Changelog¶
This page contains changelogs for Blue Brain Search released versions.
Legend¶
Add denotes new features.
Fix denotes bug fixes.
Change denotes functionality changes.
Deprecate denotes deprecated features that will be removed in the future.
Remove denotes removed features.
Latest¶
Version 0.2.0¶
July 1, 2021
Add metrics file resulting from
dvc
pipelines togit
. This allow now to usedvc metrics diff
.Change dependencies required to run the code of
data_and_models/
are not installed by default and now requirepip install .[data_and_models]
.Add in
dvc
, inner
pipelines, scripts allowing to train and evaluate NER thanks to thehuggingface/transformers
package. A comparison withspaCy
training is also possible.Change reports format of Search Widget from PDF to HTML.
Remove
tqdm
,joblib
,pdfkit
dependencies.Remove
bluesearch.mining.eval.plot_ner_confusion_matrix
function to dropjoblib
frominstall_requires
.Change
requirements.txt
refactored into three separate lists of dependencies:requirements.txt
,requirements-dev.txt
,requirements-data_and_models.txt
.Fix bugs (related to nested entities) in
ner_report
,ner_errors
,ner_confusion_matrix
functions frombluesearch.mining.eval
submodule.Add utility function
_check_consistent_iob
insidebluesearch.mining.eval
.Change upgrade linting tools in
tox.ini
Change for Transformer-based
spaCy
pipelines for NER models instead of Tok2Vec-basedscispaCy
pipelines.Change for one entity per model instead of several entities per NER model.
Change
pipelines/ner/dvc.yaml
to simplify and harmonize the definition of the pipeline for training NER models.Add
annotations/ner/analyze.py
, a code to evaluate the data quality of annotations. It could generate: 1) a detailed report for individual files when used as a script and 2) a summary table for several files when used as a function.Add
pipelines/ner/clean.py
, a script to clean annotations. It keeps only valid texts, normalizes labels, keeps only a given label, and then renames the label if necessary.Remove
ee_models_library.csv
and change the logic for one model per entity type.
Version 0.1.2¶
Change spaCy version from 2.x to 3.x, including scispaCy and models versions.
Change the training of NER models: use spaCy directly instead of Prodigy, use the default configuration from spaCy 3 instead of from Prodigy, use the binary format (
.spacy
) from spaCy 3 instead of the.jsonl
format from Prodigy.Remove Prodigy dependency.
Version 0.1.1¶
Change Upgrade to
dvc 2.0
.Remove NLTK dependencies.
Change Drop the dedicated
SBioBERT
class, we now useSentTransformer
interface to support this model.
Version 0.1.0¶
Add in
dvc
pipelines, theDockerfile
now installs requirements.txt to fix the versions of dependencies.Add support for
Python 3.9
.Add Blue Brain Search as a Zenodo record. This provides a unique DOI, a DOI for each published release, and automatic preservation outside GitHub.
Add the content of the DVC remote for Blue Brain Search v0.1.0 as a Zenodo record. This provides DOIs as for the code of Blue Brain Search above. This is also the first public release of the data and models of Blue Brain Search.
Remove support for
Python 3.6
.Remove the external dependency
sent2vec
and the embedding models depending on it, i.e.BSV
andSent2VecModel
.Remove the embedding model
Universal Sentence Encoder
: (USE) and its dependencies (tensorflow
andtensorflow-hub
).Remove
BBS_BBG_poc
notebook (now hosted on https://github.com/BlueBrain/Search-Graph-Examples) andassets/
directory.
Version 0.0.10¶
Changes¶
Change
bluesearch
is the new name of the Python package, replacing the formerbbsearch
.Change The code is now hosted on GitHub under
BlueBrain/Search
, eliminating the redundancy of the formerBlueBrain/BlueBrainSearch
.Add in README the purpose of Blue Brain Search.
Add in README the common usage of the two widgets (search and mining).
Add in README a complete and step-by-step Getting Started.
Add type checking for third-party libraries (
NumPy
,Pandas
,SQLAlchemy
).Add
BioBERT NLI+STS CORD-19 v1
to DVC evaluation pipeline.
Version 0.0.9¶
December 11, 2020
Changes¶
Add saving and loading of the results from the literature search and mining widgets.
Add mining for more than 1,000 articles.
Add
BioBERT NLI+STS CORD-19 v1
training scripts and data.Add CORD-19 version 65 database, embeddings, and entities.
Add tests for all entry points.
Add security checks with
bandit
.Fix NER false positive for
abstract
.Fix refactoring issue in
get_embedding_model
.Change naming of and inside the
bluesearch.entrypoints
module.Change how the NER entry points retrieve models: now DVC is used.
Change warnings when generating the documentation into errors.
Remove
scibert
fromsetup.py
andrequirements.txt
.
Version 0.0.8¶
November 24, 2020
Changes¶
Add column is_bad in table sentences for quality filtering (too long, too short, LaTeX code).
Add embedding model BioBERT NLI+STS CORD-19 v1.
Change embedding_models.get_embedding_model() to support any model class and checkpoint path without having to modify the source code of BBS.
Fix bug in hyperlinks of SearchWidget. We now take the first URL if there are several, and add Google search if there is none.
Change widgets UIs with tabs to improve usability.
Version 0.0.7¶
November 16, 2020
Changes¶
Add parallelization of embedding computations.
Change “Saved Articles” summary in the Search Widget.
Fix undesired timeout of MySQL connection in the Search Server.
Version 0.0.6¶
November 3, 2020
Changes¶
Add inter-rater agreement with DVC.
Add Advanced Features section in the Search Widget.
Change mining schema logic.
Change code formatting - run black on everything.
Version 0.0.5¶
October 26, 2020
Changes¶
Change bluesearch.mining.eval.spacy2df can now work with NER pipelines including entity rulers.
Version 0.0.4¶
October 20, 2020
Changes¶
Add language detection with langdetect, allowing to filter out articles not in English or no useful content.
Add widgets inform the user on the CORD-19 version being used.
Add bluesearch.utils.JSONL for easy interaction with JSONL files.
Add bluesearch.entity.PatternCreator and other functionalities to perform rule-based named entity recognition.
Change module names
Change in bluesearch.embedding_models, SBERT class is now replaced by a more general-purpose SentTransformer which can wrap any object from sentence_transformers.SentenceTransformer.
Add bluesearch.embedding_models.SklearnVectorizer is a new class that can be used to wrap any sklearn vectorizer object (TfidfVectorizer, CountVectorizer, HashingVectorizer).
Version 0.0.3¶
October 2, 2020
This is the first beta release from Blue Brain Search.
Previous releases were highly experimental and should be considered as being in alpha phase.
Changes¶
Change CORD19 database version, upgrading from v35 to v47.
Add button to Literature Search widget to let user choose whether to retrieve top N articles or top N sentences.
Fix bug in database creation where auto-increment was triggered even if insertion failed.
Add automatic creation of a FULLTEXT INDEX on sentences.text when the table is first created, just after data insertion.
Add annotations for NER with DVC.
Add pipelines to train and evaluate NER models with DVC.
Add Sent2VecModel class and option in Literature Search widget to select sent2vec to run the search.
Add Docker ecosystem with .env files and docker-compose.
Change search servers by merging RemoteSearcher and LocalSearcher into the new SearchEngine.