bluesearch.mining.relation module¶

Classes and functions for relation extraction.

class ChemProt(model_path)[source]¶

Bases: bluesearch.mining.relation.REModel

Pretrained model extracting 13 relations between chemicals and proteins.

This model supports the following entity types:

“GGP”
“CHEBI”

model_¶

The actual model in the backend.

Type: allennlp.predictors.text_classifier.TextClassifierPredictor

Notes

This model depends on a package named scibert which is not specified in the setup.py since it introduces dependency conflicts. One can install it manually with the following command.

pip install git+https://github.com/allenai/scibert

Note that import scibert has a side effect of registering the “text_classifier” model with allennlp. This is done via applying a decorator to a class. For more details see

https://github.com/allenai/scibert/blob/06793f77d7278898159ed50da30d173cdc8fdea9/scibert/models/text_classifier.py#L14

property classes¶: Names of supported relation classes.

predict_probs(annotated_sentence)[source]¶: Predict probabilities for the relation.

property symbols¶: Symbols for annotation.

class REModel[source]¶

Bases: abc.ABC

Abstract interface for relationship extraction models.

Inspired by SciBERT.

abstract property classes¶

Names of supported relation classes.

Returns: Names of supported relation classes.
Return type: list of str

predict(annotated_sentence, return_prob=False)[source]¶

Predict most likely relation between subject and object.

Parameters

annotated_sentence (str) – Sentence with exactly 2 entities being annotated accordingly. For example “<< Cytarabine >> inhibits [[ DNA polymerase ]].”
return_prob (bool, optional) – If True also returns the confidence of the predicted relation.

Returns

relation (str) – Relation type.
prob (float, optional) – Confidence of the predicted relation.

abstract predict_probs(annotated_sentence)[source]¶

Relation probabilities between subject and object.

Predict per-class probabilities for the relation between subject and object in an annotated sentence.

Parameters: annotated_sentence (str) – Sentence with exactly 2 entities being annotated accordingly. For example “<< Cytarabine >> inhibits [[ DNA polymerase ]].”
Returns: relation_probs – Per-class probability vector. The index contains the class names, the values are the probabilities.
Return type: pd.Series

abstract property symbols¶

Generate dictionary mapping the two entity types to their annotation symbols.

General structure: {‘ENTITY_TYPE’: (‘SYMBOL_LEFT’, ‘SYMBOL_RIGHT’)} Specific example: {‘GGP’: (‘[[ ‘, ‘ ]]’), ‘CHEBI’: (‘<< ‘, ‘ >>’)}

Make sure that left and right symbols are not identical.

class StartWithTheSameLetter[source]¶

Bases: bluesearch.mining.relation.REModel

Check whether two entities start with the same letter (case insensitive).

This relation is symmetric and works on any entity type.

property classes¶: Names of supported relation classes.

predict_probs(annotated_sentence)[source]¶: Predict probabilities for the relation.

property symbols¶: Symbols for annotation.

annotate(doc, sent, ent_1, ent_2, etype_symbols)[source]¶

Annotate sentence given two entities.

Parameters

doc (spacy.tokens.Doc) – The entire document (input text). Note that spacy uses it for absolute referencing.
sent (spacy.tokens.Span) – One sentence from the doc where we look for relations.
ent_1 (spacy.tokens.Span) – The first entity in the sentence. One can get its type by using the label_ attribute.
ent_2 (spacy.tokens.Span) – The second entity in the sentence. One can get its type by using the label_ attribute.
etype_symbols (dict or defaultdict) – Keys represent different entity types (“GGP”, “CHEBI”) and the values are tuples of size 2. Each of these tuples represents the starting and ending symbol to wrap the recognized entity with. Each REModel has the symbols property that encodes how its inputs should be annotated.

Returns

result – String representing an annotated sentence created out of the original one.

Return type

str

Notes

The implementation is non-trivial because an entity can span multiple words.