Quick Start¶
If your goal is to use Saber to annotate biomedical text, then you can either use the web-service or a pre-trained model. If you simply want to check Saber out, without installing anything locally, try the Google Colaboratory notebook.
Google Colaboratory¶
The fastest way to check out Saber is by following along with the Google Colaboratory notebook (). In order to be able to run the cells, select "Open in Playground" or, alternatively, save a copy to your own Google Drive account (File > Save a copy in Drive).
Web-service¶
To use Saber as a local web-service, run
(saber) $ python -m saber.cli.app
or, if you prefer, you can pull & run the Saber image from Docker Hub
# Pull Saber image from Docker Hub $ docker pull pathwaycommons/saber # Run docker (use `-dt` instead of `-it` to run container in background) $ docker run -it --rm -p 5000:5000 --name saber pathwaycommons/saber
Tip
Alternatively, you can clone the GitHub repository and build the container from the Dockerfile
with docker build -t saber .
There are currently two endpoints, /annotate/text
and /annotate/pmid
. Both expect a POST
request with a JSON payload, e.g.
{ "text": "The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53." }
or
{ "pmid": 11835401 }
For example, with the web-service running locally
curl -X POST 'http://localhost:5000/annotate/text' \ --data '{"text": 'The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.'}'
import requests # assuming you have requests package installed! url = "http://localhost:5000/annotate/pmid" payload = {"text": "The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53."} response = requests.post(url, json=payload) print(response.text) print(response.status_code, response.reason)
Warning
The first request to the web-service will be slow (~60s). This is because a large language model needs to be loaded into memory.
Documentation for the Saber web-service API can be found here. We hope to provide a live version of the web-service soon!
Pre-trained models¶
First, import Saber
. This class coordinates training, annotation, saving and loading of models and datasets. In short, this is the interface to Saber.
from saber.saber import Saber
To load a pre-trained model, first create a Saber
object
saber = Saber()
and then load the model of our choice
saber.load('PRGE')
Tip
See Resources: Pre-trained models for pre-trained model names and details. You will need an internet connection to download a pre-trained model.
To annotate text with the model, just call the Saber.annotate()
method
saber.annotate("The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.")
Warning
The Saber.annotate()
method will be slow the first time you call it (~60s). This is because a large language model needs to be loaded into memory.
Coreference Resolution¶
Coreference occurs when two or more expressions in a text refer to the same person or thing, that is, they have the same referent. Take the following example:
"IL-6 supports tumour growth and metastasising in terminal patients, and it significantly engages in cancer cachexia (including anorexia) and depression associated with malignancy."
Clearly, "it" referes to "IL-6". If we do not resolve this coreference, then "it" will not be labeled as an entity and any relation or event it is mentioned in will not be extracted. Saber uses NeuralCoref, a state-of-the-art coreference resolution tool based on neural nets and built on top of Spacy. To use it, just supply the argument coref=True
(which is False
by default) to the Saber.annotate()
method
text = "IL-6 supports tumour growth and metastasising in terminal patients, and it significantly engages in cancer cachexia (including anorexia) and depression associated with malignancy." # WITHOUT coreference resolution saber.annotate(text, coref=False) # WITH coreference resolution saber.annotate(text, coref=True)
Note
If you are using the web-service, simply supply "coref": true
in your JSON
payload to resolve coreferences.
Saber currently takes the simplest possible approach: replace all coreference mentions with their referent, and then feed the resolved text to the model that identifies named entities.
Grounding¶
Grounding (sometimes called entity linking or normalization) involves mapping each annotated entity to a unique identifier in an external resource such as a database or ontology. To ground entities in a call to Saber.annotate()
, simply pass the argument ground=True
saber.annotate('The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.', ground=True)
The grounding functionality is implemented by the EXTRACT 2.0 API. Note that you will need an internet connection or grounding will fail. Also note that Saber.annotate()
will take slightly longer to return a response when ground=True
(up to a few seconds).
See Resources: Pre-trained models for a list of the the external resources each entity type (annotated by the pre-trained models) is grounded to.
Note
If you are using the web-service, simply supply "ground": true
in your JSON
payload to ground entities.
Working with annotations¶
The Saber.annotate()
method returns a simple dict
object
ann = saber.annotate("The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.")
which contains the keys title
, text
and ents
title
: contains the title of the article, if providedtext
: contains the text (which is minimally processed) the model was deployed onents
: contains a list of entities present in thetext
that were annotated by the model
For example, to see all entities annotated by the model, call
ann['ents']
Converting annotations to JSON¶
The Saber.annotate()
method returns a dict
object, but can be converted to a JSON
formatted string for ease-of-use in downstream applications
import json # convert to json object json_ann = json.dumps(ann) # convert back to python dictionary ann = json.loads(json_ann)