Guide to the Saber API¶
You can interact with Saber as a web-service (explained in Quick Start: Web-service), command line tool, or as a python package. If you created a virtual environment, remember to activate it first.
Command line tool¶
Currently, the command line tool simply trains the model. To use it, call
(saber) $ python -m saber.cli.train
along with any command line arguments. For example, to train the model on the NCBI Disease corpus
(saber) $ python -m saber.cli.train --dataset_folder NCBI_Disease_BIO
Tip
See Resources: Datasets for help preparing datasets and word embeddings for training.
Run python -m saber.cli.train --help
to see all possible arguments.
Of course, supplying arguments at the command line can quickly become cumbersome. Saber also allows you to provide a configuration file, which can be specified like so
(saber) $ python -m saber.cli.train --config_filepath path/to/config.ini
Copy the contents of the default config file to a new *.ini
file in order to get started.
Note
Arguments supplied at the command line overwrite those found in the configuration file, e.g.,
(saber) $ python -m saber.cli.train --dataset_folder path/to/dataset --k_folds 10
would overwrite the arguments for dataset_folder
and k_folds
found in the configuration file.
Python package¶
You can also import Saber and interact with it as a python package. Saber exposes its functionality through the Saber
class. Here is just about everything Saber does in one script:
from saber.saber import Saber # First, create a Saber object, which exposes Sabers functionality saber = Saber() # Load a dataset and create a model (provide a list of datasets to use multi-task learning!) saber.load_dataset('path/to/datasets/GENIA') saber.build(model_name='MT-LSTM-CRF') # Train and save a model saber.train() saber.save('pretrained_models/GENIA') # Load a model del saber saber = Saber() saber.load('pretrained_models/GENIA') # Perform prediction on raw text, get resulting annotation raw_text = 'The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.' annotation = saber.annotate(raw_text) # Use transfer learning to continue training on a new dataset saber.load_dataset('path/to/datasets/CRAFT') saber.train()
Transfer learning¶
Transfer learning is as easy as training, saving, loading, and then continuing training of a model. Here is an example
# Create and train a model on GENIA corpus saber = Saber() saber.load_dataset('path/to/datasets/GENIA') saber.build(model_name='MT-LSTM-CRF') saber.train() saber.save('pretrained_models/GENIA') # Load that model del saber saber = Saber() saber.load('pretrained_models/GENIA') # Use transfer learning to continue training on a new dataset saber.load_dataset('path/to/datasets/CRAFT') saber.train()
Note
This is currently only supported by the mt-lstm-crf
model.
Multi-task learning¶
Multi-task learning is as easy as specifying multiple dataset paths, either in the config
file, at the command line via the flag --dataset_folder
, or as an argument to load_dataset()
. The number of datasets is arbitrary.
Here is an example using the last method
saber = Saber() # Simply pass multiple dataset paths as a list to load_dataset to use multi-task learning. saber.load_dataset(['path/to/datasets/NCBI_Disease', 'path/to/datasets/Linnaeus']) saber.build(model_name='MT-LSTM-CRF') saber.train()
Note
This is currently only supported by the mt-lstm-crf
model.
Training on GPUs¶
Saber will automatically train on as many GPUs as are available. In order for this to work, you must have CUDA and, optionally, CudDNN installed. If you are using conda to manage your environment, then these are installed for you when you call
(saber) $ conda install tensorflow-gpu
Otherwise, install them yourself and use pip
to install tensorflow-gpu
(saber) $ pip install tensorflow-gpu
To control which GPUs Saber trains on, you can use the CUDA_VISIBLE_DEVICES
environment variable, e.g.,
# To train exclusively on CPU (saber) $ CUDA_VISIBLE_DEVICES="" python -m saber.cli.train # To train on 1 GPU with ID=0 (saber) $ CUDA_VISIBLE_DEVICES="0" python -m saber.cli.train # To train on 2 GPUs with IDs=0,2 (saber) $ CUDA_VISIBLE_DEVICES="0,2" python -m saber.cli.train
Tip
You can get information about your NVIDIA GPUs by typing nvidia-smi
at the command line (assuming the GPUs are setup properly and the nvidia driver is installed).
Saving and loading models¶
In the following sections we introduce the saving and loading of models.
Saving a model¶
Assuming the model has already been created (see above), we can easily save our model like so
save_dir = 'path/to/pretrained_models/mymodel' saber.save(save_dir)
Loading a model¶
Lets illustrate loading a model with a new Saber
object
# Delete our previous Saber object (if it exists) del saber # Create a new Saber object saber = Saber() # Load a previous model saber.load(path_to_saved_model)