Guide to the Saber API¶
Command line tool¶
Currently, the command line tool simply trains the model. To use it, call
(saber) $ python -m saber.cli.train
along with any command line arguments. For example, to train the model on the NCBI Disease corpus
(saber) $ python -m saber.cli.train --dataset_folder NCBI_Disease_BIO
See Resources: Datasets for help preparing datasets and word embeddings for training.
python -m saber.cli.train --help to see all possible arguments.
Of course, supplying arguments at the command line can quickly become cumbersome. Saber also allows you to provide a configuration file, which can be specified like so
(saber) $ python -m saber.cli.train --config_filepath path/to/config.ini
Copy the contents of the default config file to a new
*.ini file in order to get started.
Arguments supplied at the command line overwrite those found in the configuration file, e.g.,
(saber) $ python -m saber.cli.train --dataset_folder path/to/dataset --k_folds 10
would overwrite the arguments for
k_folds found in the configuration file.
You can also import Saber and interact with it as a python package. Saber exposes its functionality through the
Saber class. Here is just about everything Saber does in one script:
from saber.saber import Saber # First, create a Saber object, which exposes Sabers functionality saber = Saber() # Load a dataset and create a model (provide a list of datasets to use multi-task learning!) saber.load_dataset('path/to/datasets/GENIA') saber.build(model_name='MT-LSTM-CRF') # Train and save a model saber.train() saber.save('pretrained_models/GENIA') # Load a model del saber saber = Saber() saber.load('pretrained_models/GENIA') # Perform prediction on raw text, get resulting annotation raw_text = 'The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.' annotation = saber.annotate(raw_text) # Use transfer learning to continue training on a new dataset saber.load_dataset('path/to/datasets/CRAFT') saber.train()
Transfer learning is as easy as training, saving, loading, and then continuing training of a model. Here is an example
# Create and train a model on GENIA corpus saber = Saber() saber.load_dataset('path/to/datasets/GENIA') saber.build(model_name='MT-LSTM-CRF') saber.train() saber.save('pretrained_models/GENIA') # Load that model del saber saber = Saber() saber.load('pretrained_models/GENIA') # Use transfer learning to continue training on a new dataset saber.load_dataset('path/to/datasets/CRAFT') saber.train()
This is currently only supported by the
Multi-task learning is as easy as specifying multiple dataset paths, either in the
config file, at the command line via the flag
--dataset_folder, or as an argument to
load_dataset(). The number of datasets is arbitrary.
Here is an example using the last method
saber = Saber() # Simply pass multiple dataset paths as a list to load_dataset to use multi-task learning. saber.load_dataset(['path/to/datasets/NCBI_Disease', 'path/to/datasets/Linnaeus']) saber.build(model_name='MT-LSTM-CRF') saber.train()
This is currently only supported by the
Training on GPUs¶
Saber will automatically train on as many GPUs as are available. In order for this to work, you must have CUDA and, optionally, CudDNN installed. If you are using conda to manage your environment, then these are installed for you when you call
(saber) $ conda install tensorflow-gpu
Otherwise, install them yourself and use
pip to install
(saber) $ pip install tensorflow-gpu
To control which GPUs Saber trains on, you can use the
CUDA_VISIBLE_DEVICES environment variable, e.g.,
# To train exclusively on CPU (saber) $ CUDA_VISIBLE_DEVICES="" python -m saber.cli.train # To train on 1 GPU with ID=0 (saber) $ CUDA_VISIBLE_DEVICES="0" python -m saber.cli.train # To train on 2 GPUs with IDs=0,2 (saber) $ CUDA_VISIBLE_DEVICES="0,2" python -m saber.cli.train
You can get information about your NVIDIA GPUs by typing
nvidia-smi at the command line (assuming the GPUs are setup properly and the nvidia driver is installed).
Saving and loading models¶
In the following sections we introduce the saving and loading of models.
Saving a model¶
Assuming the model has already been created (see above), we can easily save our model like so
save_dir = 'path/to/pretrained_models/mymodel' saber.save(save_dir)
Loading a model¶
Lets illustrate loading a model with a new
# Delete our previous Saber object (if it exists) del saber # Create a new Saber object saber = Saber() # Load a previous model saber.load(path_to_saved_model)