spaCy is a popular “industrial-strength” NLP library: fast, accurate models with a minimum of fuss. As of spaCy v3, W&B can now be used withDocumentation Index
Fetch the complete documentation index at: https://wb-21fd5541-john-wbdocs-2044-rename-serverless-products.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
spacy train to track your spaCy model’s training metrics as well as to save and version your models and datasets. And all it takes is a few added lines in your configuration.
Sign up and create an API key
An API key authenticates your machine to W&B. You can generate an API key from your user profile.For a more streamlined approach, create an API key by going directly to User Settings. Copy the newly created API key immediately and save it in a secure location such as a password manager.
- Click your user profile icon in the upper right corner.
- Select User Settings, then scroll to the API Keys section.
Install the wandb library and log in
To install the wandb library locally and log in:
- Command Line
- Python
- Python notebook
-
Set the
WANDB_API_KEYenvironment variable to your API key. -
Install the
wandblibrary and log in.
Add the WandbLogger to your spaCy config file
spaCy config files are used to specify all aspects of training, not just logging — GPU allocation, optimizer choice, dataset paths, and more. Minimally, under [training.logger] you need to provide the key @loggers with the value "spacy.WandbLogger.v3", plus a project_name.
For more on how spaCy training config files work and on other options you can pass in to customize training, check out spaCy’s documentation.
| Name | Description |
|---|---|
project_name | str. The name of the W&B Project. The project will be created automatically if it doesn’t exist yet. |
remove_config_values | List[str] . A list of values to exclude from the config before it is uploaded to W&B. [] by default. |
model_log_interval | Optional int. None by default. If set, enables model versioning with Artifacts. Pass in the number of steps to wait between logging model checkpoints. None by default. |
log_dataset_dir | Optional str. If passed a path, the dataset will be uploaded as an Artifact at the beginning of training. None by default. |
entity | Optional str . If passed, the run will be created in the specified entity |
run_name | Optional str . If specified, the run will be created with the specified name. |
Start training
Once you have added theWandbLogger to your spaCy training config you can run spacy train as usual.
- Command Line
- Python
- Python notebook