Getting Started with VISSL¶
This document provides a brief introduction of usage of built-in command line tools provided by VISSL.
Quick Start with VISSL¶
We provide a quick overview for training SimCLR self-supervised model on 1-gpu with VISSL.
Install VISSL¶
For installation, please follow our installation instructions.
Setup dataset¶
We will use ImageNet-1K dataset and assume the downloaded data to look like:
imagenet_full_size
|_ train
| |_ <n0......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-N-name>.JPEG
| |_ ...
| |_ <n1......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-M-name>.JPEG
| | |_...
| | |_...
|_ val
| |_ <n0......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-N-name>.JPEG
| |_ ...
| |_ <n1......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-M-name>.JPEG
| | |_...
| | |_...
Running SimCLR Pre-training on 1-gpu¶
If VISSL is built from source¶
We provide a config to train model using the pretext SimCLR task on the ResNet50 model.
Change the DATA.TRAIN.DATA_PATHS
path to the ImageNet train dataset folder path.
python3 run_distributed_engines.py \
hydra.verbose=true \
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config=test/integration_test/quick_simclr \
config.CHECKPOINT.DIR="./checkpoints" \
config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
If using pre-built conda/pip VISSL packages¶
Users need to set the dataset and obtain the builtin tool for training. Follow the steps:
Step1: Setup ImageNet1K dataset
If you installed pre-built VISSL packages, we will set the ImageNet1K dataset following our data documentation and tutorial. NOTE that we need to register the dataset with VISSL.
In your python interpretor:
>>> json_data = {
"imagenet1k_folder": {
"train": ["<img_path>", "<lbl_path>"],
"val": ["<img_path>", "<lbl_path>"]
}
}
>>> from vissl.utils.io import save_file
>>> save_file(json_data, "/tmp/configs/config/dataset_catalog.json")
>>> from vissl.data.dataset_catalog import VisslDatasetCatalog
>>> print(VisslDatasetCatalog.list())
['imagenet1k_folder']
>>> print(VisslDatasetCatalog.get("imagenet1k_folder"))
{'train': ['<img_path>', '<lbl_path>'], 'val': ['<img_path>', '<lbl_path>']}
Step2: Get the builtin tool and yaml config file
We will use the pre-built VISSL tool for training run_distributed_engines.py and the config file. Run
cd /tmp/ && mkdir -p /tmp/configs/config
wget -q -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py
wget -q -O configs/config/quick_1gpu_resnet50_simclr.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/quick_1gpu_resnet50_simclr.yaml
wget -q https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py
Step3: Train
cd /tmp/
python3 run_distributed_engines.py \
hydra.verbose=true \
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config=quick_1gpu_resnet50_simclr \
config.CHECKPOINT.DIR="./checkpoints" \
config.TENSORBOARD_SETUP.USE_TENSORBOARD=true