Running Diversity Based Acquisition Function on Unlabeled Data
In this tutorial, you will see how to run a clustering-based diversity sampling acquisition function to rank the images.
This tutorial assumes that you have installed encord-active
.
1. Inference on the unlabeled data
First, terminate any running Encord-Active app before running any code. Then, get the root directory path of the encord-active project and copy
it to data_dir
variable below. You will use Image Diversity metric
to rank the images. Image Diversity metric simply clusters the dataset according to the number of classes in the ontology and
selects equal number of samples from each cluster. Your project may consists of both labeled and unlabeled examples and you may want to run this acquisition
function only on the unlabeled data; therefore, you will set skip_labeled_data
to True
.
from pathlib import Path
from encord_active.lib.metrics.semantic.image_diversity import ImageDiversity
from encord_active.lib.metrics.execute import execute_metrics
data_dir = Path("/path/to/encord-active/project")
acquisition_func = ImageDiversity()
execute_metrics([acquisition_func], data_dir=data_dir, use_cache_only=True, skip_labeled_data=True)
2. Refresh metric files
After executing the acquisition function. It should output two new files in the metrics folder of the root project folder. We need to update the metric information in the project to reflect the changes in the UI:
from encord_active.lib.metrics.io import get_metric_metadata
from encord_active.lib.metrics.metadata import fetch_metrics_meta, update_metrics_meta
from encord_active.lib.project.project_file_structure import ProjectFileStructure
project_fs = ProjectFileStructure(data_dir)
metrics_meta = fetch_metrics_meta(project_fs)
metrics_meta[acquisition_func.metadata.title]= get_metric_metadata(acquisition_func)
update_metrics_meta(project_fs, metrics_meta)
project_fs.db.unlink(missing_ok=True)
Now, open the encord-active app using the following CLI command in the project or its root folder:
encord-active start
Go to Data Quality -> Explorer, and choose Image Diversity from the metric drop down menu. You will see the examples sorted according to the image diversity function. From now on, you can select the first N samples and:
- create a new project to label based off these samples.
- export the selected samples using Actions tab and use them in you own label annotation pipeline.