Skip to main content

Running Diversity Based Acquisition Function on Unlabeled Data

In this tutorial, you will see how to run a clustering-based diversity sampling acquisition function to rank the images.

note

This tutorial assumes that you have installed encord-active.

1. Inference on the unlabeled data

First, terminate any running Encord-Active app before running any code. Then, get the root directory path of the encord-active project and copy it to data_dir variable below. You will use Image Diversity metric to rank the images. Image Diversity metric simply clusters the dataset according to the number of classes in the ontology and selects equal number of samples from each cluster. Your project may consists of both labeled and unlabeled examples and you may want to run this acquisition function only on the unlabeled data; therefore, you will set skip_labeled_data to True.

from pathlib import Path
from encord_active.lib.metrics.semantic.image_diversity import ImageDiversity
from encord_active.lib.metrics.execute import execute_metrics

data_dir = Path("/path/to/encord-active/project")
acquisition_func = ImageDiversity()
execute_metrics([acquisition_func], data_dir=data_dir, use_cache_only=True, skip_labeled_data=True)

2. Refresh metric files

After executing the acquisition function. It should output two new files in the metrics folder of the root project folder. We need to update the metric information in the project to reflect the changes in the UI:

from encord_active.lib.metrics.io import get_metric_metadata
from encord_active.lib.metrics.metadata import fetch_metrics_meta, update_metrics_meta
from encord_active.lib.project.project_file_structure import ProjectFileStructure

project_fs = ProjectFileStructure(data_dir)
metrics_meta = fetch_metrics_meta(project_fs)
metrics_meta[acquisition_func.metadata.title]= get_metric_metadata(acquisition_func)
update_metrics_meta(project_fs, metrics_meta)
project_fs.db.unlink(missing_ok=True)

Now, open the encord-active app using the following CLI command in the project or its root folder:

encord-active start

Go to Data Quality -> Explorer, and choose Image Diversity from the metric drop down menu. You will see the examples sorted according to the image diversity function. From now on, you can select the first N samples and:

  1. create a new project to label based off these samples.
  2. export the selected samples using Actions tab and use them in you own label annotation pipeline.