Skip to content

cytomining/DeepProfiler

DeepProfiler

Python 3.10–3.11 CI codecov Cell Painting CNN-1 DOI Example data DOI

Important

v0.5.1 is a focused maintenance release. Model training, the plugin system, and CometML integration have been removed. The only supported use case is feature extraction using the Cell Painting CNN v1 checkpoint (EfficientNet B0). This release requires Python 3.10–3.11 and TensorFlow 2.10–2.15. If you need training or used the plugin system, please use the v0.3.0 tag and open an issue to let us know your use case. See ROADMAP.md for the full plan.

Image-based profiling using deep learning

DeepProfiler is a set of tools to use deep learning for analyzing imaging data in high-throughput biological experiments. Please see our DeepProfiler Handbook for more details about how to use it and DeepProfilerExperiments repository for examples of configuration files and downstream analysis.

Checkout our Nature Communications paper.

Cell Painting CNN

Cell Painting CNN weights are available on Zenodo.

We used DeepProfiler to train a feature extraction model for single cells in Cell Painting experiments. The model brings state-of-the-art profiling performance for downstream analysis tasks. This model is an EfficientNet trained to process the 5 channels of the Cell Painting assay and produce single-cell morphology embeddings, which can be aggregated to profile treatments in large-scale experiments. Features obtained with the Cell Painting CNN are more robust and improve performance.

Quick Guide

System requirements

  • Python 3.10 or 3.11
  • TensorFlow 2.10–2.15
  • Linux (Ubuntu 20.04+) recommended
  • For GPU acceleration, a CUDA-compatible GPU is recommended

Install

pip install deepprofiler

Or run directly without any environment setup using uvx — it handles installation automatically in an isolated environment:

uvx deepprofiler --root=/path/to/project --config=config.json profile

For contributing, see CONTRIBUTING.md.

Download example data

This repository contains example data structured as a DeepProfiler project. Unpack it with:

tar -xzf example_data.tar.gz

Profiling with the Cell Painting CNN-1

The only supported use case in v0.5.1 is feature extraction using the Cell Painting CNN v1 checkpoint — an EfficientNet B0 trained on 5-channel Cell Painting images (DNA, ER, RNA, AGP, Mito).

How inference works:

  1. DeepProfiler reads a metadata CSV listing your images and a locations CSV with per-image cell coordinates (e.g. from CellProfiler nucleus segmentation).
  2. For each image, it crops a fixed-size patch around each cell centroid.
  3. The crops are passed through the EfficientNet B0 backbone; the GlobalAveragePooling2D layer (pool5) produces a 1280-dimensional embedding per cell.
  4. Embeddings are written to .npz files (one per image) containing a features array of shape (num_cells, 1280) alongside metadata and crop coordinates.

These per-cell .npz files can be aggregated with pycytominer for downstream analysis.

Setup:

Initialize your project directory structure:

deepprofiler --root=/path/to/project --config=config.json setup

Place your images, metadata CSV, and cell locations in the created directories (see the handbook for layout details). Download an example configuration file and put it in project/inputs/config/.

Copy the model weights (Cell_Painting_CNN_v1.hdf5, available on Zenodo) into project/outputs/cell_painting/checkpoint/.

Run feature extraction:

deepprofiler --root=/path/to/project --config=cell_painting_cnn.json --exp=cell_painting --gpu=0 profile

Extracted features are written to project/outputs/cell_painting/features/.

Image preparation (optional but recommended)

Raw microscopy images often have uneven illumination — the centre of the field is brighter than the edges due to the optical path. DeepProfiler can correct for this and compress images to 8-bit PNG before profiling. Both steps are optional: you can profile directly from raw TIFFs, but preparation improves feature quality and speeds up repeated runs on the same dataset.

What prepare does:

  1. Illumination statistics — for each plate, DeepProfiler scans every image and builds a per-channel pixel histogram and a mean image. It then fits a smooth illumination correction function (a median-filtered version of the mean image, following Singh et al. 2014) and saves it to project/outputs/intensities/.

  2. Compression — each raw image is divided by the correction function, histogram-stretched to 8-bit, downscaled (optional), and saved as PNG to project/outputs/compressed/images/. The config then points profiling at these PNGs instead of the raw TIFFs.

When to use it:

  • Recommended for large experiments (hundreds of plates) where illumination variation between plates or within plates is substantial, or where disk I/O is a bottleneck.
  • Skip it for small pilot experiments or when your images have already been illumination-corrected upstream (e.g. by CellProfiler's CorrectIlluminationApply module).

Running preparation:

deepprofiler --root=/path/to/project --config=config.json --cores=8 prepare

--cores controls the number of parallel worker processes (default: all CPUs). Preparation is CPU-bound and benefits from parallelism — one worker processes one plate at a time.

In your config, set prepare.compression.implement to true to enable compression and point profiling at the compressed images automatically:

"prepare": {
    "illumination_correction": {
        "down_scale_factor": 4,
        "median_filter_size": 24
    },
    "compression": {
        "implement": true,
        "scaling_factor": 1.0
    }
}

down_scale_factor controls the resolution at which the mean image is computed (4 = quarter resolution, which is sufficient to capture the illumination gradient). median_filter_size is the diameter of the smoothing disk in pixels — larger values produce a smoother correction at the cost of computation time. scaling_factor controls spatial downscaling of the output PNGs (1.0 = no downscaling).

Large-scale profiling across multiple jobs

For very large datasets, the metadata index can be split into parts and profiled in parallel across multiple machines or jobs:

deepprofiler --root=/path/to/project --config=config.json split --parts=10

This writes index-000.csv through index-009.csv alongside the original index.csv. Each part is then profiled independently:

deepprofiler --root=/path/to/project --config=config.json --exp=cell_painting --gpu=0 profile --part=0
deepprofiler --root=/path/to/project --config=config.json --exp=cell_painting --gpu=1 profile --part=1
...

Parts are split by plate/well, so each job processes a contiguous group of wells. Already-profiled images are skipped automatically (resumable runs), so parts can be restarted without re-processing completed images.

Verifying your installation

After installing, you can verify that the Cell Painting CNN checkpoint loads and produces features by running the integration test suite. This downloads the checkpoint from Zenodo (~80 MB) and runs a full end-to-end profiling pipeline on synthetic data:

uv run pytest -m integration -v

The integration tests check three things:

  1. The Zenodo checkpoint loads into the EfficientNet B0 architecture without error.
  2. The loaded model produces non-trivial feature vectors for random input crops.
  3. The full Profile pipeline (checkpoint load → crop generation → feature extraction) writes a valid .npz output file.

Integration tests are excluded from the default test run (uv run pytest) to avoid network access in CI.

Training your own models

🚫 Removed in v0.5.1: Model training (train, traintf2, export-sc commands) has been removed. If you need training, use the v0.3.0 tag. A PyTorch-based training pipeline is planned for v0.6.x.

Plugin system

🚫 Removed in v0.5.1: The plugin system for models, crop generators, and metrics has been removed.

CometML experiment tracking

🚫 Removed in v0.5.1: CometML integration has been removed.

Happy profiling!

About

Morphological profiling using deep learning

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages