Important
v0.5.1 is a focused maintenance release.
Model training, the plugin system, and CometML integration have been removed.
The only supported use case is feature extraction using the Cell Painting CNN v1 checkpoint (EfficientNet B0).
This release requires Python 3.10–3.11 and TensorFlow 2.10–2.15.
If you need training or used the plugin system, please use the v0.3.0 tag and open an issue to let us know your use case.
See ROADMAP.md for the full plan.
DeepProfiler is a set of tools to use deep learning for analyzing imaging data in high-throughput biological experiments. Please see our DeepProfiler Handbook for more details about how to use it and DeepProfilerExperiments repository for examples of configuration files and downstream analysis.
Checkout our Nature Communications paper.
Cell Painting CNN weights are available on Zenodo.
We used DeepProfiler to train a feature extraction model for single cells in Cell Painting experiments. The model brings state-of-the-art profiling performance for downstream analysis tasks. This model is an EfficientNet trained to process the 5 channels of the Cell Painting assay and produce single-cell morphology embeddings, which can be aggregated to profile treatments in large-scale experiments. Features obtained with the Cell Painting CNN are more robust and improve performance.
- Python 3.10 or 3.11
- TensorFlow 2.10–2.15
- Linux (Ubuntu 20.04+) recommended
- For GPU acceleration, a CUDA-compatible GPU is recommended
pip install deepprofiler
Or run directly without any environment setup using uvx — it handles installation automatically in an isolated environment:
uvx deepprofiler --root=/path/to/project --config=config.json profile
For contributing, see CONTRIBUTING.md.
This repository contains example data structured as a DeepProfiler project. Unpack it with:
tar -xzf example_data.tar.gz
The only supported use case in v0.5.1 is feature extraction using the Cell Painting CNN v1 checkpoint — an EfficientNet B0 trained on 5-channel Cell Painting images (DNA, ER, RNA, AGP, Mito).
How inference works:
- DeepProfiler reads a metadata CSV listing your images and a locations CSV with per-image cell coordinates (e.g. from CellProfiler nucleus segmentation).
- For each image, it crops a fixed-size patch around each cell centroid.
- The crops are passed through the EfficientNet B0 backbone; the
GlobalAveragePooling2Dlayer (pool5) produces a 1280-dimensional embedding per cell. - Embeddings are written to
.npzfiles (one per image) containing afeaturesarray of shape(num_cells, 1280)alongside metadata and crop coordinates.
These per-cell .npz files can be aggregated with pycytominer for downstream analysis.
Setup:
Initialize your project directory structure:
deepprofiler --root=/path/to/project --config=config.json setup
Place your images, metadata CSV, and cell locations in the created directories (see the handbook for layout details).
Download an example configuration file and put it in project/inputs/config/.
Copy the model weights (Cell_Painting_CNN_v1.hdf5, available on Zenodo) into project/outputs/cell_painting/checkpoint/.
Run feature extraction:
deepprofiler --root=/path/to/project --config=cell_painting_cnn.json --exp=cell_painting --gpu=0 profile
Extracted features are written to project/outputs/cell_painting/features/.
Raw microscopy images often have uneven illumination — the centre of the field is brighter than the edges due to the optical path. DeepProfiler can correct for this and compress images to 8-bit PNG before profiling. Both steps are optional: you can profile directly from raw TIFFs, but preparation improves feature quality and speeds up repeated runs on the same dataset.
What prepare does:
-
Illumination statistics — for each plate, DeepProfiler scans every image and builds a per-channel pixel histogram and a mean image. It then fits a smooth illumination correction function (a median-filtered version of the mean image, following Singh et al. 2014) and saves it to
project/outputs/intensities/. -
Compression — each raw image is divided by the correction function, histogram-stretched to 8-bit, downscaled (optional), and saved as PNG to
project/outputs/compressed/images/. The config then points profiling at these PNGs instead of the raw TIFFs.
When to use it:
- Recommended for large experiments (hundreds of plates) where illumination variation between plates or within plates is substantial, or where disk I/O is a bottleneck.
- Skip it for small pilot experiments or when your images have already been illumination-corrected upstream (e.g. by CellProfiler's
CorrectIlluminationApplymodule).
Running preparation:
deepprofiler --root=/path/to/project --config=config.json --cores=8 prepare
--cores controls the number of parallel worker processes (default: all CPUs).
Preparation is CPU-bound and benefits from parallelism — one worker processes one plate at a time.
In your config, set prepare.compression.implement to true to enable compression and point profiling at the compressed images automatically:
"prepare": {
"illumination_correction": {
"down_scale_factor": 4,
"median_filter_size": 24
},
"compression": {
"implement": true,
"scaling_factor": 1.0
}
}down_scale_factor controls the resolution at which the mean image is computed (4 = quarter resolution, which is sufficient to capture the illumination gradient).
median_filter_size is the diameter of the smoothing disk in pixels — larger values produce a smoother correction at the cost of computation time.
scaling_factor controls spatial downscaling of the output PNGs (1.0 = no downscaling).
For very large datasets, the metadata index can be split into parts and profiled in parallel across multiple machines or jobs:
deepprofiler --root=/path/to/project --config=config.json split --parts=10
This writes index-000.csv through index-009.csv alongside the original index.csv.
Each part is then profiled independently:
deepprofiler --root=/path/to/project --config=config.json --exp=cell_painting --gpu=0 profile --part=0
deepprofiler --root=/path/to/project --config=config.json --exp=cell_painting --gpu=1 profile --part=1
...
Parts are split by plate/well, so each job processes a contiguous group of wells. Already-profiled images are skipped automatically (resumable runs), so parts can be restarted without re-processing completed images.
After installing, you can verify that the Cell Painting CNN checkpoint loads and produces features by running the integration test suite. This downloads the checkpoint from Zenodo (~80 MB) and runs a full end-to-end profiling pipeline on synthetic data:
uv run pytest -m integration -v
The integration tests check three things:
- The Zenodo checkpoint loads into the EfficientNet B0 architecture without error.
- The loaded model produces non-trivial feature vectors for random input crops.
- The full
Profilepipeline (checkpoint load → crop generation → feature extraction) writes a valid.npzoutput file.
Integration tests are excluded from the default test run (uv run pytest) to avoid network access in CI.
🚫 Removed in v0.5.1: Model training (
train,traintf2,export-sccommands) has been removed. If you need training, use thev0.3.0tag. A PyTorch-based training pipeline is planned for v0.6.x.
🚫 Removed in v0.5.1: The plugin system for models, crop generators, and metrics has been removed.
🚫 Removed in v0.5.1: CometML integration has been removed.
Happy profiling!


