Collection of several classifiers for 3D mesh objects using various vision and language models.
pip install -r requirements.txtThe classifiers require four pre-trained models. Download them to a models/ directory in the parent folder:
.
├── 3D_classification/ (base directory)
│ ├── README.md
│ ├── ImageClassifier.py
│ ├── VLLMClassifier.py
│ ├── LlamaMeshClassifier.py
│ └── ...
└── models/ (create this directory)
├── convnextv2-large-22k-384/
├── Qwen3-VL-8B-Instruct/
├── LLaMA-Mesh-model/
└── clip-vit-large-patch14/
Uses Qwen Vision-Language model for multi-modal classification.
from VLLMClassifier import VLLMClassifier
classifier = VLLMClassifier(device="cuda")
label = classifier.classify_one("path/to/mesh.obj")
# Command-line usage
python VLLMClassifier.py --file path/to/mesh.obj
python VLLMClassifier.py --dir path/to/meshes/ --output results.json
python VLLMClassifier.py --dir path/to/meshes/ --num-views 8 --limit 100 --output results.jsonCLI arguments (python VLLMClassifier.py ...):
--file(str): Path to a single mesh file to classify. Mutually exclusive with--dir.--dir(str): Path to a directory of mesh files to classify. Mutually exclusive with--file.--output(str, optional, default:None): Output filename for batch classifications. Saved underclassifications/.--limit(int, optional, default:None): Limit number of files in batch mode.--device(str, default:cuda:0): Device to use for inference.--model(str, default:../models/Qwen3-VL-8B-Instruct): Path to the VLM model directory.--num-views(int, default:12): Number of rendered views per mesh.--resolution(int, default:1024): Rendering resolution in pixels.
Arguments:
model_name(str, optional): Path to Qwen VL model. Defaults to../models/Qwen3-VL-8B-Instructdevice(str): Device to usenum_views(int): Number of rendered viewsresolution(int): Resolution of rendered views
Uses ConvNeXt V2 to classify rendered mesh views.
from ImageClassifier import ImageClassifier
classifier = ImageClassifier(device="cuda")
label = classifier.classify_one("path/to/mesh.obj")
# Or batch classification
results = classifier.classify_batch(
folder_path="path/to/meshes/",
save_path="results.json"
)Arguments:
model_name(str, optional): Path to ConvNeXt model. Defaults to../models/convnextv2-large-22k-384device(str): Device to use ("cuda", "cpu", etc.)num_views(int): Number of rendered views per mesh (default: 12)resolution(int): Resolution of rendered views (default: 1024)
Uses LLaMA-Mesh for direct mesh understanding.
from LlamaMeshClassifier import LlamaMeshClassifier
classifier = LlamaMeshClassifier(device="cuda")
label = classifier.classify_one("path/to/mesh.obj")
results = classifier.classify_batch(
folder_path="path/to/meshes/",
save_path="results.json"
)Arguments:
model_name(str, optional): Path to LLaMA-Mesh model. Defaults to../models/LLaMA-Mesh-modeldevice(str): Device to usemax_new_tokens(int): Maximum generation tokensmax_input_tokens(int): Maximum input tokens
Computes CLIP embeddings for labels to support evaluation.
from LabelProcessor import LabelProcessor
processor = LabelProcessor()
embedding = processor.compute_embedding("cat")
similarity = processor.compute_similarity(embedding1, embedding2)Use EvaluationManager to evaluate classifier results:
from EvaluationManager import EvaluationManager
evaluator = EvaluationManager()
# Overall accuracy
accuracy = evaluator.accuracy("predictions.json", similarity_threshold=0.8)
# Per-class accuracy
evaluator.class_accuracy("predictions.json", similarity_threshold=0.8)The models do not currently handle unsupported formats gracefully.
These formats work with a minimal trimesh install (trimesh + numpy):
glb,gltfstlplyobjoffdxf(ASCII only, 2D geometry)xyz(point clouds)
Some formats need additional packages installed:
| Format | Extra Dependencies |
|---|---|
3mf |
lxml, networkx |
3dxml |
lxml, networkx, Pillow |
dae, zae |
lxml, Pillow, pycollada |
step, stp |
cascadio |
xaml |
lxml |
svg |
svg.path |