# News

(2019, April 24th) Initial release including 1 million CAD models for step, parasolid, stl and meta formats.
(2019, September 29th) FeatureScript file format added.

# Dataset

We introduce ABC-Dataset, a collection of one million Computer-Aided Design (CAD) models for research of geometric deep learning methods and applications. Each model is a collection of explicitly parametrized curves and surfaces, providing ground truth for differential quantities, patch segmentation, geometric feature detection, and shape reconstruction. Sampling the parametric descriptions of surfaces and curves allows generating data in different formats and resolutions, enabling fair comparisons for a wide range of geometric learning algorithms. As a use case for our dataset, we perform a large-scale benchmark for estimation of surface normals, comparing existing data driven methods and evaluating their performance against both the ground truth and traditional normal estimation methods.

### Authors

Koch, Sebastian and Matveev, Albert and Jiang, Zhongshi and Williams, Francis and Artemov, Alexey and Burnaev, Evgeny and Alexa, Marc and Zorin, Denis and Panozzo, Daniele

### Acknowledgements

We are grateful to Onshape for providing the CAD models and support. This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise. Funding provided by NSF award MRI-1229185. We thank the Skoltech CDISE HPC Zhores cluster staff for computing cluster provision. This work was supported in part by NSF CAREER award 1652515, the NSF grants IIS-1320635, DMS-1436591, and 1835712, the Russian Science Foundation under Grant 19-41-04109, and gifts from Adobe Research, nTopology Inc, and NVIDIA.

### Paper/Citation

Please cite our paper if you use the ABC dataset.

@InProceedings{Koch_2019_CVPR,
author = {Koch, Sebastian and Matveev, Albert and Jiang, Zhongshi and Williams, Francis and Artemov, Alexey and Burnaev, Evgeny and Alexa, Marc and Zorin, Denis and Panozzo, Daniele},
title = {ABC: A Big CAD Model Dataset For Geometric Deep Learning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}


• The authors give no warranties regarding the dataset.

### Versions

The dataset is versioned to accommodate for future updates of some of the file formats. Currently, the latest version for all file formats is version v00 (marked by the suffix of the data chunks).

### File Formats

Each model in the dataset consists of the following file formats: Note: Not all file formats are available for all models due to processing errors, etc. The formats that are always available are meta, step, para, stl2.

Format Description Name Example
Meta Meta information of the original Onshape documents in form of a yaml document. It lists the author and various other meta information of the CAD model. meta yml
Step Step file format containing the parametric boundary representation (converted from Parasolid), see format description. step txt
Parasolid Parasolid format containing the parametric boundary representation, see format specification. para zip
Stl parts STL models with parts stored separately in single files zipped together. The triangulation of the stl models is dense but the triangles have arbitrary shapes with possibly extreme interior angles. stl 7z
Stl complete STL models with single parts merged together as one complete model. The triangulation of the stl models is dense but the triangles have arbitrary shapes with possibly extreme interior angles. stl2 binary
Features Feature description in form of a list of surface patches and curves according to the description in the supplementary material. Vertex/Face indices correspond to the elements in the object file. Note: Vertices are 0-indexed here, whereas in the object file vertices are 1-indexed per default. feat yml
Object Obj models with ground truth normals and curvature values at each vertex. The vertex and triangle indices correspond to the indices in the features file. Note: Vertices are 1-indexed here, whereas in the features file vertices are 0-indexed. obj txt
Images Renderings of the objects from canonical viewpoints that are produced with the processing pipeline. img png
Statistics Statistical information about the CAD model (in parametric boundary representation) as well as the generated object file (triangle mesh). stats yml
FeatureScript Original FeatureScript definition of the CAD model from Onshape. Represents the generation process of the CAD model. ofs yml

### Chunks

The dataset is split into compressed chunks of 10000 models according to the file formats. The following files list the urls for all file formats as well as for the single file formats for all chunks:  all_v00meta_v00para_v00step_v00stl2_v00obj_v00feat_v00stat_v00ofs_v00

The chunks can then be downloaded for example with wget or curl. The following command downloads all meta file format chunks with 8 requests (maximum) in parallel into the folder meta (which has to be created before):

cat meta_v00.txt | xargs -n 2 -P 8 sh -c 'wget --no-check-certificate $0 -O meta/$1'


or alternatively with curl:

cat meta_v00.txt | xargs -n 2 -P 8 sh -c 'curl --insecure -o meta/$1$0'


sed 'NUMq;d' meta_v00.txt | xargs -n 2 sh -c 'wget --no-check-certificate $0 -O meta/$1'


This downloads the NUMth chunk of the meta type to the folder meta. Alternatively with curl the command looks like:

sed 'NUMq;d' meta_v00.txt | xargs -n 2 sh -c 'curl --insecure -o meta/$1$0'


### Checksums/Sizes

We provide md5 checksums and filesizes (in bytes) for all compressed chunks in the following yaml files to check for archive integrity:   md5size

# Benchmarks

Based on a subset of the CAD models from the ABC dataset, we provide the following benchmarks for the task of surface normal estimation.

### Normal Estimation

The surface normal estimation benchmark consists of meshed CAD models with fixed amounts of vertices (512, 1024 and 2048) and ground truth vertex normals derived from the parametric boundary representation. There is one benchmark for patches (parts) of CAD models and one for full CAD models. The benchmarks come in different sizes (10k, 50k and 100k) and are split into training and testing data. To obtain the 100k benchmark, you need to download the 50k and 100k archives.