Procedurally generated synthetic BIM buildings for AI research and model training

AI-ready synthetic BIM datasets

Buildata creates fully synthetic BIM datasets for graph learning, machine learning and BIM language model workflows. Each package includes structured building data, semantic elements, relationships and training-ready records.

Core BIM graph ML-ready tasks LLM-ready instructions Graph learning exports

three layers in one dataset

01 Core BIM graph

Semantic building elements, metadata and relationships.

02 ML layer

Training-ready task records for prediction and benchmarking.

03 LLM layer

Instruction and reasoning samples for BIM AI workflows.

1 sample 1 complete building
3 export layers Graph, ML and LLM
AI-ready Structured for training

about the data

Synthetic by design

All Buildata catalog datasets are synthetic. Buildings are generated procedurally using BIM semantic rules based on IFC schemas, materials, property sets and spatial relationships.

Procedural generation

Generated from BIM rules, not from real project files.

No real-building data

Designed for public sharing and AI experimentation.

Built for AI

Structured for graph learning, ML pipelines and BIM LLMs.

what makes buildata different

More than synthetic BIM files

Buildata packages each building as structured AI-ready data, so teams can work directly with semantic graphs, benchmark tasks and training samples instead of starting from raw files.

For Graph Neural Networks

Use nodes, edges and topology directly in graph-based learning workflows.

For machine learning

Use generated task records for classification, prediction and validation experiments.

For BIM LLMs

Train language models on BIM instructions, reasoning and question-answering samples.

dataset layers

What a Buildata package can include

See full dataset structure
Core export

Metadata, nodes, edges, tasks, statistics and dataset card.

AI-ready layers

Graph exports, ML-ready tasks and LLM-oriented training samples.

Per-building packaging

Individual building samples for incremental testing and evaluation.

featured datasets

Explore AI-ready BIM datasets

See full catalog

Example instruction sample

{
  "instruction": "Identify the IFC entity from the BIM element properties.",
  "input": "Height: 3.2 m\nThickness: 0.2 m\nMaterial: Concrete\nObject type: Exterior Wall\nLoad bearing: true",
  "output": "IfcWall"
}

Compatible workflows

  • PyTorch and TensorFlow pipelines
  • PyTorch Geometric and DGL experiments
  • Hugging Face instruction tuning
  • BIM reasoning and QA workflows

start here

Download a free sample

Start with a compact sample dataset and review the packaging, statistics and dataset card before moving to larger collections.