Procedural generation
Generated from BIM rules, not from real project files.
Procedurally generated synthetic BIM buildings for AI research and model training
Buildata creates fully synthetic BIM datasets for graph learning, machine learning and BIM language model workflows. Each package includes structured building data, semantic elements, relationships and training-ready records.
three layers in one dataset
Semantic building elements, metadata and relationships.
Training-ready task records for prediction and benchmarking.
Instruction and reasoning samples for BIM AI workflows.
about the data
All Buildata catalog datasets are synthetic. Buildings are generated procedurally using BIM semantic rules based on IFC schemas, materials, property sets and spatial relationships.
Generated from BIM rules, not from real project files.
Designed for public sharing and AI experimentation.
Structured for graph learning, ML pipelines and BIM LLMs.
what makes buildata different
Buildata packages each building as structured AI-ready data, so teams can work directly with semantic graphs, benchmark tasks and training samples instead of starting from raw files.
Use nodes, edges and topology directly in graph-based learning workflows.
Use generated task records for classification, prediction and validation experiments.
Train language models on BIM instructions, reasoning and question-answering samples.
dataset layers
Metadata, nodes, edges, tasks, statistics and dataset card.
Graph exports, ML-ready tasks and LLM-oriented training samples.
Individual building samples for incremental testing and evaluation.
featured datasets
{
"instruction": "Identify the IFC entity from the BIM element properties.",
"input": "Height: 3.2 m\nThickness: 0.2 m\nMaterial: Concrete\nObject type: Exterior Wall\nLoad bearing: true",
"output": "IfcWall"
}
start here
Start with a compact sample dataset and review the packaging, statistics and dataset card before moving to larger collections.