AI in BIM #02 – From Vision to Form – AI in Design Concepts
In our previous article, we showed how AI can help with architectural visualization. Today, we’ll discuss the application of AI in design concepts.
Author: Oliwia Prochowska
Edited by: Wojciech Jędrosz
The beginning of a project is a moment when ideas are still loose and elusive. Jotted down on a napkin, sketched in pencil, merely suggested in a few sentences. More and more artificial intelligence-based tools are attempting to generate something more from these incomplete visions. For example, a preliminary architectural concept, a 3D mass, or even a simplified BIM model. This technology is still maturing, but the direction of development is clear – automation will increasingly boldly support the conceptual stage. In recent months, many solutions have appeared on the market. They share a common goal: to transform vague design assumptions into something that can be immediately viewed, tested, and further developed.
Is it possible to create an object model… by just typing a few sentences? Or transform a simple sketch into a preliminary concept view? In this post, we examine precisely such possibilities. This is the second article in our AI in BIM series. In it, we explore how artificial intelligence influences subsequent stages of the construction investment lifecycle.
Publications will appear periodically, so we encourage you to subscribe our newsletter.
Generating visual concepts from text and sketches
Stable Diffusion and MidJourney open up new possibilities for creating concepts in architecture. They enable the generation of conceptual images based on texts (so-called prompts) or sketches. Although these tools can be classified as rendering, they are included in this post because, in our opinion, their capabilities go beyond classic visualization creation. Instead of merely reproducing existing models, they allow for rapid generation of entirely new concepts based on texts, sketches, or photos.
In the context of BIM, MidJourney can be particularly useful for quickly generating conceptual visualizations, even before full 3D modeling in programs. It can also support the exploration of different design variants, for example, in terms of facade materials used, creating atmospheric references for presentations, and even generating material inspirations or facade details. What’s more, the tool allows working based on sketches, which can be helpful in the iterative process and early design phase.
Regarding language, MidJourney works best with prompts in English. You can try entering descriptions in other languages, but the results may be less precise. Therefore, for best results, it is recommended to use English terms. To obtain the best architectural visualizations in MidJourney, it is worth focusing on three key aspects:
- “medium” (execution technique),
- “subject” (object),
- “environment” (surroundings).
A well-chosen tool is the key to success
A well-chosen “medium” allows you to achieve a specific image style. For example, for photorealistic visualizations, it is worth using phrases such as “photorealistic render,” “large format camera photograph,” or “vintage film camera.” If we want a conceptual sketch, we can use “hand-drawn line sketch” or “watercolor illustration,” etc. It all depends on the effect we want to achieve. Another important element is a precise description of the building itself – its style, materials, form, and architectural details. The last key aspect is the environment – that is, a description of the weather, landscape, or urban context.
MidJourney offers an image editing function, allowing users to select specific areas and add details. This allows for gradual refinement of visualizations by making precise changes in selected places. For example, changing facade materials, adding architectural details, or adjusting lighting. It enables gradual improvement of the concept without having to regenerate the image from scratch. You can, for example, generate the general form of a building, and then refine the facade details, change the type of windows, or add landscape elements.


An image of a building facade created by Hassan Ragab in Midjourney. Subsequently automatically transformed into a 3D mesh using Kaedim. This is an AI-powered tool in architectural visualization that converts 2D images into 3D models.
Stable Diffusion: greater control over the image creation process
Similar to MidJourney, Stable Diffusion allows generating images based on texts, photos, or sketches. Unlike MidJourney, however, it is an open-source tool, which opens up wide possibilities for customization and integration with other design environments. Its potential in the context of architecture and conceptual design lies primarily in greater control over the image creation process – both in terms of style and compositional structure.
Stable Diffusion operates on a so-called diffusion model – it generates an image from nothing, literally from random “noise,” which is gradually cleaned and transformed into a realistic vision based on a text description (prompt). The entire process resembles “developing” an image from an abstract smudge – with each iteration approaching a complete composition.
Importantly, the user can have a real influence on this process. By selecting the “sampling method” (i.e., the algorithm for generating subsequent image steps) and the number of “sampling steps” (stages of “noise cleansing”), you can adjust the pace, quality, and character of the final effect. The more steps, the more detailed and refined the image – though at the cost of longer generation time. Different sampling methods (e.g., Euler, DPM++, LMS), on the other hand, affect the visual style, sharpness of contours, or the way the form is interpreted. This makes Stable Diffusion particularly useful for designers who want to maintain control over every stage of vision creation.
Add-ons developing AI capabilities in architectural visualization
Additionally, thanks to an extensive ecosystem of extensions – such as ControlNet (enabling precise control of the compositional layout), inpainting (i.e., editing selected fragments), and the ability to train custom models – this tool is gaining increasing importance in the context of conceptual architecture. The ability to work based on sketches, reference photos, or technical drawings allows designers to iteratively develop an idea without having to recreate it from scratch.
Stable Diffusion can support the early design stage. It offers the use of visual references to explore massing variants or test the aesthetics and form of a building – before proceeding to actual 3D modeling. Working based on your own sketches or technical drawings allows designers to gradually refine the spatial vision of the project.
Although the application does not directly generate BIM geometry or parameters, it can significantly accelerate the transition from an abstract idea to a concrete design direction. For many design teams, this can be a valuable tool, especially where quick response, experimentation with form, or the creation of presentation materials at an early stage of work are important.


Stable Diffusion – interface

Improving image elements generated with Stable Diffusion in inpaint mode
Another interesting tool is Leonardo AI – it can be classified somewhere between a concept generation tool and a visualization tool. Created in 2022, it is constantly evolving, giving designers a helping hand with flexible image editing. Leonardo offers great creative freedom thanks to the available styles to choose from or the ability to create your own stylistic models. This allows for precise matching of the image aesthetics to the character of the project.
Leonardo AI: Flexible image editing and its limitations
In practice, Leonardo proves useful at the stage of rapid prototyping of spatial forms, creating mood boards, or architectural variants. Its functions, such as inpainting – precise editing of selected image fragments – enable iterative work on detail and allow correcting concepts without having to regenerate the whole from scratch. Importantly, Leonardo also allows working based on previously loaded graphics, which opens up wide possibilities for styling sketches or testing alternative versions of the same idea.
It is worth emphasizing, however, that despite advanced options, Leonardo does not always cope with full spatial consistency. Generated images may contain visual artifacts, illegible details, and in more complex forms – logical errors (e.g., asymmetrical windows, impossible material connections, strange roof coverings). This means that the generated “vision” does not always go hand in hand with architectural “legibility.”
Despite these limitations, we will observe the further development of these tools with curiosity – if they manage to improve spatial consistency and details, they may become a permanent support in the design and conceptual process.

Generating 3D models from text and sketches with AI in architectural visualization
Although tools for generating visual concepts greatly support the early stages of design, allowing for quick testing of aesthetics, atmosphere, or compositional layout – they do not replace 3D modeling. Their strength lies in the image: suggestive, inspiring, but still two-dimensional. And it is at this stage that another promising area of AI application in architectural visualization opens up. We are talking about generating three-dimensional models based on text or a sketch. This is where it gets really interesting. At this stage, not only the aesthetic image matters, but also space, proportion, and structural logic.
Although translating words into precise geometry is still a challenge, more and more companies are experimenting with AI language models generating 3D models based on text descriptions. The idea is simple – just type a few words, and the algorithm will create a ready-made spatial object.
Lesser known tools at your fingertips
The market offers many interesting tools that transform text into geometric forms. Interesting examples are (mentioned earlier) Kaedim and Sloyd.ai. Although created mainly for computer games, they can also be used to create conceptual solids and forms in architecture. Kaedim is based on analyzing a 2D image – the user uploads a sketch or render, and the system, using machine learning models, transforms this image into a simplified 3D model.
In practice, the effects vary: from surprisingly accurate interpretations to forms that require further refinement. Interestingly, the generation process is not fully automated. Initially generated 3D models first go to the Kaedim graphics team. They manually correct and refine them before making them available to the user. It is this human stage – somewhat in the background – that gives the models greater consistency and aesthetics, but also makes the tool work more like a hybrid of technology and service than a fully standalone AI in architectural visualization. This solution works primarily where there is a need to quickly materialize a general 3D concept.

A photograph of early modernist architecture, automatically converted into a 3D mesh using Kaedim. Artificial intelligence that transforms 2D images into 3D models.
Sloyd.ai and the future of 3D model generation
Sloyd.ai works on slightly different principles than Kaedim. Instead of transforming the final image into a model, it allows building spatial forms from scratch. It’s based on a short text description, a reference image, or a ready-made template. The intuitive interface available is important in this process: the user selects a basic shape. Then they can modify it using sliders, changing proportions, details, or the arrangement of elements. This approach is more like rapid conceptual modeling known from games than classic 3D design.
Generated models are light, simplified, and highly stylized – which is natural, given that Sloyd was designed with games in mind. It’s not about reproducing details or structural precision, but about the ability to quickly test spatial ideas. The tool allows translating an abstract idea into a form that can be easily viewed, rotated, and further modified in a few minutes.
Although Sloyd does not generate models ready for direct use in a BIM environment, it can play a valuable role in the design process – especially at its initial stage. It is a tool for quickly examining massing, proportions, and spatial relationships. It acts like a three-dimensional sketchbook – but precisely. Thanks to this simplicity, it can be an interesting tool for the designer.
Kaedim or Sloyd – new developments in AI in architectural visualization
Tools like Kaedim or Sloyd can significantly accelerate the development of spatial forms. Based on uploaded sketches, images, or ready-made models, which can then be modified to a certain extent.

Sloyd.ai – interface
As an interesting point at the end, it is worth mentioning scientific works and solutions appearing on the horizon. It push the boundaries even further – allowing the creation of 3D models solely based on a text description, without sketches, without templates, without preliminary forms. These include experimental but promising systems like DreamFusion (Google) or ClipForge (NVIDIA), or CFD (from independent research teams).
These experimental tools generate 3D models solely based on a text description. They combine 2D imaging techniques (including the use of Diffusion Models) with so-called NeRFs – a representation of a scene in the form of a 3D light field. In practice, this means that the user types, for example, “futuristic pavilion with a concrete roof and a glazed facade,” and the system creates a three-dimensional mass from it, which can be further used. Although these tools still function mainly in research environments. It do not yet have a public version. Their potential clearly heralds a revolution – the moment when 3D design will begin with a word, not a line.
AI tools in architectural visualization are becoming increasingly useful . They are being used in conceptual design to reflect the boldest visions of designers. Although many of them are still in a dynamic development phase.Adhering to the “move fast and break things” principle, their potential today already has a chance to influence the workflow of early design stages. Instead of looking for one “right” form, the designer has the opportunity to test different directions. He can examine proportions, rhythms, and juxtapose different styles.
Summary
The ability to instantly generate an image or mass based on a loose idea, verbal description, or sketch fundamentally changes the creative process. Thanks to AI tools, it is possible to arrive at solutions more quickly that would take hours – or even days – of work in the design process. In this context, AI also provides space for making mistakes, experimenting, and visual “thinking aloud.”
It is worth emphasizing that using such tools does not have to mean ceding control to the machine – quite the opposite. The most effective use of AI will be achieved by engaging in a dialogue with it. Clarifying intentions, defining results, and refining details. The user sets the direction – AI merely expands the scope of what can be quickly seen.
Of course, generative models still have their limitations. The generated forms can be unclear, simplified, and sometimes even incorrect from the point of view of construction logic. But even then, they can help spark imagination, formulate assumptions, or conduct conversations with the team and client. Instead of starting from a blank sheet, we start from an image – even if it requires later correction.
In the next parts of this series, we will look at how AI supports further stages of design. From creating more complex BIM models, through analysis stages, to tools supporting building management. It is precisely where vision meets data, and form meets function, that the role of artificial intelligence becomes particularly intriguing and full of untapped potential. Stay tuned for subsequent posts!