Human intelligence has many facets. One is verbal intelligence, enabling us to communicate and connect with others through language. But perhaps more fundamental is spatial intelligence, allowing us to understand and interact with the world around us. Spatial intelligence also helps us create, and bring forth pictures in our mind's eye into the physical world. We use it to reason, move, and invent - to visualize and architect anything from humble sandcastles to towering cities.
We believe that artificial intelligence will help humans build better worlds. Progress has been rapid, but we have only seen the first chapter of the generative AI revolution. Language has thus far catalyzed this electrifying early moment, with text-prompted image and video models rising up alongside large language models (LLMs) as a harbinger of AI's potential in the visual realm. These models have already empowered people to work and create in new ways; but they only scratch the surface of what is possible. To advance beyond the capabilities of today's models, we need spatially intelligent AI that can model the world and reason about objects, places, and interactions in 3D space and time.
