A Universal AI Interface

The Universal AI App?

Cursor and similar IDEs are close to the ideal universal interface for AI interaction, at least for programmers, but currently, its capabilities are limited primarily to text and image generation. By incorporating video, real-time audio, and mobile-friendly interfaces, we can evolve the cursor into a truly universal interface for AI interaction.

The Vision

All personal computing built on top of a few core media types.

Text processing and generation
Video handling (both stored and real-time)
Image manipulation and generation
Real-time audio processing and voice input
Mobile-first and wearable interaction patterns (glasses, watches)

This vision particularly impacts creative tools and workflows. Traditional design tools like Figma may face disruption as AI enables technical users to directly translate their ideas into designs. The distinction between technical and non-technical creators begins to blur, as AI-powered interfaces make sophisticated design and development capabilities more accessible to those with domain expertise.

Going (realtime) multiplayer

While current AI-powered IDEs like Cursor excel at single-user interactions—essentially creating a duel between programmer and AI—many creative workflows demand richer collaboration. The Git-style collaboration model, though effective for code, doesn’t translate well to more fluid creative processes like design critiques or brainstorming sessions.

The platform needs to support:

Real-time Creative Collaboration:
- Synchronous multi-user workspaces
- Fluid feedback loops for creative processes
- Context-aware collaboration modes for different disciplines
Organic Data Generation:
- User corrections and feedback become training signals
- Natural workflow produces high-quality training data
- Domain experts’ interactions create specialized datasets
Collaborative Knowledge Building:
- Social motivations (community recognition, status)
- Professional growth opportunities
- Optional monetization for specialized expertise
Quality Control Systems:
- Peer review mechanisms
- AI-assisted verification
- Expert validation workflows

The core remains a dialogue between users and AI, but now operating in a truly multiplayer environment where artifacts (code, media, documents) can be collaboratively refined. Modern AI systems have established the foundational patterns—the next step is making them work at scale across teams.

Building a Self-Improving Ecosystem

The key to making this system truly powerful lies in creating a self-reinforcing, self-improving loop. This requires:

A developer and power-user friendly toolset
Deep system-level integration possibilities
Incentive structures that reward continued engagement and improvement
Model Specialization Management: As AI models become increasingly specialized, the platform must:
- Handle varying model capabilities and limitations effectively
- Provide seamless routing to appropriate models based on task requirements
- Manage feature enablement/disablement across different models
Cross-Model Standardization: Creating consistent interfaces across divergent model capabilities
Voice-First Integration: Emphasizing voice input as a primary interaction method for broader consumer adoption

Data Quality and Trust

To maintain high-quality interactions and trust in the system:

Expert Contributions: Special workflows for domain experts to contribute specialized knowledge
Feedback Loops: Systems for users to improve and correct AI outputs
Knowledge Attribution: Clear tracking of knowledge sources and contributors

The platform’s social components will be crucial:

A marketplace for tools and extensions
Collaborative workspaces
Multiplayer AI interaction capabilities
Real-time collaboration features

Product Architecture

The ideal implementation would combine:

Cursor-style interface mechanics
Real-time collaboration capabilities
Flexible multi-model integration
Marketplace functionality for:
- User-created agents and tools
- Model deployment and feedback
- Preference data collection and comparison across models
- Specialized model deployment with clear capability definitions
Developer-friendly APIs
Cross-platform presence (mobile, wearables, desktop)
Voice-first interaction layer for consumer applications

Data Storage and Processing

Key considerations for the system:

Flexible “flat file” storage that avoids rigid structure
Ability to restructure and represent data in various modalities
Context-aware output formatting based on user needs
Seamless integration across devices and platforms

User Experience Focus

The platform emphasizes:

Professional-grade tools for power users
Highly polished, intuitive interfaces
Voice-first interaction capabilities
Multi-modal input/output support
Trust through transparency and open-source elements

This creates a universal platform that serves both developers and no-code builders, providing a comprehensive ecosystem for AI-powered creation and collaboration.

Raw

I'm going to try and summarize the startup idea the way I did yesterday. Basically, there's a marketplace where users utilize models, and the benefit they get is the use of multiple models. It's kind of like a professional-level tool. Part of this is a marketplace where really good users can create agents and deploy them for others to use. On the lab and model builder side, they receive feedback for preference data, both within their model and in comparison with other popular models. The open-source element of this would largely be trust, but the emphasis should really be on a highly polished user experience—kind of like a much better take on pull. From my experience driving across the country using AI, I've realized that what will make AI really powerful for consumers, in particular, is voice input. Very multi-modal input on whatever device they have, like glasses or a watch, is crucial It's about trying to be on every platform to be part of the user's life. This is where all input or output becomes relevant, and output should well-organized information in the right modality for user processing. Intermediate storage should be almost like a fact file for the user because when you start imposing structure, it detracts from the point of AI, which is its ability to restructure data in various ways. I think those are the thoughts I have so far, and I can put it together like a PRD or pitch memo for the product after this.

Another observation about this AI marketplace that I just talked about is that as models start to become divergent, they become good at different things, and it makes sense to take features out. For example GPT-4.0 is a general-purpose model at OpenAI, and the features it doesn't have... It doesn't have very good reasoning, but the reasoning model doesn't have access to the web or images. And then certain models like... I forget what it is. Anyway, we're getting to this point where even within a company, they're having to enable and disable features because they can't get one model to support everything. And then of course, you can extend this problem across companies. What you can offer is a bunch of standard interfaces that are really good. Like we talked about artifact-centric computing, and I think that theme is relevant here. But... No but, that's all for now.

So, the thing I'm thinking of building in software, I believe, can best be described as a collaborative pro suite for AI for people, almost a long tail of people using AI. It's different from Perplexity or U.com, which are very much targeted at consumers. I think even O is targeting too low of a bar, and I think we're getting to the point where models like O3 are more appropriate. Basically, the audience is the people who are willing to pay $200 for O3 because that's just how much these models improve them at their jobs.

I mean, I said creativity seems to be the way to go here. Why not product design and software design? Figma is probably at risk here because its audience is nontechnical but "technical" users are far more enabled.

In addition to the benefits I've laid out for a collaborative creative pro tool for AI and artifacts to work together on artifacts, the big thing that any of the single AI companies just won't able to do is multimodal. So, incorporating something from Claude and something else from GPT to have the best outcome, kind of like Cursor but better, seems to be the way to go here.