Realworld

State of the Art of AI for Development · LAB

Digital Product

Artificial intelligence is revolutionizing the way we develop digital products. In this post, we share insights from the Runroom LAB, "State of the Art of AI for Development" with Rafa Gómez Casas and Javier Ferrer González, founders of Codely, where we explore the state of the art of AI applied to development.

Conclusions and Learnings from the Runroom LAB

1. AI is advancing rapidly. Order is needed!

A fundamental conclusion is that the speed of AI advancements is overwhelming; they launch things that “blow your mind” every day, making it difficult to stay updated.
Javier and Rafa help us bring order and clarity to the feeling of being overwhelmed and not knowing how to assimilate so many new developments.

Role Change: The boom of agents (around March) caused a “mini-crisis” about the developer's role in the industry, forcing an attempt to stay “one step ahead.”
Forced Adoption: A radical change is observed in the business environment: previously, AI use was prohibited, and now it is mandatory to adopt it.
Accelerated Pace: Time in AI does not pass linearly; what happened in months equates to several years in “AI age.”

2. AI for Programming: Agents and Assistance

AI for programming is divided into three main areas: the ask mode (like ChatGPT), functionalities integrated into IDEs (such as contextual suggestions or generation of unit tests), and the most advanced, the AI Agents.

An agent is an autonomous system that determines by itself when to stop executing and concludes that the assigned task has been completed, demonstrating agency.

Integration Examples: They are being integrated directly into task management systems (like Linear with GitHub Copilot), where the agent takes the specification, attempts to implement it, creates a Pull Request (PR), and provides feedback in the task manager itself.
Low-Value Tasks: Agents are ideal for low-value or “zero romance” tasks (like adjusting copies or texts on a website). By speeding up these processes, they free up developer time.
Interaction Context: An agent (like Cursor) can take context from conversation threads (like Slack), understand the task (change a text), identify the correct file (a .json), apply the change, and create a PR, all autonomously.

The choice of agent type depends on the need:

Local Agents: Ideal for medium/large functionalities where rapid iteration and solution shaping are required, keeping the AI on a short leash.
Remote Agents (Background Agents): Allow the developer to disregard execution once the plan has been specified (uploading it to Linear or Jira), enabling parallelization of other tasks.
Parallel Models
An advanced technique being implemented is executing the same task in parallel with different models. This maximizes response time by allowing the selection of the fastest result or the one closest to the expected outcome. However, this also consumes more resources and tokens.

3. The Evolution of Prompting and Rules

Improvement of Model Intelligence: Although guides like Anthropic's recommend very specific and verbose prompting, models are becoming smarter over time (e.g., GPT5 Codex).
Model Trends: Models are becoming smaller and, at the same time, more intelligent.
Rules: A solution to the problem of constantly repeating coding conventions (like “don't use verbose comments” or “don't use mocks”) are rules. Cursor pioneered this, allowing rules to be defined in a Markdown file that the model applies dynamically if deemed relevant, based on the rule description.
Agent Standardization: The use of the agents.md standard is advocated for specifying these rules and conventions.
Context: It remains essential to optimize context usage; although models support a large volume of tokens, the attention window is limited, and overloading it can lead to undesirable agent behavior.

4. AI Applied to Product Features (MCPs and RAG)

The second pillar of the talk focuses on how to use AI to develop features within the application itself.

Model Context Protocol (MCP)

MCP is a protocol proposed by Anthropic that allows adding context (such as calendar access) to LLM models.

Primitives: The MCP standard defines not only Tools (tools for performing actions, the most used) but also Resources (data lists for the model to read) and Prompts (shared libraries of prompts).
Advanced Tools (Playwright): An MCP server of Playwright allows the LLM to inspect the DOM tree of a web page, launching a browser. This gives the model “eyes,” which is useful for generating frontend acceptance tests or automating web interactions.Intelligence in Invocation: The model demonstrates intelligence by
inferring and enriching the query before invoking the tool of the MCP server. This turns the LLM into a new entry point or distribution channel for business logic.RAG (Retrieval-Augmented Generation) and Vector SearchRAG

is the process of providing context to the system. The official guide recommends

limiting the context to the most relevant, because overloading the LLM with irrelevant information (like a large list of courses) makes it hallucinate and invent things.Context Separation: To optimize costs (reductions of 5% to 10%) and

caching, the static part should be left in the instruction part (which can be cached) and the variable part in the user input.Semantic (Vector) Search: To find the most relevant information (for example, related courses), vector search is used, which is a
query to the database based on the cosine distance between vectors (embeddings).The Concept of Vectors: Vectors store the semantic meaning of a text in multidimensional coordinates, where the proximity between vectors (calculated by distance) indicates the similarity of the concept.
Tools for Vectors: Extensions like
PG Vector for PostgreSQL are becoming popular, allowing experimentation with vector searches without immediate need for specialized infrastructure.5. Challenges and Final PerspectiveThe non-deterministic nature of LLM models (same

input

does not guarantee the same output) forces a change in testing strategies that were used until now.Green Ecosystem: The ecosystem of tools for

evaluation and observability of models in production (such as evaluation frameworks, local model servers like Olama, or Behavior-Driven Development solutions) is in its early stages and “quite green.”Loss of Control: It is normal for developers to feel

uncertainty and insecurity or a “sense of loss of control” in the face of these advancements.In short, the conclusion is that it is useless to ignore the situation and the best action is to seek order and clarity, updating knowledge. And what better way to do it than by following these two titans on their YouTube

channel!Youtube!

Nov 12, 2025

Annachiara Sechi

Head of Communications

Share:Linkedin/Bluesky