Ashis Ghosh is the co-founder and CTO at Peanut Robotics.

getty
Discussions around artificial intelligence (AI) often center on model architectures, scaling laws and benchmark performance. The implicit assumption is that progress is driven primarily by larger datasets and more compute.
In practice, many real-world AI systems are built under very different constraints.
Having worked on systems that operate outside controlled environments, including deploying learning systems in commercial robotics through Peanut Robotics, I have found that training infrastructure often becomes the limiting factor long before model architecture does. This gap between controlled performance and real-world outcomes is widely observed. Studies have found that a large share of AI initiatives fail to reach production or deliver expected value, often due to challenges in deployment and integration rather than model capability. Similarly, research and industry analyses have shown that systems performing well in pilot or benchmark settings often degrade when exposed to real-world data variability, changing environments and operational constraints. The challenges are not just about improving accuracy. They are about how data is collected, how feedback loops are structured and how systems are updated under real-world conditions.
These constraints are especially visible in domains like robotics, but they increasingly apply to any AI system that interacts with dynamic environments.
Data Is Not Unlimited, And It Is Rarely Clean
In research settings, datasets are often static, curated and large-scale. Training pipelines assume that data can be sampled repeatedly with minimal cost.
In real-world systems, data is expensive to collect and often noisy. Each data point may require physical interaction, human labeling or system downtime. Distribution shifts are common because environments change over time, sometimes in subtle ways that are difficult to detect through standard validation pipelines.
This changes how training systems must be designed.
Instead of relying purely on scale, teams must prioritize data efficiency. The question becomes less about how much data is available and more about whether the right data is being collected. In practice, this leads to systems that emphasize filtering, prioritization and iterative refinement rather than bulk ingestion. Over time, the ability to identify high-value data becomes a core capability.
Training Is Often Interleaved With Deployment
In many real-world systems, training is not a separate phase. It is tightly coupled with deployment.
Models are updated based on new observations, and those updates must be validated quickly before being pushed back into production. This creates a feedback loop where data collection, training and evaluation are continuously interacting.
In robotics deployments, this loop is often driven by edge cases encountered during operation. A failure during execution becomes a training signal. That signal must be captured, processed and incorporated into the model without introducing instability into the system.
As a result, training infrastructure begins to resemble a continuous system rather than a discrete pipeline. Data flows from operation into training, from training into validation and back into deployment, often within tight time constraints.
Compute Constraints Change Model Design
Large-scale AI research often assumes access to significant compute resources. In production systems, especially those deployed on edge devices, compute is limited.
This constraint affects both training and inference in ways that are often underestimated.
Models must be designed with efficiency in mind from the beginning. Latency, memory footprint and power consumption all become part of the design space. Techniques such as quantization and model compression are not afterthoughts. They are necessary for deployment.
In many cases, the most effective systems are not the largest ones. They are the ones that balance capability with deployability, ensuring that performance gains translate into real-world impact.
Evaluation Must Reflect Real-World Performance
Benchmark performance remains a useful proxy for progress, but it does not always correlate with reliability in production environments.
Systems that perform well on curated datasets may fail under distribution shifts, sensor noise or unexpected inputs. This is particularly evident in systems that interact with physical environments, where small variations can cascade into larger failures.
As a result, evaluation must evolve.
Teams begin tracking how systems behave over time rather than in isolated tests. Stability across long horizons, sensitivity to environmental variation and the ability to recover from unexpected states become central concerns. In many cases, the most informative data comes from failures rather than successes.
Across deployments, there is a growing recognition that evaluation should be tied to operational behavior rather than static benchmarks. The industry is still early in defining consistent standards here, but the direction is becoming clearer.
A Shift Toward Integrated AI Systems
As AI systems move beyond static applications, training infrastructure is becoming more tightly integrated with deployment environments.
The boundary between training and inference is becoming less distinct. Data collection, model updates and system monitoring are part of a single continuous loop. This loop determines whether a system improves over time or gradually degrades as conditions change.
Building effective AI systems now requires designing for this loop explicitly. It is not enough to train a model once and deploy it. The system must be able to adapt, incorporate new information and maintain performance as the environment evolves.
What This Means For AI Leaders
For organizations building AI-driven products, the key challenge is not just model performance. It is how quickly and reliably the system can learn from real-world feedback.
Leaders should pay close attention to how data enters the system, how quickly models can be updated and how the system behaves when conditions deviate from expectations. The answers to these questions often determine whether an AI system can scale beyond initial deployments.
The Next Phase Of AI Development
The next phase of AI will be defined less by isolated model improvements and more by how systems are trained, updated and maintained in real environments.
Teams that understand this will build systems that improve continuously rather than degrade over time. The difference will not be visible in benchmark results alone. It will be visible in how systems perform after months of operation under real-world conditions.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

1 month ago
21













English (US)