AI in Stochastic Environments

Navigating under uncertainty: Adapting AI techniques to stochastic setting

By Pranesh Srinivasan

Abstract: Stochastic environments, with their unpredictable nature, might seem like a daunting problem. However, with techniques like problem decomposition, preventing overfitting, joint losses, these systems can navigate the unpredictable waters with increasing adeptness. As the field continues to evolve, we can anticipate even more robust solutions to emerge, making things a little more navigable.

What are stochastic environments?

At its core, a stochastic environment[1] is one in which outcomes cannot be predicted with complete certainty due to random factors. Stochastic settings can vary tremendously in complexity. At one end, are simple Bernoulli processes like flipping a coin where the underlying factors are independent, orthogonal and exhibit stationarity. As another example, consider masked language modeling, which is used in training several language models including BERT[2].

At the other end is predicting stock price movement in the stock market. Predicting stock prices or volumes is notoriously difficult because of the myriad of variables at play, many of which can change suddenly and without warning. External factors, such as geopolitical events or natural disasters, can also drastically impact prices in unpredictable ways. These events can cause sudden changes in demand or supply, which can lead to price fluctuations. Other examples of difficult to model processes include predicting weather conditions and self-driving cars. This is because these processes are complex and involve many factors that can change over time.

Challenges for AI & ML techniques in Stochastic Settings

In addition to inherent uncertainty, applying AI & ML techniques is challenging in stochastic settings.

  • Non-Stationarity: The underlying distribution may not exhibit stationarity. As an example, the past (last week’s stock price in the stock market) does not always predict the future (tomorrow’s stock price)[3].
  • Explore-Exploit conundrum: In many environments, exploration may be required to sample new regimes. In stochastic environments, it’s challenging to determine whether a particular action led to a reward (or lack thereof) due to inherent randomness. Further, some settings[4] only allow for a limited number of actions to be taken, thus, leading to a conundrum on exploring new possibilities versus exploiting already known good paths.
  • Evaluation Challenges:
    • Overfitting: There’s a risk that ML models might overfit to the random noise rather than capturing signal in a consistent manner. Overfitted models perform poorly on new, unseen data.
    • In indeterministic settings, evaluating a model’s performance is fraught with inherent randomness, making consistent evaluation and comparison harder.
  • Capturing State
    • The state of a system may not be easily definable. The butterfly effect is an example of this in the context of weather.
    • Even when such a hidden state exists, it can be far-reaching and difficult to represent. For example, in self-driving cars, cars on the other side of the road become relevant when there is construction.

Techniques to alleviate challenges

Even with these challenges, there are several ways to apply Generative AI techniques in stochastic areas. While an exhaustive list is prohibitive, we list some relevant techniques below:

Problem Decomposition: Perhaps the most important strategy is to decompose the problem. In many cases, the task can be broken down into a stationary (or less-stochastic) part and a highly stochastic part. For the stationary sub-problem, existing techniques can work well including large data techniques like LLMs. The stochastic sub-problem thus reduces to a smaller sub-problem which can be solved with simpler, explainable models.

Consider as a concrete example, predicting the trajectory of a robot in a warehouse in the presence of other agents. Stationary parts include perception and object detection, thus allowing the stochastic models to focus on path planning. Path planning is a more approachable (albeit still complex) problem when the state of other agents can be represented in a fixed lower dimensional representation of co-ordinate grids and current velocities.

Learning with limited data:  Performance in Limited data regimes can be improved by reducing overfitting, data augmentation and predicting a range of outcomes:

  1. Preventing overfitting: Regularization is key to preventing overfitting. Techniques including having dropout, early stopping can help prevent overfitting.
    1. One specific common example of early stopping includes limited fine-tuning. Leveraging pretrained models on vast datasets, and then fine-tuning them for specific tasks in stochastic environments can help.
    2. An extreme case of this is  Few-shot and Zero-shot Learning. For situations where data is sparse, training models to make predictions based on very few examples (or even none, in the case of zero-shot) can be beneficial.
    3. Joint-loss objectives can help performance by training the generative model to minimize the loss between its output and the data on different timescales or regimes.
  2. Data Augmentation: Generating synthetic data by introducing variations can help models learn more robust decision boundaries. It can be also used to create a more balanced dataset, which can be helpful for models that are sensitive to class imbalance through techniques like SMOTE[5].
  1. Ensemble learning: Going one step further, models can also be used to predict a range of outcomes. By using multiple models (or multiple runs of a model) and aggregating their outputs, one can reduce the variance in predictions, making the model more robust to the randomness of stochastic environments.

Adapting to changing environments: Adaptability is key to deploy AI models in stochastic environments.

  1. Online Learning: In non-stationary stochastic environments, online learning techniques that continuously update the model as new data comes in can be particularly valuable. These include RL-based techniques[4] like Q-Learning which can be immensely valuable in unpredictable settings as the agent can iteratively learn optimal strategies through trial and error. Closely related are meta learning techniques like REPTILE[6] that allow the model to adapt quickly to new tasks from a similar distribution.
  1. Interpretable models: Confidence in Decision Making can arise from having interpretable models[7]. Especially in critical stochastic settings, having models that provide interpretable predictions or reasons can help in trust and further refinement.
  1. Using simpler models for time-varying distributions: This is because time-varying distributions can be very complex, and it can be difficult for a generative model to learn them accurately. Using simpler models can help to reduce overfitting and improve the accuracy of the generative model.

Leveraging Retrieval as hidden state: One approach to navigating such environments is by leveraging retrieval mechanisms and vast databases. For instance, in finance, databases containing recent news, historical stock prices, trading volumes, and related economic indicators can be invaluable. Retrieval also helps shrink model size (example: RETRO[8]), which in turn may have benefits in avoiding overfitting. Retrieval augmentation helps understand the context of the current input and make a more accurate decision.


In conclusion, while challenges abound in applying AI techniques in stochastic environments, the toolkit to combat these challenges is also vast and growing. Avoiding Overfitting through problem decomposition and regularized techniques remain key tools in the arsenal of approaches.


[1]: Durett, R. (2011). Essentials of Stochastic Processes

[2]: Devlin, J., et al (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

[3]: Fama, E (1965). The Behavior of Stock-Market Prices

[4]: Sutton, R. and Barto, A (2018): Reinforcement Learning: An Introduction (second edition)

[5]: Chawla, N.V. et al (2002): SMOTE: Synthetic minority over-sampling technique 

[6]: Nichol A, et al (2018): REPTILE: A scalable meta-learning algorithm

[7]: Zhuang H, et al (2020): Interpretable Learning-to-Rank with Generalized Additive Models

[8]: Borgeaud S, et al (2022): Improving language models by retrieving from trillions of tokens


Pranesh Srinivasan is a Senior Staff Software Engineer working at Google. He works on large scale NLP modeling and infrastructure challenges in Featured Snippets and Search Quality. Previously, he was a Quant at Goldman Sachs on Program Trading where he worked on models for pricing portfolios under uncertainty and adversarial risk. Pranesh holds a Bachelor’s and Master’s in Computer Science from IIT Madras.

You may also like...