8.62M

opeai presentation

1.

The Birth of ChatGPT and DOTA2
A journey to AI mastery.
Caterpillar: Confidential Green
Caterpillar: Confidential

The Significance of Dota 2
Why Dota 2?
Complexity and Challenge
Dota 2 isn't just any game. It's one of the most
complex and popular esports titles globally."
With millions of active players and professional
tournaments with prize pools exceeding $35 million,
it's a rich environment for AI research.
The game involves real-time strategy, long time
horizons, imperfect information, and a vast action
space.
These factors make Dota 2 an ideal tested for
developing and testing advanced AI systems.
Caterpillar: Confidential Green
Number of heroes: 125, each with 4-6 unique skills.

3.

4.

OpenAI's Ambition
OpenAI's Vision with Dota 2
Initial Goals
OpenAI aimed to push the boundaries of AI by
mastering a game that mirrors the complexity
of the real world.
Their goal was not just to play Dota 2 but to
achieve superhuman performance.
By choosing Dota 2, OpenAI could address challenges
like long-term planning and decision-making under
uncertainty, which are crucial in AI development.
150 Purchasable items and 58 Neutral items.
Caterpillar: Confidential Green

5.

The Journey Begins
From Research Group to AI Contender
Humble Beginnings
OpenAI started as a research group focused on opensource machine learning algorithms.
"They began with a small team of nine members, all
distinguished in the AI community.
Collaboration and Support
Early support came from industry leaders like NVIDIA
and Elon Musk, providing resources and funding.
This collaboration was crucial in scaling their
computational capabilities.
Caterpillar: Confidential Green

6.

The First Showcase
The International 2018: Making a Statement
Historic Match
In 2018, at The International—the biggest Dota 2
tournament—OpenAI introduced their AI to the world.
The AI, known as OpenAI Five, played against
professional players and stunned the audience.
Unexpected Victory
Despite skepticism, OpenAI Five defeated top players,
showcasing the potential of AI in complex tasks.
This was a pivotal moment that caught the attention of
investors and the tech community.
Caterpillar: Confidential Green

7.

Scaling Up
From Demonstration to Domination
Increased Investment
The success led to significant investments, including
$1 billion from Microsoft in 2019.
This funding allowed OpenAI to scale their operations
and computational resources.
Continuous Improvement
They continued to refine their AI, eventually defeating
the world champions, Team OG, in 2019.
This achievement marked the first time an AI system
beat human champions in an esports game.
Caterpillar: Confidential Green

8.

Technical Challenges
Overcoming the Technical Hurdles
Complex Environment
"Training an AI for Dota 2 involved dealing with a
massive state and action space.
The AI had to make decisions every 0.133 seconds,
considering thousands of possible actions.
Long Time Horizons
Games could last up to 45 minutes, equating to over
20,000 decision-making steps.
This required the AI to plan and strategize over
extended periods.
At each time step one of our heroes observes ∼ 16, 000 inputs about the game state
Caterpillar: Confidential Green

9.

Technical Challenges
Overcoming the Technical Hurdles part 2
Subtask decomposition: Instead of creating one large model that relates x, y,
z, OpenAI created separate simpler models relating x to y, y to z, x to z, …. and
so on to decompose the complexity of the problem.
Caterpillar: Confidential Green
Timescales and Staleness:
The breakdown of a rollout game.
Rather than collect an entire game
before sending it to the optimizers,
rollout machines send data in shorter segments.

10.

Reinforcement Learning
Understanding RL in Dota 2
Application in Dota 2
At the heart of OpenAI's success is reinforcement
learning—a trial-and-error method where agents learn
by interacting with the environment.
Agents receive rewards or penalties based on their
actions, guiding them toward better strategies.
RL allowed OpenAI Five to learn from millions of
games, improving its performance over time.
Shaped Reward Weights
Caterpillar: Confidential Green

11.

Proximal Policy Optimization (PPO)
PPO: Revolutionizing RL Training
Understanding PPO
Technical Insights
PPO is a policy gradient method for reinforcement learning
that balances ease of implementation with sample efficiency
and robustness.
PPO uses first-order optimization, making it computationally
efficient and scalable.
It simplifies the complexity of previous algorithms while
maintaining strong empirical performance.
Introduces a new clipped surrogate objective function to
constrain policy updates.
Caterpillar: Confidential Green
The algorithm relies on a clipped probability ratio to prevent
destructive updates.
It addresses the trade-off between exploration and exploitation
more effectively than previous methods.

12.

Advantages of PPO over TRPO in OpenAI's Training
Why PPO Outperformed TRPO in OpenAI's Dota 2 AI
Computational Efficiency
Scalability and Flexibility
TRPO requires solving a constrained optimization problem
involving second-order derivatives.
PPO scales better with massive parallel processing, crucial for
OpenAI's large-scale training.
PPO avoids this by using a simple clipping mechanism,
enabling faster computations.
Handles large batches and high-dimensional action spaces
more effectively.
Implementation Simplicity
Empirical Performance
PPO is easier to implement and tune compared to TRPO.
PPO demonstrated robust performance across various
challenging tasks, including Dota 2.
Reduces the need for complex conjugate gradient methods
and Hessian-vector products.
Caterpillar: Confidential Green
OpenAI's experiments showed that PPO could achieve
superhuman performance without the overhead of TRPO.

13.

Scaling Reinforcement Learning
Massive Scale Training
Computational Resources
OpenAI Five learned from batches of approximately 2
million frames every two seconds.
They utilized thousands of GPUs over several months,
a testament to the scale required.
Distributed Training System
They developed Rapid, a distributed training platform,
to handle this immense computational load.
This system coordinated the efforts of numerous
CPUs and GPUs to train the AI effectively.
Caterpillar: Confidential Green

14.

Hierarchical Reinforcement Learning
Breaking Down Complexity: Hierarchical RL
Concept Overview
Hierarchical RL involves decomposing complex tasks
into simpler subtasks.
This approach makes it easier for the AI to learn and
make decisions.
Application in Dota 2
For example, deciding whether to purchase a specific
item depends on multiple factors.
By breaking down the decision into smaller parts, the
AI can evaluate each factor individually before making
a choice.
Caterpillar: Confidential Green
Real time decision making process: Algorithm considers each factor one by
one and reaches a final verdict, being that of “BUY BKB”. Each factor will
“push” or cause the algorithm to one of the two decisions by a certain amount.
This “one-by-one” approach is because of the decomposition of subtasks as
explained in

15.

The Reward System
Shaping Behavior Through Rewards
Customized Rewards
OpenAI designed a reward function that went beyond
just winning or losing.
It included signals like character deaths, resource
collection, and team performance.
Zero-Sum Symmetry
They symmetrized rewards by subtracting the
opponent's rewards, aligning the AI's objectives with
competitive play.
This encouraged strategies that not only advanced its
position but also hindered the opponent.
Caterpillar: Confidential Green

16.

Continuous Learning and Surgery
Adapting and Evolving: The Surgery Technique
Need for Adaptation
Throughout training, changes in game updates and AI
architecture required OpenAI to adapt without losing
progress.
Restarting training from scratch each time would have
been impractical.
Surgery Method
They developed 'surgery,' a method to transfer
knowledge from one model to another despite
changes.
This allowed continuous learning and saved valuable
time and resources.
Caterpillar: Confidential Green

17.

Results and Impact
Achievements and Recognition
Historic Victory
OpenAI Five's victory over Team OG was a landmark
achievement in AI.
It demonstrated the potential of AI in mastering
complex, real-world tasks.
This engagement provided valuable data and
showcased the AI's robustness.
Investment and Growth
The project's success attracted significant investment,
fueling further research and development.
It positioned OpenAI as a leader in the AI industry.
Caterpillar: Confidential Green

18.

Thank You!
Questions & Discussion
References
Dota 2 with Large Scale Deep Reinforcement
Learning by OpenAI, Christopher Berner, Greg
Brockman
OpenAI Five: https://openai.com/index/openaifive/
Caterpillar: Confidential Green

English Русский Rules