What is Reinforcement Learning in AI?
This is the ultimate introduction to reinforcement learning (RL) in artificial intelligence (AI).
RL can do things humans have never done before. It is motivated by how living creatures, including us, learn how to interact with our world.
In this blog post, we will define reinforcement learning, how it works, why it is important, and how it compares to other learning methods. We will also explain how reinforcement learning can be used in business today and provide examples of how it benefits the retail, insurance, and health care industries. Let's get started:
What is Reinforcement Learning?
Reinforcement learning is a branch of AI that learns how to make decisions, either through simulation or in real time that result in a desired outcome. It is the brains of autonomous systems that are self-learning.
Reinforcement Learning Example
An autonomous racecar is a great example to explain reinforcement learning in action.
The racecar is concerned with trying to find the fastest lap around the track. The software applies reinforcement learning to find the optimal sequence of steering, brake, and gas that gets the fastest lap time.
You cannot learn to drive a car using predictive modeling or supervised learning. You cannot learn from the historical data as it makes no sense to categorize each millisecond (or other time slice) as good or bad. The brains of the vehicle are trained in a computer simulation of the world using the laws of physics as it is not practical to train a car in the real world (for obvious reasons).
The historical data is used to configure the simulation. The brains of the autonomous car make decisions every millisecond (or other time slice) where the decision consists of a steering, braking or acceleration input. The reward or objective for the autonomous system is the lap time, and it is delayed in time from the decisions.
Why is Reinforcement Learning Important?
Reinforcement learning is different than supervised learning because it can learn faster than the pace of time when used in simulation mode (I.e. we can run 100 million hours of simulation time in an hour of real time if we have powerful enough computers). Furthermore, it can learn to make decisions that have never been thought of or tried before by humans because it does not use historical decisions to learn from. It can also learn in real-time interacting with the world where the system dynamics (akin to the laws of physics) are unknown.
Reinforcement learning can be applied to real world problems by helping humans make decisions. The AI outputs recommendations to the human, and the human decides how to implement those recommendations within their business.
This can give businesses a significant competitive advantage because the AI technology delivers decision that are beyond human capability, either due to the complexity behind making those decisions and/or because the AI system has more experience than humans will ever have by simulating more decisions than a human could in a lifetime.
How Does Reinforcement Learning Work?
To keep things simple, let us use the autonomous racecar example again:
If you know the dynamics of your system, in the case of a race car it is the laws of physics, then you do not have to spend time learning the dynamics. [Side] Instead, the racecar software focuses on learning how a sequence of steering wheel, gas pedal, and brake pedal positions result in an optimal lap.
There are two general classes of reinforcement learning algorithms that allows it to learn which optimal decisions it should make that result in an optimal reward:
Reinforcement Learning Can Learn in Real Time
A real car can be put on the road, and it could learn to drive in real time through random trial and error. In this case, the car could figure it out through random guessing of what is the best way to steer and accelerate – but it would crash thousands (probably millions) of times before learning to drive, which would be a concern for everyone’s safety and require a large number of cars.
The learning is part random; the car’s software would try new things it has never done before, and also utilize what it already knows. That is where the idea of reinforcement comes from, you reinforce your learning through practice, and recall it in new situations for better outcomes.
Reinforcement Learning Can Learn by Simulating
The car could learn in a simulation using a computer model of your system. The simulation would require a model of the world so the car can use the model to simulate outcomes and learn the sequence of decisions that result in optimal outcome.
Simulation is possible when the dynamics of the world are known as the laws of physics and the car driving can be simulated. One of the biggest benefits simulation offers is that you do not have to break millions of cars or put people at risk to obtain optimal performance (the reward).
Why is Reinforcement Learning Better than Predictive Analytics?
Reinforcement learning is better than predictive analytics because it learns faster than the pace of time. It allows you to simulate the future without any historical data. As a result, you can do things you have never done before.
Learn the key differences between traditional predictive analytics and AI in the retail space.
For statistical models like predictive analytics, learning happens at the pace of time because it finds patterns from known historical data. To create new patterns, it requires new real-world experiments which happen at the pace of time. Humans must build the models, summarize the insights and decide what to do with the learned patterns or insights. Decision making is not embedded in the math of predictive modelling.
A predictive model is only able to replicate what has been done before and cannot learn new patterns without examples. You would be hard pressed to beat a human because you would only be modeling human patterns.
Whereas, through reinforcement learning, its ability to simulate allows you to find completely new alternatives humans would have never thought of.
Reinforcement learning combined with simulation can evaluate and assess more decisions than humanity could in all human lifetimes combined. That is the most significant difference between the two.
Read our blog post that explains a simple way to look at predictive analytics vs artificial intelligence featuring two grocery clerks.
3 Ways Reinforcement Learning is Used in Business
Now that we understand what reinforcement learning is and how it works, let us look at three different ways it can be used in the retail, insurance, and health care industries:
Reinforcement Learning in Retail
In retail, reinforcement learning works similarly to the way it does in the case of the autonomous racecar. Driving a lap around a racetrack is like a year of business. In a year of retail, the goal is to optimize and find the maximum sales, margins, transactions etc.
When you drive a car, you do the sequence of steering, brake, and gas pedal. Those are the decisions you make to get the fastest lap. To achieve optimal retail performance, the decisions category managers make are around:
- What are the weekly products to be promoted by promotional channel?
- What are the weekly products that should not be promoted?
- What are the weekly prices I should charge for both promoted and non-promoted products?
- What are the weekly inventory allocations? How much product should I put on the shelf in each store?
There are of course other decisions retailers make, but these are the core decisions which impact performance on a weekly basis. The reinforcement learning problem is to find the optimal sequence of those decisions.
Daisy has its Theory of Retail™, which is like the laws of physics. Daisy does model-based reinforcement learning (i.e. We understand the system dynamics), and our software system can simulate the impact of millions of the aforementioned decision alternatives to get the optimal or improved outcomes for retailers.
Learn five ways AI based on reinforcement learning is changing retail.
Reinforcement Learning in Insurance
In insurance, reinforcement learning is used to understand how to ‘drive’ an insurance company. The sequence of underwriting (pricing), adjudication (claims process) and fraud detection decisions are the key inputs that drive an insurance company to performance. These include:
- Underwriting is the process of pricing a policy or coverage based on the assessment of risk.
- Adjudication determines whether claims are eligible for coverage and which parts of the claims should be paid up to potential coverage limits.
- Fraud detection when executed as part of the adjudication process, determines if any parts of the claim are potentially fraudulent and hence should not be paid or at least reviewed prior to payment.
Those are the core decisions that insurance operations staff (underwriters, adjustors and special investigators) make that drive the profitability of an insurance company.
In insurance, Daisy has its Theory of Risk™, which is like the laws of physics. It is model-based reinforcement learning for the insurance industry.
Reinforcement Learning in Health Care
In health care, reinforcement learning can be used to improve people’s wellbeing by diagnosing a patient’s condition so the doctor can recommend a treatment.
For example, Daisy’s client BrainFx does neuroscience patient assessments. BrainFx is a neuroscience company that supports clinicians with a software based 360-degree assessment of patients with mild or moderate brain disease or injury. The assessment consists of sixty tests on a tablet app that test people’s memory, math skills, hearing, balance, dexterity, problem solving, and vocabulary.
Daisy’s AI technology collects the raw data from a patient clinical test and does a medical assessment identifying where the patients’ deficits or surpluses are compared to their own baseline or to peer groups of similar individuals. Review of the assessment results by a clinician can aid in determining the best course of treatment for the patient and gauging progress of the treatments over time.
Daisy has its Theory of Medicine™ that is like the laws of physics. It is model-based reinforcement learning that assists medical professionals improve patient outcomes by delivering a sequence of assessments over time.
These are just some examples of why reinforcement learning is an excellent type of AI that can be used to optimize business performance. With Daisy’s AI technology, companies can achieve more efficient processes, significantly improve financial outcomes, and by freeing up human operators from the daily decision making, it permits those operators focus on strategy and innovation.
Using reinforcement learning, along with our patent-pending Theory of Retail™ and Theory of Risk™, we are on the cutting edge of AI technology and revolutionizing retail category management and insurance risk management.
Our team of computational scientists, mathematicians, and experts execute billions of simulations that show how companies can better grow their business, delivering increased profits, increased customer satisfaction and drive operational efficiencies.
Contact us today to learn how Daisy’s AI can help your business today.