Machine learning in the Oil and Gas industry – 2054 years of professional training
- Category:
- Artificial Intelligence Services

Tooploox delivered a reinforcement learning-based model to optimize and automate a Crude-oil Distillation Unit’s workflow, delivering a significant boost to the profitability and efficiency of the whole installation.

Applying Artificial Intelligence (AI) in the oil and gas industry seems intuitive. Companies collect tremendous amounts of data from IoT-powered infrastructure in a never-ending race toward constant optimization. This was the basis of the use of deep learning in the oil and gas sector delivered jointly by Tooploox and client’s company.
The client
Our client is one of the world’s leading industry software providers. Founded in the UK as a research facility, the company is one of the Computer-Aided Design pioneers.
Later, the company was privatized and further developed its portfolio of computer design-centered software by acquiring other industry-specific companies. Having strong competence in heavy industry-centric applications, the company finds multiple partners in the oil and gas sector.
The challenge
The modern oil and gas industry heavily benefits from automation and the digitalization of their workflow. In this particular case, the Crude-oil Distillation Units (CDU) were of concern, or rather the tanks material required to fill them up during the production process. Currently, the full process of filling the tanks from the tanker, feeding the CDU with raw materials, and producing fuel from it is done manually.
The client and Tooploox team were in charge of designing software to support the automation of this process, thus enhancing the profitability and reducing the pressure put on the operators.
Understanding the workflow
The CDU itself is an industrial installation with a high level of complexity which is irrelevant in this particular use case. Yet it is crucial to understand two facts regarding the challenge:
-
The CDU takes crude oil and other ingredients of the produced fuel from eight charging tanks. The tanks enable the operator to ensure that the CDU is constantly fed with the proper ingredients to avoid any interruptions in the production process. Tanks are filled from tankers at the berth, so there are multiple (more or less random) events that need to be taken into consideration when managing the oil flow from tanks to CDU.
-
Also, the oil from the tankers needs to be discharged as fast as possible as to not hold up the queue and ensure a smooth transition to the next ship.
-
A situation when the CDU gets no oil or production is stopped is to be avoided – an interruption is not only a tremendous cost but can also result in damaging the installation.
-
Depending on the type of oil and ingredients, the refinery produces various types of petrol, either more or less valuable, thus the management of oil tanks has a direct impact on the whole installation’s business performance.
Currently, all the management work is done manually by the operator. The role is stressful and the specialist works under high pressure, being responsible for millions-of-dollars worth of workflow and complex machinery.
Aiding this process with machine learning was the point where the client and Tooploox decided to join forces to deliver a practical artificial intelligence application for the oil and gas industry.
The challenge itself
From Tooploox’s point of view, the workflow described above was a complex optimization task. The main issue of this process was to make optimal scheduling decisions according to the given objective functions. Moreover, the following challenges presented us with such optimization problems:
-
Large space of decision variants.
-
Large space of constraints.
-
Uncertainty of some events (like ship delays).

Our solution
Considering multiple aspects of the challenge, our engineering team decided to deliver two policies. A static one, which is based on fixed rules, and a dynamic one, which leverages the power of reinforcement learning. From the operational point of view, we considered two agents:
-
The Berth agent, which oversees the flow of the oil from the tankers to the oil tanks.
-
The CDU agent, which is responsible for managing the flow of the oil from the tanks to the CDU
The responsibility of these two agents is shown on the image below:


After dividing the workflow between the agents, our team started to build a testing environment.
The refinery simulator
The first step during the delivery stage was designing a simulated environment of the refinery. Considering the cost of malfunction and the overall scale of the operations, one would be insane to test the technology on-site. The cost would be tremendous even assuming we only use one in the testing process, without accounting for the eventual damage done by suboptimal policies. Though, on the other hand, there was a need for the test environment to validate and train the agents that would be in charge of the facilities management.
That’s why the Tooploox and client’s engineering team delivered a refinery simulator – a sandbox environment enabling the team to test multiple policies and ideas.
The core of the simulated environment is the representation of the “current stage” of the facility. It included the pipes topology, all the tanks, CDUs, valves to operate, and information about the oil flow.
The simulator responded in real-time to the actions of the agent, making it a fully functional testing environment for any policy given, be that a manual one, an automated one, or a fully ML-based policy. Also, the environment enabled the agents to be trained in an accelerated time frame while using the parallelization technique, so after a few hours of the simulated environment running, the ML-based agent could gather several hundred years of training.
Last but not least, the simulated environment included information about price differences between multiple types of products and the demand for them, so the agents were able to work on income maximization.
The simulator was the first step to test and deliver agents to control the facility. Our team has prepared two of them.
Static policy agent
In our first approach, we considered only static (fixed) policies for both Berth and CDU agents. We analyzed the following human-developed policies:
-
BerthAgent – chooses the most empty tank to discharge the material
-
CDUAgent – chooses the tanks (both Low Sulphur and High Sulphur) with the most material for feeding and works until the demand is satisfied.
Such a policy is not able to satisfy the maximal target demand (working with maximum flow rate) for CDU units. It is worth mentioning that such policies may not give a feasible solution at all.
Dynamic policy agent
Contrary to the static one, the dynamic policy is based on the machine learning agents trained in a simulated environment with the reinforcement learning paradigm, one of the most innovative approaches in machine learning.
What is reinforcement learning
In the most common approach of supervised learning, the machine learning model is fed with data in order for it to extract patterns and learn how to recognize them. So the model can achieve superhuman accuracy in recognizing road signs or cancer tumors on ICT scans.
Although powerful and versatile, this approach suffers from several limitations. First of all, it is nearly helpless when the model encounters something before unseen, confusing the solution.
In reinforcement learning, the model is trained by its interactions with the environment and being rewarded for certain outcomes. A great example of this comes from training autonomous cars, where the model can gather points by sticking to the rules, avoiding speeding, and driving safely. The model, at first, acts randomly and then memorizes the actions it is rewarded for, thus performing them more often to gather bigger rewards. Thus, the reinforcement learning paradigm is comparable to the natural model of learning, often seen in the animal and human world.
In this particular case, the model would get points for optimizing the refinery workflow for higher income and constant work. The key challenge was in delivering the environment from which the model could gather the data – it would be insane to run training sessions in an industrial facility – as the random actions of the model would have quickly resulted in disaster.
The reinforcement learning-based training
Our main results concern the dynamic policies for CDU agents that are developed during the training process. We ran the simulation many times in order to observe more and more (state, action) pairs and the policies are being improved iteratively.
Highlight: 2054 years of operator’s work – and equivalent of single agent one-hour training session
In this approach, we generate new behaviors for the agent to apply to fulfill the optimization requirements.
The effect
We considered some fixed scenarios of vessels approaching the Berth with a given amount materials, LS and HS.
We analyzed three objective functions, i.e. with minimal, average, and maximal target demand for production processes for all CDU units.
The main achievement is that our models were able to find dynamic policies that provided feasible solutions for the minimum, the average, and the nearly maximum (see the last row in the table) target demand, while the static policies were not able to solve even the average demand.
Summary – AI solutions that digitally transform the energy industry

Undoubtedly, Artificial intelligence is on its way towards transforming the oil and gas industry, be that by the application of machine learning to predictive maintenance, exploration, and production, or as specific as process optimization as in the example above.

Let our specialists solve the problems and tackle the challenges that hold you from conquering the world.
Let’s talk