Modern retail can no longer do without building predictive and recommendation systems based on Big Data.But with large amounts of data, such as at Auchan, working with big data at local facilities is ineffective: it is expensive, difficult to operate. It can lead to a race for resources between departments.
Therefore, some companies come to the cloud Big Data platform as a tool that provides simple scalability and manageability for systems working with Big Data. Moving to such a platform will not be easy: it is not enough to move production systems to the cloud as they are. A global restructuring will be required – not only in terms of architecture and technology but also at the level of the corporate culture. Report users will have to learn SQL, and development, testing, and operations will have to be DevOps friends.
I am Alexander Dorofeev, ex-Head of Big Data at Auchan Retail Russia; in this article, I will tell you:
At Auchan, the Big Data division builds recommender and predictive systems based on Machine Learning (ML) and Artificial Intelligence (AI). Systems of this kind have long been a must-have for large retailers wishing to remain competitive.
We develop and plan to build solutions for a wide range of tasks: forecasting key store indicators (traffic, turnover, sales), determining optimal prices (elasticity of demand), customer segmentation, increasing loyalty through personal offers, evaluating the effectiveness of marketing campaigns, and much more.
We developed and launched a pilot for our first ML solutions on the technology stack available at that time – our infrastructure. We have a separate database deployed in the Oracle Exadata DBMS, which was also used in parallel as an OLTP storage for other business applications of the company. We loaded historical data on sales, prices, and other indicators from our information systems into the database and cleared them.
We have developed algorithms for the first Demand Forecasting Engine solution, which predicts demand three months in advance for goods in specific stores (in the context of “product – store – week”) and, as a result, allows you to plan purchases and reduce costs. Testing was carried out, and the pilot was released.
We were impressed with the results of the pilot implementation. Compared to the forecasts that were formed earlier using Excel Enterprise, the accuracy of the new algorithms was 17.5% higher for regular sales and 21% for promotional sales. It is an impressive increase by the standards of our industry.
Since the pilot turned out to be successful, we were faced with transferring it to commercial operation. And for several reasons, it was impossible to use the same technology stack in battle as in the pilot:
The technologies used (Oracle + Python on the data analyst desktops) were extremely slow. Thus, training predictive models for pilot categories (approximately 10% of the entire product matrix) took two weeks.
It turns out that in a production environment, a complete training cycle would take 20 weeks. Although the rate of degradation of the accuracy of models in the modern market is high: it is necessary to retrain models based on the latest data more often than once every 20 weeks.
Relational databases like Oracle are excellent at handling OLTP workloads and are suitable as reliable data warehouses for business applications. But they are not intended to process complex analytical queries, build storefronts, ad hoc reporting (one-time unique questions that combine different data), and other Big Data processing operations. We needed modern tools suitable for OLAP workloads, with which we could develop and launch analytical products based on ML and big data.
Analytical operations on the On-premise Oracle Exadata base affected the performance of the DBMS and the stability of other processes running on it. It did not suit the business – it is evident that a separate circuit is needed for analytical tasks.
Due to the impossibility of parallel data processing, scaling was not possible. In addition, we wanted to automate the increase in capacity and not do everything manually.
To solve these problems, we needed to create a specialized unified Big Data platform that would allow:
The first step to creating a Big Data platform is choosing the right technology stack. To determine the future architecture, the business leaders and I took the following steps:
Based on the analysis, we came to the following conceptual architecture of the future Big Data platform.
Also Read: What Is Apache Spark, And How Is It Used In Big Data
Key Takeaways Understand current innovations reshaping payroll processes. Learn how automation improves payroll accuracy and…
Convert URL To MP3: Your Comprehensive Guide To Easy Online Conversions Description: Discover how to…
Spending a lot of time on the internet, I am always looking for tools that…
Due to the abundance of options available in the field of cloud storage, it may…
Lately, I have been searching for YouTube alternatives. Even though I enjoy YouTube for its…
Internet marketing and entrepreneurship are dynamic fields, but BizGurukul assists fresh and experienced marketing personnel.…