Pixum Software Engineering Blog
An insight into our development team, what we do and how we work

Conference review: c/o data science Bonn 2022

Hendrik Gruß

We would like to give you a short recap of our visit at the data science barcamp “c/o data science Bonn 2022”.

Patrick and I (Hendrik) attended the conference on the 13th September and gained some interesting insights into how data science is used in different fields and companies. You can find out more about the event at https://www.co-datascience.de/bonn/bn22_program/.

In this blog post I will talk about the key insights from the keynotes.

The event was a mixture of keynote speakers, hacksessions and a barcamp. A barcamp is a concept where everybody can bring in topics and small talks to fill a timetable at the event, quite similar to the “Open Friday” concept we have at Pixum.

First of all, Thomas Isele at Deutsche Post DHL talked about the challenges in planning and organizing last-mile delivery. They proposed an interesting new concept for planning optimal routes for their couriers. If you only look at minimizing delivery time, a lot of details of a courier’s day are missing. They found out that by minimizing the route for delivery time, the driving time can only be reduced by roughly 8 minutes, where the whole delivery time of a day is only 84 minutes. This means a lot of time goes into other work. This is why they proposed 2 new KPIs: Usage of their routing app for couriers and adherence to the suggested routes. Using the new kpis they tried to give routes to couriers, which match more to their typical working day. In a first step, they clustered all routes for a district into overall delivery preferences, for example north to south, clockwise, etc. Using machine learning, they assigned every courier to his most preferred overall delivery preferences. Within that preference, they solved a traveling salesman problem to optimize the route. By applying this strategy, they increased the route suggestion adherence from 40% to above 80%, meaning their couriers are happier with the suggested routes.

Barcamp

In another talk, Sebastian Hätäla at LeanIX explained how siamese neural networks can be used to make search algorithms more useful. LeanIX classifies computer tools into different types of categories, for example video chat, text editor, communication etc. They have a lot of metadata from users’ reviews of tools. Using these reviews, they wanted to improve their search algorithm to not only search by lexical similarity but also using semantic data. They trained a neural network to produce a similarity score between applications. When a user now searches for “video chat”, not only the applications with a lexical match are shown, but also applications with high similarity scores to the shown applications. Amazon uses similar techniques to show products “that match the search query” even if the words don’t appear in the product names or descriptions.

One of the big topics in a lot of talks was MLOps. Muhammed Demircan at REWE Digital GmbH presented their approach to implement an MLOps infrastructure and culture at REWE Digital. MLOps is a new area of machine learning which combines DevOPS, Data Engineering and Machine Learning. It aims at keeping machine learning applications failure free, automated and monitored, such that the business value created by machine learning is maintained. Demircan showed different principles that need to be considered when applying MLOps. Afterwards, he showed which tools REWE Digital chose to adopt to those principles. As they already had a Google Cloud Infrastructure, they went for Vertex AI by Google. While giving them a lot of nice features (model registry, auto ML, feature stores etc.) the tool is quite expensive (he was talking about half a million € per year). However, there also seem to be a lot of different open source tools for the principles mentioned, which we will also evaluate for usage at Pixum.

Impression

In another talk “No Projectflops with MLOps” Matthias Wiciok from Evaco talked about MLOps, too. He gave some interesting numbers for machine learning projects: 85% of all machine learning models never go live, 47% of all prototypes never go live and 55% of companies evaluating machine learning, never brought a model to production. He showed what the problems of implementing machine learning in production are and what needs to be considered beyond just training the models. Evaco sells a product that helps with MLOps - DataRobot - which has automatic training and deployment of models, together with monitoring and retraining.

All in all, the conference was very informative and we could get some insights that help us with implementing machine learning at Pixum.