Show Notes

(2:00) Alexey studied Information Systems and Technologies from a local university in his hometown in eastern Russia.(4:54) Alexey commented on his experience working as a Java developer in the first three years after college in Russia and Poland, along with his initial exposure to Machine Learning thanks to Coursera.(7:55) Alexey talked about his decision to pursue the IT4BI Master Program specializing in Large-Scale Business Intelligence in 2013.(9:42) Alexey discussed his time working as a Research Assistant on Apache Flink at the DIMA Group at TU Berlin.(12:28) Alexey’s Master Thesis is called Semantification of Identifiers in Mathematics for Better Math Information Retrieval, which was later presented at the SIGIR conference on R&D in Information Retrieval in 2016.(14:35) Alexey discussed his first job as a Data Scientist at Searchmetrics - working on projects to help content marketers improve SEO ranking for their articles.(18:54) Alexey’s next role was with the ad-tech company Simplaex. There, he designed, developed, and maintained the ML infrastructure for processing 3+ billion events per day with 100+ million unique daily users - working with tools like Spark for data engineering tasks.(22:17) Alexey reflected on his journey participating in Kaggle competitions.(25:35) Alexey also participated in other competitions at academic conferences: winning 2nd place at the Web Search and Data Mining 2017 challenge on Vandalism Detection and winning 1st place at the NIPS 2017 challenge on Ad Placement.(29:59) Alexey authored his first book called Mastering Java for Data Science, which teaches readers how to create data science applications with Java.(31:40) Alexey then transitioned to a Data Scientist role at OLX Group, a global marketplace for online classified advertisements.(33:23) Alexey explained the ML system that detects duplicates of images submitted to the OLX marketplace, which he presented at PyData Berlin 2019. Read his two-part blog series: The first post presents a two-step framework for duplicate detection, and the second post explains how his team served and deployed this framework at scale.(38:12) Alexey was recently involved in building an infrastructure for serving image models at OLX. Read his two-part blog series on this evolution of image model serving at OLX, including the transition from AWS SageMaker to Kubernetes for model deployment, as well as the utilization of AWS Athena and MXNet for design simplification.(42:39) Alexey is in the process of writing a technical book called Machine Learning Bookcamp - which encourages readers to learn machine learning by doing projects.(46:17) Alexey discussed common struggles during data science interviews, referring to his talk on Getting a Data Science Job.(48:32) Alexey has put together a neat GitHub page that includes both theoretical and technical questions for people who are preparing for interviews.(52:19) Alexey extrapolated on the steps needed to become a better data scientist, in conjunction to his LinkedIn post a while back.(56:40) Alexey gave his advice for software engineers looking to transition into data science.(58:32) Alexey shared his opinion on the data science community in Berlin.(01:01:53) Closing segment.

His Contact Info

WebsiteTwitterLinkedInGitHubKaggleQuoraGoogle ScholarMedium

His Recommended Resources

Apache FlinkKubeflowData Science Interviews GitHub RepoPyData BerlinBerlin BuzzwordsAndrew NgDesigning Data-Intensive Applications by Martin Kleppmann

Machine Learning Bookcamp

Permanent 40$ discount code: poddcast195 free eBook codes (each good for one sample of the book): mlbdrt-D452, mlbdrt-5922, mlbdrt-2C4D, mlbdrt-3034, mlbdrt-1DD1

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email [email protected].

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

Listen on SpotifyListen on Apple PodcastsListen on Google Podcasts

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Twitter Mentions