Episode 132: Big Data Engineering, Data Culture from First Principles, and Reimagined Metadata with Suresh Srinivas

Datacast

English - February 09, 2024 20:00 - 1 hour - 56.8 MB - ★★★★★ - 4 ratings
Technology Business Careers research data engineering data science artificial intelligence machine learning statistics technology startup computer science venture capital Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Episode 131: Data Infrastructure for Consumer Platforms, Algorithmic Governance, and Responsible AI with Krishna Gade

Next Episode: Episode 133: Full Data Stack Observability with Salma Bakouk

Show Notes(01:53) Suresh went over his college experience studying Electronics Engineering at the National Institute of Technology Karnataka.(04:35) Suresh recalled his 9-year engineering career at Sylantro Systems.(08:47) Suresh talked about the origin of Apache Hadoop at Yahoo.(11:05) Suresh dissected the high-level design architecture of the Hadoop Distributed File System (HDFS).(15:36) Suresh reflected on his decision to become a co-founder of Hortonworks, which focused on bringing Hadoop training and support to enterprise customers.(17:36) Suresh unpacked the evolution of the Hortonworks Data Platform - which includes Hadoop technology such as HDFS, MapReduce, Pig, Hive, HBase, ZooKeeper, and additional components.(20:30) Suresh shared his lessons from developing and supporting open-source software designed to manage big data processing.(23:43) Suresh walked through the evolution of Uber’s Data Platform.(28:03) Suresh described Uber's journey toward better data culture from first principles.(34:00) Suresh explained his motivation to start the OpenMetadata Project.(37:21) Suresh elaborated on OpenMetadata's five design principles: schema-first, extensibility, API-centric, vendor-neural, and open-source.(40:17) Suresh highlighted OpenMetadata's built-in features to power multiple applications, such as data collaboration, metadata versioning, and data lineage.(44:38) Suresh emphasized his priority for the open-source roadmap to adapt to the community's needs.(47:05) Suresh explained the architecture of OpenMetadata - which goes deep into the push-based and pull-based characteristics of metadata ingestion and consumption.(51:47) Suresh shared the long-term vision of his new company Collate, which powers the OpenMetadata initiative.(53:36) Suresh shared valuable hiring lessons as a startup founder.(56:30) Suresh shared fundraising advice to founders who want to seek the right investors for their startups.(57:50) Closing segment.Suresh's Contact InfoLinkedIn Twitter GitHubOpenMetadata's ResourcesWebsite | Twitter Slack | GitHub | Community Documentation CollateMentioned ContentPeopleJoe Littlejohn (jsonschema2pojo)Sriharsha ChintalapaniBookThe Innovator's Dilemma (by Clayton Christensen)

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email [email protected].

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

Listen on Spotify Listen on Apple Podcasts Listen on Google Podcasts

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Episode 132: Big Data Engineering, Data Culture from First Principles, and Reimagined Metadata with Suresh Srinivas

Datacast

Twitter Mentions