Show Notes(01:56) Curtis reflected on his upbringing in rural Kentucky and his gift of education.(07:20) Curtis explained how he cultivated mental focus and intellectual fortitude while growing up in Kentucky.(10:30) Curtis shared his view regarding online misinformation on social media.(14:27) Curtis recalled his undergraduate experience at Vanderbilt University in the early 2010s.(22:39) Curtis explained how he learned best via teaching and mentoring.(24:04) Curtis walked through the research and industry experiences he obtained throughout college.(32:45) Curtis recalled his decision to embark on a Ph.D. in Computer Science at MIT.(38:53) Curtis told the story of how he ended up finding his advisor - Professor Isaac Chuang (the inventor of the first working quantum computer).(40:36) Curtis mentioned how he invented the CAMEO Detection Algorithm to detect “multiple-account” cheating in massive open online courses.(44:47) Curtis unpacked his Ph.D. research on dataset uncertainty estimation.(50:08) Curtis dissected confident learning, a family of theories and algorithms for supervised ML with label errors.(53:22) Curtis encapsulated how he strategically iterated cleanlab at his various graduate internships.(01:00:22) Curtis recalled his time founding his first startup ChipBrain, before founding Cleanlab.(01:06:42) Curtis brought up the creation of the labelerrors.com project.(01:12:12) Curtis provided lessons learned as a second-time founder.(01:14:25) Curtis elaborated on the open-source roadmap of cleanlab.(01:17:08) Curtis highlighted the key capabilities of Cleanlab Studio - the no-code, automatic data correction solution for data and engineering teams with robust enterprise features.(01:18:50) Curtis touched on Cleanlab Vizzy - an interactive visualization of confident learning.(01:20:29) Curtis shared valuable hiring lessons to attract the right people who are excited about Cleanlab’s mission.(01:23:23) Curtis gave his thoughts on shaping Cleanlab’s culture.(01:26:06) Curtis explained the similarity and differences between being a founder and a researcher.(01:29:09) Curtis mentioned how he had helped researchers build affordable state-of-the-art deep learning machines.(01:31:46) Curtis brought up his alter ego PomDP the Ph.D. rapper, and how rapping has been an outlet for him to express emotions and creativity.(01:40:12) Curtis emphasized how his success had been due to a function of grit, resourcefulness, and friends made along the way.(01:44:04) Closing segment.Curtis' Contact InfoAcademic WebsiteLinkedIn | Twitter | Facebook | InstagramGoogle Scholar | GitHubPhD Rapper (YouTube | Spotify | SoundCloud | Facebook | Twitter | Instagram)L7 Machine Learning BlogCleanlab's ResourcesWebsite | GitHub | Slack | Twitter | LinkedInBlog | Research | DocAbout | CareersCleanlab StudioCleanlab VizzyThe Cleanlab CultureMentioned ContentPapersDetecting and preventing “multiple-account” cheating in massive open online courses, Curtis G. Northcutt, Andrew Ho, & Isaac L. Chuang, Computers & Education, 2016. [paper | code | arXiv]Comment Ranking Diversification in Forum Discussions, Curtis G. Northcutt, Kimberly Leon, & Naichun Chen, Learning at Scale, 2017. [paper | code | free-access]Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels, Curtis G. Northcutt, Tailin Wu, & Isaac L. Chuang, 33rd Conference on Uncertainty in Artificial Intelligence (UAI 2017). [paper | code]Confident Learning: Estimating Uncertainty for Dataset Labels, Curtis G. Northcutt, Lu Jiang, & Isaac L. Chuang, Journal of Artificial Intelligence Research (JAIR), Vol. 70 (2021). [paper | code | blog]Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks, Curtis Northcutt, Anish Athalye, and Jonas Mueller, 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks [paper| demo | code | blog]Blog PostsFounder’s Medal recipient chooses MIT over Microsoft (May 2013)Build a Pro Deep Learning Workstation... for Half the Price (Feb 2019)An Introduction to Confident Learning: Finding and Learning with Label Errors in Datasets (Nov 2019)Announcing cleanlab: a Python Package for ML and Deep Learning on Datasets with Label Errors (Nov 2019)Double Deep Learning Speed by Changing the Position of your GPUs (Dec 2019)Benchmarking: Which GPU for Deep Learning? (Dec 2019)The Best 4-GPU Deep Learning Rig only costs $7000 not $11,000 (April 2020)Pervasive Label Errors in ML Datasets Destabilize Benchmarks (March 2021)Cleanlab: The History, Present, and Future (April 2022)cleanlab 2.0: Automatically Find Errors in ML Datasets (April 2022)How We Built Cleanlab Vizzy (August 2022)Talks and PodcastsTedx Talk: The MIT Rap Challenge (July 2020)Talk at NLP Summit (March 2022)Talk at Data + AI Summit (June 2022)MLOps Coffee Chat (July 2022)Talk at Snorkel's Future of Data-Centric AI Conference (July 2022)Open-Source Startup Podcast (March 2023)PeopleLeslie KaelblingGeoff HintonJeff DeanBookPlay Bigger: How Pirates, Dreamers, and Innovators Create and Dominate Markets (by Al Ramadan, Dave Peterson, Chris Lockhead, and Kevin Maney)Notes

My conversation with Curtis was recorded back in August 2022. The Cleanlab team has had some important announcements in 2023 that I recommend looking at:

The launches of CleanVision, Datalab, and ActiveLabThis blog post on using Cleanlab to improve LLMsHis new single "Clarity In My Vision"Cleanlab's partnership with Databricks (Video)

Cleanlab is about to announce its Series A announcement soon. Stay on the look for it!

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email [email protected].

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

Listen on SpotifyListen on Apple PodcastsListen on Google Podcasts

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.


About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email [email protected].

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

Listen on SpotifyListen on Apple PodcastsListen on Google Podcasts

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Twitter Mentions