Episode 60: Algorithms and Data Structures for Massive Datasets with Dzejla Medjedovic

Datacast

English - April 05, 2021 08:00 - 1 hour - 61.7 MB - ★★★★★ - 4 ratings
Technology Business Careers research data engineering data science artificial intelligence machine learning statistics technology startup computer science venture capital Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Episode 59: Bridging The Gap Between Data and Models with Willem Pienaar

Next Episode: Episode 61: Meta Reinforcement Learning with Louis Kirsch

Show Notes(01:58) Dzejla described her undergraduate experience studying Computer Science at the Sarajevo School of Science and Technology back in the mid-2000s.(07:59) Dzejla recapped her overall experience getting a Ph.D. in Computer Science at Stony Brook University.(14:38) Dzejla unpacked the key research problem in her Ph.D. thesis titled “Upper and Lower Bounds on Sorting and Searching in External Memory.”(19:13) Dzejla went over the details of her paper “Don’t Thrash: How to Cache Your Hash on Flash,” — which describes the Cascade Filter, an approximate-membership-query data structure that scales beyond main memory, that is an alternative to the well-known Bloom-filter data structure.(24:41) Dzejla elaborated on her work “The batched predecessor problem in external memory,” — which studies the lower bounds in three external memory models: the I/O comparison model, the I/O pointer-machine model, and the index-ability model.(29:56) Dzejla shared her learnings from being a Teaching Assistant for the Introduction to Algorithms course at Stony Brook (both at the undergraduate and graduate level).(35:08) Dzejla went over her summer internships at Microsoft’s Server and Tools Division during her Ph.D.(41:06) Dzejla reasoned about her decision to return to Sarajevo School of Science and Technology as an Assistant Professor of Computer Science.(47:22) Dzejla dissected the essential concepts and methods covered in her Data Structures, Introductory Algorithms, Advanced Algorithms, and Algorithms for Big Data courses taught at SSIT.(48:42) Dzejla provided a brief overview of the Computer Science/Software Engineering department at the International University of Sarajevo (where she has been a professor since 2017.(50:57) Dzejla briefly talked about the courses that she taught at IUS, including Intro to Programming, Human-Computer Interaction, and Algorithms/Data Structures.(52:49) Dzejla shared the challenges of writing Algorithms and Data Structures for Massive Datasets, which introduces data processing and analytics techniques specifically designed for large distributed datasets.(56:14) Dzejla explained concepts in Part 1 of the book — including Hash Tables, Approximate Membership, Bloom Filters, Frequency/Cardinality Estimation, Count-Min Sketch, and Hyperloglog.(58:38) Dzejla provided a brief overview of techniques to handle streaming data in Part 2 of the book.(01:00:14) Dzejla mentioned the data structures for large databases and external-memory algorithms in Part 3 of the book.(01:02:15) Dzejla shared her thoughts about the tech community in Sarajevo.(01:04:16) Closing segment.Dzejla’s Contact InfoLinkedIn Twitter Google ScholarMentioned Content

Papers

“Upper and Lower Bounds on Sorting and Searching in External Memory” (Dzejla’s Ph.D. Thesis, 2014)“Don’t Thrash: How to Cache Your Hash on Flash” (2012)“The batched predecessor problem in external memory” (2014)

People

Erik Demaine (Computer Science Professor at MIT)Michael Bender (Computer Science Professor at Stony Brook, Dzejla’s Ph.D. Advisor)Joseph Mitchell (Computational Geometry Professor at Stony Brook)Steven Skiena (Computer Science Professor at Stony Brook)Jeff Erickson (Computer Science Professor at UIUC)

Books

“Algorithms and Data Structures for Massive Datasets” (by Dzejla Medjedovic, Emin Tahirovic, and Ines Dedovic)“The Algorithm Design Manual” (by Steven Skiena)

Here is a permanent 40% discount code (good for all Manning products in all formats) for Datacast listeners: poddcast19. Link at http://mng.bz/4MAR.

Here is one free eBook code good for a copy of Algorithms and Data Structures for Massive Datasets for a lucky listener: algdcsr-7135. Link at http://mng.bz/Q2y6

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email [email protected].

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

Listen on Spotify Listen on Apple Podcasts Listen on Google Podcasts

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Twitter Mentions

@dzejla19