Distributed Data Show artwork

Distributed Data Show

154 episodes - English - Latest episode: over 3 years ago - ★★★★★ - 15 ratings

The Distributed Data Podcast is your weekly source for the latest news and technical expertise to help you succeed in building large-scale distributed systems. Brought to you by the Developer Advocate team, we go in-depth with DataStax engineers and special guests from the broader data community. New episodes each Tuesday.

Technology apachecassandra database hybridcloud multicloud nosql opensource
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

Distributed Data Show Episode 56: Multi-Cloud: The What & Why

July 17, 2018 15:00 - 7 minutes - 7.24 MB

Today we’re tackling what confusion my exist around the various *-clouds. We have multi-cloud, hybrid-cloud, private-cloud, and so many more. After we narrow down the multi-cloud description to it’s key points the conversation begins around why we’d want something multi-cloud to begin with! It’s the beginning of a series of conversations we’re going to have around the multi-cloud space and all of the complexities, advantages, lock-in, and related topics around this new foray into distributed ...

Distributed Data Show Episode 55: Enterprise Transformation with Chelsea Navo

July 10, 2018 15:00 - 27 minutes - 25 MB

We talk with DataStax Vanguard Lead, Chelsea Navo about how the Vanguard team and DataStax helps enterprises transform to meet the challenges posed by disrupting technology and competitors. See omnystudio.com/listener for privacy information.

Distributed Data Show Episode 54: Graph Processing Trends With Jonathan Lacefield

July 03, 2018 15:00 - 35 minutes - 32.8 MB

We sit down with Jonathan Lacefield to discuss the latest trends in graph processing, from DataStax perspective? Talk through how the challenges being associated with graph are evolving, current tips/tricks, tools being seen, etc. See omnystudio.com/policies/listener for privacy information.

Distributed Data Show Episode 53: Disruptive Innovation with Matthias Brocheler

June 26, 2018 15:00 - 20 minutes - 19 MB

There’s more to the lean product strategy than just building skateboards. Matthias Brocheler joins guest host Kathryn Erickson to discuss Disruptive Innovation. We’ll discuss and provide practical examples of how to explore an idea you have or a problem you want to solve, how to size your potential market, and when to start writing code.

Distributed Data Show Episode 52: Benchmarking with Nitsan Wakart

June 19, 2018 15:00 - 31 minutes - 29 MB

Are all benchmarks lies? Nitsan Wakart joins the show to explain the discipline of performance engineering, the ingredients of an effective benchmark, why you should always create custom benchmarks based on your expected workload, and the benchmarking effort we undertook for DSE 6.

Distributed Data Show Episode 51: Graph Tips and Tricks with Ted Wilmes

June 12, 2018 15:00 - 22 minutes - 21 MB

David and Jeff talk with Ted Wilmes from Expero about best practices regarding DSE Graph and the importance of proper data modeling.

Distributed Data Show Episode 50: Think Like A Support Engineer With Sequoyha Pelletier

June 05, 2018 15:00 - 19 minutes - 17.8 MB

We talk with support engineer Sequoyha Pelletier about the support team, they’re training gauntlet, and get a bunch of tips and tricks to use in troubleshooting clusters and submitting support tickets. Highlights! 0:16 - David introduces Sequoyha Pelletier to the Distributed Data Show 0:42 - Sequoyha gives an overview of the support team experience 1:35 - Common issues support handles regularly 4:02 - Data modeling challenges and the largest partition contest 5:49 - DSE clients should engage ...

Distributed Data Show Episode 49: Bulk Loading with Brian Hess

May 29, 2018 17:07 - 25 minutes - 23.3 MB

Brian Hess joins the show to explain why the bulk loader is a vital tool for a distributed database, the history of bulk loaders for Apache Cassandra, and the virtues of the new DSBulk.

Distributed Data Show Episode 48: DataStax Drivers with Chris Splinter

May 22, 2018 15:00 - 37 minutes - 34 MB

We talk with DataStax product manager for developer solutions, Chris Splinter about new DSE 6 driver features and peer into bright the future of driver development.

Distributed Data Show Episode 47: NodeSync with Sylvain Lebresne

May 15, 2018 15:00 - 13 minutes - 12.1 MB

Sylvain Lebresne shares what’s new and awesome with NodeSync in DSE 6, including what simplicity it brings to operational operations, improvements in performance for repair and how well it is integrated with DSE through CQL and OpsCenter.

Distributed Data Show Episode 46: DSE 6 Analytics with Brian Hess

May 08, 2018 15:00 - 31 minutes - 28.8 MB

Distributed Data Show Episode 46: DSE 6 Analytics with Brian Hess by DataStax Developers

Distributed Data Show Episode 45: Search in DSE 6 with Nick Panahi

May 01, 2018 15:00 - 25 minutes - 22.9 MB

Nick Panahi shares what’s new and awesome with Search in DSE 6, including what you can do with search-enabled CQL queries, the performance enhancements you can expect to see, and why configuring search just got easier.

Distributed Data Show Episode 44: Thread Per Core with Jake Luciani

April 24, 2018 15:00 - 28 minutes - 26.6 MB

Jake Luciani takes us behind the scenes to explain how the principle of mechanical sympathy was applied to DataStax Enterprise 6 in the new Thread Per Core feature. DSE 6 is demonstrating 2x improvements in read/write latency compared to DSE 5.1 / open source Apache Cassandra.

Distributed Data Show Episode 43: Introducing DSE 6 with Robin Schumacher

April 17, 2018 12:00 - 35 minutes - 32.6 MB

Robin Schumacher joins the show to take us behind the scenes of the brand new DataStax Enterprise 6 release, sharing how a focus on customer value, operational simplicity and building a unified platform led to new features like Advanced Performance, NodeSync and many others.

Distributed Data Show Episdoe 42: Updating KillrVideo for DSE Search and Docker

April 10, 2018 15:00 - 24 minutes - 22 MB

Cedrick Lunven interviews David Gilardi and Jeff Carpenter about their recent additions to KillrVideo, a reference application for Apache Cassandra and DataStax Enterprise. David upgraded the existing search feature based on CQL to use DSE Search, while Jeff configured the desktop deployment of KillrVideo to use the official DSE Docker images.

Distributed Data Show Episode 41: Graph-based Genealogy with Dave Bechberger

April 03, 2018 15:00 - 21 minutes - 19.4 MB

Migrating from a Relational application to a Graph based application is an undertaking that takes forethought, planning and the right use case. The challenges with taking a team used to working in a relational world and transitioning them to a distributed, eventually consistent system based on a graph are many. In this episode we talk to Dave Bechberger who is the Chief Software for Gene By Gene which is a Bioinformatics company specializing in Genetic Genealogy. Dave will share his experienc...

Distributed Data Show Episode 40: Feature flags with Cedrick Lunven

March 27, 2018 07:09 - 23 minutes - 21.8 MB

Feature Flags, also named Feature Toggle is a software development pattern allowing to enable and disable features within your applications at runtime. Cedrick has been working on an implementation in Java for nearly five years now. He will share his experience on succeeding to implement this patterns especially on large architecture and distributed systems. We cover expected use cases, underlying data model and architecture concerns.

Distributed Data Show Episode 39: Cassandra on Kubernetes with Aaron Ploetz

March 20, 2018 16:19 - 9 minutes - 8.52 MB

We talk with Aaron Ploetz of Target about how they implemented Cassandra on Kubernetes and celebrate Aaron passing Jonathan Ellis to the top of the Cassandra tag in Stackoverflow. Highlights! 0:15 - David welcomes Cassandra MVP Aaron Ploetz to the show and celebrates Aaron passing Jonathan Ellis to the top of the Cassandra tag in Stack Overflow 1:12 - Aaron explains how Target is moving toward on-demand deployments of Cassandra using Kubernetes, starting on Target’s private cloud running Open...

Distributed Data Show Episode 38: Spark 3.0 and Beyond with Holden Karau

March 13, 2018 15:31 - 12 minutes - 11.2 MB

David Gilardi talks with Holden Karau of Google to mine many wonderful nuggets on the future of Spark and find out what might happen if she had a magic wand of awesomeness. Highlights! 0:15 - Welcoming Holden back to the show 0:30 - So what exactly is going to be in Spark 3? Significant updates to the SQL and Machine Learning (ML) APIs. There are missing pieces in ML API, adding them will cause breaking changes to existing models. One example is support for online model serving. 2:25 - The Da...

Distributed Data Show Episode 37: Cassandra at Instagram with Dikang Gu

March 05, 2018 21:16 - 23 minutes - 21.2 MB

We talk with Dikang Gu about Instagram’s experience migrating from Apache Cassandra 2.2 to 3.0 and using Rocks DB as a pluggable storage engine for Cassandra.

Distributed Data Show Episode 36: Graph Invariants Galore with Denise Gosnell

February 27, 2018 16:04 - 16 minutes - 14.8 MB

We talk with Dr. Denise Gosnell, graph consultant at DataStax about creative applications of graph technology, TOTAL DOMINATION, and some surprising use cases. Highlights! 0:15 - David welcomes Denise back to the show 0:40 - Denise recaps how to know if you have a graph problem 1:06 - Introducing some interesting graph invariant techniques that we can apply to a variety of problems 1:55 - Chromatic numbers refer to the minimum number of colors required to color a graph so that adjacent vertic...

Distributed Data Show Episode 35: Apache Cassandra vs. the Cloud Databases with Jonathan Ellis

February 20, 2018 06:15 - 41 minutes - 37.9 MB

DataStax CTO Jonathan Ellis compares the tradeoffs, strengths, and weaknesses of Apache Cassandra vs. Amazon’s DynamoDB, Microsoft’s Azure Cosmos DB, and Google’s Cloud Spanner.

Distributed Data Show Episode 34: Spark 2.3 with Holden Karau

February 13, 2018 08:28 - 21 minutes - 19.3 MB

Patrick McFadin catches up with Holden Karau of Google to learn about new features of Spark 2.3, including Vectorized UDFs, Microbatch improvements, and Kubernetes support. Along the way, they explore whether API stability is an indicator that it’s time to make a career move.

Distributed Data Show Episode 33: Big Data and Blockchain

February 06, 2018 06:54 - 18 minutes - 16.8 MB

DuyHai Doan leads a discussion with Patrick McFadin and Jeff Carpenter on whether blockchain is a database, how blockchains and distributed databases complement each other, and what we can learn about distributed systems by looking at blockchain technology. Highlights: 1:00 - DuyHai gives a quick introduction to blockchain 3:01 - Patrick explains why blockchain != Bitcoin 3:49 - We discuss whether a blockchain can be considered a type of database 5:38 - Blockchain limitations: limited transac...

Distributed Data Show Episode 32: Search with Nick Panahi

January 30, 2018 15:55 - 23 minutes - 21.5 MB

We talk with Nick Panahi about DSE Search and the direction of the product moving forward.

Distributed Data Show Episode 31: Microservices and Data - Best Practices

January 23, 2018 07:28 - 29 minutes - 26.8 MB

The evangelist team shares some its internal discussions and debates about the intersection of microservice architecture and the data tier.

Distributed Data Show Episode 30: Orchestration with Kathryn Erickson

January 09, 2018 16:16 - 17 minutes - 16.5 MB

We talk with Kat Erickson about the popularity of orchestration frameworks like Kubernetes and the pros and cons of using orchestration for deployment of distributed databases.

Distributed Data Show Episode 28: 2018 Predictions

January 02, 2018 08:58 - 23 minutes - 21.3 MB

The DataStax Evangelist team offers their predictions on the big technology trends to look for in 2018, including microservices and service meshes, containers and orchestration, and the emergence of higher level managed services for machine learning and AI applications.

Distributed Data Show Episode 27: 2017 In Review

December 26, 2017 19:42 - 16 minutes - 15.2 MB

The DataStax Evangelist team talks about the big tech trends of 2017 for large scale distributed systems including containers, orchestration, and graph databases.

Distributed Data Show Episode 26: Partitioning Techniques with DuyHai Doan and Patrick McFadin

December 19, 2017 07:32 - 16 minutes - 15.2 MB

DuyHai Doan and Patrick McFadin explain the primary two ways of distributing data used in computer science: hash-based partitioning and range-based partitioning, and the implications of each of these on operations (hint: rebalancing!).

Distributed Data Show Episode 25: Adding a Graph-Based Recommender to KillrVideo with David Gilardi

December 12, 2017 15:38 - 23 minutes - 21.2 MB

Patrick McFadin talks with David Gilardi about the new recommendation engine recently added to the KillrVideo reference application using DSE Graph and Java DSL’s (Domain Specific Language).

Distributed Data Show Episode 24: Pre-Aggregation in an Eventually Consistent World with DuyHai Doan

December 05, 2017 08:29 - 20 minutes - 19.2 MB

Luke Tillman and DuyHai Doan talk about why pre-aggregation is difficult on an eventually consistent database like Apache Cassandra, and debate whether the storage engine in Cassandra should be made pluggable.

Distributed Data Show Episode 23: DataStax Managed Cloud with Darla Baker and Kiyu Gabriel

November 28, 2017 09:57 - 13 minutes - 12.4 MB

David Gilardi talks with Kiyu Gabriel and Darla Baker about the DataStax Managed Cloud and gets some details on the advantages of using a managed service for your distributed database.

Distributed Data Show Episode 22: Docker with Kathryn Erickson

November 21, 2017 08:07 - 17 minutes - 16.3 MB

Kat Erickson announces the availability of official Docker images of DataStax Enterprise and why you should use them.

Distributed Data Show Episode 21: Debugging Gremlin Queries with DuyHai Doan

November 14, 2017 08:19 - 24 minutes - 22.1 MB

DuyHai Doan shares his advice on debugging graph traversals using the Gremlin query language, including how to identify and fix performance bottlenecks and his thoughts on the “supernode” challenge.

Distributed Data Show Episode 20: Domain Specific Languages for Graph in Java with Stephen Mallette

November 07, 2017 15:34 - 20 minutes - 18.9 MB

David Gilardi talks with Stephen Mallette about domain specific languages for graph databases in Java, when you should use a DSL, and some of the implementation details you’ll want to know to succeed.

Distributed Data Show Episode 19: Data Modeling Horror Stories

October 31, 2017 06:16 - 15 minutes - 14.6 MB

Patrick McFadin, Luke Tillman and Jeff Carpenter sit around the campfire telling Cassandra data modeling horror stories, dad jokes and the occasional spooky noise.

Distributed Data Show Episode 18: Securing Distributed Databases with Wei Deng

October 24, 2017 04:28 - 36 minutes - 33.4 MB

Wei Deng talks about the challenges involved in securing distributed databases, the latest security features in DataStax Enterprise, and recommended techniques to help you and your company stay out of the headlines.

Distributed Data Show Episode 17: Apache Kudu Architecture Analysis with DuyHai Doan

October 17, 2017 06:32 - 18 minutes - 16.6 MB

DuyHai Doan takes us inside of Apache Kudu, a data store designed to support fast access for analytics in the Hadoop ecosystem. We compare Kudu’s architecture with Apache Cassandra and discuss why effective design patterns for distributed systems show up again and again.

Distributed Data Show Episode 16: Everything is Not a Graph Problem with Denise Gosnell

October 10, 2017 06:12 - 25 minutes - 23.8 MB

Denise Gosnell talks about working with Graph and why not every problem is a graph problem.

Distributed Data Show Episode 15: Indexing Techniques with DuyHai Doan

October 03, 2017 06:30 - 18 minutes - 16.6 MB

DuyHai Doan talks about secondary indexes in Apache Cassandra, including how they work, how they are different than indexes in relational databases, the various implementations available, and when to use them.

Distributed Data Show Episode 14: Distributed Tracing with Luke Tillman

September 26, 2017 05:52 - 32 minutes - 29.7 MB

Luke Tillman talks about his dogfooding project for DataStax Academy, challenges of developing microservice applications, and how distributed tracing throughout the stack can help.

Distributed Data Show Episode 12: Trends in Data Development

September 12, 2017 16:02 - 31 minutes - 28.6 MB

Patrick McFadin and Jeff Carpenter talk about the latest trends in data development and share their thoughts on which trends you should be following and what tech it’s time to start learning, and why Jeff looks familiar.

Distributed Data Show Episode 11: Catching up on Apache Spark with Russ Spitzer

September 05, 2017 16:46 - 27 minutes - 25.2 MB

DSE Analytics badass Russ Spitzer brings us up to speed on the latest developments in Apache Spark and the implications for DataStax Enterprise Analytics.

Distributed Data Show Episode 10: Mastering Cassandra and DataStax Enterprise with Tanya Gallagher

August 29, 2017 15:05 - 16 minutes - 15.5 MB

Tanya Gallagher explains why you need a learning path for Apache Cassandra and DataStax Enterprise and gives us an inside look at how the DataStax Curriculum Engineering team stays up to speed as they maintain content like our free online courses at DataStax Academy and live, instructor-led training.

Distributed Data Show Episode 9: Avoiding Apache Cassandra Replication Mistakes with DuyHai Doan

August 22, 2017 14:55 - 22 minutes - 20.9 MB

DuyHai Doan (@doanduyhai) shares about his experiences supporting DataStax customers in Europe, including some of the most common misunderstandings he sees regarding configuring Apache Cassandra and DataStax Enterprise clusters for high availability.

Distributed Data Show Episode 8: DataStax Studio Power Features with Bob Briody

August 15, 2017 14:49 - 22 minutes - 21 MB

Bob Briody (@bobbriody) explains the origins and capabilities of DataStax Studio, a powerful developer enablement tool for querying and visualizing both graph (Gremlin) and CQL data. Join us to learn about the most underrated feature of Studio and some hints on new features the Studio team has in the works.

Distributed Data Show Episode 7: Top 10 things Apache Cassandra Users Need to Know

August 08, 2017 14:53 - 27 minutes - 25.1 MB

Jeff Carpenter (@jscarp) chats with Patrick McFadin (@PatrickMcFadin) about the joys and challenges of building distributed systems using Apache Cassandra and reveals the “Top Ten” features of DataStax Enterprise that can help address the challenges.

Distributed Data Show Episode 6: Optimizing Gremlin Traversals with Sebastian Estevez

August 01, 2017 06:15 - 17 minutes - 15.8 MB

Solutions Engineer Sebastian Estevez (@syllogistic) shares his secrets for optimizing graph traversals using the Gremlin query language and why he’s excited about the future of DataStax Enterprise Graph.

Distributed Data Show Episode 5: Increasing Developer Productivity using DataStax Enterprise

July 25, 2017 05:17 - 20 minutes - 19.1 MB

After 20 plus years in IT and programming using relational databases, David Gilardi (@SonicDMG) started fresh into Apache Cassandra and DataStax Enterprise. David shares what he discovered about DataStax Enterprise that could have increased his productivity in previous applications. Learn David’s advice for conquering the learning curve so you can be successful with DataStax Enterprise in your applications.