Distributed Data Show
154 episodes - English - Latest episode: almost 4 years ago - ★★★★★ - 15 ratingsThe Distributed Data Podcast is your weekly source for the latest news and technical expertise to help you succeed in building large-scale distributed systems. Brought to you by the Developer Advocate team, we go in-depth with DataStax engineers and special guests from the broader data community. New episodes each Tuesday.
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed
Episodes
Distributed Data Show Episode 56: Multi-Cloud: The What & Why
July 17, 2018 15:00 - 7 minutes - 7.24 MBToday we’re tackling what confusion my exist around the various *-clouds. We have multi-cloud, hybrid-cloud, private-cloud, and so many more. After we narrow down the multi-cloud description to it’s key points the conversation begins around why we’d want something multi-cloud to begin with! It’s the beginning of a series of conversations we’re going to have around the multi-cloud space and all of the complexities, advantages, lock-in, and related topics around this new foray into distributed ...
Distributed Data Show Episode 55: Enterprise Transformation with Chelsea Navo
July 10, 2018 15:00 - 27 minutes - 25 MBWe talk with DataStax Vanguard Lead, Chelsea Navo about how the Vanguard team and DataStax helps enterprises transform to meet the challenges posed by disrupting technology and competitors. See omnystudio.com/listener for privacy information.
Distributed Data Show Episode 54: Graph Processing Trends With Jonathan Lacefield
July 03, 2018 15:00 - 35 minutes - 32.8 MBWe sit down with Jonathan Lacefield to discuss the latest trends in graph processing, from DataStax perspective? Talk through how the challenges being associated with graph are evolving, current tips/tricks, tools being seen, etc. See omnystudio.com/policies/listener for privacy information.
Distributed Data Show Episode 53: Disruptive Innovation with Matthias Brocheler
June 26, 2018 15:00 - 20 minutes - 19 MBThere’s more to the lean product strategy than just building skateboards. Matthias Brocheler joins guest host Kathryn Erickson to discuss Disruptive Innovation. We’ll discuss and provide practical examples of how to explore an idea you have or a problem you want to solve, how to size your potential market, and when to start writing code.
Distributed Data Show Episode 52: Benchmarking with Nitsan Wakart
June 19, 2018 15:00 - 31 minutes - 29 MBAre all benchmarks lies? Nitsan Wakart joins the show to explain the discipline of performance engineering, the ingredients of an effective benchmark, why you should always create custom benchmarks based on your expected workload, and the benchmarking effort we undertook for DSE 6.
Distributed Data Show Episode 51: Graph Tips and Tricks with Ted Wilmes
June 12, 2018 15:00 - 22 minutes - 21 MBDavid and Jeff talk with Ted Wilmes from Expero about best practices regarding DSE Graph and the importance of proper data modeling.
Distributed Data Show Episode 50: Think Like A Support Engineer With Sequoyha Pelletier
June 05, 2018 15:00 - 19 minutes - 17.8 MBWe talk with support engineer Sequoyha Pelletier about the support team, they’re training gauntlet, and get a bunch of tips and tricks to use in troubleshooting clusters and submitting support tickets. Highlights! 0:16 - David introduces Sequoyha Pelletier to the Distributed Data Show 0:42 - Sequoyha gives an overview of the support team experience 1:35 - Common issues support handles regularly 4:02 - Data modeling challenges and the largest partition contest 5:49 - DSE clients should engage ...
Distributed Data Show Episode 49: Bulk Loading with Brian Hess
May 29, 2018 17:07 - 25 minutes - 23.3 MBBrian Hess joins the show to explain why the bulk loader is a vital tool for a distributed database, the history of bulk loaders for Apache Cassandra, and the virtues of the new DSBulk.
Distributed Data Show Episode 48: DataStax Drivers with Chris Splinter
May 22, 2018 15:00 - 37 minutes - 34 MBWe talk with DataStax product manager for developer solutions, Chris Splinter about new DSE 6 driver features and peer into bright the future of driver development.
Distributed Data Show Episode 47: NodeSync with Sylvain Lebresne
May 15, 2018 15:00 - 13 minutes - 12.1 MBSylvain Lebresne shares what’s new and awesome with NodeSync in DSE 6, including what simplicity it brings to operational operations, improvements in performance for repair and how well it is integrated with DSE through CQL and OpsCenter.
Distributed Data Show Episode 46: DSE 6 Analytics with Brian Hess
May 08, 2018 15:00 - 31 minutes - 28.8 MBDistributed Data Show Episode 46: DSE 6 Analytics with Brian Hess by DataStax Developers
Distributed Data Show Episode 45: Search in DSE 6 with Nick Panahi
May 01, 2018 15:00 - 25 minutes - 22.9 MBNick Panahi shares what’s new and awesome with Search in DSE 6, including what you can do with search-enabled CQL queries, the performance enhancements you can expect to see, and why configuring search just got easier.
Distributed Data Show Episode 44: Thread Per Core with Jake Luciani
April 24, 2018 15:00 - 28 minutes - 26.6 MBJake Luciani takes us behind the scenes to explain how the principle of mechanical sympathy was applied to DataStax Enterprise 6 in the new Thread Per Core feature. DSE 6 is demonstrating 2x improvements in read/write latency compared to DSE 5.1 / open source Apache Cassandra.
Distributed Data Show Episode 43: Introducing DSE 6 with Robin Schumacher
April 17, 2018 12:00 - 35 minutes - 32.6 MBRobin Schumacher joins the show to take us behind the scenes of the brand new DataStax Enterprise 6 release, sharing how a focus on customer value, operational simplicity and building a unified platform led to new features like Advanced Performance, NodeSync and many others.
Distributed Data Show Episdoe 42: Updating KillrVideo for DSE Search and Docker
April 10, 2018 15:00 - 24 minutes - 22 MBCedrick Lunven interviews David Gilardi and Jeff Carpenter about their recent additions to KillrVideo, a reference application for Apache Cassandra and DataStax Enterprise. David upgraded the existing search feature based on CQL to use DSE Search, while Jeff configured the desktop deployment of KillrVideo to use the official DSE Docker images.
Distributed Data Show Episode 41: Graph-based Genealogy with Dave Bechberger
April 03, 2018 15:00 - 21 minutes - 19.4 MBMigrating from a Relational application to a Graph based application is an undertaking that takes forethought, planning and the right use case. The challenges with taking a team used to working in a relational world and transitioning them to a distributed, eventually consistent system based on a graph are many. In this episode we talk to Dave Bechberger who is the Chief Software for Gene By Gene which is a Bioinformatics company specializing in Genetic Genealogy. Dave will share his experienc...
Distributed Data Show Episode 40: Feature flags with Cedrick Lunven
March 27, 2018 07:09 - 23 minutes - 21.8 MBFeature Flags, also named Feature Toggle is a software development pattern allowing to enable and disable features within your applications at runtime. Cedrick has been working on an implementation in Java for nearly five years now. He will share his experience on succeeding to implement this patterns especially on large architecture and distributed systems. We cover expected use cases, underlying data model and architecture concerns.
Distributed Data Show Episode 39: Cassandra on Kubernetes with Aaron Ploetz
March 20, 2018 16:19 - 9 minutes - 8.52 MBWe talk with Aaron Ploetz of Target about how they implemented Cassandra on Kubernetes and celebrate Aaron passing Jonathan Ellis to the top of the Cassandra tag in Stackoverflow. Highlights! 0:15 - David welcomes Cassandra MVP Aaron Ploetz to the show and celebrates Aaron passing Jonathan Ellis to the top of the Cassandra tag in Stack Overflow 1:12 - Aaron explains how Target is moving toward on-demand deployments of Cassandra using Kubernetes, starting on Target’s private cloud running Open...
Distributed Data Show Episode 38: Spark 3.0 and Beyond with Holden Karau
March 13, 2018 15:31 - 12 minutes - 11.2 MBDavid Gilardi talks with Holden Karau of Google to mine many wonderful nuggets on the future of Spark and find out what might happen if she had a magic wand of awesomeness. Highlights! 0:15 - Welcoming Holden back to the show 0:30 - So what exactly is going to be in Spark 3? Significant updates to the SQL and Machine Learning (ML) APIs. There are missing pieces in ML API, adding them will cause breaking changes to existing models. One example is support for online model serving. 2:25 - The Da...
Distributed Data Show Episode 37: Cassandra at Instagram with Dikang Gu
March 05, 2018 21:16 - 23 minutes - 21.2 MBWe talk with Dikang Gu about Instagram’s experience migrating from Apache Cassandra 2.2 to 3.0 and using Rocks DB as a pluggable storage engine for Cassandra.
Distributed Data Show Episode 36: Graph Invariants Galore with Denise Gosnell
February 27, 2018 16:04 - 16 minutes - 14.8 MBWe talk with Dr. Denise Gosnell, graph consultant at DataStax about creative applications of graph technology, TOTAL DOMINATION, and some surprising use cases. Highlights! 0:15 - David welcomes Denise back to the show 0:40 - Denise recaps how to know if you have a graph problem 1:06 - Introducing some interesting graph invariant techniques that we can apply to a variety of problems 1:55 - Chromatic numbers refer to the minimum number of colors required to color a graph so that adjacent vertic...
Distributed Data Show Episode 35: Apache Cassandra vs. the Cloud Databases with Jonathan Ellis
February 20, 2018 06:15 - 41 minutes - 37.9 MBDataStax CTO Jonathan Ellis compares the tradeoffs, strengths, and weaknesses of Apache Cassandra vs. Amazon’s DynamoDB, Microsoft’s Azure Cosmos DB, and Google’s Cloud Spanner.
Distributed Data Show Episode 34: Spark 2.3 with Holden Karau
February 13, 2018 08:28 - 21 minutes - 19.3 MBPatrick McFadin catches up with Holden Karau of Google to learn about new features of Spark 2.3, including Vectorized UDFs, Microbatch improvements, and Kubernetes support. Along the way, they explore whether API stability is an indicator that it’s time to make a career move.
Distributed Data Show Episode 33: Big Data and Blockchain
February 06, 2018 06:54 - 18 minutes - 16.8 MBDuyHai Doan leads a discussion with Patrick McFadin and Jeff Carpenter on whether blockchain is a database, how blockchains and distributed databases complement each other, and what we can learn about distributed systems by looking at blockchain technology. Highlights: 1:00 - DuyHai gives a quick introduction to blockchain 3:01 - Patrick explains why blockchain != Bitcoin 3:49 - We discuss whether a blockchain can be considered a type of database 5:38 - Blockchain limitations: limited transac...
Distributed Data Show Episode 32: Search with Nick Panahi
January 30, 2018 15:55 - 23 minutes - 21.5 MBWe talk with Nick Panahi about DSE Search and the direction of the product moving forward.
Distributed Data Show Episode 31: Microservices and Data - Best Practices
January 23, 2018 07:28 - 29 minutes - 26.8 MBThe evangelist team shares some its internal discussions and debates about the intersection of microservice architecture and the data tier.
Distributed Data Show Episode 30: Orchestration with Kathryn Erickson
January 09, 2018 16:16 - 17 minutes - 16.5 MBWe talk with Kat Erickson about the popularity of orchestration frameworks like Kubernetes and the pros and cons of using orchestration for deployment of distributed databases.
Distributed Data Show Episode 28: 2018 Predictions
January 02, 2018 08:58 - 23 minutes - 21.3 MBThe DataStax Evangelist team offers their predictions on the big technology trends to look for in 2018, including microservices and service meshes, containers and orchestration, and the emergence of higher level managed services for machine learning and AI applications.
Distributed Data Show Episode 27: 2017 In Review
December 26, 2017 19:42 - 16 minutes - 15.2 MBThe DataStax Evangelist team talks about the big tech trends of 2017 for large scale distributed systems including containers, orchestration, and graph databases.
Distributed Data Show Episode 26: Partitioning Techniques with DuyHai Doan and Patrick McFadin
December 19, 2017 07:32 - 16 minutes - 15.2 MBDuyHai Doan and Patrick McFadin explain the primary two ways of distributing data used in computer science: hash-based partitioning and range-based partitioning, and the implications of each of these on operations (hint: rebalancing!).
Distributed Data Show Episode 25: Adding a Graph-Based Recommender to KillrVideo with David Gilardi
December 12, 2017 15:38 - 23 minutes - 21.2 MBPatrick McFadin talks with David Gilardi about the new recommendation engine recently added to the KillrVideo reference application using DSE Graph and Java DSL’s (Domain Specific Language).
Distributed Data Show Episode 24: Pre-Aggregation in an Eventually Consistent World with DuyHai Doan
December 05, 2017 08:29 - 20 minutes - 19.2 MBLuke Tillman and DuyHai Doan talk about why pre-aggregation is difficult on an eventually consistent database like Apache Cassandra, and debate whether the storage engine in Cassandra should be made pluggable.
Distributed Data Show Episode 23: DataStax Managed Cloud with Darla Baker and Kiyu Gabriel
November 28, 2017 09:57 - 13 minutes - 12.4 MBDavid Gilardi talks with Kiyu Gabriel and Darla Baker about the DataStax Managed Cloud and gets some details on the advantages of using a managed service for your distributed database.
Distributed Data Show Episode 22: Docker with Kathryn Erickson
November 21, 2017 08:07 - 17 minutes - 16.3 MBKat Erickson announces the availability of official Docker images of DataStax Enterprise and why you should use them.
Distributed Data Show Episode 21: Debugging Gremlin Queries with DuyHai Doan
November 14, 2017 08:19 - 24 minutes - 22.1 MBDuyHai Doan shares his advice on debugging graph traversals using the Gremlin query language, including how to identify and fix performance bottlenecks and his thoughts on the “supernode” challenge.
Distributed Data Show Episode 20: Domain Specific Languages for Graph in Java with Stephen Mallette
November 07, 2017 15:34 - 20 minutes - 18.9 MBDavid Gilardi talks with Stephen Mallette about domain specific languages for graph databases in Java, when you should use a DSL, and some of the implementation details you’ll want to know to succeed.
Distributed Data Show Episode 19: Data Modeling Horror Stories
October 31, 2017 06:16 - 15 minutes - 14.6 MBPatrick McFadin, Luke Tillman and Jeff Carpenter sit around the campfire telling Cassandra data modeling horror stories, dad jokes and the occasional spooky noise.
Distributed Data Show Episode 18: Securing Distributed Databases with Wei Deng
October 24, 2017 04:28 - 36 minutes - 33.4 MBWei Deng talks about the challenges involved in securing distributed databases, the latest security features in DataStax Enterprise, and recommended techniques to help you and your company stay out of the headlines.
Distributed Data Show Episode 17: Apache Kudu Architecture Analysis with DuyHai Doan
October 17, 2017 06:32 - 18 minutes - 16.6 MBDuyHai Doan takes us inside of Apache Kudu, a data store designed to support fast access for analytics in the Hadoop ecosystem. We compare Kudu’s architecture with Apache Cassandra and discuss why effective design patterns for distributed systems show up again and again.
Distributed Data Show Episode 16: Everything is Not a Graph Problem with Denise Gosnell
October 10, 2017 06:12 - 25 minutes - 23.8 MBDenise Gosnell talks about working with Graph and why not every problem is a graph problem.
Distributed Data Show Episode 15: Indexing Techniques with DuyHai Doan
October 03, 2017 06:30 - 18 minutes - 16.6 MBDuyHai Doan talks about secondary indexes in Apache Cassandra, including how they work, how they are different than indexes in relational databases, the various implementations available, and when to use them.
Distributed Data Show Episode 14: Distributed Tracing with Luke Tillman
September 26, 2017 05:52 - 32 minutes - 29.7 MBLuke Tillman talks about his dogfooding project for DataStax Academy, challenges of developing microservice applications, and how distributed tracing throughout the stack can help.
Distributed Data Show Episode 12: Trends in Data Development
September 12, 2017 16:02 - 31 minutes - 28.6 MBPatrick McFadin and Jeff Carpenter talk about the latest trends in data development and share their thoughts on which trends you should be following and what tech it’s time to start learning, and why Jeff looks familiar.
Distributed Data Show Episode 11: Catching up on Apache Spark with Russ Spitzer
September 05, 2017 16:46 - 27 minutes - 25.2 MBDSE Analytics badass Russ Spitzer brings us up to speed on the latest developments in Apache Spark and the implications for DataStax Enterprise Analytics.
Distributed Data Show Episode 10: Mastering Cassandra and DataStax Enterprise with Tanya Gallagher
August 29, 2017 15:05 - 16 minutes - 15.5 MBTanya Gallagher explains why you need a learning path for Apache Cassandra and DataStax Enterprise and gives us an inside look at how the DataStax Curriculum Engineering team stays up to speed as they maintain content like our free online courses at DataStax Academy and live, instructor-led training.
Distributed Data Show Episode 9: Avoiding Apache Cassandra Replication Mistakes with DuyHai Doan
August 22, 2017 14:55 - 22 minutes - 20.9 MBDuyHai Doan (@doanduyhai) shares about his experiences supporting DataStax customers in Europe, including some of the most common misunderstandings he sees regarding configuring Apache Cassandra and DataStax Enterprise clusters for high availability.
Distributed Data Show Episode 8: DataStax Studio Power Features with Bob Briody
August 15, 2017 14:49 - 22 minutes - 21 MBBob Briody (@bobbriody) explains the origins and capabilities of DataStax Studio, a powerful developer enablement tool for querying and visualizing both graph (Gremlin) and CQL data. Join us to learn about the most underrated feature of Studio and some hints on new features the Studio team has in the works.
Distributed Data Show Episode 7: Top 10 things Apache Cassandra Users Need to Know
August 08, 2017 14:53 - 27 minutes - 25.1 MBJeff Carpenter (@jscarp) chats with Patrick McFadin (@PatrickMcFadin) about the joys and challenges of building distributed systems using Apache Cassandra and reveals the “Top Ten” features of DataStax Enterprise that can help address the challenges.
Distributed Data Show Episode 6: Optimizing Gremlin Traversals with Sebastian Estevez
August 01, 2017 06:15 - 17 minutes - 15.8 MBSolutions Engineer Sebastian Estevez (@syllogistic) shares his secrets for optimizing graph traversals using the Gremlin query language and why he’s excited about the future of DataStax Enterprise Graph.
Distributed Data Show Episode 5: Increasing Developer Productivity using DataStax Enterprise
July 25, 2017 05:17 - 20 minutes - 19.1 MBAfter 20 plus years in IT and programming using relational databases, David Gilardi (@SonicDMG) started fresh into Apache Cassandra and DataStax Enterprise. David shares what he discovered about DataStax Enterprise that could have increased his productivity in previous applications. Learn David’s advice for conquering the learning curve so you can be successful with DataStax Enterprise in your applications.