Roaring Elephant

Episode 111 – How Public Cloud changed Big Data

October 23, 2018 08:00 - 51 minutes - 35.4 MB

No interview this time but just Dave and Jhon talking about how public cloud changed Big data. Current news has brought this topic back to the foreground and we though it was a good idea to give our views on this subject. Along the way, we go over the different deployment strategies for Hadoop across on premise, private and public cloud and of course, hybrid environments. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode top...

Episode 110 – Roaring News

October 16, 2018 08:00 - 38 minutes - 26.6 MB

Another week, another Big Data News episode. After going over all the event ticket giveaways that are currently going on, we have an article that goes over the basics on ETL vs ELT and have some fun with R graphs by the XKCD web comic. We finish with an in depth article on columnar data stores and a quick shout-out to Apache Nifi. Breaking News Our thanks to our guest from H2O.ai: John Spooner Director of Solution Engineering, h2o.ai Dave: XKCD Curve Fitting in R http...

Episode 109 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 2

October 09, 2018 08:00 - 52 minutes - 36.1 MB

In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this second part, we discuss the ins and outs of good data stewardship and how companies can adopt, implement and contribute. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering htt...

Episode 108 – Roaring News

October 02, 2018 08:00 - 55 minutes - 38.7 MB

Another episode of Big Data News and not just another episode, but an episode packed and packed with items. Before we do our regular article reviews, we are doing raffles for not one, not two but three different events! And as if that was not enough, our friends from Pulsar dropped in with their big Apache top-level project announcement. So not very bite sized this time, but smack full of delicious Big Data news! Breaking News Our thanks to our guests: Solix Empower Sai Gundavelli Fou...

Episode 107 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 1

September 25, 2018 08:00 - 41 minutes - 29 MB

In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this first part, the focus is more on Mandy herself and we lay the groundwork for the second part that will go live in episode 109. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineerin...

Episode 106 – Roaring News

September 18, 2018 08:00 - 39 minutes - 27.2 MB

In this edition of Big Data News, we take the pulse of Machine learning adoption and talk about Big Data Online Learning by IBM on Coursera and by Columbia University on Edx. We round the episode off with a look at MR3 and the evil that are benchmarks. Breaking News Data Science Professional Certificate https://cognitiveclass.ai/blog/data-science-professional-certificate/ Taking the pulse of machine learning adoption https://www.zdnet.com/article/taking-the-pulse-of-machine-lear...

Episode 105 – Big Data at British Telecom with Phillip Radley

September 11, 2018 08:00 - 1 hour - 45.9 MB

In this episode we welcome Phil Radley, Chief Data Architect at BT to talk about the Big Data deployment at BT. Phillip Radley (Linkedin) Chief Data Architect @ BT https://home.bt.com/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Episode 104 – Roaring News

September 04, 2018 08:00 - 36 minutes - 25.6 MB

In this Big Data News episode, we discuss an article with guidelines on how you should arrange your data gathering projects with the customer in mind. Dave brings a matrix of visualization products. Breaking News The five Cs: Five framing guidelines to help you think about building data products. https://www.oreilly.com/ideas/the-five-cs?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content The Chartmaker Directory http://chartmaker.visualising...

Episode 103 – Apache Pulsar version 2.0 with Matteo and Sijie from Streamlio

August 28, 2018 08:00 - 43 minutes - 30.1 MB

Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts. the first of which was published in episode 101. Here is the second part with information on version 2.0 and the future of the Apache Pulsar project. Apache Pulsar logo The first subject taken on by Sijie is Pulsar Functions, followed by Matteo talking about the new schema registry and Topic Compaction. Wit...

Episode 102 – Roaring News

August 21, 2018 08:00 - 22 minutes - 15.4 MB

Big Data News at the end of the summer is not easy to find, but we did end up with three topics to discuss: from isolating GPUs in Hadoop 3.x to replicating big data (to the cloud) and quick tips from Adam's blog. Breaking News First Class GPUs support in Apache Hadoop 3.1, YARN & HDP 3.0 https://hortonworks.com/blog/gpus-support-in-apache-hadoop-3-1-yarn-hdp-3/ Replicating big datasets in the cloud https://medium.com/hotels-com-technology/replicating-big-datasets-in-the-cloud-c0...

Episode 101 – Apache Pulsar update with Matteo and Sijie from Streamlio

August 14, 2018 08:00 - 1 hour - 45.4 MB

Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts and here is the first part where they introduce Apache Pulsar, go in depth on the correct deployment scaling of a stable Pulsar cluster and clarify Pulsars "at least once vs exactly once" strategy. Part two will go in more depth on what's new. Stay tuned! Apache Pulsar logo Matteo Merli (https://www.linked...

Episode 100 – Celebrating our Centennial with the history of Hadoop

August 07, 2018 08:00 - 1 hour - 46.5 MB

100 Big Data episodes! We made it, in no small part thanks to our audience: you are who keeps us going! In this episode we celebrate our centennial by going over the history of Hadoop releases, highlighting the most noteworthy events along the way. Join us down the twisty paths of our memory lanes! The blockchain related Linkedin post Jhon liked The sources for this episode: http://hadoop.apache.org/releases.html https://en.wikipedia.org/wiki/Apache_Hadoop Debate over which com...

Episode 99 – The State of Big Data at Codemotion Amsterdam

July 31, 2018 08:00 - 45 minutes - 31.5 MB

The Roaring Elephant podcast was a guest at the Codemotion conference in Amsterdam a little while ago. This episode contains the audio of the talk we did on the State of Big Data. Our talk was dfinitely light on slideware, but if you want to see the video cast of our presentation, you can find it on the Codemotion youtube channel:Codemotion Amsterdam 2018: The State of Big Data by Roaring Elephant podcast Please use the Contact Form on this blog or our twitter feed to send us your questions...

Episode 98 – Roaring news

July 24, 2018 08:00 - 22 minutes - 15.5 MB

In this episode of Big Data Roaring News, Dave laments another announcement of Hadoop's demise and exposes A.I. imposters. Jhon has articles comparing Ranger with Sentry and Apache Nifi reaching the ripe age of 1.7 with a Minifi charged practical demo to prove the point. Breaking News Hadoop’s star dims in the era of cloud object data storage and stream computing https://siliconangle.com/blog/2018/07/09/hadoops-star-dims-era-cloud-object-data-storage-stream-computing/ The rise of “p...

Episode 97 – ODPi: A new world for data governance

July 17, 2018 08:00 - 1 hour - 46.9 MB

In this episode, we welcome back John Mertic one more time. It was quite obvious that John had lots more to talk about at the end of our last interview with him. ODPi has recently reinvented itself, moving away from a strict distribution standards body towards data governance and reference specifications. ODPi logo John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ ODPi website links: https://www.odpi.org...

Episode 96 – Roaring news

July 10, 2018 08:00 - 46 minutes - 31.9 MB

In this edition of Roaring news, Ward Bekker returns to discuss what is happening in the world of Big Data. Ward brings news on GPUs in supercomputers and how Big Data could be wrong about you. Dave and Jhon found articles on Big data growth visualizations and GDPR. Breaking News 10 Charts that will change your perspective of Big Data’s Growth https://www.forbes.com/sites/louiscolumbus/2018/05/23/10-charts-that-will-change-your-perspective-of-big-datas-growth/#1ea595702926 New GPU-A...

Episode 95 – DataWorks Summit in San Jose with Ward Bekker

July 03, 2018 08:00 - 1 hour - 77.7 MB

Since both Dave and Jhon were not able to attend the Dataworks Summit in San Jose a couple of weeks ago, we have a guest, Ward Bekker, who was happy to join and educate us on the subject. DataWorks Summit San Jose 2018 In this episode we discuss the daily keynotes and Wards' selection of sessions at the Summit ranging from the new things in Yarn 3.0, Materialized views in Hive and much more. Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II @ Hortonworks Some of the sessions a...

Episode 94 – Roaring news

June 26, 2018 08:00 - 37 minutes - 26.1 MB

I this weeks edition of Roaring Big Data News, Dave talks about modernizing Hadoop and a billion java errors. Jhon has an article on improving your learning data sets. We finish with a discussion about the newly released HDP 2.6.5 with an emphasis on the deprecation notices and Yarn Containers. Breaking News Dave Modernizing Hadoop: Reaching the plateau of productivity https://www.zdnet.com/article/modernizing-hadoop-reaching-the-plateau-of-productivity/ 1 billion Java errors, he...

Episode 93 – Apache Kylin: Extreme OLAP Engine for Big Data

June 19, 2018 08:00 - 46 minutes - 32 MB

In this episode Apache PMC member Dong Li joins us to explains how Apache Kylin can deploy Analytical OLAP cubes in your Big Data environment. http://kylin.apache.org/ Dong Li Technical Partner & Senior Architect of Kyligence (linkedin) PMC Member of Apache Kylin http://en.kyligence.io/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Episode 92 – Roaring news

June 12, 2018 08:00 - 46 minutes - 31.9 MB

Another week, another edition of Roaring Big Data News. This time, Dave talks about driving teens and Jhon takes a detailed look at an Eventbrite data pipeline article. Breaking News Dave Driver monitoring isn't just for teens; adults can benefit, too https://arstechnica.com/cars/2018/05/buicks-smart-driver-explains-why-my-gas-mileage-sucks-and-my-editors-doesnt/ Jhon Looking under the hood of the Eventbrite data pipeline! https://www.eventbrite.com/engineering/looking-und...

Episode 91 – ODPi is back and better than ever!

June 05, 2018 08:00 - 1 hour - 46.9 MB

In this episode, we welcome back John Mertic, director of Program Management for ODPi, R Consortium, and the Open Mainframe Project. It's been almost two years since we checked in with John and the ODPi initiative and as John mentions in the interview, a lot has changed in Hadoop... ODPi logo John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ ODPi website links: https://www.odpi.org/ https://www.odpi.or...

Episode 90 – Roaring news

May 29, 2018 08:00 - 38 minutes - 26.4 MB

In this weeks Roaring News episode, Dave brings up the resilience of Apache Community open source projects and plays some Doom. Jhon has some practical Apache NIFI guides and the emergence of multi modal NoSQL databases. Breaking News DataWorks Summit Berlin video recordings are up: https://www.youtube.com/user/HadoopSummit/playlists Find Dave on his Australian road-trip: http://bit.ly/aus-nz-ibm-hwx-tour Dave DataTorrent, Stream Processing Startup, Folds (Apache Apex) h...

Episode 89 – DataWorks Summit San Jose Agenda Review

May 22, 2018 08:00 - 1 hour - 49.9 MB

With the San Jose edition of the DataWorks Summit only a month away, we go over the sessions that are available in the agenda today and offer our top picks. If you're going, or if you will be watching the replays online, we hope to guide you on your selection of sessions. DataWorks Summit San Jose 2018 And here is the dashboard we created with statistics on the San Jose sessions, for your enjoyment: https://aka.ms/DWS2018SJ The agenda is still in flux so we will be updating the dashboard r...

Episode 88 – Roaring News

May 15, 2018 08:00 - 35 minutes - 24.4 MB

Returning to our more regular schedule, we have a Roaring News episode today. Dave has articles on multi-cloud readiness, Big Data being a pariah, and Google Duplex and Jhon came up with Synthetic data, data engineers and scientists and a Neural Network sharing cake recipes. Breaking News Dave Less than 10% ready for multi cloud http://www.cloudpro.co.uk/cloud-essentials/hybrid-cloud/7451/idc-less-than-10-of-organisations-are-ready-for-multi-cloud Tech companies distancing themse...

Episode 87 – Druid: a high-performance, column-oriented, distributed data store – part 2

May 08, 2018 08:00 - 31 minutes - 22.1 MB

This is the second part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners. Fangjin Yang Cofounder and CEO at Imply (linkedin) Please use the Contact Form on this blo...

Episode 86 – Druid: a high-performance, column-oriented, distributed data store – part 1

May 01, 2018 08:00 - 31 minutes - 22.2 MB

This is the first part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners. Fangjin Yang Cofounder and CEO at Imply (linkedin) Please use the Contact Form on this blog...

Episode 85 – DataWorks Summit Community Showcase Exhibitor Soundbites

April 24, 2018 08:00 - 30 minutes - 21 MB

This is the final part of our coverage of the DataWorks Summit Berlin 2018. Normally we would not have had an episode this week, since we were in Berlin last week, but we had lightning interviews with the vendors in the Community Expo Are and used that coverage to make this episode. So less of "Dave & Jhon" and more "ecosystem tech" snippets this time. Even though this does stray a bit from our usual content, we still hope it is useful. This was recorded in a hotel room and on the expo floo...

Episode 84 – DataWorks Summit Berlin – Day 2 Recap

April 19, 2018 19:38 - 1 hour - 62.3 MB

And with the end of day two of the 2018 DataWorks Summit in Berlin comes the end of this years Europe Summit. But never fear, we have an extra 90 minutes of DataWorks goodness for you to consume on your way home. No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us! Enjoy! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would lik...

Episode 83 – DataWorks Summit Berlin – Day 1 Recap

April 18, 2018 18:51 - 1 hour - 57.8 MB

Another year, another European Dataworks Summit, and yes, another daily recap show from Jhon and Dave. We walk through the keynotes and sessions we attended and give our thoughts and views. This should be useful for anyone who wasn't able to attend or those seeking to peek into sessions they couldn't make. No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us! Enjoy! Please use the Contact Form on this blog o...

Episode 82 – DataWorks Summit Berlin 2018 Preview

April 10, 2018 08:00 - 47 minutes - 33 MB

Next week is DataWorks Summit Berlin week! Your two hosts will be in attendance and in this episode we go over the agenda and plan which sessions we want to attend and why. Peppered throughout we add further insights and experiences from previous years. Unfortunately, Dave's network was a little unstable and there are a couple audio glitches in this episode. For some session statistics or if you can use some help deciding what sessions you want to attend, you can use the dashboard we create...

Episode 81 – Roaring News

April 03, 2018 08:00 - 26 minutes - 18.3 MB

In this installment of Big Data News, we talk about the recent Facebook leak, how everybody is still doing it wrong (according to some at least) and installing Hadoop "the old-fashioned way". Also briefly covered is Elastic's X-Pack, now even more "open" than before, but still rather closed it would seem. Breaking News Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Episode 80 – Big Data Tracking

March 27, 2018 08:00 - 51 minutes - 35.6 MB

Last June, Wolfie Christl published a 93 page report Corporate Surveillance in Everyday Life using big data tracking. Apart from the massive pdf that can be downloaded on the net, an extensive summary can be found on the Cracked Labs website. In this episode we go over the content and give our views on the subject. If you want to follow along with us while we are discussing the different point in the onlin earticle, here is the link: http://crackedlabs.org/en/corporate-surveillance Please...

Episode 79 – Roaring News

March 20, 2018 08:00 - 37 minutes - 25.9 MB

Another Big Data news episode! This time we consider the Big or small nodes conundrum based on an article that after close scrutiny doesn't really seem to test the real issue. Other things that get covered are Linkedin's Dynanometer, Cloudera's full production architecture advise for a recommendation service and a really interesting visualization technique based on blobs. Breaking News Big Data, Small Nodes https://insidebigdata.com/2018/02/22/make-sense-big-data-small-nodes/ Dynamo...

Episode 78 – Apache Trafodion transactional SQL for Hadoop (Part 2)

March 13, 2018 08:00 - 1 hour - 44.6 MB

This episode, a group of people from Esgyn join us to talk about the Apache Trafodion transactional SQL for Hadoop database engine. In this second part Rohit, Ken and Rao talk about the internal workings and best practices of Apache Trafodion. Rohit Jain Chief Technology Officer (linkedin) https://esgyn.com Ken Holt Chief Operating Officer and Co-Founder (linkedin) https://esgyn.com Rao Kakarlamudi VP of Pre-sales & Principal Architect (linkedin) https://esgyn.c...

Episode 77 – Roaring News

March 06, 2018 08:00 - 48 minutes - 33.3 MB

Another Roaring News wpisode where we cover recent Big Data News items we found interesting. This time we talk about Open Source turning 20 years old, the annoyances that come with Smart Homes and a big data device in Germany. Additionally, we talk about some introductory guides to AI. Breaking News 20 years of open source + who contributes http://www.zdnet.com/article/open-source-turns-20/ https://www.infoworld.com/article/3253948/open-source-tools/who-really-contributes-to-open-so...

Episode 76 – Apache Trafodion transactional SQL for Hadoop (Part 1)

February 27, 2018 08:00 - 45 minutes - 31.3 MB

This episode, a group of people from Esgyn join us to talk about the Apache Trafodion transactional SQL for Hadoop database engine. In this first part Rohit, Ken and Rao talk about the history and goals behind the Apache Trafodion. Rohit Jain Chief Technology Officer (linkedin) https://esgyn.com Ken Holt Chief Operating Officer and Co-Founder (linkedin) https://esgyn.com Rao Kakarlamudi VP of Pre-sales & Principal Architect (linkedin) https://esgyn.com ...

Episode 75 – Roaring News

February 20, 2018 08:00 - 32 minutes - 22.8 MB

In this Big Data News episode, we discuss the 5 year aniversary of Hadoop Weekly, now Data Engineering Weekly, the Strava "data leak" and Twitter Wars, may the data be with you! Breaking News Five Years of Hadoop Weekly (Joe Crobak @joecrobak @Medium) https://medium.com/@joecrobak/five-years-of-hadoop-weekly-7aa8994f140b https://dataengweekly.com/ https://www.hadoopweekly.com/ How Strava's "anonymized" fitness tracking data spilled government secrets ([Nathan Ruser @Nrg8000] ...

Episode 74 – Hadoop sizing part 3: Compute sizing

February 13, 2018 08:00 - 49 minutes - 34 MB

As promised, in this final part of our Hadoop Sizing series, we round off the subject with sizing your compute and network resources. Undoubtedly we'll be revisiting this subject in the future, but the three parts of this series should give ample information on the subject for now. Hadoop Node Sizing Hadoop Data Node Density Tradeoff on HCC: https://community.hortonworks.com/content/kbentry/48878/hadoop-data-node-density-tradeoff.html Please use the Contact Form on this blog or our twitte...

Episode 73 – Roaring News

February 06, 2018 08:00 - 34 minutes - 24.3 MB

In this edition of the Roaring News series, we talk about delivering business value and how to build an analytics team. For the Machine learning aficionados, we cover the top ML algorithms and we round off with an article on sizing a Apache Flink cluster, which fits nicely with the previous and next episode! Breaking News Delivering Business Value with Big Data Projects https://www.techrepublic.com/article/4-tips-for-delivering-more-business-value-with-short-term-big-data-projects/ ...

Episode 72 – Hadoop sizing part 2: Storage sizing

January 30, 2018 08:00 - 32 minutes - 22.8 MB

In this continuation of our Hadoop Sizing series we started last September, we move on from sizing your cluster to sizing the individual server chassis or virtual machines in your cluster. We did not finish the entire story just yet, concentrating mainly on the storage component. The final part 3 where we round off the subject with sizing your compute and network resources is planned to be published in the next topic episode. Hadoop Node Sizing Hadoop Data Node Density Tradeoffpost on HCC...

Episode 71 – Roaring News

January 23, 2018 08:00 - 51 minutes - 35.9 MB

This time Dave has prepared some articles for us to discuss. First we talk about something new on our radar: Apache Trafodion which is a transactional SQL on Hadoop. Next we spend some time on Artificial ignorance and we round off with some IoT predictions by IBM Breaking News Apache Trafodion - http://trafodion.apache.org/ goes TLP after 2.5 years… http://incubator.apache.org/projects/trafodion.html https://www.slideshare.net/mKrishnaKumar1/trafodion-an-enterprise-class-sql-based-on...

Episode 70 – 10 Facts about Hadoop, five years later

January 16, 2018 08:00 - 47 minutes - 32.5 MB

In this trip down memory lane, we go over an article from five years ago and discuss how Hadoop and Big Data have changed since then, or has it...? Time Machine Data tunnel Hadoop is 10 years old. Lets look back at public opinion just five years ago. (https://www.developer.com/db/10-facts-about-hadoop.html) Import/Export Data to and from HDFS Data Compression in HDFS Transformation in Hadoop Achieve Common Task Combining Large Volume Data Ways to Analyze High Volume Data Debugging in...

Episode 69 – Roaring News

January 09, 2018 08:00 - 34 minutes - 24.1 MB

The first news episode of 2018 has landed. We discuss the new Big Data architecture at CERN, a curious case of a broken benchmark and the future plans of the Apache Hadoop project. Breaking News The Architecture of the Next CERN Accelerator Logging Service https://databricks.com/blog/2017/12/14/the-architecture-of-the-next-cern-accelerator-logging-service.html The Curious Case of the Broken Benchmark: Revisiting Apache Flink® vs. Databricks Runtime https://data-artisans.com/blog/...

Episode 68 – Future Predictions

January 02, 2018 08:00 - 48 minutes - 33.4 MB

Welcome to 2018! And welcome to our 110% fact based prediction show for 2018. As you may expect from your two hosts, everything in this episode is 110% sure to become reality in the next twelve months. And since 110% is not actually possible, our predictions might also be just a little bit off? But we have 365 days to bask in the glory of our predictions before we, as usual, are shot back down to earth. Nancy comic Dave The year of cloud first and hybrid cloud Many organisations w...

Episode 67 – Roaring News

December 26, 2017 08:00 - 43 minutes - 29.9 MB

It's here: the final news episode for 2017! We finish off the year talking about Apache Pulsar, Hadoop Delegation tokens (aka Kerberos), the Hadoop on Container hype (or is it?), Apache Hadoop 3.0 release and all you need to know bout Data Prepping (or at least all we can tell you in about 10 minutes, that is). Breaking News Jhon Comparing Pulsar and Kafka: unified queuing and streaming https://streaml.io/blog/pulsar-streaming-queuing/ Hadoop Delegation Tokens Explained http:/...

Episode 66 – Past Predictions

December 19, 2017 08:00 - 37 minutes - 25.9 MB

It the time of the year again where you can call us out on being totally rubbish at predicting much of anything, or can we..? Listen to the episode and find out! In any case, we unabashedly will be recording a new "future predictions" show in a couple of weeks so if you have any predictions you want us to consider, send them to us by tweet or email! Bart Simpson - Being Right Sucks Predictions: Fragmentation of ecosystem Scale of data-breaches get larger and more IOT focused Chat-bot...

Episode 65 – Roaring news

December 12, 2017 08:00 - 36 minutes - 25.6 MB

It's another Roaring News episode. Today Jhon talks about machine learning projects for beginners, data visualization and the new neural network hotness which is transfer learning. Dave covers the Dataworks Summit call for papers and Apache Impala reaching Top Level Project status. Breaking News Jhon 8 Fun Machine Learning Projects for Beginners https://elitedatascience.com/machine-learning-projects-for-beginners Data is Beautiful https://www.reddit.com/r/dataisbeautiful/ ht...

Episode 64 – Talking Apache Pulsar with Matteo and Sijie from Streamlio

December 05, 2017 08:00 - 1 hour - 57.1 MB

A while ago, the all knowing oracle that is twitter pointed out that we really did not do justice to the Apache Pulsar project when we covered it in or Roaring News episode. The good people at Streamlio reached out to us and here is the 80+ minutes long discussion we had with Matteo Merli and Sijie Guo, going in depth on the merits and technical details, setting the Roaring Pulsar record straight! Apache Pulsar logo Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder...

Episode 63 – Roaring News

November 28, 2017 08:00 - 31 minutes - 21.7 MB

It's another news episode folks. This time Dave and Jhon talk about extracting telemetry from a PS3 steering wheel and pedal set, IBM sun-setting BigInsights and 6 things a budding Data Scientist should be aware of. Breaking News Dave Taking KSQL for a Spin Using Real-time Device Data https://www.rittmanmead.com/blog/2017/11/taking-ksql-for-a-spin-using-real-time-device-data/ Jhon IBM leads BigInsights for Hadoop out behind barn. Shots heard https://www.theregister.co.uk/2...

Episode 62 – Second Year Anniversary

November 21, 2017 08:00 - 1 hour - 60.9 MB

Are there really two years worth of Roaring Elephant podcasts out there? Well, since this is our second anniversary party, it must be! Join some of the guests we had on the podcast this year to reminisce about the months gone by. Due to the drop-in drop-out nature, this episode is a little rough but we hope you can enjoy being part of our little party! Discussion topics ranged from what our guests have been up to, Apache Kafka, Dremio the effects of GDPR on the industry and how our guests s...

Episodes