Roaring Elephant artwork

Roaring Elephant

411 episodes - English - Latest episode: 6 days ago - ★★★★★ - 9 ratings

Bite-Sized Big Tech

Tech News News Technology
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

Episode 61 – Roaring News

November 14, 2017 08:00 - 31 minutes - 21.6 MB

In this episode of Roaring News, we talk about the seemingly inevitable block chain, Fraud detection in banking and a celebration of the DevOps engineer. Dave: The continued journey to understand enterprise usage of block-chain http://fortune.com/2017/10/17/blockchain-berners-lee/ https://www.hyperledger.org/blog/2017/10/17/qa-does-blockchain-alleviate-security-concerns-or-create-new-challenges Jhon: StreamING Machine Learning Models: How ING Adds Fraud Detection Models at Ru...

Episode 60 – Big Data Roles: Recruiting and hiring

November 07, 2017 08:00 - 47 minutes - 32.7 MB

In  this entry in our "Roles in Big Data" series, we talk to Chuck Waygood, global director of talent Acquisition at Hortonworks. Chuck has been in this space since 2013 and in this episode he talks about his experiences, what recruiters are looking for, how you can attract that perfect candidate and what you can do to improve your chances of landing that great career in Big Data. Chuck Waygood Director, Global Talent Acquisition at Hortonworks https://www.linkedin.com/in/chuckwaygood/   P...

Episode 59 – Roaring News

October 31, 2017 08:00 - 35 minutes - 24.5 MB

It's another installment of Roaring News! This time, we talk about the ensemble recommendation system allegedly used by Spotify, not-so-new kid-on-the-block-after-all Apache Pulsar, the ever so popular "Hadoop is dead" and end with a quick shout-out to the Tokyo Data Platform Conference. Dave Apache Pulsar https://pulsar.apache.org/ https://www.slideshare.net/ydn/october-2016-hug-pulsar-a-highly-scalable-low-latency-pubsub-messaging-system https://streaml.io/blog/apache-pulsar-ge...

Episode 58 – Big Data Roles: The data scientist

October 24, 2017 08:00 - 1 hour - 47.8 MB

In this entry in our long-running "roles in Big Data" series, we talk to Eduardo Barbaro, a Sr. Data Scientist at Mobiquity. To say that the data scientist is a pivotal person in any big data or advanced analytics project is not an exaggeration and we are really grateful to Eduardo for spending some time on the podcast to give us his views and recount his experiences.       Eduardo Barbaro Sr. Data Scientist at Mobiquity, Inc - Europe https://www.linkedin.com/in/edbarbaro/ Please use the ...

Episode 57 – Dataworks Summit Sydney recap by Dave – Part 2

October 17, 2017 08:00 - 57 minutes - 39.8 MB

In this second part of Dave's tale of the Sidney Dataworks Summit, the subjects range from Apache Metron, a talk by Telstra, Australia's leading mobile provider, Yarn 3.0 and Apache Zeppelin Solving Cyber at Scale - Simon Ball https://www.slideshare.net/Hadoop_Summit/solving-cyber-at-scale-80187657 Implementing greenfield Apache Metron SOC – Telstra - Saad Ayad Slides not available :( Yarn past present future - Rohith Sharma KS - Sunil G https://www.slideshare.net/Hadoop_Su...

Episode 56 – Dataworks Summit Sydney recap by Dave – Part 1

October 10, 2017 08:00 - 1 hour - 42.8 MB

Dave has attended the Dataworks Summit in Sidney and we go over the different sessions he attended there. In this first of two episodes, the focus lies on the new goodness that Hadoop 3.0 will bring us soon. Hadoop 3.0 – Sanjay Radia https://www.slideshare.net/Hadoop_Summit/apache-hadoop-30-community-update-79999467 JDK 8+ Port number changes Class-path isolation HDFS – 3 node Namenode, intra data node balancer for balanced storage within a node, erasure coding 10TB node recov...

Episode 55 – Roaring News

October 03, 2017 08:00 - 46 minutes - 31.8 MB

In this edition of Roaring News, Dave covers the release of Apache Metron based HCP 1.3 and an HBase vs Cassandra benchmark battle. Jhon talks about some Spark tuning and scheduler inner-workings and finishes with a tale of a compliance kettle... Dave HCP 1.3 release https://hortonworks.com/blog/hortonworks-cybersecurity-platform-big-data-cybersecurity-solution/ https://docs.hortonworks.com/HDPDocuments/HCP1/HCP-1.3.0/bk_release-notes/content/ch01.html Battle of the Apache NoSQL...

Episode 54 – Hadoop sizing part 1: One big cluster, or many small ones

September 26, 2017 08:00 - 52 minutes - 36.4 MB

In this episode, we took an online article by Chris Riccomini and give our take on the discussion on having a single big cluster versus many smaller ones. If you are architecting a Hadoop cluster and are faced with this choice, this episode should give you a lot of information on the subject. One big cluster, or many small ones? by Chris Riccomini https://medium.com/@criccomini/one-big-cluster-or-many-small-ones-5f3126ed7045 Please use the Contact Form on this blog or our twitter feed to s...

Episode 53 – Roaring News

September 19, 2017 08:00 - 26 minutes - 18.2 MB

In this episode of Roaring News, Dave brings up the newly released HDP 2.6.2 which incorporates IBM's move from their proprietary IOP to HDP. Jhon brings an update on the MLEAP story for productionizing your spark model. We finish off discussing the newly released Apache Atlas version 0.8.1 Dave HDP and IBM HDP 2.6.2 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_release-notes/content/ch_relnotes.html Jhon MLeap: Providing (Near) Real-time Data Science with Apach...

Episode 52 – Big data in travel

September 12, 2017 08:00 - 1 hour - 52.5 MB

Over the summer, when your hosts enjoyed a well-earned vacation (well, we like to think we earned it) we could not stop being Big-Data Nerds and in this episode we talk about the Hadoop opportunities we spotted. During this episode you will hear us talk about how Big data does, could or should improve many aspects of vacationing. We talk about review sites, preventive maintenance on rental cars, IoT tracking beer levels, the social media privacy issues and much, much more. We really tried t...

Episode 51 – Roaring News

September 05, 2017 08:00 - 38 minutes - 26.9 MB

In this news episode (our very first one), Dave is all-out on Artificial Intelligence and its use in naming "stuff"; for some subjects it apparently works very well, for other subjects not so much...   Jhon brings a blog on deploying new Kerberos functionality and a tutorial for Kafka Connect for those that have not really looked at it. The ensuing discussion on Nifi vs kafka is purely coincidental. Dave AI naming Paint (May 2017) http://lewisandquark.tumblr.com/post/160776374467/...

Episode 50 – Alan Gates Wrap Up (Part 4)

August 29, 2017 08:00 - 34 minutes - 24 MB

This is the final part of our long interview with Alan Gates. In this part, Alan talks more about ODPI, Cloud First, Apache Flink, Apache Pig and we finish off with a little bit of Philosophy. A big thank you to Alan for sharing his pearls of wisdom with us! [Image from Linux.com] 00:00 Recent events Our vacation is almost over but this episode too was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about 02:10 Alan Gates Wrap Up (Part 4) 34:37 End P...

Episode 49 – Thomas Henson on IoT architectures

August 15, 2017 08:00 - 52 minutes - 36.5 MB

In this episode we have an interview with Thomas Henson for you. Thomas is an Isilon Data Lake Evangelist at Dell/EMC, but in this episode he will talk about IoT architectures, related to his talk at the DataWorks Summit San Jose 2017 00:00 Recent events Since both Dave and Jhon are still on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 02:14 Thomas Henson on IoT architectures You can find Thomas Hensons blog on Big D...

Episode 48 – Alan Gates on the DataWorks Summit (Part 3)

August 01, 2017 08:00 - 35 minutes - 24.7 MB

In this third part of our interview with Alan Gates, PMC member for various Apache projects including Apache Hive and co-founder of Hortonworks, we talk about his sessions at the DataWorks Summits and about the Summits in general. [Image taken from Linux.com] 00:00 Recent events Since both Dave and Jhon are still on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 02:38 Alan Gates on the DataWorks Summit (Part 3) Since th...

Episode 47 – Deep dive into Kudu

July 18, 2017 08:00 - 1 hour - 49.4 MB

We've been interested in Kudu for a while. But it's something that neither of your hosts have been exposed to very much. Apache Kudu went from incubation to top level project in record time and now seemed like the time was right to dig into this piece of antelope. Mike Percy, PMC member and committer on the Apache Kudu project and software engineer at Cloudera was only too glad to come on the podcast and answer all our questions! 00:00 Recent events Since both Dave and Jhon are currently on ...

Episode 46 – San Jose DataWorks Summit 2017 in Review

July 04, 2017 08:00 - 1 hour - 79.2 MB

Dave joined our free ticket raffle winner Pitt at the Data Works Summit in Sunny San Jose last month and they came back with almost two hours worth of exciting stories!   Thanks again to Hortonworks for providing the free ticket to our raffle that Pitt won. San Jose DataWorks Summit 2017 in Review 00:01:20 Keynotes 00:31:20 Day 1 sessions 01:10:00 Day 2&3 sessions 01:54:55 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest futu...

Episode 45 – Modern Day Airships

June 20, 2017 08:00 - 1 hour - 47.6 MB

Breaking up our series of insights from Alan Gates, we switch gears to another really interesting topic (and guest!) where we talk about the new visualisation features coming in Apache Zeppelin and we get it straight from the brains behind the new code, Bernhard Walter.   Recent events 03:03 Jhon: Churn Prediction with Apache Spark Machine Learning by Carol McDonald (@caroljmcdonald) @mapr https://mapr.com/blog/churn-prediction-sparkml/ 12:12 Dave: HDFS Maintenance State by Manoj ...

Episode 44 – Suicidal Spark

June 06, 2017 08:00 - 1 hour - 49.2 MB

In this episode we're joined by Youen Chéné and Aurélien Vandel from Saagie who talk to us about their experiences deploying Spark Streaming workloads in production (based on their Dataworks Summit talk), what worked well, what didn't and what they'd recommend you might want to do if you follow in their footsteps.   Enjoy! 00:00 Recent events Dave Big Data Videos http://www.kdnuggets.com/2017/05/top-recent-big-data-videos-youtube.html https://www.youtube.com/watch?v=RQ9czRAdmMs ...

Episode 43 – Alan Gates talks Hive (Part 2)

May 23, 2017 08:00 - 54 minutes - 37.6 MB

In this episode we discuss the maturity of the Hadoop ecosystem and how hard it currently still is to get the value out of data. In the main section, we will have the second part of the interview with Alan Gates, this time talking about the place Hive has in the ecosystem. We still have more from Alan so stay tuned for more Hive goodness in future episodes! 00:00 Recent events Dave PredictionIO 0.11 release https://github.com/apache/incubator-predictionio/blob/v0.11.0-incubating/RE...

Episode 42 – Alan Gates talks Hive (Part 1)

May 09, 2017 08:00 - 1 hour - 44.3 MB

Welcome to the life the universe and everything episode of the Roaring Elephant Podcast. We talk some news and this episode got a little bit ranty... Apologies for that; to balance it out we have a chat with Alan Gates talking about Hive for you.   There was so much Alan Gates goodness, we've split it over a few sessions and here's part one...   07:00 Recent events Dave Metron graduates to Apache TLP status https://blogs.apache.org/foundation/entry/apache-software-foundation-announ...

Episode 41 – News, news and some more news

April 25, 2017 08:00 - 33 minutes - 19.3 MB

In this episode, due to us blowing our recording space budget with the Dataworks Summit day by day episodes (39 and 40 if you've not listened yet, go and do so!) we're just bringing you a short episode this time with news, all the news that's new and approved by the Roaring Elephants! 05:10 Recent events Superset: benefits and limitations of the open source data visualization tool by Airbnb https://indatalabs.com/blog/data-strategy/open-source-data-visualization-tool-superset http://...

Episode 40 – Dataworks Summit Europe – Day 2

April 06, 2017 20:42 - 1 hour - 31.3 MB

In this episode of the Roaring Elephant podcast, Dave and I continue to share our Dataworks summit experience, meet yet more listeners, sit in on a few more sessions and give our overall view of the day and the summit as a whole! It will make you wish you were here. 00:00:00 Intro Roaring Elephant Roadshow Day 2 - The night after the party! 00:04:14 Session Discussions Our review of the sessions, what we liked, what we learned, what we'd recommend you go and check out afterwards: Keynote...

Episode 39 – Dataworks Summit Europe – Day 1

April 05, 2017 23:44 - 1 hour - 54.3 MB

In this episode of the Roaring Elephant podcast, Dave and I attend the Dataworks summit, meet listeners, sit in on sessions and give our overall view of the day! It's the next best thing to being here.   If you ARE here, then look out for us, we'll exchange limited edition Roaring Elephant stickers for audio clips. 00:00 Intro Roaring Elephant Roadshow Day 1- Direct from Munich! 03:25 Session Discussions Our review of the sessions, what we liked, what we learned, what we'd recommend you go ...

Episode 38 – Dataworks Summit 2017 – Preview

March 28, 2017 08:00 - 1 hour - 59 MB

This week, your hosts go over what we consider to be our pick of the sessions that will be presented during the Hadoop Summit Dataworks Summit in Munich next week.   The Roaring Elephant will be in attendance, look out for the two guys in distinctive yellow fleeces with the Roaring Elephant logo on the back, we hope to see you there! 00:00 Recent events Dave DS Model Lifecycle https://www.svds.com/models-lab-factory/ Stitchfix Algorithm Tour http://algorithms-tour.stitchfix.co...

Episode 37 – Big Data Roles: The starter

March 14, 2017 08:00 - 1 hour - 47.7 MB

In this episode, we start a new series on the different roles in Big Data. Purely by coincidence, it turns out that the winner of our raffle started a new job as a Data Engineer at the beginning of this month, so naturally we decided to invite Marcel-Jan on the show to talk about the how and why of his career move. 00:00 Recent events Dave It’s morphing time: Apache Ranger graduates to a Top Level Project https://hortonworks.com/blog/morphing-time-apache-ranger-graduates-top-level-p...

Episode 36 – Use-case: Single View

February 28, 2017 08:00 - 1 hour - 36.2 MB

No guests today, just Dave and Jhon talking so brace yourselves! This time we're actually going to explain what we mean by "single view of customer" go through explaining an example of a use-case and discuss how you might implement such a thing. Enjoy. 00:00 Recent events Dave Faster spark! http://www.zdnet.com/article/spark-gets-faster-for-streaming-analytics/ If you’re interested in reading/watching more then check out the site for Spark Summit East, the sessions slides and vid...

Episode 35 – What do people get wrong when deploying Hadoop? – Part 2

February 14, 2017 08:00 - 1 hour - 41.6 MB

Paul Codding and Sheetal Dolas, both from Hortonworks, join us in this second part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster. 00:00 Recent events Dave TensorKart: self-driving MarioKart with TensorFlow http://kevinhughes.ca/blog/tensor-kart What is Data Engineering? https://www.dataquest.io/blog/what-is-a-data-engineer/ ...

Episode 34 – What do people get wrong when deploying Hadoop? – Part 1

January 31, 2017 08:00 - 1 hour - 35 MB

Paul Codding and Sheetal Dolas, both from Hortonworks, join us in this first part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster. 00:00 Recent events Dave Apache Beam becomes a top level project! https://beam.apache.org/ https://beam.apache.org/get-started/beam-overview/ https://github.com/eljefe6a/beamexample/blob/master/BeamTut...

Episode 33 – Roaring News

January 17, 2017 08:00 - 50 minutes - 29.1 MB

This episode, we have an absolutely brilliant topic that we were going to cover after the news section... But the news section has us talking so much that it ran a bit long. Preferring not to give you a two hour episode, we're rescheduling the delivery of the intended topic to next episode and present you with our first (and probably last) "News only" episode. 00:00 Recent events Dave A pair of “trends to watch in 2017” http://www.techrepublic.com/article/6-big-data-trends-to-watch-...

Episode 32 – The sense and non-sense of certifications

January 03, 2017 08:00 - 50 minutes - 29.4 MB

In this episode, we talk about the use and abuse of certifications, both the certifications you van achieve by passing an exam and the Industry ISV certifications that should help yu make purchasing decisions. 00:00 Recent events Dave 5 enterprise uses of blockchain today http://www.pcworld.com/article/3149504/cloud-computing/5-enterprise-related-things-you-can-do-with-blockchain-technology-today.html Top 7 big data trends for 2017 https://datafloq.com/read/the-top-7-big-data-...

Episode 31 – Bold Predictions, Past and Future

December 20, 2016 08:00 - 1 hour - 38.7 MB

In this episode, we go over the bold predictions for 2016 we made just before the start of the year. Find out how right we were, or indeed how bad we are at predicting the future of Big Data. Undeterred, we then happily put on our Nostradamus hats and proceed to make even more new bold predictions for 2017. Have a listen and let us know if you agree or disagree with our view on the world? 00:03 Bold predictions - reviewing past predictions for 2016 Apace Atlas Apache Nifi Apache Spark...

Episode 30 – Apache Software Foundation

December 06, 2016 08:00 - 1 hour - 35.8 MB

So many of the tools and projects we talk about and use every day are prefaced by 6 letters, A P A C H E... What does it mean to be an Apache project? What does the Apache Software Foundation (ASF) do for software? Are there other options? Let us tell you about the ASF! 00:00 Recent events Dave: How we caught the circle line rogue train with data https://blog.data.gov.sg/how-we-caught-the-circle-line-rogue-train-with-data-79405c86ab6a#.mhqs1mikx Black Friday 2016: Mobile vs Deskt...

Episode 29 – 1 Year anniversary

November 22, 2016 08:00 - 1 hour - 37.1 MB

One year of elephants roaring has come and gone so we reminisce a little bit about what happened over the last year. And since we could not have done this podcast nearly as good without them, we asked the special guests we have had on the podcast over the previous year to call in on the Skype call and talk about what they have been up to. 00:00 One year of pod-casting... Dave and Jhon reminiscing about how the Podcast got started. 06:55 Fireside chats with guests over the year 07:56 Joe Wit...

Episode 28 – Talking Datameer with Erik Stalpers

November 08, 2016 07:00 - 59 minutes - 34.4 MB

In this episode, Dave is stuck in a hotel basement in the middle of internet nowhere and Erik Stalpers from Datameer joins us to talk about the Datameer exploration and visualization tool. 00:00 Recent events Dave Machine learning vs AI http://www.wired.co.uk/article/machine-learning-ai-explained Machine Learning Data Cleansing https://gcn.com/articles/2016/10/19/activeclean-big-data.aspx https://activeclean.github.io/ Battle of the Data Science Venn Diagrams http://ww...

Episode 27 – Security 3: Encryption at rest and in motion

October 25, 2016 07:00 - 57 minutes - 33.4 MB

Rounding out our series on security in Hadoop, we finish with Encryption at rest and in motion. We go over the different approaches, do's and don'ts and mention some higher level application in this space. 00:00 News for the week! Dave: Executives Still Relying on Gut, Not Gigabytes in Planning for Future http://www.datadigestonline.com/2016/10/executives-still-relying-on-gut.html Rewriting SAS Programs for Financial Data Manipulation in R http://blog.revolutionanalytics.com/2...

Episode 26 – Security 2: Authorisation and audit

October 11, 2016 07:00 - 1 hour - 40.6 MB

In this episode, we continue our coverage on Hadoop security. Where episode 24 dealt with the subject of authentication, we now delve deeper in the why and how of authorization and audit, and cover the  major players in the arena.     00:00 Recent events Dave Beyond Privacy and Security in a Connected World http://www.svds.com/beyond-privacy-security-connected-world/ The broken promise of open-source Big Data software – and what might fix it http://siliconangle.com/blog/2016/...

Episode 25 – The pro’s and con’s of crafting your own distribution

September 27, 2016 07:00 - 1 hour - 54.6 MB

When we talk about Big Data and Hadoop in particular, we generally have one of the existing distributions from Cloudera, Hortonworks or other Big Data companies in mind. But sometimes, a pre-built distro just does not meet the needs. In this episode, we have a guest on the show that explains why they made the choice to forgo the available distributions in favour of building ones own. http://lod-cloud.net/ 00:00 Recent events Dave: Which tool should I use? http://brohrer.github.io/...

Episode 24 – Hadoop Summit Melbourne 2016 Preview

September 13, 2016 07:00 - 1 hour - 38.9 MB

With Hadoop Summit Melbourne 2016 starting the day after we are recording this episode, we go over the published agenda and discuss the current state of the Big Data Technology ecosystem while we pick our favorite sessions. Wish we were there! 00:00 Recent events Dave Cloud Security Alliance release cloud and big data security guidelines http://siliconangle.com/blog/2016/08/28/the-cloud-security-alliance-publishes-its-best-practices-for-big-data-security/ https://cloudsecurityall...

Episode 23 – Security in Hadoop – Authentication

August 30, 2016 07:00 - 1 hour - 39.1 MB

In this episode, we discuss this fortnight's interesting big data news that caught our eye and then go on to discuss the basics around authentication in Hadoop for what is the first in a series of episodes that we'll be doing over the next few months on the broad topic of security. 00:00 Recent events Dave: The new science behind customer loyalty http://insights.principa.co.za/the-new-science-behind-customer-loyalty http://insights.principa.co.za/infographic-creating-a-data-driven...

Episode 22 – Big Data in Small Business

August 16, 2016 07:00 - 1 hour - 53.2 MB

The main subject in this episode features answer to a listener question we received a couple of months ago: How can big data help small businesses? What ways can small business use big data? At the moment all the talk is about big data helping enterprise firms. And we are introducing a new section which we hope you will enjoy! 00:00 Recent events Working with a new team in sunny cork, getting them up to speed Workshop with a global SI and a European tel-co about the upcoming phases of t...

Episode 21 – The Open Data Platform Initiative

August 02, 2016 07:00 - 59 minutes - 34.2 MB

This episode we have an interview with John Mertic about ODPi. There has been plenty of mystery and even some controversy about ODPi which we attempt to resolve for you. Big thanks to John for giving us some of his time for this interview! Sadly, this time the Skype Gods were not with us and we experienced some drops and hitches. We tried to smooth things over as much as possible, but we were not able to achieve our usual level of quality this time. 00:00 Recent events Vacation for Dave ...

Episode 20 – Dave’s Hadoop Summit San Jose 2016 Retrospective – Part 2

July 19, 2016 07:00 - 1 hour - 38.3 MB

In this second part, we discuss the sessions that Dave attended at the San Jose Hadoop Summit and we go in depth on some related topics. Since we ran over an hour with the main topic, and we did not want to make this a three-parter, we decided to forgo the questions from the audience just this one time...   00:00 Recent events Vacation tine! Edx.Org Big Data Courses 04:00 Dave's Hadoop Summit San Jose 2016 Retrospective - Part 2 Session 1: End-to-End Processing of 3.7 Million Teleme...

Episode 19 – Dave’s Hadoop Summit San Jose 2016 Retrospective

July 05, 2016 07:00 - 48 minutes - 27.9 MB

Dave went to the Hadoop Summit 2016 in San Jose last week and came back with a riveting tale to tell. In this first part of the Summit coverage, join me when I ask Dave all about the keynotes and the general event. Join us next episode where Dave will talk about some of the sessions he attended!   00:00 Recent events Lift and shift to IaaS Hybrid Disaster Recovery Spark & ML goodness MOOC's San Jose Hadoop Summit 09:25 Dave went to the Hadoop Summit in San Jose! Record attendanc...

Episode 18 – MLeap interview: Productionising Data Science – Part 2

June 21, 2016 07:00 - 43 minutes - 25 MB

In this episode, we have the second part of the interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project where they go into more technical details and give tips on deploying MLeap in your environment. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Yet more telco security, again. RFI for euro...

Episode 17 – MLeap interview: Productionising Data Science

June 07, 2016 07:00 - 54 minutes - 31.2 MB

In this episode, we have an interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Machine Learning Hackathon on Azure Strata Europe Fighting with Kafka 09:30 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Meet Hollin and M...

Episode 16 – Interview part two with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!

May 24, 2016 07:00 - 46 minutes - 26.9 MB

Hopefully you enjoyed the first part of our interview with Sumeet, here is part two where we go into more detail about Yahoo's use of Hadoop, with lots of interesting topics coming up including the splintering of the ecosystem, governance and much much more.   00:00 Recent events Customer and partner adventures with Apache Nifi Jhon is settling in at Microsoft but is unfortunately quite jet-lagged. 08:15 Part two of our interview with Sumeet Singh - Senior Director, Cloud and Big Data ...

Episode 15 – Interview with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!

May 10, 2016 07:00 - 1 hour - 35.1 MB

Having met Sumeet at the Hadoop Summit we thought he'd make a great guest for the podcast, so here he is for your listening pleasure!   00:00 Recent events Louder! iTunes and the missing episode 12 Jhon's new role at Microsoft Hadoop as a Service A fortnight of SAS + Hadoop Metron teething troubles https://issues.apache.org/jira/browse/METRON-136 17:50 Interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo!   42:50 Questions from our Listeners On...

Episode 14 – Hadoop Summit – Retrospective

April 26, 2016 07:00 - 51 minutes - 29.9 MB

After the last two special edition episodes where we quickly covered each Summit day in a "same-day" episode, we go over the full event in this episode, highlighting the sessions we enjoyed the most and sharing our general feelings about the 2016 Hadoop Summit in Dublin.   00:00 Recent events Summit! Sessions on youtube Meetings and planning, Apache Metron https://cwiki.apache.org/confluence/display/METRON/Metron+Wiki https://community.hortonworks.com/articles/26047/apche-metron-...

Episode 13 – Hadoop Summit Dublin 2016 – Day 2

April 14, 2016 21:37 - 37 minutes - 21.7 MB

Welcome to our second special edition podcast bought to you from day 2 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the second day of keynotes and yet more sessions that we enjoyed. 00:00 Recent events Introduction to the Hadoop Summit Dublin 2016 from day 2 01:45 Hadoop Summit 2016 Dublin Day 2 Review Keynote/Session - Yahoo! - Sumeet Singh Key...

Episode 12 – Hadoop Summit Dublin 2016 – Day 1

April 13, 2016 20:57 - 29 minutes - 17 MB

Welcome to our special edition podcast bought to you from day 1 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the keynotes and some of the sessions we enjoyed during day 1. 00:00 Recent events Introduction to the Hadoop Summit episode for day 1 01:40 Main Topic Some comments from attendees as to what they're looking forward to at the event Conversa...