Software Daily artwork

Software Daily

1,633 episodes - English - Latest episode: over 2 years ago - ★★★★★ - 1 rating

Technical interviews about software topics.

Tech News News
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

Internet Archive Book Scanning with Davide Semenzin

September 15, 2020 09:00 - 1000 Bytes

The Internet Archive collects historical records of the Internet. The Wayback Machine is one tool from the Internet Archive which you may be familiar with. One project you may be unfamiliar with is book scanning. Internet Archive scans high volumes of books in order to digitize them. In today’s episode, Davide Semenzin joins the show to talk through the history of the Internet Archive and the engineering behind book digitization. We talk through OCR, storage, architecture, and scalability. ...

UnifyID: Biometric Authentication with John Whaley

September 14, 2020 09:00 - 1000 Bytes

Biometric authentication uses signals from a human’s unique biology to verify identity. Forms of biometric authentication include fingerprints, eye patterns, and the way a person walks, otherwise known as gait. UnifyID is a company that builds systems for biometric authentication. John Whaley is the CEO of UnifyID, and he joins the show to talk through techniques for biometrics, and the implementation details that UnifyID has built to turn these into a reality.

Robotic Process Automation with Antti Karjalainen

September 04, 2020 09:00 - 1000 Bytes

Robotic process automation involves the scripting and automation of highly repeatable tasks. RPA tools such as UIPath paved the way for a newer wave of automation, including the Robot Framework, an open source system for RPA. Antti Karjalainen is the CEO of Robocorp, a company that provides an RPA tool suite for developers. Antti joins the show to talk through the definition of RPA, common RPA tasks, and what he is building with Robocorp.

Modern Venture with Jerry Chen

September 03, 2020 09:00 - 1000 Bytes

After working at VMware for 10 years, Jerry Chen developed an expertise in technology companies. Today, he works at Greylock, where he looks at deals in the infrastructure and developer tooling space. Jerry is an expert in go-to-market strategy and makes investments in technologies that have a good chance at becoming large and profitable businesses. In today’s episode, Jerry and I talk through the dynamics of modern infrastructure investing, including examples of deals such as Chronosphere ...

API Change Management with Aidan Cunniffe

September 02, 2020 09:00 - 1000 Bytes

APIs within a company change all the time. Every service owner has an API to manage, and those APIs have upstream and downstream connections. APIs need to be tested for integration points as well as for their “contract”, the agreement between an API owner and the consumers of that API. Aidan Cuniffe is the founder of Optic, a product built for API change management. He joins the show to explain why there is an opportunity for such a product, and the market dynamics of the space of API testi...

WebAssembly Migration with Nicolo Davis

September 01, 2020 09:00 - 1000 Bytes

WebAssembly allows for the execution of languages other than JavaScript in a browser-based environment. But WebAssembly is still not widely used outside of a few particular niches such as Dropbox and Figma. Nicolo Davis works on an application called Boardgame Lab, and he joins the show to explain why WebAssembly can be useful even for a simple application. Nicolo also shares his reflections on TypeScript, Rust, and the future of web development. He talks through the client/server interacti...

Hyperparameter Tuning with Richard Liaw

August 28, 2020 09:00 - 1000 Bytes

Hyperparameters define the strategy for exploring a space in which a machine learning model is being developed. Whereas the parameters of a machine learning model are the actual data coming into a system, the hyperparameters define how those data points are fed into the training process for building a model to be used by an end consumer. A different set of hyperparameters will yield a different model. Thus, it is important to try different hyperparameter configurations to see which models e...

Anduril Engineering with Gokul Subramanian

August 27, 2020 09:00 - 1000 Bytes

Anduril is a technology defense company with a focus on drones, computer vision, and other problems related to national security. It is a full-stack company that builds its own hardware and software, which leads to a great many interesting questions about cloud services, engineering workflows, and management. Gokul Subramanian is an engineer at Anduril, and he joins the show to share his knowledge of how Anduril operates and what the company has built.

Machine Learning Labeling and Tooling with Lukas Biewald

August 26, 2020 09:00 - 1000 Bytes

CrowdFlower was a company started in 2007 by Lukas Biewald, an entrepreneur and computer scientist. CrowdFlower solved some of the data labeling problems that were not being solved by Amazon Mechanical Turk. A decade after starting CrowdFlower, the company was sold for several hundred million dollars. Today, data labeling has only grown in volume and scope. But Lukas has moved on to a different part of the machine learning stack: tooling for hyperparameter search and machine learning monito...

Software and the Law with Mark Radcliffe

August 25, 2020 09:00 - 1000 Bytes

As software permeates our lives, there are an increased number of situations where the legal system must be designed to account for that software. Whether the issues are open source licensing, cryptocurrencies, or worker classifications, software overlaps heavily with the law. Just as software is crafted by engineers, the legal structure around software is crafted by lawyers. There are large law firms that have built their business by knowing how to navigate these software and business ques...

Data Version Control with Dmitry Petrov

August 24, 2020 09:00 - 1000 Bytes

Code is version controlled through git, the version control system originally built to manage the Linux codebase. For decades, software has been developed using git for version control. More recently, data engineering has become an unavoidable facet of software development. It is reasonable to ask–why are we not version controlling our data? Dmitry Petrov is the founder of Iterative.ai, a company for collaborating and version controlling data sets. Dmitry joins the show to talk about how da...

Release Apps with Tommy McClung

August 21, 2020 09:00 - 1000 Bytes

Every software company works off of several different development environments–at the very least there is staging, testing, and production. Every push to staging can be spun up as an application to be explored, tinkered with, and tested. These ad hoc spin-ups are known as release apps. A release app is an environment for engineers to play with, and potentially throw away or promote to production. Release apps have been made easier due to technologies such as infrastructure-as-code, continuo...

ParlAI: Facebook Dialogue Platform with Stephen Roller

August 20, 2020 09:00 - 1000 Bytes

Chatbots are useful for developing well-defined applications such as first-contact customer support, sales, and troubleshooting. But the potential for chatbots is so much greater. Over the last five years, there have been numerous platforms that have arisen to allow for better, more streamlined chatbot creation. Dialogue software enables the creation of sophisticated chatbots. ParlAI is a dialogue platform built inside of Facebook. It allows for the development of dialogue models within Fac...

SuperAnnotate: Image Annotation Platform with Vahan and Tigran Petrosyan

August 19, 2020 09:00 - 1000 Bytes

Image annotation is necessary for building supervised learning models for computer vision. An image annotation platform streamlines the annotation of these images. Well-known annotation platforms include Scale AI, Amazon Mechanical Turk, and Crowdflower. There are also large consulting-like companies that will annotate images in bulk for you. If you have an application that requires lots of annotation, such as self-driving cars, then you might be compelled to outsource this annotation to su...

Metabase: Business Intelligence Open Source with Sameer Al-Sakran

August 18, 2020 09:00 - 1000 Bytes

Business intelligence tooling allows analysts to see large quantities of data presented to them in a flexible interface including charts, graphs, and other visualizations. BI tools have been around for decades, and as the world moves towards increased open source software, the business intelligence tools are following that trend. Metabase is an open source business intelligence system that has been widely adopted by enterprises. It includes all the common tools that are expected from a busi...

Gitlab Courseware as Code with Ben Allison

August 17, 2020 09:00 - 1000 Bytes

The US Army Cyber School is a training program which trains cyber soldiers and leaders to be adept in cyber military strategy and tactics. In order to teach these skills, the cyber school uses a system they call “courseware as code”, a workflow that allows updates to the curriculum in a reversion-friendly fashion similar to infrastructure-as-code. Ben Allison teaches at the US Army Cyber School and has put work into developing the training program and ongoing lesson plans. Ben joins the sho...

Security Monitoring with Marc Tremsal

July 31, 2020 09:00 - 1000 Bytes

Logs are the source of truth. If a company is sufficiently instrumented, the logging data that streams off of the internal infrastructure can be refined to tell a comprehensive story for what is changing across that infrastructure in real time. This includes logins, permissions changes, other events that could signal a potential security compromise. Datadog is a company that was built around log management, metrics storage, and distributed tracing. More recently, they have also built tools ...

DEV and Forem with Ben Halpern

July 30, 2020 09:00 - 1000 Bytes

Dev.to has become one of the most popular places for developers to write about engineering, programming languages, and everyday life. For those who have not seen it, DEV is like a cross between Twitter and Medium, but targeted at developers. The content on DEV ranges from serious to humorous to technically useful. DEV contains a set of features which appeal to a developer community, such as the ability to embed code snippets in a post, but for the most part the entire app is generalizable t...

Drug Simulations with Bryan Vicknair and Jason Walsh

July 29, 2020 09:00 - 1000 Bytes

Drug trials can lead to new therapeutics and preventative medications being discovered and placed on the market. Unfortunately, these drug trials typically require animal testing. This means animals are killed or harmed as a result of needing to verify that a drug will not kill humans. Animal testing is unavoidable, but the extent to which testing needs to occur can be reduced by inserting machine learning models which simulate the effects of a drug on the human body. If the simulated effec...

Access Control Management with Fouad Matin and Dan Gillespie

July 28, 2020 09:00 - 1000 Bytes

Across a company, there is a wide range of resources that employees need access to. Documents, S3 buckets, git repositories, and many others. As access to resources changes across the organization, a history of the changes to permissions can be useful for compliance and monitoring. Indent is a system for simplifying access management across infrastructure. Indent allows users within an organization to request access to resources, and keeps logs of the changes to who can access those resourc...

Acquired Podcasting with David Rosenthal and Ben Gilbert

July 27, 2020 09:00 - 1000 Bytes

Acquisitions are part of the technology industry. A successful corporation will often have an “exit”, either going public or becoming acquired. And with each of these corporations, there is a set of stories that narrate the company from beginning to end.  Acquired is a podcast that tells the stories of companies such as YouTube, Instagram, and PayPal. During each episode, the life of a company is explored from its beginning til the end. Media companies, chip companies, and software companie...

Ray Applications with Richard Liaw

July 24, 2020 09:00 - 1000 Bytes

Ray is a general purpose distributed computing framework. At a low level, Ray provides fault-tolerant primitives that support applications running across multiple processors. At a higher level, Ray supports scalable reinforcement learning, including the common problem of hyperparameter tuning. In a previous episode, we explored the primitives of Ray as well as Anyscale, the business built around Ray and reinforcement learning. In today’s episode, Richard Liaw explores some of the libraries ...

Modin: Pandas Scalability with Devin Petersohn

July 23, 2020 09:00 - 1000 Bytes

Pandas is a Python data analysis library, and an essential tool in data science. Pandas allows users to load large quantities of data into a data structure called a dataframe, over which the user can call mathematical operations. When the data fits entirely into memory this works well, but sometimes there is too much data for a single box. The Modin project scales Pandas workflows to multiple machines by utilizing Dask or Ray, which are distributed computing primitives for Python programs. ...

SourceGraph: Code Search and Intelligence with Beyang Liu

July 22, 2020 09:00 - 1000 Bytes

A large codebase cannot be searched with naive indexing algorithms. In order to search through a codebase the size of Uber’s it is necessary to build a much more sophisticated indexing system than simple pure text search. SourceGraph is a system for universal code search. It allows developers to more easily onboard to a new codebase, make large refactors, and perform other tasks. SourceGraph can integrate with source control systems, IDEs, and other tools to fit comfortably into an engineer...

Session Replay with Michael Morrissey

July 21, 2020 09:00 - 1000 Bytes

Users do not use web applications in the way that you might expect. And it is not easy to get the data that is necessary to get a full picture. But a newer API within browsers does make this more possible by capturing DOM mutations.  The change capture of these DOM mutations can be stored for replay in the future. After being stored, this change capture can be retrieved and replayed. That allows for comprehensive frontend monitoring, which has been built into a product called FullStory. Mi...

Cortex: Microservices Management with Anish Dhar and Ganesh Datta

July 20, 2020 09:00 - 1000 Bytes

Managing microservices becomes a challenge as the number of services within the organization grows. With that many services comes more interdependencies–downstream and upstream services that may be impacted by an update to your service.  One solution to this problem: a dashboard and newsfeed system that lets you see into the health and changes across your services. With this kind of system, you can avoid accidentally shipping code that will impact other service owners. It can also help with...

ADP Engineering with Tim Halbur

July 17, 2020 09:00 - 1000 Bytes

ADP has been around for more than 70 years, fulfilling payroll and other human resources services. Payroll processing is a complex business, involving the movement of money in accordance with regulatory and legal strictures.  From an engineering point of view, ADP has decades of software behind it, and a bright future of a platform company used by thousands of companies. Balancing the maintenance of old code while charting a course with the new projects is not a simple task.  Tim Halbur is...

Capital Allocation with Blair Silverberg and Chris Olivares

July 16, 2020 09:00 - 1000 Bytes

Software companies can be funded in a variety of ways: venture capital, self-funding, and debt, among others. In order to receive financing, a company is evaluated on its ability to generate future cash flows. After all, a valuation is a number that summarizes the present value of future cash flows. Determining that valuation number is a complicated, subjective process. If the valuation can be determined more intelligently and objectively, then smarter financing decisions can be made. This ...

GitHub Mobile with Brian Lovin and Ryan Nystrom

July 15, 2020 09:00 - 1000 Bytes

GitHub has been a social network for developers for many years. Most social networks are centered around mobile applications, but GitHub sits squarely in a developer’s browser-based desktop workflow. As a result, the design of a mobile app for GitHub is less straightforward. GitHub did acquire a popular mobile client called GitHawk, which was developed by Ryan Nystrom. Since joining GitHub, Ryan has worked on a new mobile app for GitHub, along with a team of engineers including Brian Lovin....

Multimesh with Luke Kysow

July 14, 2020 09:00 - 1000 Bytes

A service mesh provides routing, load balancing, policy management, and other features to a set of services that need to communicate with each other. The mesh can simplify operations across these different services by providing an interface to configure them.  There are lots of different vendors who offer service mesh technology: AWS has AppMesh, Google has Istio (which is open source), Buoyant has Linkerd (which is also open source), and HashiCorp has Consul Connect. Unfortunately, these s...

Metaflow: Netflix Machine Learning Platform with Savin Goyal

July 13, 2020 09:00 - 1000 Bytes

Netflix runs all of its infrastructure on Amazon Web Services. This includes business logic, data infrastructure, and machine learning. By tightly coupling itself to AWS, Netflix has been able to move faster and have strong defaults about engineering decisions. And today, AWS has such an expanse of services that it can be used as a platform to build custom tools. Metaflow is an open source machine learning platform built on top of AWS that allows engineers at Netflix to build directed acycl...

Strapi: Headless CMS with Pierre Burgy

July 10, 2020 09:00 - 1000 Bytes

WordPress has been a dominant force in the world of online publishing for many years because of how battle-tested it is. WordPress is the definitive leader in CMS technology. But there have always been alternatives.  Drupal, Ghost, and other open source CMSes. More recently, there has been an emergence of the headless CMS, such as Contentful, which decouples the CMS backend from the frontend presentation layer. Strapi is a popular open source headless CMS. Pierre Burgy is the founder of St...

Chronosphere: Scalable Metrics Database with Rob Skillington

July 09, 2020 09:00 - 1000 Bytes

M3 is a scalable metrics database originally built to host Uber’s rapidly growing data storage from Prometheus. When Rob Skillington was at Uber, he helped design, implement, and deploy M3. Since leaving Uber, he has co-founded a company around a hosted version of M3 called Chronosphere. If you have access to a scalable metrics database, you might as well start accumulating as much data as possible, right? Not exactly. If your company generates enough data, you probably want to turn down th...

Determined AI: Machine Learning Ops with Neil Conway

July 08, 2020 09:00 - 1000 Bytes

Developing machine learning models is not easy. From the perspective of the machine learning researcher, there is the iterative process of tuning hyperparameters and selecting relevant features. From the perspective of the operations engineer, there is a handoff from development to production, and the management of GPU clusters to parallelize model training. In the last five years, machine learning has become easier to use thanks to point solutions. TensorFlow, cloud provider tools, Spark, ...

The Good Parts of AWS with Daniel Vassallo

July 07, 2020 09:00 - 1000 Bytes

AWS has over 150 different services. Databases, log management, edge computing, and lots of others. Instead of being overwhelmed by all of these products, an engineering team can simplify their workflow by focusing on a small subset of AWS services–the defaults. Daniel Vassalo is the author of The Good Parts of AWS. An excerpt from the book: “The cost of acquiring new information is high and the consequence of deviating from a default choice is low, so sticking with the default will likely ...

Pull Request Environments with Eric Silverman

July 06, 2020 09:00 - 1000 Bytes

The modern release workflow involves multiple stakeholders: engineers, management, designers, and product managers. It is a collaborative process that is often held together with brittle workflows. A developer deploys a new build to an ad hoc staging environment and pastes a link to that environment in Slack. Other stakeholders click on that link, then send messages to each other in Slack, or make comments on the pull request in GitHub. This workflow is far from ideal. Collaborating around ...

Deepgram: End-to-End Speech Recognition with Scott Stephenson

July 03, 2020 09:00 - 1000 Bytes

Deepgram is an end-to-end deep learning platform for speech recognition. Unlike the general purpose APIs from Google or Amazon, Deepgram models are custom-trained for each customer. Whether the customer is a call center, a podcasting company, or a sales department, Deepgram can work with them to build something specific to their use case. Sound data is incredibly rich. Consider all the features in a voice recording: volume, intonation, inflection. And once the speech is transcribed, there a...

DynamoDB with Alex DeBrie

July 02, 2020 09:00 - 1000 Bytes

DynamoDB is a managed NoSQL database service from AWS. It is widely used as a transactional database to fulfill key-value and wide-column data models. In a previous show with Rick Houlihan, we explored how to build a data model and optimize the query patterns for a NoSQL database.  Today’s show is about DynamoDB specifically: partitioning, indexing, query semantics, normalization, table design, and other subjects. We talk through how to be cost conscious, and how to integrate with event-bas...

Snowplow Analytics: Data Collection Platform with Alex Dean

July 01, 2020 09:00 - 1000 Bytes

As a user browses a webpage, that browser session generates events that need to be recorded, validated, enriched, and stored. This data is sometimes called customer data infrastructure, or CDI. This data requires a full stack of different tools: a system on the frontend to collect the data, middleware to transport the data, and backend systems for storing and loading that data into data warehouses and other analytical systems. Snowplow Analytics is a data collection platform for storing eve...

Postman: API Development with Abhinav Asthana

June 30, 2020 09:00 - 1000 Bytes

A software company manages and interacts with hundreds of APIs. These APIs require testing, performance analysis, authorization management, and release management. In a word, APIs require collaboration. Postman is a system for API collaboration. It allows users to test APIs with collections of requests, monitor the API responses, and visualize the query results. Users of Postman can collaborate with their team through Team Workspaces, sharing collections, environments, history, and more. A...

Cresta: Speech ML for Calls with Zayd Enam

June 29, 2020 09:00 - 1000 Bytes

At a customer service center, thousands of hours of audio are generated. This audio provides a wealth of information to transcribe and analyze. With the additional data of the most successful customer service representatives, machine learning models can be trained to identify which speech patterns are associated with a successful worker. By identifying these speaking patterns, a customer service center can continuously improve, with the different representatives learning the different patte...

React Native Ecosystem with Nader Dabit (Summer Break Repeat)

June 26, 2020 09:00 - 1000 Bytes

Originally published July 6, 2017. We are taking a few weeks off. We’ll be back soon with new episodes. React Native allows developers to reuse components from one user interface on multiple platforms. React Native was introduced by Facebook to reduce the pain of teams who were rewriting their user interfaces for web, iOS, and Android. Nader Dabit hosts React Native Radio, a podcast about React Native. Nader also trains companies to use React Native through his company React Native Traini...

Traces: Video Recognition with Veronica Yurchuk and Kostyantyn Shysh (Summer Break Repeat)

June 25, 2020 09:00 - 1000 Bytes

Originally published October 8, 2019. We are taking a few weeks off. We’ll be back soon with new episodes. Video surveillance impacts human lives every day.  On most days, we do not feel the impact of video surveillance. But the effects of video surveillance have tremendous potential. It can be used to solve crimes and find missing children. It can be used to intimidate journalists and empower dictators. Like any piece of technology, video surveillance can be used for good or evil. Video ...

Envoy Mobile with Matt Klein (Summer Break Repeat)

June 24, 2020 09:00 - 1000 Bytes

Originally published July 25, 2019. We are taking a few weeks off. We’ll be back soon with new episodes. Envoy is an open source edge and service proxy that was originally developed at Lyft.  Envoy is often deployed as a sidecar application that runs alongside a service and helps that service by providing features such as routing, rate limiting, telemetry, and security policy. Envoy has gained significant traction in the open source community, and has formed the backbone of popular service...

Data Intensive Applications with Martin Kleppman (Summer Break Repeat)

June 23, 2020 09:00 - 1000 Bytes

Originally published May 2, 2017. We are taking a few weeks off. We’ll be back soon with new episodes. A new programmer learns to build applications using data structures like a queue, a cache, or a database. Modern cloud applications are built using more sophisticated tools like Redis, Kafka, or Amazon S3. These tools do multiple things well, and often have overlapping functionality. Application architecture becomes less straightforward. The applications we are building today are data-int...

freeCodeCamp with Quincy Larson (Summer Break Repeat)

June 22, 2020 09:00 - 1000 Bytes

Originally published December 20, 2019. We are taking a few weeks off. We’ll be back soon with new episodes. freeCodeCamp was started five years ago with the goal of providing free coding education to anyone on the Internet. freeCodeCamp has become the best place to begin learning how to write software. There are many other places that a software engineer should visit on their educational journey, but freeCodeCamp is the best place to start, because it is free, and there are no advertiseme...

Facebook Open Source with Tom Occhino (Summer Break Repeat)

June 19, 2020 09:00 - 1000 Bytes

Originally published April 14, 2017. We are taking a few weeks off. We’ll be back soon with new episodes. Facebook’s open source projects include React, GraphQL, and Cassandra. These projects are key pieces of infrastructure used by thousands of developers–including engineers at Facebook itself. These projects are able to gain traction because Facebook takes time to decouple the projects from their internal infrastructure and clean up the code before releasing them into the wild. Facebook...

Redis with Alvin Richards (Summer Break Repeat)

June 18, 2020 09:00 - 1000 Bytes

Originally published October 24, 2019. We are taking a few weeks off. We’ll be back soon with new episodes. Redis is an in-memory database that persists to disk. Redis is commonly used as an object cache for web applications. Applications are composed of caches and databases. A cache typically stores the data in memory, and a database typically stores the data on disk. Memory has significantly faster access times, but is more expensive and is volatile, meaning that if the computer that is ...

HTTP with Julia Evans (Summer Break Repeat)

June 17, 2020 09:00 - 1000 Bytes

Originally published November 21, 2019. We are taking a few weeks off. We’ll be back soon with new episodes. HTTP is a protocol that allows browsers and web applications to communicate across the Internet. Everyone knows that HTTP is doing some important work, because “HTTP” is at the beginning of most URLs that you enter into your browser. You might be familiar with the request/response model, and HTTP request methods such as GET, PUT, and POST. But unless you have had a reason to learn m...

Stripe Machine Learning Infrastructure with Rob Story and Kelley Rivoire (Summer Break Repeat)

June 16, 2020 09:00 - 1000 Bytes

Originally published June 13, 2019. We are taking a few weeks off. We’ll be back soon with new episodes. Machine learning allows software to improve as that software consumes more data. Machine learning is a tool that every software engineer wants to be able to use. Because machine learning is so broadly applicable, software companies want to make the tools more accessible to the developers across the organization. There are many steps that an engineer must go through to use machine lear...

Twitter Mentions

@software_daily 3 Episodes
@alexyaseen 1 Episode