Vector Podcast artwork

Vector Podcast

21 episodes - English - Latest episode: 15 days ago - ★★★★★ - 2 ratings

Vector Podcast is here to bring you the depth and breadth of Search Engine Technology, Product, Marketing, Business. In the podcast we talk with engineers, entrepreneurs, thinkers and tinkerers, who put their soul into search.

Depending on your interest, you should find a matching topic for you -- whether it is deep algorithmic aspect of search engines and information retrieval field, or examples of products offering deep tech to its users.

"Vector" -- because it aims to cover an emerging field of vector similarity search, giving you the ability to search content beyond text: audio, video, images and more.

"Vector" also because it is all about vector in your profession, product, marketing and business.

Science Education Self-Improvement
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

Saurabh Rai - Growing Resume Matcher

April 12, 2024 19:17 - 26 minutes - 24 MB

Topics: 00:00 Intro - how do you like our new design? 00:52 Greets 01:55 Saurabh's background 03:04 Resume Matcher: 4.5K stars, 800 community members, 1.5K forks 04:11 How did you grow the project? 05:42 Target audience and how to use Resume Matcher 09:00 How did you attract so many contributors? 12:47 Architecture aspects 15:10 Cloud or not 16:12 Challenges in maintaining OS projects 17:56 Developer marketing with Swirl AI Connect 21:13 What you (listener) can help with 22:52 W...

Sid Probstein - Creator of SWIRL - Search in siloed data with LLMs

July 22, 2023 05:03 - 1 hour - 84.6 MB

Topics: 00:00 Intro 00:22 Quick demo of SWIRL on the summary transcript of this episode 01:29 Sid’s background 08:50 Enterprise vs Federated search 17:48 How vector search covers for missing folksonomy in enterprise data 26:07 Relevancy from vector search standpoint 31:58 How ChatGPT improves programmer’s productivity 32:57 Demo! 45:23 Google PSE 53:10 Ideal user of SWIRL 57:22 Where SWIRL sits architecturally 1:01:46 How to evolve SWIRL with domain expertise 1:04:59 Reasons to...

Atita Arora - Search Relevance Consultant - Revolutionizing E-commerce with Vector Search

May 17, 2023 08:12 - 1 hour - 84.5 MB

Topics: 00:00 Intro 02:20 Atita’s path into search engineering 09:00 When it’s time to contribute to open source 12:08 Taking management role vs software development 14:36 Knowing what you like (and coming up with a Solr course) 19:16 Read the source code (and cook) 23:32 Open Bistro Innovations Lab and moving to Germany 26:04 Affinity to Search world and working as a Search Relevance Consultant 28:39 Bringing vector search to Chorus and Querqy 34:09 What Atita learnt from Eric Pug...

Connor Shorten - Research Scientist, Weaviate - ChatGPT, LLMs, Form vs Meaning

March 11, 2023 19:38 - 1 hour - 85.3 MB

Topics: 00:00 Intro 01:54 Things Connor learnt in the past year that changed his perception of Vector Search 02:42 Is search becoming conversational? 05:46 Connor asks Dmitry: How Large Language Models will change Search? 08:39 Vector Search Pyramid 09:53 Large models, data, Form vs Meaning and octopus underneath the ocean 13:25 Examples of getting help from ChatGPT and how it compares to web search today 18:32 Classical search engines with URLs for verification vs ChatGPT-style answ...

Evgeniya Sukhodolskaya - Data Advocate, Toloka - Data at the core of all the cool ML

January 28, 2023 10:19 - 1 hour - 79.4 MB

Toloka’s support for Academia: grants and educator partnerships https://toloka.ai/collaboration-with-educators-form https://toloka.ai/research-grants-form These are pages leading to them: https://toloka.ai/academy/education-partnerships https://toloka.ai/grants Topics: 00:00 Intro 01:25 Jenny’s path from graduating in ML to a Data Advocate role 07:50 What goes into the labeling process with Toloka 11:27 How to prepare data for labeling and design tasks 16:01 Jenny’s take on why Re...

Yaniv Vaknin - Director of Product, Searchium - Hardware accelerated vector search

December 21, 2022 20:35 - 1 hour - 67.3 MB

00:00 Introduction 01:11 Yaniv’s background and intro to Searchium & GSI 04:12 Ways to consume the APU acceleration for vector search 05:39 Power consumption dimension in vector search 7:40 Place of the platform in terms of applications, use cases and developer experience 12:06 Advantages of APU Vector Search Plugins for Elasticsearch and OpenSearch compared to their own implementations 17:54 Everyone needs to save: the economic profile of the APU solution 20:51 Features and ANN algo...

Doug Turnbull - Staff Relevance Engineer, Shopify - Search as a constant experimentation cycle

October 01, 2022 07:32 - 1 hour - 85.5 MB

Topics: 00:00 Intro 01:30 Doug’s story in Search 04:55 How Quepid came about 10:57 Relevance as product at Shopify: challenge, process, tools, evaluation 15:36 Search abandonment in Ecommerce 21:30 Rigor in A/B testing 23:53 Turn user intent and content meaning into tokens, not words into tokens 32:11 Use case for vector search in Maps. What about search in other domains? 38:05 Expanding on dense approaches 40:52 Sparse, dense, hybrid anyone? 48:18 Role of HNSW, scalability and ne...

Malte Pietsch - CTO, Deepset - Passion in NLP and bridging the academia-industry gap with Haystack

August 30, 2022 07:27 - 1 hour - 78.9 MB

Topics: 00:00 Introduction 01:12 Malte’s background 07:58 NLP crossing paths with Search 11:20 Product discovery: early stage repetitive use cases pre-dating Haystack 16:25 Acyclic directed graph for modeling a complex search pipeline 18:22 Early integrations with Vector Databases 20:09 Aha!-use case in Haystack 23:23 Capabilities of Haystack today 30:11 Deepset Cloud: end-to-end deployment, experiment tracking, observability, evaluation, debugging and communicating with stakeholder...

Max Irwin - Founder, MAX.IO - On economics of scale in embedding computation with Mighty

June 16, 2022 18:27 - 1 hour - 102 MB

00:00 Introduction 01:10 Max's deep experience in search and how he transitioned from structured data 08:28 Query-term dependence problem and Max's perception of the Vector Search field 12:46 Is vector search a solution looking for a problem? 20:16 How to move embeddings computation from GPU to CPU and retain GPU latency? 27:51 Plug-in neural model into Java? Example with a Hugging Face model 33:02 Web-server Mighty and its philosophy 35:33 How Mighty compares to in-DB embedding layer...

Grant Ingersoll - Fractional CTO, Leading Search Consultant - Engineering Better Search

June 09, 2022 14:51 - 1 hour - 66.6 MB

Vector Podcast Live Topics: 00:00 Kick-off introducing co:rise study platform 03:03 Grant’s background 04:58 Principle of 3 C’s in the life of a CTO: Code, Conferences and Customers 07:16 Principle of 3 C’s in the Search Engine development: Content, Collaboration and Context 11:51 Balance between manual tuning in pursuit to learn and Machine Learning 15:42 How to nurture intuition in building search engine algorithms 18:51 How to change the approach of organizations to true experimen...

Daniel Tunkelang - Leading Search Consultant - Leveraging ML for query and content understanding

May 23, 2022 13:00 - 1 hour - 57.2 MB

Topics: 00:00 Kick-off by Judy Zhu 01:33 Introduction by Dmitry Kan and his bio! 03:03 Daniel’s background 04:46 “Science is the difference between instinct and strategy” 07:41 Search as a personal learning experience 11:53 Why do we need Machine Learning in Search, or can we use manually curated features? 16:47 Swimming up-stream from relevancy: query / content understanding and where to start? 23:49 Rule-based vs Machine Learning approaches to Query Understanding: Pareto principle...

Yusuf Sarıgöz - AI Research Engineer, Qdrant - Getting to know your data with metric learning

May 07, 2022 20:37 - 1 hour - 64.1 MB

Topics: 00:00 Intro 01:03 Yusuf’s background 03:00 Multimodal search in tech and humans 08:53 CLIP: discovering hidden semantics 13:02 Where to start to apply metric learning in practice. AutoEncoder architecture included! 19:00 Unpacking it further: what is metric learning and the difference with deep metric learning? 28:50 How Deep Learning allowed us to transition from pixels to meaning in the images 32:05 Increasing efficiency: vector compression and quantization aspects 34:25 Y...

Jo Bergum - Distinguished Engineer, Yahoo! Vespa - Journey of Vespa from Sparse into Neural Search

April 12, 2022 12:29 - 1 hour - 79.3 MB

Topics: 00:00 Introduction 01:21 Jo Kristian’s background in Search / Recommendations since 2001 in Fast Search & Transfer (FAST) 03:16 Nice words about Trondheim 04:37 Role of NTNU in supplying search talent and having roots in FAST 05:33 History of Vespa from keyword search 09:00 Architecture of Vespa and programming language choice: C++ (content layer), Java (HTTP requests and search plugins) and Python (pyvespa) 13:45 How Python API enables evaluation of the latest ML models with...

Amin Ahmad - CTO, Vectara - Algolia / Elasticsearch-like search product on neural search principles

February 16, 2022 16:14 - 1 hour - 65.1 MB

Update: ZIR.AI has relaunched as Vectara: https://vectara.com/ Topics: 00:00 Intro 00:54 Amin’s background at Google Research and affinity to NLP and vector search field 05:28 Main focus areas of ZIR.AI in neural search 07:26 Does the company offer neural network training to clients? Other support provided with ranking and document format conversions 08:51 Usage of open source vs developing own tech 10:17 The core of ZIR.AI product 14:36 API support, communication protocols and P95/P...

Yury Malkov - Staff Engineer, Twitter - Author of the most adopted ANN algorithm HNSW

January 31, 2022 09:41 - 1 hour - 82.5 MB

Topics: 00:00 Introduction 01:04 Yury’s background in laser physics, computer vision and startups 05:14 How Yury entered the field of nearest neighbor search and his impression of it 09:03 “Not all Small Worlds are Navigable” 10:10 Gentle introduction into the theory of Small World Navigable Graphs and related concepts 13:55 Further clarification on the input constraints for the NN search algorithm design 15:03 What did not work in NSW algorithm and how did Yury set up to invent new a...

Joan Fontanals - Principal Engineer - Jina AI

January 19, 2022 21:02 - 56 minutes - 52 MB

Topics: 00:00 Intro 00:42 Joan's background 01:46 What attracted Joan's attention in Jina as a company and product? 04:39 Main area of focus for Joan in the product 05:46 How Open Source model works for Jina? 08:38 Deeper dive into Jina.AI as a product and technology stack 11:57 Does Jina fit the use cases of smaller / mid-size players with smaller amount of data? 13:45 KNN/ANN algorithms available in Jina 16:05 BigANN competition and BuddyPQ, increasing 12% in recall over FAISS 17...

Tom Lackner - VP Engineering - Classic.com - on Qdrant, NFT, challenges and joys of ML engineering

December 23, 2021 16:01 - 47 minutes - 43.4 MB

Show notes: - The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction https://research.google/pubs/pub46555/ - IEEE MLOps Standard for Ethical AI https://docs.google.com/document/d/1x... - Qdrant: https://qdrant.tech/ - Elixir connector for Qdrant by Tom: https://github.com/tlack/exqdr - Other 6 vector databases: https://towardsdatascience.com/milvus... - ByT5: Towards a token-free future with pre-trained byte-to-byte models https://arxiv.org/abs/2105.13626...

Connor Shorten - PhD Researcher - Florida Atlantic University & Founder at Henry AI Labs

December 23, 2021 13:32 - 59 minutes - 54.1 MB

Show notes: - On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained) [YouTube](https://www.youtube.com/watch?v=3_qGr...) - [2108.07258 On the Opportunities and Risks of Foundation Models](https://arxiv.org/abs/2108.07258) - [2005.11401 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) - Negative Data Augmentation: https://arxiv.org/abs/2102.05113 - Beyond Accuracy: Behavioral Testing of NLP models wit...

Filip Haltmayer (Data Engineer, Ziliz) on Milvus vector database and working with clients

December 23, 2021 13:28 - 1 hour - 66.5 MB

Order your Milvus t-shirt / hoodie! https://milvus.typeform.com/to/IrnLAgui Thanks Filip for arranging. Show notes: - Milvus DB: https://milvus.io/ - Not All Vector Databases Are Made Equal: https://towardsdatascience.com/milvus... - Milvus talk at Haystack: https://www.youtube.com/watch?v=MLSMs... - BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models https://arxiv.org/abs/2104.08663 - End-to-End Environmental Sound Classification using a 1D Con...

Bob van Luijt (CEO, Semi) on the Weaviate vector search engine

December 23, 2021 13:17 - 1 hour - 82.8 MB

1. Layering problem: www.edge.org/conversation/sean_…-layers-of-reality 2. Podcast with Etienne Dilocker (SeMI Technologies Co-Founder & CTO): www.youtube.com/watch?v=6lkanzOqhDs 3. SOC2: linfordco.com/blog/soc-1-vs-soc-2-audit-reports/ 4. Dmitry's post on 7 Vector Databases: towardsdatascience.com/milvus-pineco…-9c65a3bd0696 5. Billion-Scale ANN Challenge: big-ann-benchmarks.com/index.html 6. Weaviate Introduction: www.semi.technology/developers/weaviate/current/ Newsletter: www.semi.t...

Greg Kogan - Pinecone - Vector Podcast with Dmitry Kan

December 06, 2021 18:00 - 43 minutes - 40.3 MB

Show notes: 1. Pinecone 2.0: https://www.pinecone.io/learn/pinecon... It is GA and free: https://www.pinecone.io/learn/v2-pric... 2. Get your “Love Thy Nearest Neighbour” t-shirt :) shoot an email to [email protected] 3. Billion-Scale Approximate Nearest Neighbour Search Challenge: https://big-ann-benchmarks.com/index.... 4. ANNOY: https://github.com/spotify/annoy 5. FAISS: https://github.com/facebookresearch/f... 6. HNSW: https://github.com/nmslib/hnswlib 7. “How Zero Results Are K...

Twitter Mentions

@dmitrykan 5 Episodes
@srvbhr 5 Episodes
@chelseabfinn 1 Episode
@pinecone_io 1 Episode
@psh_lewis 1 Episode
@cshorten30 1 Episode
@mmbronstein 1 Episode
@_srbhr_ 1 Episode
@srbhr_ 1 Episode
@grigoriy_kogan 1 Episode
@bobvanluijt 1 Episode
@jeffclune 1 Episode
@jobergum 1 Episode