New York Tech Media
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
New York Tech Media
No Result
View All Result
Home AI & Robotics

Dr. Ram Sriharsha, VP of Engineering at Pinecone – Interview Series

New York Tech Editorial Team by New York Tech Editorial Team
February 4, 2023
in AI & Robotics
0
Dr. Ram Sriharsha, VP of Engineering at Pinecone – Interview Series
Share on FacebookShare on Twitter

Dr. Ram Sriharsha, is the VP of Engineering and R&D at Pinecone.

Before joining Pinecone, Ram had VP roles at Yahoo, Databricks, and Splunk. At Yahoo, he was both a principal software engineer and then research scientist; at Databricks, he was the product and engineering lead for the unified analytics platform for genomics; and, in his three years at Splunk, he played multiple roles including Sr Principal Scientist, VP Engineering and Distinguished Engineer.

Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. It combines vector search libraries, capabilities such as filtering, and distributed infrastructure to provide high performance and reliability at any scale.

What initially attracted you to machine learning?

High dimensional statistics, learning theory and topics like that were what attracted me to machine learning. They are mathematically well defined, can be reasoned and have some fundamental insights to offer on what learning means, and how to design algorithms that can learn efficiently.

Previously you were Vice President of Engineering at Splunk, a data platform that helps turn data into action for Observability, IT, Security and more. What were some of your key takeaways from this experience?

I hadn’t realized until I got to Splunk how diverse the use cases in enterprise search are: people use Splunk for log analytics, observability and security analytics among myriads of other use cases. And what is common to a lot of these use cases is the idea of detecting similar events or highly dissimilar (or anomalous) events in unstructured data. This turns out to be a hard problem and traditional means of searching through such data aren’t very scalable. During my time at Splunk I initiated research around these areas on how we could use machine learning (and deep learning) for log mining, security analytics, etc. Through that work, I came to realize that vector embeddings and vector search would end up being a fundamental primitive for new approaches to these domains.

Could you describe for us what is vector search?

In traditional search (otherwise known as keyword search), you are looking for keyword matches between a query and documents (this could be tweets, web documents, legal documents, what have you). To do this, you split up your query into its tokens, retrieve documents that contain the given token and merge and rank to determine the most relevant documents for a given query.

The main problem of course, is that to get relevant results, your query has to have keyword matches in the document.  A classic problem with traditional search is: if you search for “pop” you will match “pop music”, but will not match “soda”, etc. as there is no keyword overlap between “pop” and documents containing “soda”, even though we know that colloquially in many areas in the US, “pop” means the same as “soda”.

In vector search, you start by converting both queries and documents to a vector in some high dimensional space. This is usually done by passing the text through a deep learning model like OpenAI’s LLMs or other language models. What you get as a result is an array of floating point numbers that can be thought of as a vector in some high dimensional space.

The core idea is that nearby vectors in this high dimensional space are also semantically similar. Going back to our example of “soda” and “pop”, if the model is trained on the right corpus, it is likely to consider “pop” and “soda” semantically similar and thereby the corresponding embeddings will be close to each other in the embedding space. If that is the case, then retrieving nearby documents for a given query becomes the problem of searching for the nearest neighbors of the corresponding query vector in this high dimensional space.

Could you describe what the vector database is and how it enables the building of high-performance vector search applications?

A vector database stores, indexes and manages these embeddings (or vectors). The main challenges a vector database solves are:

  • Building an efficient search index over vectors to answer nearest neighbor queries
  • Building efficient auxiliary indices and data structures to support query filtering. For example, suppose you wanted to search over only a subset of the corpus, you should be able to leverage the existing search index without having to rebuild it

Support efficient updates and keep both the data and the search index fresh, consistent, durable, etc.

What are the different types of machine learning algorithms that are used at Pinecone?

We generally work on approximate nearest neighbor search algorithms and develop new algorithms for efficiently updating, querying and otherwise dealing with large amounts of data in as cost effective a manner as possible.

We also work on algorithms that combine dense and sparse retrieval for improved search relevance.

 What are some of the challenges behind building scalable search?

While approximate nearest neighbor search has been researched for decades, we believe there is a lot left to be uncovered.

In particular, when it comes to designing large scale nearest neighbor search that is cost effective, in performing efficient filtering at scale, or in designing algorithms that support high volume updates and generally fresh indexes are all challenging problems today.

What are some of the different types of use cases that this technology can be used for?

The spectrum of use cases for vector databases is growing by the day. Apart from its uses in semantic search, we also see it being used in image search, image retrieval, generative AI, security analytics, etc.

What is your vision for the future of search?

I think the future of search will be AI driven, and I don’t think this is very far off. In that future, I expect vector databases to be a core primitive. We like to think of vector databases as the long term memory (or the external knowledge base) of AI.

Thank you for the great interview, readers who wish to learn more should visit Pinecone.

Credit: Source link

Previous Post

Drata Audit Hub unifies customer and auditor communication

Next Post

Becoming Indistractable, Time Management, Focus, and ChatGPT

New York Tech Editorial Team

New York Tech Editorial Team

New York Tech Media is a leading news publication that aims to provide the latest tech news, fintech, AI & robotics, cybersecurity, startups & leaders, venture capital, and much more!

Next Post
Becoming Indistractable, Time Management, Focus, and ChatGPT

Becoming Indistractable, Time Management, Focus, and ChatGPT

  • Trending
  • Comments
  • Latest
Meet the Top 10 K-Pop Artists Taking Over 2024

Meet the Top 10 K-Pop Artists Taking Over 2024

March 17, 2024
Panther for AWS allows security teams to monitor their AWS infrastructure in real-time

Many businesses lack a formal ransomware plan

March 29, 2022
Zach Mulcahey, 25 | Cover Story | Style Weekly

Zach Mulcahey, 25 | Cover Story | Style Weekly

March 29, 2022
How To Pitch The Investor: Ronen Menipaz, Founder of M51

How To Pitch The Investor: Ronen Menipaz, Founder of M51

March 29, 2022
Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

March 29, 2022
UK VC fund performance up on last year

VC-backed Aerium develops antibody treatment for Covid-19

March 29, 2022
Startups On Demand: renovai is the Netflix of Online Shopping

Startups On Demand: renovai is the Netflix of Online Shopping

2
Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

1
Menashe Shani Accessibility High Tech on the low

Revolutionizing Accessibility: The Story of Purple Lens

1

Netgear announces a $1,500 Wi-Fi 6E mesh router

0
These apps let you customize Windows 11 to bring the taskbar back to life

These apps let you customize Windows 11 to bring the taskbar back to life

0
This bipedal robot uses propeller arms to slackline and skateboard

This bipedal robot uses propeller arms to slackline and skateboard

0
The Future of “I Do”: How Technology is Revolutionizing Weddings in 2025

The Future of “I Do”: How Technology is Revolutionizing Weddings in 2025

March 19, 2025
Eldad Tamir

AI vs. Traditional Investing: How FINQ’s SEC RIA License Signals a New Era in Wealth Management

March 17, 2025
Overcoming Payment Challenges: How Waves Audio Streamlined Transactions with BridgerPay

Overcoming Payment Challenges: How Waves Audio Streamlined Transactions with BridgerPay

March 16, 2025
Arvatz and Iyer

PointFive and Emertel Forge Strategic Partnership to Elevate Enterprise FinOps in ANZ

March 13, 2025
Global Funeral Traditions Meet Technology

Global Funeral Traditions Meet Technology

March 9, 2025
Canditech website

Canditech is Revolutionizing Hiring With Their New Product

March 9, 2025

Recommended

The Future of “I Do”: How Technology is Revolutionizing Weddings in 2025

The Future of “I Do”: How Technology is Revolutionizing Weddings in 2025

March 19, 2025
Eldad Tamir

AI vs. Traditional Investing: How FINQ’s SEC RIA License Signals a New Era in Wealth Management

March 17, 2025
Overcoming Payment Challenges: How Waves Audio Streamlined Transactions with BridgerPay

Overcoming Payment Challenges: How Waves Audio Streamlined Transactions with BridgerPay

March 16, 2025
Arvatz and Iyer

PointFive and Emertel Forge Strategic Partnership to Elevate Enterprise FinOps in ANZ

March 13, 2025

Categories

  • AI & Robotics
  • Benzinga
  • Cybersecurity
  • FinTech
  • New York Tech
  • News
  • Startups & Leaders
  • Venture Capital

Tags

3D bio-printing acoustic AI Allseated B2B marketing Business carbon footprint climate change coding Collaborations Companies To Watch consumer tech cryptocurrency deforestation drones earphones Entrepreneur Fetcherr Finance Fintech food security Investing Investors investorsummit israelitech Leaders LinkedIn Leaders Metaverse news OurCrowd PR Real Estate reforestation software start- up startupnation Startups Startups On Demand startuptech Tech Tech leaders technology UAVs Unlimited Robotics VC
  • Contact Us
  • Privacy Policy
  • Terms and conditions

© 2024 All Rights Reserved - New York Tech Media

No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital

© 2024 All Rights Reserved - New York Tech Media