New York Tech Media
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
New York Tech Media
No Result
View All Result
Home AI & Robotics

Dr. Ram Sriharsha, VP of Engineering at Pinecone – Interview Series

New York Tech Editorial Team by New York Tech Editorial Team
February 4, 2023
in AI & Robotics
0
Dr. Ram Sriharsha, VP of Engineering at Pinecone – Interview Series
Share on FacebookShare on Twitter

Dr. Ram Sriharsha, is the VP of Engineering and R&D at Pinecone.

Before joining Pinecone, Ram had VP roles at Yahoo, Databricks, and Splunk. At Yahoo, he was both a principal software engineer and then research scientist; at Databricks, he was the product and engineering lead for the unified analytics platform for genomics; and, in his three years at Splunk, he played multiple roles including Sr Principal Scientist, VP Engineering and Distinguished Engineer.

Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. It combines vector search libraries, capabilities such as filtering, and distributed infrastructure to provide high performance and reliability at any scale.

What initially attracted you to machine learning?

High dimensional statistics, learning theory and topics like that were what attracted me to machine learning. They are mathematically well defined, can be reasoned and have some fundamental insights to offer on what learning means, and how to design algorithms that can learn efficiently.

Previously you were Vice President of Engineering at Splunk, a data platform that helps turn data into action for Observability, IT, Security and more. What were some of your key takeaways from this experience?

I hadn’t realized until I got to Splunk how diverse the use cases in enterprise search are: people use Splunk for log analytics, observability and security analytics among myriads of other use cases. And what is common to a lot of these use cases is the idea of detecting similar events or highly dissimilar (or anomalous) events in unstructured data. This turns out to be a hard problem and traditional means of searching through such data aren’t very scalable. During my time at Splunk I initiated research around these areas on how we could use machine learning (and deep learning) for log mining, security analytics, etc. Through that work, I came to realize that vector embeddings and vector search would end up being a fundamental primitive for new approaches to these domains.

Could you describe for us what is vector search?

In traditional search (otherwise known as keyword search), you are looking for keyword matches between a query and documents (this could be tweets, web documents, legal documents, what have you). To do this, you split up your query into its tokens, retrieve documents that contain the given token and merge and rank to determine the most relevant documents for a given query.

The main problem of course, is that to get relevant results, your query has to have keyword matches in the document.  A classic problem with traditional search is: if you search for “pop” you will match “pop music”, but will not match “soda”, etc. as there is no keyword overlap between “pop” and documents containing “soda”, even though we know that colloquially in many areas in the US, “pop” means the same as “soda”.

In vector search, you start by converting both queries and documents to a vector in some high dimensional space. This is usually done by passing the text through a deep learning model like OpenAI’s LLMs or other language models. What you get as a result is an array of floating point numbers that can be thought of as a vector in some high dimensional space.

The core idea is that nearby vectors in this high dimensional space are also semantically similar. Going back to our example of “soda” and “pop”, if the model is trained on the right corpus, it is likely to consider “pop” and “soda” semantically similar and thereby the corresponding embeddings will be close to each other in the embedding space. If that is the case, then retrieving nearby documents for a given query becomes the problem of searching for the nearest neighbors of the corresponding query vector in this high dimensional space.

Could you describe what the vector database is and how it enables the building of high-performance vector search applications?

A vector database stores, indexes and manages these embeddings (or vectors). The main challenges a vector database solves are:

  • Building an efficient search index over vectors to answer nearest neighbor queries
  • Building efficient auxiliary indices and data structures to support query filtering. For example, suppose you wanted to search over only a subset of the corpus, you should be able to leverage the existing search index without having to rebuild it

Support efficient updates and keep both the data and the search index fresh, consistent, durable, etc.

What are the different types of machine learning algorithms that are used at Pinecone?

We generally work on approximate nearest neighbor search algorithms and develop new algorithms for efficiently updating, querying and otherwise dealing with large amounts of data in as cost effective a manner as possible.

We also work on algorithms that combine dense and sparse retrieval for improved search relevance.

 What are some of the challenges behind building scalable search?

While approximate nearest neighbor search has been researched for decades, we believe there is a lot left to be uncovered.

In particular, when it comes to designing large scale nearest neighbor search that is cost effective, in performing efficient filtering at scale, or in designing algorithms that support high volume updates and generally fresh indexes are all challenging problems today.

What are some of the different types of use cases that this technology can be used for?

The spectrum of use cases for vector databases is growing by the day. Apart from its uses in semantic search, we also see it being used in image search, image retrieval, generative AI, security analytics, etc.

What is your vision for the future of search?

I think the future of search will be AI driven, and I don’t think this is very far off. In that future, I expect vector databases to be a core primitive. We like to think of vector databases as the long term memory (or the external knowledge base) of AI.

Thank you for the great interview, readers who wish to learn more should visit Pinecone.

Credit: Source link

Previous Post

Drata Audit Hub unifies customer and auditor communication

Next Post

Becoming Indistractable, Time Management, Focus, and ChatGPT

New York Tech Editorial Team

New York Tech Editorial Team

New York Tech Media is a leading news publication that aims to provide the latest tech news, fintech, AI & robotics, cybersecurity, startups & leaders, venture capital, and much more!

Next Post
Becoming Indistractable, Time Management, Focus, and ChatGPT

Becoming Indistractable, Time Management, Focus, and ChatGPT

  • Trending
  • Comments
  • Latest
Meet the Top 10 K-Pop Artists Taking Over 2024

Meet the Top 10 K-Pop Artists Taking Over 2024

March 17, 2024
Panther for AWS allows security teams to monitor their AWS infrastructure in real-time

Many businesses lack a formal ransomware plan

March 29, 2022
Zach Mulcahey, 25 | Cover Story | Style Weekly

Zach Mulcahey, 25 | Cover Story | Style Weekly

March 29, 2022
How To Pitch The Investor: Ronen Menipaz, Founder of M51

How To Pitch The Investor: Ronen Menipaz, Founder of M51

March 29, 2022
10 Raunchy Movies on Netflix You Won’t Regret Watching

10 Raunchy Movies on Netflix You Won’t Regret Watching

May 20, 2024
Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

March 29, 2022
Startups On Demand: renovai is the Netflix of Online Shopping

Startups On Demand: renovai is the Netflix of Online Shopping

2
Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

1
Menashe Shani Accessibility High Tech on the low

Revolutionizing Accessibility: The Story of Purple Lens

1

Netgear announces a $1,500 Wi-Fi 6E mesh router

0
These apps let you customize Windows 11 to bring the taskbar back to life

These apps let you customize Windows 11 to bring the taskbar back to life

0
This bipedal robot uses propeller arms to slackline and skateboard

This bipedal robot uses propeller arms to slackline and skateboard

0
laptop on glass table

Automat-it Cuts Deployment Friction as Monce Scales AI Order Processing on AWS

April 13, 2026
Lee's Famous Recipe Chicken

Why Lee’s Famous Recipe Chicken Is Betting on Hi Auto to Quietly Rewire the Drive-Thru

April 9, 2026
computer generated image of letters

San Francisco Tribune Lists 11 HumanX Startups Moving AI Closer to the Operating Core

April 8, 2026
Impala CEO and Highrise AI CEO

The Industrialization of AI Infrastructure: What Impala and Highrise AI Reveal About the Next Scaling Frontier

April 7, 2026
Employee Time Tracking

What is an Employee Time Tracking Solution? A Definite Guide for 2026

March 31, 2026
Voltify founders

Voltify Raises $30 Million Seed Round as It Challenges $1 Trillion Rail Electrification Model

March 31, 2026

Recommended

laptop on glass table

Automat-it Cuts Deployment Friction as Monce Scales AI Order Processing on AWS

April 13, 2026
Lee's Famous Recipe Chicken

Why Lee’s Famous Recipe Chicken Is Betting on Hi Auto to Quietly Rewire the Drive-Thru

April 9, 2026
computer generated image of letters

San Francisco Tribune Lists 11 HumanX Startups Moving AI Closer to the Operating Core

April 8, 2026
Impala CEO and Highrise AI CEO

The Industrialization of AI Infrastructure: What Impala and Highrise AI Reveal About the Next Scaling Frontier

April 7, 2026

Categories

  • AI & Robotics
  • Benzinga
  • Cybersecurity
  • FinTech
  • New York Tech
  • News
  • Startups & Leaders
  • Venture Capital

Tags

AI AI QSRs Allseated Automat-it AWS B2B marketing Business CISO CISO Whisperer Collaborations Companies To Watch cryptocurrency Cybersecurity Entrepreneur Fetcherr Finance FINQ Fintech Funding Announcement hi-tech Hi Auto Impala Investing Investors investorsummit Israel israelitech Leaders LinkedIn Leaders Metaverse Mindset Minnesota omri hurwitz PointFive PR QSR Real Estate start- up startupnation Startups Startups On Demand Tech Tech leaders Unlimited Robotics VC
  • Contact Us
  • Privacy Policy
  • Terms and conditions

© 2024 All Rights Reserved - New York Tech Media

No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital

© 2024 All Rights Reserved - New York Tech Media