New York Tech Media
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
New York Tech Media
No Result
View All Result
Home AI & Robotics

Google Research Identifies a Bottleneck in Hyperscale Approaches to AI

New York Tech Editorial Team by New York Tech Editorial Team
October 7, 2021
in AI & Robotics
0
Google Research Identifies a Bottleneck in Hyperscale Approaches to AI
Share on FacebookShare on Twitter

A new paper from Google Research indicates that the current trend towards the curation of very high-volume datasets may be counterproductive to developing effective artificial intelligence systems. In fact, the research indicates that better machine learning products may emerge from being trained on less accurate (i.e. technically ‘worse’) datasets.

If the principles obtained by the researchers are valid, it means that ‘hyperscale’ datasets such as the recently-released LAION-400M (which contains 400 million text/image pairs), and the data behind the GPT-3 neural language engine (containing 175 billion parameters), are potentially subject to a kind of ‘thermal limit’ in traditional and popular machine learning architectures and methodologies, whereby the sheer volume of data ‘saturates’ downstream applications and prevents them generalizing in a useful way.

The researchers also propose alternate methods to rethink hyperscale dataset architecture, in order to redress the imbalance.

The paper states:

‘Delving deeper to understand the reasons that give rise to these phenomena, we show that the saturation behavior we observe is closely related to the way that representations evolve through the layers of the models. We showcase an even more extreme scenario where performance on upstream and downstream are at odds with each other. That is, to have a better downstream performance, we need to hurt upstream accuracy.’

The study is titled Exploring the Limits of Large Scale Pre-training, and comes from four authors at Google Research.

Investigating ‘Saturation’

The authors challenge the prevailing assumptions of machine learning>data relationships in the hyperscale data age: that scaling models and data size notably improves performance (a belief that has been cemented in the hype over GPT-3 since its launch); and that this improved performance ‘passes through’ to downstream tasks in a linear (i.e. desirable) way, so that the on-device algorithms that are eventually launched to market, derived from the otherwise ungovernably huge datasets and undistilled trained models, benefit completely from the insights of the full-sized, upstream architectures.

‘These views,’ the researchers note ‘suggest that spending compute and research effort on improving the performance on one massive corpus would pay off because that would enable us to solve many downstream tasks almost for free.’

But the paper contends that a lack of computing resources and the subsequent ‘economical’ methods of model evaluation are contributing to a false impression of the relationship dynamics between data volume and useful AI systems. The authors identify this habit as ‘a major shortcoming’, since the research community typically assumes that local (positive) results will translate into useful later implementations:

‘[Due] to compute limitations, performance for different choices of hyper-parameter values is not reported. Scaling plots seem more favorable if the hyper-parameter chosen for each scale is fixed or determined by a simple scaling function.’

The researchers further state that many scaling studies are measured not against absolute scales, but as incremental improvements against the state-of-the-art (SotA), observing that ‘there is no reason, a priori, for the scaling to hold outside of the studied range’.

Pre-Training

The paper addresses the practice of ‘pre-training’, a measure designed to save compute resources and cut down on the often horrendous timescales needed to train a model on large-scale data from zero. Pre-training snapshots handle the ‘ABCs’ of the way that data within one domain will become generalized during training, and are commonly used in a variety of machine learning sectors and specialties, from Natural Language Processing (NLP) through to deepfakes.

Previous academic research has found that pre-training can notably improve model robustness and accuracy, but the new paper suggests that the complexity of features, even in relatively short-trained pre-training templates, might be of more benefit if shunted down the line to later processes in the pipeline.

However, this can’t happen if researchers continue to depend on pre-trained models that use current best practice in application of learning rates, which, the research concludes, can notably affect the ultimate accuracy of the final applications of the work. In this respect, the authors note that ‘one cannot hope to find one pre-trained checkpoint that performs well on all possible downstream tasks’.

The Study

To establish the saturation effect, the authors conducted 4800 experiments on Vision Transformers, ResNets and MLP-Mixers, each with a varying number of parameters, from 10 million to 10 billion, all trained on the highest-volume datasets available in the respective sectors, including ImageNet21K and Google’s own JFT-300M.

The results, the paper claims, show that data diversity should be considered as an additional axis when attempting to ‘scale up’ data, model parameters and compute time. As it stands, the heavy concentration of training resources (and researcher attention) on the upstream section of an AI pipeline is effectively blasting downstream applications with an avalanche of parameters up to a point of ‘saturation’, lowering the capability of deployed algorithms to navigate through features and perform inference or effect transformations.

The paper concludes:

‘Through an extensive study, we establish that as we improve the performance of the upstream task either by scaling up or hyper-parameter and architectural choices, the performance of downstream tasks shows a saturating behaviour. In addition, we provide strong empirical evidence that, contrary to the common narrative, scaling does not lead to a one-model-fits-all solution.’

 

Credit: Source link

Previous Post

Skyflow Launches Fintech Data Privacy Vault | Business

Next Post

Highway Hypnosis Casts A Daring Trance On AI Autonomous Cars 

New York Tech Editorial Team

New York Tech Editorial Team

New York Tech Media is a leading news publication that aims to provide the latest tech news, fintech, AI & robotics, cybersecurity, startups & leaders, venture capital, and much more!

Next Post
Highway Hypnosis Casts A Daring Trance On AI Autonomous Cars 

Highway Hypnosis Casts A Daring Trance On AI Autonomous Cars 

  • Trending
  • Comments
  • Latest
Meet the Top 10 K-Pop Artists Taking Over 2024

Meet the Top 10 K-Pop Artists Taking Over 2024

March 17, 2024
Panther for AWS allows security teams to monitor their AWS infrastructure in real-time

Many businesses lack a formal ransomware plan

March 29, 2022
Zach Mulcahey, 25 | Cover Story | Style Weekly

Zach Mulcahey, 25 | Cover Story | Style Weekly

March 29, 2022
How To Pitch The Investor: Ronen Menipaz, Founder of M51

How To Pitch The Investor: Ronen Menipaz, Founder of M51

March 29, 2022
Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

March 29, 2022
UK VC fund performance up on last year

VC-backed Aerium develops antibody treatment for Covid-19

March 29, 2022
Startups On Demand: renovai is the Netflix of Online Shopping

Startups On Demand: renovai is the Netflix of Online Shopping

2
Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

1
Menashe Shani Accessibility High Tech on the low

Revolutionizing Accessibility: The Story of Purple Lens

1

Netgear announces a $1,500 Wi-Fi 6E mesh router

0
These apps let you customize Windows 11 to bring the taskbar back to life

These apps let you customize Windows 11 to bring the taskbar back to life

0
This bipedal robot uses propeller arms to slackline and skateboard

This bipedal robot uses propeller arms to slackline and skateboard

0
Coffee Nova’s $COFFEE Token

Coffee Nova’s $COFFEE Token

May 29, 2025
Money TLV website

BridgerPay to Spotlight Cross-Border Payments Innovation at Money TLV 2025

May 27, 2025
The Future of Software Development: Why Low-Code Is Here to Stay

Building Brand Loyalty Starts With Your Team

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Creative Swag Ideas for Hackathons & Launch Parties

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Strengthening Cloud Security With Automation

May 22, 2025
How Local IT Services in Anderson Can Boost Your Business Efficiency

Why VPNs Are a Must for Entrepreneurs in Asia

May 22, 2025

Recommended

Coffee Nova’s $COFFEE Token

Coffee Nova’s $COFFEE Token

May 29, 2025
Money TLV website

BridgerPay to Spotlight Cross-Border Payments Innovation at Money TLV 2025

May 27, 2025
The Future of Software Development: Why Low-Code Is Here to Stay

Building Brand Loyalty Starts With Your Team

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Creative Swag Ideas for Hackathons & Launch Parties

May 23, 2025

Categories

  • AI & Robotics
  • Benzinga
  • Cybersecurity
  • FinTech
  • New York Tech
  • News
  • Startups & Leaders
  • Venture Capital

Tags

3D bio-printing acoustic AI Allseated B2B marketing Business carbon footprint climate change coding Collaborations Companies To Watch consumer tech crypto cryptocurrency deforestation drones earphones Entrepreneur Fetcherr Finance Fintech food security Investing Investors investorsummit israelitech Leaders LinkedIn Leaders Metaverse news OurCrowd PR Real Estate reforestation software start- up Startups Startups On Demand startuptech Tech Tech leaders technology UAVs Unlimited Robotics VC
  • Contact Us
  • Privacy Policy
  • Terms and conditions

© 2024 All Rights Reserved - New York Tech Media

No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital

© 2024 All Rights Reserved - New York Tech Media