New York Tech Media
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
New York Tech Media
No Result
View All Result
Home AI & Robotics

Identifying Sponsored Content in News Sites With Machine Learning

New York Tech Editorial Team by New York Tech Editorial Team
November 11, 2021
in AI & Robotics
0
Identifying Sponsored Content in News Sites With Machine Learning
Share on FacebookShare on Twitter

Researchers from the Netherlands have developed a new machine learning method that’s capable of distinguishing sponsored or otherwise paid content within news platforms, to an accuracy of more than 90%, in response to growing interest from advertisers in ‘native’ advertising formats that are difficult to distinguish from ‘real’ journalistic output.

The new paper, titled Distinguishing Commercial from Editorial Content in News, comes from researchers at Leiden University.

Commercial (red) and editorial (blue) sub-graphs emerging from analysis of the data. Source: https://arxiv.org/pdf/2111.03916.pdf

Commercial (red) and editorial (blue) sub-graphs emerging from analysis of the data. Source: https://arxiv.org/pdf/2111.03916.pdf

The authors observe that though more serious publications, which can more easily dictate terms to advertisers, will make a reasonable effort to distinguish ‘partner content’ from the general run of news and analysis, the standards are slowly but inexorably shifting to increased integration between editorial and commercial teams on an outlet, which they consider an alarming and negative trend.

‘The ability to disguise content, willingly or unwillingly, and the probability that advertorials are not recognized as such even if properly labelled is significant. Marketers call it native [advertising] for a reason.’

Some current examples of native advertising, variously called 'partner content', 'brand content', and many other appellations designed to subtly obscure the distinction between native and commercially-placed content in journalistic platforms.

Some current examples of native advertising, variously called ‘partner content’, ‘brand content’, and many other appellations designed to subtly obscure the distinction between native and commercially-placed content in journalistic platforms.

The work was carried out as part of a broader investigation into networked news culture at the ACED Reverb Channel, based in Amsterdam, which concentrates on data-driven analysis of evolving journalistic trends.

Acquiring Data

To develop source data for the project, the authors used 1,000 articles and 1,000 advertorials from four Dutch news outlets and classified them based on their textual features. Since the dataset was relatively modest in size, the authors avoided high-scale approaches such as BERT, and instead evaluated the effectiveness of more classical machine learning frameworks, including Support Vector Machine (SVM), LinearSVC, Decision Tree, Random Forest, K-Nearest Neighbor (K-NN), Stochastic Gradient Descent (SGD) and Naïve Bayes.

The Reverb  Channel  corpus was able to furnish the 1,000 necessary ‘straight’ articles, but the authors had to scrape advertorials directly from the four Dutch websites featured. The obtained data is available in limited form (due to copyright concerns) at GitHub, together with some of the Python code used to obtain and evaluate the data.

The four publications studied were the politically conservative Nu.nl, the more progressive Telegraaf, NRC, and the business journal De Ondernemer. Each publication was equally represented in the data.

It was necessary to identify and discount potential ‘leakers’ in the lexicon formed by the research – words which might appear in both types of content with little distinction between their frequency and usage, in order to establish clear patterns for genuinely native and sponsored content.

Results

Across the methods tested for identification, the best results were obtained by SVM, linearSVC, Random Forest and SGD. Therefore the researchers proceeded to use SVM in further analysis.

The best model approach for extracting classification across the corpus exceeded 90% accuracy, though the researchers note that obtaining a clear classification becomes more difficult when dealing with B2B-oriented publications, where the lexical overlap between perceived ‘real’ and ‘sponsored’ content is excessive – perhaps because the native style of business language is already more subjective than the general run of reporting and analysis conventions, and can more easily conceal an agenda.

t-Distributed Stochastic Neighbor Embedding (t-SNE) plots for separation of real and sponsored content across the four publications.

t-Distributed Stochastic Neighbor Embedding (t-SNE) plots for separation of real and sponsored content across the four publications.

Is Sponsored Content ‘Fake News’?

The authors’ research suggests that their project is novel in the field of news content analysis. Frameworks capable of identifying sponsored content could pave the way to developing year-on-year monitoring of the balance between objective journalism and the growing tranche of ‘native advertising’ which sits in almost the same context in most publications, using the same visual cues (CSS stylesheets and other formatting) as general content.

In a certain sense, the frequent lack of obvious context for sponsored content is emerging as a sub-field of the study of ‘fake news’. Though most publishers recognize the need for separation of ‘church and state’, and the obligation to provide readers with clear divisions between paid and organically-generated content, the realities of the post-print journalistic scene, and increased dependence on advertisers, have turned the de-emphasis of sponsored indicators into a fine art in UI psychology. Sometimes the rewards of running sponsored content are tempting enough to risk a major optical disaster.

In 2015 the social media and competitive benchmarking platform Quintly offered an AI-based detection method to determine if a post on Facebook is sponsored, claiming an accuracy rate of 96%. The following year, a study from the University of Georgia contended that the way publishers handle the declaration of sponsored content could be ‘complicit with deception’.

In 2017 MediaShift, an organization that examines the intersection between media and technology, observed the growing extent to which the New York Times monetizes its operations through its branded content studio, T Brand Studio, claiming diminishing levels of transparency around sponsored content, with the tacitly intentional result that readers cannot easily tell whether or not content is organically generated.

In 2020, another research initiative from the Netherlands developed machine learning classifiers to automatically identify Russian state-funded news appearing in Serbian news platforms. Further, it was estimated in 2019 that Forbes’ ‘media content solutions’ account for 40% of its total revenue through BrandVoice, the content studio launched by the publisher in 2010.

 

Credit: Source link

Previous Post

Banca Sella invests €2 million in Italian fintech startup Tot

Next Post

Deep Research Self-Reconfiguring Modular Robots market Opportunity, 2021 Global Industry Trends, Share, Size, Growth, and Forecast 2027

New York Tech Editorial Team

New York Tech Editorial Team

New York Tech Media is a leading news publication that aims to provide the latest tech news, fintech, AI & robotics, cybersecurity, startups & leaders, venture capital, and much more!

Next Post
Deep Research Self-Reconfiguring Modular Robots market Opportunity, 2021 Global Industry Trends, Share, Size, Growth, and Forecast 2027

Deep Research Self-Reconfiguring Modular Robots market Opportunity, 2021 Global Industry Trends, Share, Size, Growth, and Forecast 2027

  • Trending
  • Comments
  • Latest
Meet the Top 10 K-Pop Artists Taking Over 2024

Meet the Top 10 K-Pop Artists Taking Over 2024

March 17, 2024
Panther for AWS allows security teams to monitor their AWS infrastructure in real-time

Many businesses lack a formal ransomware plan

March 29, 2022
Zach Mulcahey, 25 | Cover Story | Style Weekly

Zach Mulcahey, 25 | Cover Story | Style Weekly

March 29, 2022
How To Pitch The Investor: Ronen Menipaz, Founder of M51

How To Pitch The Investor: Ronen Menipaz, Founder of M51

March 29, 2022
10 Raunchy Movies on Netflix You Won’t Regret Watching

10 Raunchy Movies on Netflix You Won’t Regret Watching

May 20, 2024
Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

March 29, 2022
Startups On Demand: renovai is the Netflix of Online Shopping

Startups On Demand: renovai is the Netflix of Online Shopping

2
Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

1
Menashe Shani Accessibility High Tech on the low

Revolutionizing Accessibility: The Story of Purple Lens

1

Netgear announces a $1,500 Wi-Fi 6E mesh router

0
These apps let you customize Windows 11 to bring the taskbar back to life

These apps let you customize Windows 11 to bring the taskbar back to life

0
This bipedal robot uses propeller arms to slackline and skateboard

This bipedal robot uses propeller arms to slackline and skateboard

0
laptop on glass table

Automat-it Cuts Deployment Friction as Monce Scales AI Order Processing on AWS

April 13, 2026
Lee's Famous Recipe Chicken

Why Lee’s Famous Recipe Chicken Is Betting on Hi Auto to Quietly Rewire the Drive-Thru

April 9, 2026
computer generated image of letters

San Francisco Tribune Lists 11 HumanX Startups Moving AI Closer to the Operating Core

April 8, 2026
Impala CEO and Highrise AI CEO

The Industrialization of AI Infrastructure: What Impala and Highrise AI Reveal About the Next Scaling Frontier

April 7, 2026
Employee Time Tracking

What is an Employee Time Tracking Solution? A Definite Guide for 2026

March 31, 2026
Voltify founders

Voltify Raises $30 Million Seed Round as It Challenges $1 Trillion Rail Electrification Model

March 31, 2026

Recommended

laptop on glass table

Automat-it Cuts Deployment Friction as Monce Scales AI Order Processing on AWS

April 13, 2026
Lee's Famous Recipe Chicken

Why Lee’s Famous Recipe Chicken Is Betting on Hi Auto to Quietly Rewire the Drive-Thru

April 9, 2026
computer generated image of letters

San Francisco Tribune Lists 11 HumanX Startups Moving AI Closer to the Operating Core

April 8, 2026
Impala CEO and Highrise AI CEO

The Industrialization of AI Infrastructure: What Impala and Highrise AI Reveal About the Next Scaling Frontier

April 7, 2026

Categories

  • AI & Robotics
  • Benzinga
  • Cybersecurity
  • FinTech
  • New York Tech
  • News
  • Startups & Leaders
  • Venture Capital

Tags

AI AI QSRs Allseated Automat-it AWS B2B marketing Business CISO CISO Whisperer Collaborations Companies To Watch cryptocurrency Cybersecurity Entrepreneur Fetcherr Finance FINQ Fintech Funding Announcement hi-tech Hi Auto Impala Investing Investors investorsummit Israel israelitech Leaders LinkedIn Leaders Metaverse Mindset Minnesota omri hurwitz PointFive PR QSR Real Estate start- up startupnation Startups Startups On Demand Tech Tech leaders Unlimited Robotics VC
  • Contact Us
  • Privacy Policy
  • Terms and conditions

© 2024 All Rights Reserved - New York Tech Media

No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital

© 2024 All Rights Reserved - New York Tech Media