New York Tech Media
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
New York Tech Media
No Result
View All Result
Home AI & Robotics

The ‘Invisible’, Often Unhappy Workforce That’s Deciding the Future of AI

New York Tech Editorial Team by New York Tech Editorial Team
December 13, 2021
in AI & Robotics
0
The ‘Invisible’, Often Unhappy Workforce That’s Deciding the Future of AI
Share on FacebookShare on Twitter

Two new reports, including a paper led by Google Research, express concern that the current trend to rely on a cheap and often disempowered pool of random global gig workers to create ground truth for machine learning systems could have major downstream implications for AI.

Among a range of conclusions, the Google study finds that the crowdworkers’ own biases are likely to become embedded into the AI systems whose ground truths will be based on their responses; that widespread unfair work practices (including in the US) on crowdworking platforms are likely to degrade the quality of responses; and that the ‘consensus’ system (effectively a ‘mini-election’ for some piece of ground truth that will influence downstream AI systems) which currently resolves disputes can actually throw away the best and/or most informed responses.

That’s the bad news; the worse news is that pretty much all the remedies are expensive, time-consuming, or both.

Insecurity, Random Rejection, and Rancor

The first paper, from five Google researchers, is called Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation; the second, from two researchers at Syracuse University in New York, is called The Origin and Value of Disagreement Among Data Labelers: A Case Study of Individual Differences in Hate Speech Annotation.

The Google paper notes that crowd-workers – whose evaluations often form the defining basis of machine learning systems that may eventually affect our lives – are frequently operating under a range of constraints that may affect the way that they respond to experimental assignments.

For instance, the current policies of Amazon Mechanical Turk allow requesters (those that give out the assignments) to reject an annotator’s work without accountability*:

‘[A] large majority of crowdworkers (94%) have had work that was rejected or for which they were not paid. Yet, requesters retain full rights over the data they receive regardless of whether they accept or reject it; Roberts (2016) describes this system as one that “enables wage theft”.

‘Moreover, rejecting work and withholding pay is painful because rejections are often caused by unclear instructions and the lack of meaningful feedback channels; many crowdworkers report that poor communication negatively affects their work.’

The authors recommend that researchers who use outsourced services to develop datasets should consider how a crowdworking platform treats its workers. They further note that in the United States, crowdworkers are classified as ‘independent contractors’, with the work therefore unregulated, and not covered by the minimum wage mandated by the Fair Labor Standards Act.

Context Matters

The paper also criticizes the use of ad hoc global labor for annotation tasks, without consideration of the annotator’s background.

Where budget allows, it’s common for researchers using AMT and similar crowdwork platforms to give the same task to four annotators, and abide by ‘majority rule’ on the results.

Contextual experience, the paper argues, is notably under-regarded. For instance, if a task question related to sexism is randomly distributed between three agreeing males aged 18-57 and one dissenting female aged 29, the males’ verdict wins, except in the relatively rare cases where researchers pay attention to the qualifications of their annotators.

Likewise, if a question on gang behavior in Chicago is distributed between a rural US female aged 36, a male Chicago resident aged 42, and two annotators respectively from Bangalore and Denmark, the person likely most affected by the issue (the Chicago male) only holds a quarter share in the outcome, in a standard outsourcing configuration.

The researchers state:

‘[The] notion of “one truth” in crowdsourcing responses is a myth; disagreement between annotators, which is often viewed as negative, can actually provide a valuable signal. Secondly, since many crowdsourced annotator pools are socio-demographically skewed, there are implications for which populations are represented in datasets as well as which populations face the challenges of [crowdwork].

‘Accounting for skews in annotator demographics is critical for contextualizing datasets and ensuring responsible downstream use. In short, there is value in acknowledging, and accounting for, worker’s socio-cultural background — both from the perspective of data quality and societal impact.’

No ‘Neutral’ Opinions on Hot Topics

Even where the opinions of four annotators are not skewed, either demographically or by some other metric, the Google paper expresses concern that researchers are not accounting for the life experiences or philosophical disposition of annotators:

‘While some tasks tend to pose objective questions with a correct answer (is there a human face in an image?), oftentimes datasets aim to capture judgement on relatively subjective tasks with no universally correct answer (is this piece of text offensive?). It is important to be intentional about whether to lean on annotators’ subjective judgements.’

Regarding its specific ambit to address problems in labeling hate speech, the Syracuse paper notes that more categorical questions such as Is there a cat in this photograph? are notably different from asking a crowdworker whether a phrase is ‘toxic’:

‘Taking into account the messiness of social reality, people’s perceptions of toxicity vary substantially. Their labels of toxic content are based on their own perceptions.’

Finding that personality and age have a ‘substantial influence’ on the dimensional labeling of hate speech, the Syracuse researchers conclude:

‘These findings suggest that efforts to obtain annotation consistency among labelers with different backgrounds and personalities for hate speech may never fully succeed.’

The Judge May Be Biased Too

This lack of objectivity is likely to iterate upwards as well, according to the Syracuse paper, which argues that the manual intervention (or automated policy, also decided by a human) which determines the ‘winner’ of consensus votes should also be subject to scrutiny.

Likening the process to forum moderation, the authors state*:

‘[A] community’s moderators can decide the destiny of both posts and users in their community by promoting or hiding posts, as well as honoring, shaming, or banning the users. Moderators’ decisions influence the content delivered to community members and audiences  and by extension also influence the community’s experience of the discussion.

‘Assuming that a human moderator is a community member who has demographic homogeneity with other community members, it seems possible that the mental schema they use to evaluate content will match those of other community members.’

This gives some clue to why the Syracuse researchers have come to such a despondent conclusion regarding the future of hate speech annotation; the implication is that policies and judgement-calls on dissenting crowdwork opinions cannot just be randomly applied according to ‘acceptable’ principles that are not enshrined anywhere (or not reducible to an applicable schema, even if they do exist).

The people who make the decisions (the crowdworkers) are biased, and would be useless for such tasks if they were not biased, since the task is to provide a value judgement; the people who adjudicate on disputes in crowdwork results are also making value judgements in setting policies for disputes.

There may be hundreds of policies in just one hate speech detection framework, and unless each and every one is taken all the way back to the Supreme Court, where can ‘authoritative’ consensus originate?

The Google researchers suggest that ‘[the] disagreements between annotators may embed valuable nuances about the task’. The paper proposes the use of metadata in datasets that reflects and contextualizes disputes.

However, it is difficult to see how such a context-specific layer of data could ever lead to like-on-like metrics, adapt to the demands of established standard tests, or support any definitive results – except in the unrealistic scenario of adopting the same group of researchers across subsequent work.

Curating the Annotator Pool

All of this assumes that there is even budget in a research project for multiple annotations that would lead to a consensus vote. In many cases, researchers attempt to ‘curate’ the outsourced annotation pool more cheaply by specifying traits that the workers should have, such as geographical location, gender, or other cultural factors, trading plurality for specificity.

The Google paper contends that the way forward from these challenges could be by establishing extended communications frameworks with annotators, similar to the minimal communications that the Uber app facilitates between a driver and a rider.

Such careful consideration of annotators would, naturally, be an obstacle to hyperscale annotation outsourcing, resulting either in more limited and low-volume datasets that have a better rationale for their results, or a ‘rushed’ evaluation of the annotators involved, obtaining limited details about them, and characterizing them as ‘fit for task’ based on too little information.

That’s if the annotators are being honest.

The ‘People Pleasers’ in outsourced dataset labeling

With an available workforce that’s underpaid, under severe competition for available assignments, and depressed by scant career prospects, annotators are motivated to quickly provide the ‘right’ answer and move on to the next mini-assignment.

If the ‘right answer’ is anything more complicated than Has cat/No cat, the Syracuse paper contends that the worker is likely to attempt to deduce an ‘acceptable’ answer based on the content and context of the question*:

‘Both the proliferation of alternative conceptualizations and the widespread use of simplistic annotation methods are arguably hindering the progress of research on online hate speech. For example, Ross, et al. found that showing Twitter’s definition of hateful conduct to annotators caused them to partially align their own opinions with the definition. This realignment resulted in very low interrater reliability of the annotations.’

 

* My conversion of the paper’s inline citations to hyperlinks.

 

Credit: Source link

Previous Post

How early cancer detection startup Earli survived its founder’s death

Next Post

‘We’re starting to see’ venture capital shift from Silicon Valley, AOL co-founder explains

New York Tech Editorial Team

New York Tech Editorial Team

New York Tech Media is a leading news publication that aims to provide the latest tech news, fintech, AI & robotics, cybersecurity, startups & leaders, venture capital, and much more!

Next Post
‘We’re starting to see’ venture capital shift from Silicon Valley, AOL co-founder explains

'We're starting to see' venture capital shift from Silicon Valley, AOL co-founder explains

  • Trending
  • Comments
  • Latest
Meet the Top 10 K-Pop Artists Taking Over 2024

Meet the Top 10 K-Pop Artists Taking Over 2024

March 17, 2024
Panther for AWS allows security teams to monitor their AWS infrastructure in real-time

Many businesses lack a formal ransomware plan

March 29, 2022
Zach Mulcahey, 25 | Cover Story | Style Weekly

Zach Mulcahey, 25 | Cover Story | Style Weekly

March 29, 2022
How To Pitch The Investor: Ronen Menipaz, Founder of M51

How To Pitch The Investor: Ronen Menipaz, Founder of M51

March 29, 2022
Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

March 29, 2022
UK VC fund performance up on last year

VC-backed Aerium develops antibody treatment for Covid-19

March 29, 2022
Startups On Demand: renovai is the Netflix of Online Shopping

Startups On Demand: renovai is the Netflix of Online Shopping

2
Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

1
Menashe Shani Accessibility High Tech on the low

Revolutionizing Accessibility: The Story of Purple Lens

1

Netgear announces a $1,500 Wi-Fi 6E mesh router

0
These apps let you customize Windows 11 to bring the taskbar back to life

These apps let you customize Windows 11 to bring the taskbar back to life

0
This bipedal robot uses propeller arms to slackline and skateboard

This bipedal robot uses propeller arms to slackline and skateboard

0
New York City

Why Bite-Sized Learning is Booming in NYC’s Hustle Culture

June 4, 2025
Driving Innovation in Academic Technologies: Spotlight from ICTIS 2025

Driving Innovation in Academic Technologies: Spotlight from ICTIS 2025

June 4, 2025
Coffee Nova’s $COFFEE Token

Coffee Nova’s $COFFEE Token

May 29, 2025
Money TLV website

BridgerPay to Spotlight Cross-Border Payments Innovation at Money TLV 2025

May 27, 2025
The Future of Software Development: Why Low-Code Is Here to Stay

Building Brand Loyalty Starts With Your Team

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Creative Swag Ideas for Hackathons & Launch Parties

May 23, 2025

Recommended

New York City

Why Bite-Sized Learning is Booming in NYC’s Hustle Culture

June 4, 2025
Driving Innovation in Academic Technologies: Spotlight from ICTIS 2025

Driving Innovation in Academic Technologies: Spotlight from ICTIS 2025

June 4, 2025
Coffee Nova’s $COFFEE Token

Coffee Nova’s $COFFEE Token

May 29, 2025
Money TLV website

BridgerPay to Spotlight Cross-Border Payments Innovation at Money TLV 2025

May 27, 2025

Categories

  • AI & Robotics
  • Benzinga
  • Cybersecurity
  • FinTech
  • New York Tech
  • News
  • Startups & Leaders
  • Venture Capital

Tags

3D bio-printing acoustic AI Allseated B2B marketing Business carbon footprint climate change coding Collaborations Companies To Watch consumer tech crypto cryptocurrency deforestation drones earphones Entrepreneur Fetcherr Finance Fintech food security Investing Investors investorsummit israelitech Leaders LinkedIn Leaders Metaverse news OurCrowd PR Real Estate reforestation software start- up Startups Startups On Demand startuptech Tech Tech leaders technology UAVs Unlimited Robotics VC
  • Contact Us
  • Privacy Policy
  • Terms and conditions

© 2024 All Rights Reserved - New York Tech Media

No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital

© 2024 All Rights Reserved - New York Tech Media