TrygFonden

Verbal Attacks in the Public Debate

Intro

How harsh is the tone of the public debate?

The tone on social media is a recurring theme in the public debate that particularly flourishes when yet another politician, journalist, or reality TV show contestant announces they are withdrawing from the debate. But how harsh is the tone really? Is it equally harsh for everyone? And is it equally harsh everywhere? With millions of comments, it is almost impossible to form a comprehensive overview of the state of the debate; unless you are an artificial intelligence.

Together with TrygFonden, we set out to challenge Danish language technology and train three algorithms to recognize, respectively, verbal attacks, hate speech, and linguistic recognition in Facebook comments. The purpose was to raise the level of knowledge in the debate about the debate. Fortunately, it went beyond all expectations.

0

1

2

3

4

5

6

0

1

2

3

,

9

8

7

6

5

4

3

2

1

0

9

8

7

6

5

4

3

2

1

0

9

8

7

6

5

4

3

2

1

0

,

9

8

7

6

5

4

3

2

1

0

9

8

7

6

5

4

3

2

1

0

9

8

7

6

5

4

3

2

1

0

Methods

Using supervised and unsupervised machine learning, we have trained three artificial intelligences that make it possible to analyze the whole public debate on the Facebook pages of Danish media and politicians over two years. In this first study using the algorithms, we analyzed 63 million comments, all told.

The process included collecting all posts and comments on the Facebook pages of 199 politicians and 477 media registered at the Danish Press Complaints Commission; drawing up definitions of attacks, hate speech, and linguistic recognition, respectively; manual annotation of a training data set of 70,000 comments; application of the language model Ælæctra; and training of the algorithms including ongoing tests and active learning. In Denmark, the algorithms A&ttack, Ha&te, and Rec&nition ended up being the best in their field.

Analyzing the output of the algorithms, we have used quantitative, linguistic keyword analysis, and qualitative ethnography to learn more about the harshest and most appreciative forums and topics on Danish Facebook.

Results

Our research shows that approximately 5 percent—or slightly more than one out of every 20 comments—can be classified as a verbal attack, i.e., a stigmatizing, derogatory, offensive, stereotyping, exclusionary, harassing, or threatening expression. Nearly a quarter of the attacks can be described as hate speech because the attack is based on a protected characteristic (e.g., ethnicity, gender, religion, etc.). In the public debate on Facebook, it is the ethnic minorities—especially Muslims—who are under fire. Quite a few attacks also target women and people based on their political beliefs. On the other hand, a full 14 percent of the comments, corresponding to every 7th, contain linguistic recognition.

The harsh comments are hard to shake off, and the harsh tone on Facebook means that many Danes—especially women and minorities—may refrain from participating in the debate.

It is important to focus on the hate. But at the same time, it is also important to remember that there is far more recognition than hate on Facebook. 14 percent of the comments on the politician and media pages contain linguistic recognition.

There are more attacks (8 percent) and more recognition (24 percent) in the politicians' comments, with the most dedicated supporters and opponents. The politicians on the extreme right have the highest shares of attacks in their comment tracks. Out of all the media, especially the biased media, host the fiercest debate. The local media host the most appreciative arguments.