AI and dialect – Schoemann

The training of Large Language Models (LLM) uses large data sets to learn about conventions of which words are combined with each other and which ones are less frequently employed in conjunction. Therefore, it does not really come as a surprise that training which uses standardised languages of American English might not be as valid for applications that receive input from minority languages or dialects. The study forthcoming in the field of Computer science and Language by Hofmann et al. (Link) provides evidence of the systematic bias against African American dialects in these models. Dialect prejudice remains a major concern in AI, just like in the day-to-day experiences of many people speaking a dialect. The study highlights that dialect speakers are more likely to be assigned less prestigious jobs if AI is used to sort applicants. Similarly, criminal sentences will harsher for speakers of African American. Even the more frequent attribution of death sentences for dialect speakers was evidenced.
If we translate this evidence to wide-spread applications of AI in the workplace, we realise that there are severe issues to resolve. The European Trade Union Congress (ETUC) has flagged the issue for some time (Link) and made recommendations of how to address these shortcomings. Human control and co-determination by employees are crucial in these applications to the world of work and employment. The need to justify decision-making concerning hiring and firing limit discrimination in the work place. This needs to be preserved in the 21^st century collaborating with AI. The language barriers like dialects or multiple official languages in a country ask for a reconsideration of AI to avoid discrimination. Legal systems have to clarify the responsibilities of AI applications before too much harm has been caused.
There are huge potentials of AI as well in the preservation of dialects or interacting in a dialect. The cultural diversity may be preserved more easily, but discriminatory practices have to be eliminated from the basis of these models otherwise they become a severe legal risk for people, companies or public services who apply these large language models without careful scrutiny.
(Image AI BING Designer: 3 robots are in an office. 2 wear suits. 1 wears folklore dress. All speak to each other in a meeting. Cartoon-like style in futuristic setting)