data – Schoemann

24/03/202422/03/2024

AI Sorting

Algorithms do the work behind AI systems. Therefore a basic understanding of how algorithms work is helpful to gauge the potential, risks and performance of such systems. The speed of computers determines the for example the amount of data you can sort at a reasonable time. Efficiency of the algorithm is an other factor. Here we go, we are already a bit absorbed into the the sorting as purely intellectual exercise. The website of Darryl Nester shows a playful programming exercise to sort numbers from 1 to 15 in a fast way (Link to play sorting). If you watch the sorting as it runs you realize that programs are much faster than us in such simple numeric tasks. Now think of applying this sorting routine or algorithm to a process of social sorting. The machine will sort social desirability scores of people’s behavior in the same simple fashion even for thousands of people. Whether proposed AI systems in human interaction or of human resource departments make use of such sorting algorithms we do not know. Sorting applicants is a computational task, but the data input of personal characteristics is derived from another more or less reliable source. Hence, the use of existing and newly available databases will create or eliminate bias. Watching sorting algorithms perform is an important learning experience to be able to critically assess what is likely to happen behind the curtains of AI.

15/03/202413/03/2024

AI and S/he

There was hope that artificial intelligence (AI) would be a better version of us. Well, so far that seems to have failed. Let us take gender bias as a pervasive feature even in modern societies, let alone the societies in medieval or industrial age. AI tends to uphold gender biases and might even reinforce them. Why? A recent paper by Kotek, Dockum, Sun (2023) explains the sources for this bias in straightforward terms. AI is based on Large Language Models. These LLMs are trained using big detailed data sets. Through the training on true observed data like detailed data on occupation by gender as observed in the U.S. in 2023, the models tend to have a status quo bias.
This means they abstract from a dynamic evolution of occupations and the potential evolution of gender stereotypes over years. Even deriving growing or decreasing trends of gender dominance in a specific occupation the models have little ground for reasonable or adequate assessment of these trends. Just like thousands of social scientists before them. Projections into the future or assuming a legal obligation of equal representation of gender might still not be in line with human perception of such trends.
Representing women in equal shares among soldiers, 50% of men as secretaries in offices appears rather utopian in 2024, but any share in-between is probably arbitrary and differs widely between countries. Even bigger data sets may account for this in some future day. For the time being these models based on “true” data sets will have a bias towards the status quo, however unsatisfactory this might be.
Now let us just develop on this research finding. Gender bias is only one source of bias among many other forms of bias or discriminatory practices. Ethnicity, age or various abilities complicate the underlying “ground truth” (term used in paper) represented in occupation data sets. The authors identify 4 major shortcoming concerning gender bias in AI based on LLMs: (1) The pronouns s/he were picked even more often than in Bureau of Labor Statistics occupational gender representations; (2) female stereotypes were more amplified than male ones; (3) ambiguity of gender attribution was not flagged as an issue; (4) when found out to be inaccurate LLMs returned “authoritative” responses, which were “often inaccurate”.
These findings have the merit to provide a testing framework for gender bias of AI. Many other biases or potential biases have to be investigated in a similarly rigorous fashion before AI will give us an authoritarian answer, no I am free of any bias in responding to your request. Full stop.

10/01/202409/01/2024

Throne

In continental Europe some people say they are going to sit on their throne for a while if they are going to the toilet. Previous furniture built for that purpose almost resembled a throne. Nowadays the comfort has increased and most households have an excellent WLAN connection in the tiny place, automatic ventilation and heating comfort. Linked to the throne, the issue of the smart home comes back into our mind. Besides the increased comfort we generate abundant amounts of data. Sensors of all kinds can track lots of information that may reveal quite intimate details which we did not intend to share. The movement pattern of mounting your throne might be easily identifiable by your smartphone. Such data could be added to assess your health status in your dedicated health app. Anonymized data could give early warnings about a local outbreak of diarrhea. Do we want this? Probably not. In countries where thousands die from diarrhea, probably yes. It is a matter of balancing the pros and cons. Health data are thought to be particularly sensitive information about us. It certainly is the new luxury to keep your health data private. Even if you measure and capture a lot of information in your smart home, make sure your smart home is sufficiently secure. For the benefit of all of us.

29/11/202328/11/2023

APP Circus

We are all keen to have as many APPs as possible on our Smartphones. Instead of collecting post stamps some of us collect APPs on our mobile devices as well as desktop computers. As with almost all so-called for free software, the APPs are not for free, we just pay with our personal information used for some usually not disclosed other purposes. On webpages we are used to at least confirm that we agree to the use. Alternatively, we should have the option to decline the use of transmitting personal information, user analytics or tracking. The organisation “Netzpolitik.org” has published a short overview article on the results of the research by Konrad Kollnig published in Internet Policy Review. The main message is: even 5 years after the adoption of the GDPR (General Data Protection Regulation) little has changed for the tracking and data collection in the world of APPs. Both Apple store IOS and Google Play Android are concerned.
In contrast to webpages APPs a vast majority of APPs, which we perhaps thought would provide tracking-free access to services, does in fact use “secretly” the tracking. Additionally, many APPs transmit personal information and credit card details even without encryption. A so-called middle man attack would “listen” to the transmission and potentially abuse the accessed information. The proof of the abuse of your financial details will be on the side of the consumer, completely ignorant of the potential threat from all sorts of APPs. This market is evolving with rapid speed and what used to be at the top in 2022 is no longer at the top in 2023.
Based on an online query for France on 2023-11-28 via www.appfigures.com the data reveal interesting market insights. Gaming is making a new push on the APP-market again. TIKTOK still remains fairly high up in the ranking. Conversion from free download to paid versions seems to work in several instances as well. It needs to be checked whether the paid versions do use less tracking and provide better overall security. A frugal use or regular checking whether you really need all those APPs you have currently installed on your device should reduce your very own cybersecurity risk. It seems like “Less can be more” – fewer APPS, more security – is also valid in this respect.

31/08/202330/08/2023

Smart watch

Das Handgelenk hat noch viel Platz für elektronische Geräte. Vom preiswerten Schrittzähler bis zur smart watch und intelligenten Armreifen oder sogar Ringen gibt es dort viel Potential für Innovation. Jenseits der Schritte werden mittels vielfältiger Sensoren eine Fülle von Daten erfasst. Datenschützer erschaudern geradezu. Das Potential für medizinische und soziologische Auswertungen dieser Daten ist immens. Vergleichbar mit einer elektronischen Waage, die Knochenmasse, Wasser und Muskeln erfasst erlauben die smarten Uhren noch mit Ungenauigkeiten, aber stetig besser werdend, kardiologische Werte und Schlafrhythmus zu erfassen.
Mit Daten von tausenden bald Millionen Datenlieferanten lassen sich wichtige Studien zur Früherkennung von Gesundheitsrisiken durchführen. Plötzlicher Herzstillstand ist eines der immer noch wenig aufgeklärten Phänomene. Die Studie im LANCET digital health hat solche Daten ausgewertet und kommt zu der Schlussfolgerung, dass dyspnoea für Frauen und Männer der größten Risikofaktoren darstellt. Für Frauen wird Entwarnung gegeben für Diaphoresis, exzessives Schwitzen, aber nicht für Männer. Letztere sollten Schmerzen in der Brust ebenfalls ernst nehmen. Daten sammeln kann durchaus Leben retten.
Dennoch droht dem Gesundheitswesen eventuell eine Welle an falsch positiven Selbsteinweisungen in der Notaufnahme der Krankenhäuser. Darauf sind wir noch wenig vorbereitet, inklusive der möglichen rechtlichen und finanziellen Konsequenzen. “Big brothers are smart watching you”. Müssen Ärzte die smarte Evidenz berücksichtigen, etwa so wie eine Patientenverfügung? Was wenn plötzlich der Notarzt unaufgefordert an der Tür klingelt? Wer hat meine Uhr gehackt, … , singt das Paulchen Panther jetzt.