Codificação de vieses no processo de modelagem algorítmica: formas de opacidade e obscurecimento a partir do estudo de caso da base de dados Boston Housing

André Pecini; Denise Tsunoda

Codificação de vieses no processo de modelagem algorítmica

formas de opacidade e obscurecimento a partir do estudo de caso da base de dados Boston Housing

Authors

André Pecini UFPR
Denise Tsunoda Universidade Federal do Paraná https://orcid.org/0000-0002-5663-4534

Abstract

Datasets are the raw material for the machine learning process. Data scientists and students use toy datasets to test algorithms and students, to learn how they work. Boston Housing dataset is an example of toy dataset. One of its attributes, called “B”, caught the attention of the researcher Michael Carlisle (2019). It is the proportion of blacks in each neighborhood. The attribute does not contain absolute numbers or percentages, but the result of a non-invertible function that generates a “ghetto effect” in which certain levels of racial segregation has positive effect on property values. This article contains a systematic review of literature of a relevant sample of recent publications that cite this database in order to identify if the database was properly identified by the authors, made as the authors, which were the models developed and if the variable “B” had an influence on the results. These questions aim to contribute to research on algorithmic or coded biases. These biases become hidden, as mathematical models are often black boxes. And their investigation is usually done indirectly, from their results. By identifying the presence and existing role of the “B” attribute in publications, it will be possible to estimate the invisibility of the base used to develop or propose models. The fact that it received relatively low attention until recently shows how an explicitly racist attribute is unnoticed or included in the calculations. Its investigation may contribute to indicate ways to identify other biased datasets.

Downloads

PDF (Português (Brasil))

Published

2023-02-28

Issue

Vol. 24 No. 3 (2022): Setembro/Dezembro

Section

Dossiê

License

I grant the journal Fronteiras - estudos midiáticos the first publication of my article, licensed under Creative Commons Attribution license (which allows sharing of work, recognition of authorship and initial publication in this journal).

I confirm that my article is not being submitted to another publication and has not been published in its entirely on another journal. I take full responsibility for its originality and I will also claim responsibility for charges from claims by third parties concerning the authorship of the article.

I also agree that the manuscript will be submitted according to the journal’s publication rules described above.