Author: Cathy O'Neil
Summary: This was interesting and the author made some good points, but it was repetitive and not very technical.
More and more aspects of our lives are being dictated by hidden algorithms. In theory, this should eliminate human biases and lead to more fair decisions. Unfortunately, the reality is that models are made by biased human beings. Models are also built on data describing the way the world is now and so may codify existing inequalities. This is particularly dangerous because algorithms often escape the regulation human decisions would have to follow. The reasons for their decisions may never be revealed to people whose lives are influenced by those decisions.
As a data scientist, I think the author makes some great points about how models are developed. In particular, models should not be used when too little data is available and/or when it is impossible to check predictions. For example, individual teachers teach so few students that predictions of teacher quality vary enough year to year that they’re no better than random guesses. Worse still, teachers predicted to be poor teachers can be fired and it is then impossible to learn if the model was right about their performance.
I wish the author had spent more time making points like this about model quality. I also wish she had made her points in a way any data scientist could get on board with. Instead, her audience seems to be people who are already socially liberal. Most of her arguments hinge on the assumption that her reader shares a similar sense of what is fair. Since I am socially liberal, these arguments worked for me, but I don’t think they’d convince a conservative data scientist the way they were written.
There were also aspects of the way she presented the data that didn’t work for me. The first model involved a lot of details about finances that bored me. The points she makes about each of the models she discussed were very similar and got repetitive. And last of all, she often goes on digressions about human behavior. She describes their behavior as resulting from bad decision making that is analogous to bad algorithms. This was interesting, but not really relevant. It was also these sections that struck me as most likely to be alienating to conservative data scientists. Generally, I wouldn’t be bothered by a book that doesn’t take into account the opinions of people I would consider racist, sexist, or classist. In this case though, I think the book could have done a lot more good by speaking to all data scientists about the best way to practice their craft.