NPR: Computer scientists are making the troubling observation that sexual and racial biases are creeping into the output of programs designed to process human language. Adam Kalai of Microsoft and his colleagues, who have been developing a process called word embedding, discovered the problem when they used their algorithm to solve analogies. Based on its study of hundreds of thousands of articles on such sites as Wikipedia and Google News, the word-embedding algorithm suggested, for example, that “he” is to “she” as “brilliant” is to “lovely” or as “computer programmer” is to “homemaker.” Such biases could prove problematic, as when a computer is used for sorting through piles of resumés looking for ideal job candidates. But bias isn’t necessarily bad; for example, pharmaceutical companies may well want to address certain products to either men or women. Therefore, rather than offer de-biased word embeddings themselves, the researchers have developed a technique that leaves it up to individual users to determine “what is a good bias and what is a bad bias,” Kalai says.
Modeling the shapes of tree branches, neurons, and blood vessels is a thorny problem, but researchers have just discovered that much of the math has already been done.