Targeted socio-economic policies require an accurate understanding of a country's demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine learning driven approaches are cheaper and faster--with the potential ability to detect trends in close to real time. In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data. We first detect cars in 50 million images across 200 of the largest US cities and train a model to determine demographic attributes using the detect cars. To facilitate our work, we used a graph based algorithm to collect a challenging fine-grained dataset consisting of over 2600 classes of cars comprised of images from Google Street View and other web sources. Our prediction results correlate well with ground truth income (r=0.82), race, education, voting, sources investigating crime rates, income segregation, per capita carbon emission, and other market research. Data mining based works such as this one can be used for many types of applications--some ethical and others not. I will finally discuss work (inspired by my experiences while working on this project), on auditing and exposing biases found in computer vision systems. Using recent work on exposing the gender and skin type bias found in commercial gender classification systems as a case study, I will discuss how the lack of standardization and documentation in AI is leading to biased systems used in high stakes scenarios. I will end with the concept of AI datasheets for datasets, and model cards for model reporting to standardize information for datasets and pre-trained models, to push the field as a whole towards transparency and accountability.
Timnit Gebru is a research scientist in the Ethical AI team at Google and just finished her postdoc in the Fairness Accountability Transparency and Ethics (FATE) group at Microsoft Research, New York. Prior to that, she was a PhD student in the Stanford Artificial Intelligence Laboratory, studying computer vision under Fei-Fei Li. Her main research interest is in data mining large-scale, publicly available images to gain sociological insight, and working on computer vision problems that arise as a result, including fine-grained image recognition, scalable annotation of images, and domain adaptation. She is currently studying the ethical considerations underlying any data mining project, and methods of auditing and mitigating bias in sociotechnical systems. The New York Times, MIT Tech Review and others have recently covered her work. As a cofounder of the group Black in AI, she works to both increase diversity in the field and reduce the negative impacts of racial bias in training data used for human-centric machine learning models.