Can crowdsourcing create the missing crash data?


Guadalupe Bedoya (World Bank)
Arianna Legovini (World Bank)
Robert Marty (World Bank)
Sveta Milusheva (World Bank)
Elizabeth Resor (UC Berkeley)
Sarah Williams (MIT)


Session: 3.4. AI and social impact

Abstract: Road traffic crashes (RTCs) are the primary cause of death among children and young adults. Yet data on RTCs is incomplete, hindering effective road safety policymaking in many developing countries where mortality is purportedly highest. We web-scrape 850,000 tweets to create crash data and develop a machine learning algorithm to geolocate RTCs. Our algorithm is nearly twice as precise as a standard geoparsing algorithm in identifying the set of locations that include the crash location. Above and beyond, it identifies the unique location of a crash from the set of possible locations in a majority of cases. We dispatch a set of motorcycle drivers to the site of the presumed crash in real time to verify the validity of the crowdsourced data and document the performance of the algorithm. The study can be used as a proof of concept for countries interested to improve RTC data at low cost through a machine learning approach and substantially increase the data available to analyze RTCs and prioritize road safety policies.