Imagine a database that holds information on all world events and historic records reported in the global news media over the last 30 years, along with the narratives, emotions, and images that defined those events. What you’re envisioning is the real-life GDELT project.
GDELT—which stands for Global Database of Events Language and Tone—is a free, open data platform that applies machine learning to gather news from all over the world and curate what GDELT creator Kalev Leetaru calls “a catalogue of society.”
“Today, we have sensors and satellites blanketing the earth, we know what the weather is, when an earthquake happens, and how many people are affected,” Leetaru said. “We have so much data about the natural Earth, but when it comes to the human Earth, to cataloging human ‘earthquakes’ like mass protests or coups, we were in the stone ages. Before GDELT we never had a database that could give you a list of all the protests happening right now around the world. That’s the goal of GDELT—to let you see the human world just as well as you can the natural world, letting you map global protests as easily as you can map global earthquakes.”
Leetaru began working with supercomputing and web mining in 1995 when he launched his first Internet startup. In 2013, he developed GDELT, and it has been his main focus ever since. Leetaru is also a senior fellow with George Washington University’s Center for Cyber & Homeland Security.
GDELT has evolved beyond its original scope, and now collects broadcast, print, and web news and images from around the world—updating every 15 minutes. Several different data sets bring together more than 400 million event records in 300 categories, more than a trillion emotional measures, two billion mentions of location, and more than 175 million images covering world events from 1979 to present.
GDELT captures the emotion and tone of the articles and images. The project brings together a number of algorithms to detect the author’s emotion in an article, ranging from traditional positive/negative to more complex emotions such as anxiety and motivation. The database also distinguishes the emotion of an image—for example, whether it is violent or if the people in the image are looking away in horror.
GDELT identifies and disambiguates every location mentioned in each article, which can be used to map the geography of specific topics such as wildlife crime or civil unrest.
“Wildlife crimes are fragmented and groups are doing their own thing with little communication, never being able to put it all together to see the big picture,” Leetaru said. “Being able to use GDELT and see the patterns and what’s happening around the world puts the dots on the map and the context behind it in order to see where poachers will strike next. That’s the power of GDELT.”
GDELT is available for anyone to use for free. The GDELT cloud-based analysis website offers a number of built-in visualizations users can leverage to explore the data. Users can also download the raw files on the GDELT website or explore any of the GDELT data sets via Google BigQuery.
Photo Credit: GDELT