Science in the age of big data
Just over 10 years ago, a tsunami of data began to advance. That wave has since magnified and inundated all areas of science. Of all fields, astronomy, physics and life sciences have long required the most intensive computing capacity. “But other sectors, such as social sciences, are rapidly catching up. Researchers now use smart technologies to observe behaviour rather than hand individuals questionnaires,” says Professor Wil van der Aalst, head of the Data Science Centre at the Eindhoven University of Technology (TU/e). Scientists in most areas now use big data to push the limits of knowledge. “The theory-based approach has been replaced by a data-based one,” van der Aalst adds.
Scientists use big data to compile and process enormous data sets. But costs can rise quickly. This has driven researchers to work together, even forming interdisciplinary alliances. Nowadays, biologists commonly team up with statisticians, and sociologists with mathematicians. Institutions share their infrastructure, giving rise to multidisciplinary research centres. This is also made possible thanks to political initiatives. For example, the European Commission has recently made open research data a default setting for all the projects linked to its science program Horizon 2020.
“Today laboratories don’t have the capacity for all the expertise needed to conduct their research,” says Professor Sune Lehmann of the Technical University of Denmark (DTU). For more than two years, he studied the social interaction of his students by analysing gigabytes of data from smartphones (see inset).
1. Zeroing in on human behaviour
The SensibleDTU project examines students’ social interactions.
Modern man communicates through several channels, including direct speech, telephone, e-mail, instant messaging and social media. This observation formed the basis of the SensibleDTU project launched by Sune Lehmann of DTU to gain insight into our social interactions. “To decode social systems, you have to understand how people communicate across all existing channels.”
A thousand smartphones equipped with a software programme designed to collect information about social interactions were handed out to students. For two and a half years, until February 2016, they logged data provided by Bluetooth, text messages, conversations, e-mails and social media. Each device collected up to 100 megabytes of information per day. That’s a mind-boggling amount of data to process.
“With 1,000 students, that totalled 100 gigabytes a day, for 1,000 days,” says Lehmann. It took several years of technical preparation to figure out how to interpret all that data, because “smartphones don’t measure social interactions directly.” How did they go about it? For example, the strength of a Bluetooth signal varies depending on the distance between two phones, and that can be used to determine when the social interaction took place, Lehmann explains. GPS is valuable for studying the social context. “It doesn’t mean the same thing if a meeting takes place in a café or in a bedroom.”
Data analysis is already producing results that offer unprecedented resolution and density. Many fundamental aspects of social sciences are being factored in, such as confidentiality, academic success, gender differences, social dynamics and mobility. The most surprising application has been in epidemiology. “The networks of contacts between individuals may shed some light on the way infectious diseases are transmitted.” Lehmann hopes to use Facebook’s social network to stop viruses by advising groups of people identified as being at risk to be vaccinated.
2. Making driving safer
Vehicles chock full of electronics collect valuable information that can be used in driverless cars.
Cooperative driving is a system that lets vehicles communicate with each other and their environment to improve road traffic and the information used by driverless cars. Researchers from TU/e and its Smart Mobility Strategic Area are working to develop cooperative driving systems that pack electronics into everything from passenger cars with drivers to robot football players.
Data are collected from a wide range of systems, including GPS, ABS, gyroscopes, wheel rotation sensors and Wi-Fi. A whopping 100 terabytes of data — the storage space on 400 iPads — are generated every hour, says Carlo van de Weijer, director of the Smart Mobility programme. Cooperative driving offers numerous advantages. Cars optimise space and distances, consume less fuel and share any event useful to other vehicles to make roads safer and improve traffic flow. But, says van de Weijer, the technology still needs development. “Safety is close to 100%, but the tiny percentage left would cause several accidents a day if all vehicles were autonomous.”
3. Exploring a millennium
The Venice Time Machine will need 10 years to scan 1,000 years of history.
The basic algorithms that made massive data collection and processing possible date to 2004. These new tools, however, cannot use much of the information pre-dating that period. Yet “the past urgently needs to become as easy to access as the present,” says Frédéric Kaplan, who leads the Venice Time Machine project at the École Polytechnique Fédérale de Lausanne (EPFL). He aims to discover the floating city’s secrets by scanning its archives and cultural works.
A daunting task, to say the least. It will take 10 years to scan the 1,000 years of history carefully guarded in 327 rooms full of birth, marriage and death certificates, wills, business records, tax returns and the addresses of Venetian residents. But that’s not all. The archives also contain diplomatic documents. “These logs offer such a wealth of information that they alone could be used to retrace a good chunk of European history,” Kaplan says. The biggest challenge is not so much the volume of data as finding a way to scan the billions of pages without damaging them. “We’ve developed a semi-automatic scanner that can process 1,000 sheets per hour.” The team has even considered using medical imaging techniques to scan books without opening them. “It works – but these processes are still under development.”
Another challenge is recognising characters in manuscripts. “We’ve teamed up with no fewer than 15 universities to come up with solutions.” EPFL has focused on writing algorithms that can transform the scanned images into words and sentences. The end goal is to design a Google-like search tool to use the database. Scientists have linked key words to documents and organised information into huge graphs of interconnected data.
The system used by Venetian archivists is helping researchers do that, as it was the precursor to modern indexing systems. EPFL has also been working with the Fondazione Giorgio Cini since March 2016 to scan and digitise paintings. The foundation’s archives include works by Piero della Francesca, Fra Angelico and Sandro Botticelli.
The impact of Big Data on society
Winning elections In his successful 2012 re-election campaign, Barack Obama used digital information to target messages to specific voter groups.
Decoding the human genome At the beginning of the 2000s it took up to 10 years to decode a human genome, which is composed of 3 billion nucleotides. Thanks to Big Data, it now takes only one day.
Avoiding crime In Modesto, California, the police use Big Data to track criminals. Using information on every crime committed in the city since 2004, they have reduced the number of burglaries by 27%.