CSIRO Data61 researchers develop publicly available cybersecurity dataset

Researchers from CSIRO’s Data61 and Macquarie University, in collaboration with Nokia Bell Labs and University of Sydney have developed a comprehensive dataset of the global cybersecurity threat landscape to enable cybersecurity specialists to derive new insights and predict future malicious online activity.

Announced at D+61 LIVE in Sydney, FinalBlacklist is reportedly the first and largest publicly available dataset of its kind.

The researchers collected a total of 51.6 million mal-activity reports dating back to 2007 involving 662,000 unique IP addresses worldwide, which were categorised using machine learning techniques into six classes of mal-activity: Malware, Phishing, Fraudulent Services, Potentially Unwanted Programs, Exploits and Spamming.

Professor Dali Kaafar, Information Security and Privacy research leader at CSIRO’s Data61 and Scientific Director of Optus Macquarie University Cyber Security Hub, said that malicious software (or malware) has consistently been the weapon of choice for cyber-criminals over the past decade.

“Ransomware remains a persistent threat as evidenced by the recent attacks against hospitals across Victoria,” Professor Kaafar said.

Reports of phishing activities have also steadily risen with a spike in 2009 coinciding with the increased adoption of smartphones.

In 2013, another spike was experienced which can be linked to the growing popularity of digital payment systems which attracted unwanted attention from cybercriminals.

Analysis of the retrospective dataset will allow researchers to identify how the sources, types and scale of different mal-activity has transformed over time, so that organisations can be better prepared against it.

“We’ve made this dataset available to the wider research community so it can be used to train algorithms to predict future instances of mal-activity before they happen,” Professor Kaafar said.

The dataset shows that mal-activity has consistently increased in volume over the last decade. In fact, the annual cost of cybercrime damages is expected to hit $6 trillion by 2021.

Dr Liming Zhu, Software and Computational Systems Research Director at CSIRO’s Data61 said researchers and organisations are locked in a perpetual arms race to combat widespread malicious activity on the internet.

“The insights that can be drawn from the FinalBlacklist dataset represent a significant contribution to cybersecurity research.

“A retrospective analysis of historical mal-activity trends could help reduce the impact of cybercrime on the economy,” Dr Zhu said.

Although other longitudinal datasets do exist, they are predominantly proprietary as industries are unable to share them due to privacy concerns and wanting to maintain a competitive advantage.

The FinalBlacklist dataset has been made publicly available to drive further research.

“Our analysis revealed a consistent minority of repeat offenders that contributed a majority of the mal-activity reports. Detecting and quickly reacting to the emergence of these mal-activity contributors could significantly reduce the damage inflicted,” Professor Kaafar said.