A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence

  title={A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence},
  author={Paris Koloveas and Thanasis Chantzios and Christos Tryfonopoulos and Spiros Skiadopoulos},
  journal={2019 IEEE World Congress on Services (SERVICES)},
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in… Expand
On Strengthening SMEs and MEs Threat Intelligence and Awareness by Identifying Data Breaches, Stolen Credentials and Illegal Activities on the Dark Web
Machine Learning and specialised Information Retrieval techniques are devised to extract insights and investigate how the Dark Web enables cybercrime, maintains marketplaces with breached enterprise data collections and pawned email accounts. Expand
Social Media Monitoring for IoT Cyber-Threats
This work proposes a novel social media monitoring system tailored to the IoT domain that allows users to identify recent/trending vulnerabilities and exploits on IoT devices and publicly releases all annotated datasets created during this process. Expand
Blockchain-Based Cyber Threat Intelligence System Architecture for Sustainable Computing
Experimental results of evaluation using the IP of 10 open source intelligence (OSINT) CTI feeds show that the proposed model saves about 15% of storage space compared to total network resources in a limited test environment. Expand
IoVT: Internet of Vulnerable Things? Threat Architecture, Attack Surfaces, and Vulnerabilities in Internet of Things and Its Applications towards Smart Grids
This paper proposes a threat architecture for IoT, addressing threats in the context of a three-layer IoT reference architecture, and covers the applications of Internet of Vulnerable Things (IoVT) in Smart energy Grid solutions, as there will be tremendous use of IoT in future Smart Grids to save energy and improve overall distribution. Expand
Data Elimination on Repetition using a Blockchain based Cyber Threat Intelligence
A CTI system using blockchain to tackle the issues of sustainability, scalability, privacy and reliability is introduced, capable of measuring organizations contributions, reducing network load, creating a reliable dataset and collecting CTI data with multiple feeds. Expand
Black Widow Crawler for TOR network to search for criminal patterns
This work aims to develop the Black Widow crawler focused on the Tor network, which searches, analyzes, and indexes websites containing criminal patterns. Also, a comparison of the results of theExpand
Robotics cyber security: vulnerabilities, attacks, countermeasures, and recommendations
Different approaches and recommendations are presented in order to enhance and improve the security level of robotic systems such as multi-Factor device/user authentication schemes, in addition to multi-factor cryptographic algorithms. Expand


Learning to crawl deep web
Experimental results show that the novel deep web crawling framework based on reinforcement learning outperforms the state of art methods in terms of crawling capability and relaxes the assumption of full-text search implied by existing methods. Expand
CSCE: A Crawler Engine for Cloud Services Discovery on the World Wide Web
The Cloud Service Crawler Engine is presented that is used to collect metadata of 5, 883 valid cloud services through search engines after parsing more than a half million possible links and offers an overall view on the current status of cloud services. Expand
Crawling Ranked Deep Web Data Sources
This paper proposes the document frequency df based algorithm that exploits the queries whose document frequencies are within the specified range and demonstrates that this method outperforms the two algorithms 58i¾?% and 90i½?% on average respectively. Expand
Web Crawler Architecture
  • Marc Najork
  • Computer Science
  • Encyclopedia of Database Systems
  • 2009
In order to crawl a substantial fraction of the “surface web” in a reasonable amount of time, web crawlers must download thousands of pages per second, and are typically distributed over tens or hundreds of computers. Expand
Efficient Deep Web Crawling Using Reinforcement Learning
A novel deep web crawling framework based on reinforcement learning, in which the crawler is regarded as an agent and deep web database as the environment is proposed, which outperforms the state of art methods in terms of crawling capability and breaks through the assumption of full-text search implied by existing methods. Expand
OXPath: A language for scalable data extraction, automation, and crawling on the deep web
This work introduces OXPath as an extension of XPath for interacting with web applications and extracting data thus revealed—matching all the above requirements for web data extraction, automation, and (focused) web crawling. Expand
Rank-Aware Crawling of Hidden Web sites
This paper presents algorithms for crawling a Hidden Web site by taking the ranking of the results into account and provides a framework for performing ranking-aware Hidden Web crawling and shows experimental results on a real Web site demonstrating the performance of the methods. Expand
A New Architecture of an Intelligent Agent-Based Crawler for Domain-Specific Deep Web Databases
  • Yanni Li, Yuping Wang, Erfeng Tian
  • Computer Science
  • 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology
  • 2012
The iCrawler, based on intelligent learning agents and domain ontology, and a series of novel and effective strategies, can improve the performance of the existing methods of domain-specific Deep Web Form-Focused Crawlers. Expand
SmartCrawler: A Two-Stage Crawler for Efficiently Harvesting Deep-Web Interfaces
The experimental results show the agility and accuracy of the proposed crawler framework, SmartCrawler, which efficiently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers. Expand
The Architecture and Implementation of an Extensible Web Crawler
It is argued that the low-latency, high selectivity, and scalable nature of the extensible crawler system makes it a promising platform for taking advantage of emerging real-time streams of data, such as Facebook or Twitter feeds. Expand