This paper describes a cluster-based high-performance web spider architecture. Its architecture has been designed for handling a very large number of web pages with both URLs contents compression. The method we used to fetch URLs has been designed for achieving maximum performance with respect to well-known spider's considerations. In experiments, our… (More)
Intrusion detection has been performed at network and host level for detecting various attacks. Port scanning could be classified as one of the network intrusions. This paper presents a method for detecting port scanning attacks using rule-based state diagram techniques. A set of rules corresponding with the appropriate thresholds was designed for intrusion… (More)
A common problem of large scale search engines and web spiders is how to handle a huge number of encountered URLs. Traditional search engines and web spiders use hard disk to store URLs without any compression. This results in slow performance and more space requirement. This paper describes a simple URL compression algorithm allowing efficient compression… (More)
restoration largely reduces the number of state synchronization transactions when the number of firewall nodes fluctuates. Therefore, the high-scalability and load balancing are gained with minimal state replications.
In a very high-speed network environment such as gigabit Ethernet network, firewalls that have to inspect and filter all flowing packets are reaching their limits. A firewall running on a single machine is potential bottleneck and cannot scale over certain thresholds, even if it has particular hardware built-in. Hence, parallel system appears as an… (More)
With the speed and bandwidth offered by the next generation Internet technology, there is a need for large and scalable Internet server that can provides an adequate computing power and storage for the new generation Internet applications. This requires a huge investment in a very large and expensive commercial server system. Recently, the emergence of… (More)
This paper presents the latest status of Thai web servers. Quantitative measurements are based on database crawling on July 2000. Our experiment shows that the Heaps' and Zipf's laws apply strongly to documents on the Thai web. A visualization tool is developed to show servers connectivity.
In the IPv4/IPv6 dual-stack environment, enterprises critically need a captive portal based authentication system that can bind a user account to both IPv4 and IPv6 addresses, on the machine the user log-in, and release the binding when the user log-out. Aggravating users by requiring them to do multiple log-in, one per address, is out of the question. In… (More)
Search engines primary rely on web spiders to collect large amount of data for indexing and analysis. Data collection can be performed by several agents of web spiders running in parallel or distributed manner over a cluster of workstations. This parallelization is often necessary in order to cope with a large number of pages in a reasonable amount of time.… (More)