Separating Interleaved User Sessions from Web Log
When analyzing patterns in server side data, it becomes quickly apparent that some of the data originating from the client is lost, mainly due to the caching of web pages. Missing data is a very important issue when using server side data to analyze a user’s browsing behavior, since the quality of the browsing patterns that can be identified depends on the quality of the data. In this paper, we present a series of experiments to demonstrate the extent of the data loss in different browsing environments and illustrate the difference this makes in the resulting browsing patterns when visualized as footstep graphs. We propose an algorithm, called the Pattern Restore Method (PRM), for restoring some of the data that has been lost and evaluate the efficiency and accuracy of this algorithm.