Data Cleaning Methods for Client and Proxy Logs

H. Weinreich, H. Obendorf, E. Herder

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    38 Downloads (Pure)


    In this paper we present our experiences with the cleaning of Web client and proxy usage logs, based on a long-term browsing study with 25 participants. A detailed clickstream log, recorded using a Web intermediary, was combined with a second log of user interface actions, which was captured by a modified Firefox browser for a subset of the participants. The consolidated data from both records revealed many page requests that were not directly related to user actions. For participants who had no ad-filtering system installed, these artifacts made up one third of all transferred Web pages. Three major reasons could be identified: HTML Frames and iFrames, advertisements, and automatic page reloads. The experiences made during the data cleaning process might help other researchers to choose adequate filtering methods for their data.
    Original languageUndefined
    Title of host publicationWorkshop on Logging Traces of Web Activity: The Mechanics of Data Collection
    EditorsA. Edmonds, K. Hawkey, M. Kellar, D. Turnbull
    Place of PublicationHalifax, Canada
    PublisherDalhousie University
    Number of pages4
    ISBN (Print)not assigned
    Publication statusPublished - 23 May 2006

    Publication series

    PublisherDalhousie University


    • EWI-9155
    • HMI-IE: Information Engineering
    • IR-66893
    • METIS-237939
    • HMI-HF: Human Factors

    Cite this