nealrichter’s posterous

 
« Back to blog

References for mining from streaming data

While reading a lower quality paper on the subject I found these references worth tracking down.  Essentially the idea is that you can make one-pass through the data and must produce statistics of the data that are approximate in nature, ideally with bounded approximation error.

Gibbons et al 1997: Fast Incremental Maintenance of Approximate Histograms
   Incremental maintenance of histograms primarily for database query planners

Manku and Motwani 2002 Approximate frequency counts over data streams.
  They show algorithms called sticky sampling and lossy counting with proven error bounds.

Zhu et al. 2002 StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time
  Basic stats and high correlations between all pairs of data streams

Cormode and Muthukrishnan's 2003 What's Hot and What's Not: Tracking Most Frequent Items Dynamically
   Introduced groupTest probabilistic Monte Carlo algorithm for frequent item tracking

Metwally et al 2005 Efficient Computation of Frequent and Top-k Elements in Data Streams
   Uses counters to monitor data streams with a new stream-summary data structure

Cheng et al 2005 Time-Decaying Bloom Filters for Data Streams with Skewed Distributions
   Dampened Bloom-filters for frequent items

Three papers on frequent itemsets (different than frequent items):

Jiang and Gruenwald have a pair of 2006 papers
Research Issues in Data Stream Association Rule Mining
  Survey paper of issues and previous results

CFI-Stream: Mining Closed Frequent Itemsets in Data Streams
   New stream based itemset miner

Tsai and Chen 2009 Mining Frequent Itemsets for data streams over Weighted Sliding Windows
   Apriori like itemset miner on windows of data with differential weighting

(9/19) Note: corrected the StatStream link - hat tip @dataspora

Loading mentions Retweet

Comments (0)

Leave a comment...

 
To leave a comment on this posterous, please login by clicking one of the following.
Posterous-login     Connect     twitter