Abstract
Sampling techniques are becoming increasingly important for very large databases. However, the problem of obtaining a random sample from index structures has not received much attention. In this paper, we examine sampling techniques for B tree. As the fanout of each node varies, a random walk through the index structure does not produce a good representative sample of the data set. We propose a new technique, called B Tree based Weighted Random Sampling (BTWRS), that alters the inclusion probabilities of records accordingly to allow more records from leaves, along the paths with higher fanouts, to be extracted. We extensively evaluated our method, and the results show that there is an improvement in BTWRS over the existing schemes in terms of the quality of the samples obtained and the efficiency of the sampling process. The proposed method can be readily adopted in existing commercial systems.
Original language | English (US) |
---|---|
Pages (from-to) | 359-377 |
Number of pages | 19 |
Journal | Intelligent Data Analysis |
Volume | 6 |
Issue number | 4 |
DOIs | |
State | Published - 2002 |
Keywords
- B Tree
- quality of samples
- weighted random sampling
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Vision and Pattern Recognition
- Artificial Intelligence