Math Help Forum

Math Help Forum Feed Site Feed

Go Back   Math Help Forum > Pre-University Math Help > Basic Statistics and Probability
Reply
 
Thread Tools Display Modes
  #1  
Old October 27th, 2009, 02:52 PM
Newbie
 
Join Date: Oct 2009
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Translate is on a distinguished road
Default sample selection based on unequal probability

Anyone know of a routine that can be used in Excel to randomly select n elements based on the value of a parameter for the elements? So a higher parameter value would create a higher probability that element would be selected than another element that had a lower value.
Reply With Quote
Advertisement
 
  #2  
Old October 27th, 2009, 04:13 PM
Senior Member
 
Join Date: Mar 2008
Posts: 448
Country:
Thanks: 20
Thanked 202 Times in 169 Posts
awkward is just really niceawkward is just really niceawkward is just really niceawkward is just really nice
Default

Quote:
Originally Posted by Translate View Post
Anyone know of a routine that can be used in Excel to randomly select n elements based on the value of a parameter for the elements? So a higher parameter value would create a higher probability that element would be selected than another element that had a lower value.
Here is a way that I think will work.

Let's say the item identifiers are in cells A1:Am and their associated weights are in cells B1:Bm.

1. Put the following formula in cell C1: =B1*rand()

2. Copy cell C1 and paste into cells C2:Cm.

3. Copy cells C1:Cm and Paste Special-Values into C1:Cm, over the same cells. This fixes the random numbers in these cells so they now will not change.

4. Sort the data on column C and select the top n items.
Reply With Quote
  #3  
Old October 27th, 2009, 06:59 PM
Newbie
 
Join Date: Oct 2009
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Translate is on a distinguished road
Thumbs up Works

Not sure of the theoretical underpinnings, but empirically works great. Thanks.
Reply With Quote
  #4  
Old November 3rd, 2009, 06:48 PM
Senior Member
 
Join Date: Mar 2008
Posts: 448
Country:
Thanks: 20
Thanked 202 Times in 169 Posts
awkward is just really niceawkward is just really niceawkward is just really niceawkward is just really nice
Default

I have continued to mull over this problem, and I have decided the solution I gave was wrong. What is more, I don't have a correct solution at this time.

Although the OP did not state the requirement explicitly, I assume the goal of the algorithm is to select a subset of size n, with the probability of selection of each subset being proportional to the sum of the weights of the items in the subset. The algorithm I gave does not do that.

In particular, suppose there are only 2 elements and we want to select a subset of size 1. If the weights are a and b for items 1 and 2, respectively, then we want item 1 (i.e., the subset consisting of item 1) to be selected with probability a/(a+b) and item 2 to be selected with probability b/(a+b). Instead, it can be shown that the method I proposed will select item 1 with probability a/(2b) if a < b and probability (2a-b)/(2a) if a > b. This isn't what we want, unless a=b, which is the unweighted case.

I will continue to think about this problem, but so far I haven't come up with a workable algorithm.
Reply With Quote
  #5  
Old November 3rd, 2009, 07:34 PM
Newbie
 
Join Date: Oct 2009
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Translate is on a distinguished road
Default

Uh oh. But your initial procedure was so elegant! I would appreciate any other suggestions you or others might devise.
Reply With Quote
  #6  
Old November 10th, 2009, 06:58 PM
Senior Member
 
Join Date: Mar 2008
Posts: 448
Country:
Thanks: 20
Thanked 202 Times in 169 Posts
awkward is just really niceawkward is just really niceawkward is just really niceawkward is just really nice
Default

I continued to think about this problem, and I finally came up with a solution.

As before, let's suppose the item identifiers are in cells A1:Am, and their associated weights are in cells B1:Bm.

1. Put the following formula into cell C1: = -LN(RAND())/B1

2. Copy cell C1 and paste into cells C2:Cm.

3. Copy cells C1:Cm and Paste Special-Values over the original cells C1:Cm. This fixes the values of those cells so they will not change during the next step.

4. Sort the data in cells A1:Cm in ascending order based on column C.
Your sample is in the first n rows (which have the lowest values in column C).

This process is based on two properties of the Exponential distribution.
First, if X has an Exponential(\lambda) distribution and Y has an Exponential(\mu) distribution, then
\Pr(X < Y) = \frac{\lambda}{\lambda + \mu}.
Second, you can simulate a draw from an Exponential(\lambda) distribution on a computer by computing
X = -\frac{1}{\lambda} \ln U
where U is a (pseudo-) random number drawn from a Uniform(0,1) distribution.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off
Forum Jump


All times are GMT -7. The time now is 10:08 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0 ©2008, Crawlability, Inc.
©2005 - 2009 Math Help Forum


Math Help Forum is a community of maths forums with an emphasis on maths help in all levels of mathematics.
Register to post your math questions or just hang out and try some of our math games or visit the arcade.