Rejection sampling is a popular method for generating random variates. It's based on the idea that, if you generate a number from some probability distribution and that number turns out to be outside the bounds of distribution, you can just discard it and try until you find one that works.

In this article, we will discuss the advantages and disadvantages of rejection sampling in research, along with when and how rejection sampling can be used in a study.

What is Rejection Sampling? 

Rejection Sampling is a method of statistical inference. It involves drawing random samples and rejecting those that don't meet some threshold until you reach the number of samples you need.

It is a method for creating samples from one distribution by using an easier distribution. For instance, imagine you have a coin that lands on heads 60% of the time.

Read: Snowball Sampling Guide: Types, Examples, Pros & Cons

You want to use this coin to create samples from another distribution that also has a probability of 60% for an outcome. Only that this other distribution is much harder to sample from than just flipping the coin. You could write a program that flips the coin over and over again until there are 60 "heads" and 40 "tails" or to your desired ratio.

However, to get a good number of samples, you will have to flip the coin thousands of times. This would take a lot of time and still wouldn't give you perfect results.

On the other hand, if you had another coin that already has the desired ratio built-in, this coin would be much easier to work with because you could just flip it and use the results you get. So what rejection sampling does is build this second "easy-to-sample" distribution so that it closely matches the first one.

Read: Consecutive Sampling: Definition, Examples, Pros & Cons

Advantages of Rejection Sampling

Rejection Sampling has several advantages over other methods for sampling, some of which include:

  • It can be used with any distribution.
  • It's easy to modify it for different target distributions
  • It can be used to generate any number of samples at once.
  • It is easy to implement
  • There is no restriction on the support of the target distribution.
  • You can use rejection sampling even if you don't know the constant of the probability distribution you're trying to sample from.
  • The rejection constant is usually not too difficult to find in practice.
Read: Convenience Sampling: Definition, Applications, Examples

Disadvantages of Rejection Sampling

Rejection Sampling has some disadvantages such as:

  • It can take a lot of samples until you get one that fits your criteria, so it's inefficient.
  • It can be slow if the probability density function (pdf) of your target distribution is very close to zero at most points in its range.

When Should Rejection Sampling Be Used?

Rejection sampling is a type of sampling that's often used when you're estimating a quantity, but it's not always the best option. However, you can use rejection sampling in the following cases:

  • If the distribution of the quantity to be estimated is known.
  • If the distribution of the quantity to be estimated can be tightly bound by some other distribution.
  • If you have an easy way to sample from another distribution (that bounds your target distribution).
Read: Multistage Sampling: Types, Applications, Pros & Cons

You can also make use of rejection sampling when you want to:

  • Generate uniform random numbers that fall within an arbitrary polygon or polyhedron.
  • Simulate rare events that are hard to predict. 

How to Conduct Rejection Sampling

The rejection sampling process consists of two steps. In the first step, a sample is selected from a distribution that has a known probability density function (pdf).

In the second step, the sample is accepted or rejected based on a probability density function that's related to the pdf in the first step. If the sample is accepted, it's returned to the calling routine; if it's rejected, you go back to the first step and select another sample.

The process works because it can simulate any random variable whose distribution matches the pdf in step one. To understand how this works, consider the following example.

The goal of a sampling process is to generate random values that follow a normal distribution with mean 0 and standard deviation 1. These values will be generated by starting with samples from an exponential distribution with a mean of 1 and then accepting those samples based on their proximity to 0. 

Read: What is Stratified Sampling? Definition, Examples, Types

The exponential distribution has an easy-to-use inverse function, while the normal distribution doesn't. This process can be used to approximate any distribution you could want.

Therefore, rejection sampling involves three steps:

  1. Generate a random sample from the domain of interest
  2. Calculate the probability density function (PDF) at this point
  3. Accept or reject the sample based on whether it meets certain criteria

The PDF is used either to evaluate the probability that an event will occur under some circumstances or to represent the relative likelihood of different events. If accepted, then you have a sample from your distribution, if otherwise, you go back and start over.

Read: Probability Sampling: Definition, Types, Examples, Pros & Cons

Rejection Sampling Techniques

Rejection sampling has a relative simplicity of algorithm that can be used to generate samples. The process is as follows:

  1. Set a proposal distribution with a known density, q(x).
  2. Draw a random number x uniformly distributed between 0 and 1
  3. Draw another random number y uniformly distributed between 0 and 1
  4. If y < f(x)/M, where f(x) is the probability density function and M is a constant, then accept the value x as a sample from the distribution f(x).

Select an arbitrary distribution function, f(x). This function should be the same as the distribution of your data. In other words, if you are trying to draw a random sample that is uniformly distributed between 1 and 10, then this function would be f(x) = x.

Select a probability density function (p.d.f.) from which you can sample easily. The p.d.f. should be greater than 0 everywhere, but does not need to approach zero quickly (e.g., p(x) = 1). 

A constant p.d.f. works well for most purposes; however, any p.d.f will work so long as it is greater than 0 everywhere and does not approach zero quickly at either end of the interval in which it is defined (this might seem like a tall order, but such functions do exist). The process of rejection sampling can be illustrated with an example. 

Read: Cluster Sampling Guide: Types, Methods, Examples & Uses

Suppose we wish to sample from a uniform distribution in the interval [0, 1], but we only have a normal distribution with a mean of 0 and a standard deviation of 1. Since there is no easy way to generate uniform random numbers, we will accept or reject samples from the normal distribution until we have generated a uniform random number.

The procedure is as follows:

Sample x from the normal distribution

Set y = f(x) where f is the density function of the target distribution, which in this case would be the uniform density function in [0, 1]. In other words, y is a uniform random variable multiplied by the density of the normal distribution at x.

Generate another uniformly distributed random variable u between 0 and 1. If u < y, accept x as a sample; otherwise reject it. Repeat until the desired number of samples is obtained. 

Read: Sampling Bias: Definition, Types + [Examples]

Examples of Rejection Sampling

Example 1

Imagine that you want to generate samples from the distribution shown in the graph below. The distribution has a sharp peak over the interval (0,1) but falls quickly to zero outside this range.

This distribution is difficult to sample directly because of its narrow peak. To sample this distribution using rejection sampling, we first need to choose an envelope distribution that has no regions where it drops to zero.

In this example, we will use a uniform distribution that spans the entire range of the target distribution (in this case from 0 to 1). The envelope and target distributions are shown in the graph below.

Notice that there are no regions where the envelope distribution drops to zero within the region where the target distribution does not drop to zero, so this pair of distributions satisfy our requirements for rejection sampling.

Example 2

Let's say you were applying to three different jobs at the same company, but only got offered one of them. You could say that you were rejected by the other two jobs and accepted by the third job.

Or if you're trying to get a promotion at work and your boss says no, but then tells you about an opening at another company that could help you advance in your career, you can think of it as getting rejected by your boss, but then discovering another option through that rejection. For all its benefits (like helping us grow!), rejection sampling can be tough at first. 

Conclusion

Researchers should note that rejection sampling works by taking samples from an “envelope” distribution and accepting them with a probability that depends on the target distribution. If the sample is rejected, you can be assured that another sample will be taken until one is accepted.



  • Formplus Blog
  • on 7 min read

Formplus

You may also like:

Multistage Sampling: Types, Applications, Pros & Cons

In multistage sampling or multistage cluster sampling, a sample is drawn from a population through the use of smaller and smaller groups ...

Formplus Blog
8 min read
Consecutive Sampling: Definition, Examples, Pros & Cons

Consecutive sampling is a common method of data collection used to study a specific group of individuals. It's an efficient solution to ...

Formplus Blog
9 min read
Reporting bias: Definition, Types, Examples & Mitigation

Reporting bias is a type of selection bias that occurs when only certain observations are reported or published. Reporting bias can greatly ...

Formplus Blog
7 min read
Analysis of Variance (ANOVA): Types, Examples & Uses

ANOVA is an acronym that stands for "analysis of variance." The ANOVA test is used to determine whether a significant difference exists ...

Formplus Blog
9 min read

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. Try Formplus and transform your work productivity today.
Try Formplus For Free