A probability proportional to size sampling (PPS) procedure is a variation on multi-stage sampling in which the probability of selecting a PSU is proportional to its size, and an equal number of elements is sampled within each PSU. If one PSU has twice as large a population as another, it is given twice the chance of being selected.

If the same number of persons is then selected from each of the selected PSU’s, the overall probability of selection of any person will be the same. Exact PPS sampling of PSU’s thus achieves complete control over sample size.

The PPS method of selection is useful when the PSU’s vary greatly in size.

The inherent difference between simple random sampling and PPS sampling is that in the former method, the probability of drawing any specified unit at any given draw is the same. In contrast, in the latter method, the probability differs from draw to draw. As a result, the theory of PPS sampling is more complex than simple random sampling.

How does it work in practice? We will illustrate the method (called cumulative total method) by an example.

## Example of PPS Sampling

A population consists of 10 villages with a total of 212 households. The second column of the accompanying table shows the number of households corresponding to each village. A sample of 6 villages is to be selected by the PPS method. To do this follow the steps are followed:

- Prepare a cumulative total column with the households in column 2. These totals appear in column 3.
- Make a column displaying the range implied by the cumulated totals.
- Read off the random numbers from the Appendix. These random numbers are 173, 95,210, ..,32. (Ignore all random numbers lying outside the range 001-212).
- The columns corresponding to our selected random numbers will be our sampled villages.
- Table 5.9 shows the selected villages under-sampling with and without replacement.

The procedure has ensured that the probabilities of inclusion are proportional to the size (number of households) of the villages at each draw. If household sizes are not known, some other auxiliary variable, highly correlated with household size (such as population size) could be used instead as a measure of size.

Table: Selection of PPS Sample | ||||

Village | No. of households | Cumulative total | Range | Probability of selection |

1 | 35 | 35 | 001-035 | 35/212 |

2 | 28 | 63 | 036-063 | 28/212 |

3 | 20 | 83 | 064-083 | 20/212 |

4 | 25 | 108 | 084-108 | 25/212 |

5 | 30 | 138 | 109-138 | 30/212 |

6 | 19 | 157 | 139-157 | 19/212 |

7 | 10 | 167 | 158-167 | 10/212 |

8 | 12 | 179 | 168-179 | 12/212 |

9 | 18 | 197 | 180-197 | 18/212 |

10 | 15 | 212 | 198-212 | 15/212 |

Total | 212 | – | – | 1.000 |

Table: Results of PPS Sampling | ||||||

Random# 173 | 95 | 210 | 119 | 140 | 152 | 32 |

Village # 8 | 4 | 10 | 5 | 6 | 6 | 1 |

SWR 1 | 2 | 3 | 4 | 5 | 6 | – |

SWOR* 1 | 2 | 3 | 4 | 5 | – | 6 |

* SWR Sampling with replacement, SWOR: Sampling without replacement |

## PPS Systematic Sampling

You are already familiar with the concept of PPS sampling. This section illustrates how this method can be employed in systematic sampling too.

We illustrate this approach by the previous example for sampling without replacement. To fit the problem in the context of linear systematic sampling, we select 4 villages so that the total 212 is exactly divisible by the sample size.

Refer to the first four columns of Table 5.7. Now to select 4 villages, follow the steps as detailed below:

- Divide the total number of households (here 212) by 4, the sample size. This gives the sampling interval
*k=53.* - Choose a random number between 1 and 33 inclusive. Say this number is 20. This is found to be located in the range 001-035. This identifies the village bearing serial number 1 as our first selection.
- Add
*k*(=53) to the number 20 chosen in step 2. This results in 53+20=73, which falls in the range 64-83. This leads us to select the village bearing serial number 3. - To select the third unit, add 53 to 73, giving 126, which falls in the range 109-138. This dictates us to select village bearing serial number 5.
- Finally, add 53 to 126, resulting in a total of 179. This selects village 8.
- This completes the sample selection procedure. We have selected villages with serial numbers: 1,3,5 and 8.

Had this been a case of *n=6, k* would have been 35.33 leading the selection procedure to a circular systematic sampling.

To accomplish the task under this procedure, we round the sampling interval to the next higher digit, which is 36. As the method dictates, we choose our random number between 1 and 212 inclusive to ensure equal probability selection.

It is easy to verify that choice of any random number in the range 1-32 will not make any problem in the selection of 6 villages. If you go beyond that, you will have to follow the circular systematic sampling strategy to ensure 6 villages.

Suppose your chosen random number is 40. This falls in the range 36-63, thus giving us village 2 as our selection. Add now 36 to 40, which results in 76. This falls in the range 64-83, identifying village 3 as our second selection.

Continuing the process, the remaining 4 selected villages are those that bear the serial numbers 5, 6, 9, and 1. The accompanying table shows the random number is chosen and the associated selected villages.

| ||

Random number | Range | Selected villages |

40 | 036-063 | 2 |

76 | 064-083 | 3 |

112 | 109-138 | 5 |

148 | 139-157 | 6 |

184 | 180-197 | 9 |

220’ | 001-035 | 1 |

* 220-212=8, which falls in the first range, identifying the first village. |