Probability Proportional to Size(PPC) sampling procedure is a variation on multi-stage sampling where the probability of selecting a PSU is proportional to its size, and an equal number of elements is sampled within each PSU.
If one PSU has twice as large a population as another, it is given twice the chance of being selected.
If the same number of persons is selected from each of the selected PSUs, any person’s overall probability of selection will be the same. Exact PPS sampling of PSUs thus achieves complete control over sample size.
The PPS selection method is useful when the PSUs vary greatly in size.
The inherent difference between simple random sampling and PPS sampling is that the probability of drawing any specified unit at any given draw is the same as the former method.
In contrast, the latter method’s probability differs from draw to draw. As a result, the theory of PPS sampling is more complex than simple random sampling.
How does it work in practice? We will illustrate the method (called cumulative total method) by an example.
Example of PPS Sampling
A population consists of 10 villages with a total of 212 households. The second column of the accompanying table shows the number of households corresponding to each village. A sample of 6 villages is to be selected by the PPS method.
To do this follow, the steps are followed:
- Prepare a cumulative total column with the households in column 2. These totals appear in column 3.
- Make a column displaying the range implied by the cumulated totals.
- Read off the random numbers from the Appendix. These random numbers are 173, 95,210, ..,32. (Ignore all random numbers lying outside the range 001-212).
- The columns corresponding to our selected random numbers will be our sampled villages.
- Table 5.9 shows the selected villages under-sampling with and without replacement.
The procedure has ensured that the probabilities of inclusion are proportional to the villages’ size (number of households) at each draw.
If household sizes are not known, some other auxiliary variables, highly correlated with household size (such as population size) could be used instead as a measure of size.
Table: Selection of PPS Sample
Village | No. of households | Cumulative total | Range | Probability of selection |
1 | 35 | 35 | 001-035 | 35/212 |
2 | 28 | 63 | 036-063 | 28/212 |
3 | 20 | 83 | 064-083 | 20/212 |
4 | 25 | 108 | 084-108 | 25/212 |
5 | 30 | 138 | 109-138 | 30/212 |
6 | 19 | 157 | 139-157 | 19/212 |
7 | 10 | 167 | 158-167 | 10/212 |
8 | 12 | 179 | 168-179 | 12/212 |
9 | 18 | 197 | 180-197 | 18/212 |
10 | 15 | 212 | 198-212 | 15/212 |
Total | 212 | – | – | 1.000 |
Random# 173 | 95 | 210 | 119 | 140 | 152 | 32 |
Village # 8 | 4 | 10 | 5 | 6 | 6 | 1 |
SWR 1 | 2 | 3 | 4 | 5 | 6 | – |
SWOR* 1 SWR Sampling with replacement, SWOR: Sampling without replacement | 2 | 3 | 4 | 5 | – | 6 |
PPS Systematic Sampling
You are already familiar with the concept of PPS sampling. This section illustrates how this method can be employed in systematic sampling too.
We illustrate this approach by the previous example for sampling without replacement. To fit the problem in the context of systematic linear sampling, we select 4 villages so that the total of 212 is divisible by the sample size.
Refer to the first four columns of Table 5.7. Now to select 4 villages, follow the steps as detailed below:
- Divide the total number of households (here 212) by 4, the sample size. This gives the sampling interval k=53.
- Choose a random number between 1 and 33 inclusive. Say this number is 20. This is found to be located in the range 001-035. This identifies the village bearing serial number 1 as our first selection.
- Add k (=53) to the number 20 chosen in step 2. This results in 53+20=73, which falls in the range of 64-83. This leads us to select the village bearing serial number 3.
- To select the third unit, add 53 to 73, giving 126, which falls in the range 109-138. This dictates us to select a village bearing serial number 5.
- Finally, add 53 to 126, resulting in a total of 179. This selects village 8.
- This completes the sample selection procedure. We have selected villages with serial numbers: 1,3,5 and 8.
Had this been a case of n=6, k would have been 35.33 leading the selection procedure to systematic circular sampling.
To accomplish the task under this procedure, we round the sampling interval to the next higher digit, 36. As the method dictates, we choose our random number between 1 and 212 inclusive to ensure equal probability selection.
It is easy to verify that choosing any random number in the range 1-32 will not cause any problem in selecting 6 villages. If you go beyond that, you must follow the circular systematic sampling strategy to ensure 6 villages.
Suppose your chosen random number is 40. This falls in the range 36-63, thus giving us village 2 as our selection. Add now 36 to 40, which results in 76. This falls in the range 64-83, identifying village 3 as our second selection.
Continuing the process, the remaining 4 selected villages are those that bear serial numbers 5, 6, 9, and 1. The accompanying table shows the random number chosen and the associated selected villages.
Random Number | Range | Selected Villages |
---|---|---|
40 | 036-063 | 2 |
76 | 064-083 | 3 |
112 | 109-138 | 5 |
148 | 139-157 | 6 |
184 | 180-197 | 8 |
220 | 001-035 | 1 |
220-212=8, which falls in the first range, identifying the first village. |