# Data Collection, Sampling, Analysis

Data Collection, Sampling, Analysis is the essential and earliest part of any research process, either legal or otherwise.

## Collection of Data

For conducting empirical or quantitative research, data has to be collected and analyzed. Quantitative research involves measurements, usually of variables, because quantitative methods attach more importance to generalizing our sub-set findings to the larger set from which the subset was selected.

**There are two main types of data collection:** census method and sampling method. The Census method is used when the whole area or population is surveyed. The collecting information from all units of a population is usually called the census method. If the size of the units of the study is a small one, the census method is generally used to collect data.

However, in most cases, the census method is not practicable when the exhaustive and intensive study is undertaken and given the time factor and human and financial resource constraints.

On the other hand, the sampling method is less expensive and less time-consuming. Moreover, data collected from the questionnaire survey form the basis for analyzing issues and problems and drawing practical and theoretical conclusions.

## Sampling Method

Most empirical research based on surveys or interviews is undertaken based on the sampling method. The sampling method implies selecting a small group from a larger group of individuals as representative of the whole. Sampling is the process of selecting a subset of people or social phenomena to be studied from the larger universe.

The main objective of sampling is to draw inferences about the larger group based on information obtained from the small group. The main way to achieve this is to select a representative sample. A sound representative sample should reflect all variables that exist in the population.

The term ‘population’ refers to all those who could be included in the survey. A variable is any characteristic on which people or groups differ.

A variable is a set of mutually exclusive attributes of a sample unit: sex, age, employment status, and so forth. The elements of a given population may be described in terms of their individual attributes on a given variable. Variable is closely associated with the term sampling frame. The sampling frame is listing all units in the population from which the sample will be selected.

The sampling method is less expensive and less time-consuming than the census technique. It is convenient to administer a sample method as the small units of the sample can be easily manageable. The sampling method is also useful for the intensive and elaborate study of selected units.

The main assumption behind the sampling technique is that though socio-legal phenomena are complex, there appears dominant unity in diversity, and it is possible to draw a representative sample. But the choice of the unit should be clear, unambiguous, and definite. Moreover, the sample unit must be adequate in size to be reliable.

However, to be reliable, the choice of sample units should be made with due care, and the subject matter under the survey should be homogenous.

The main advantages of the sampling method are that it can facilitate the estimate of the characteristics of the population in a much shorter time than would be possible otherwise. It is also less expensive as only fewer people need to be interviewed.

However, the sampling method also has some disadvantages, such as the possibility of biases in selecting units, leading to the drawing of false conclusions. Biasness occurs when the decisions of the researcher about whom to sample are influenced too much by personal judgments, by prospective respondents’ availability, or by his implicit criteria for inclusion. A biased sample does not represent the population from which the sample was selected.

The use of sampling methods also requires the knowledge of sampling and selection of appropriate samples. Moreover, if the units under sampling are liable to change, it isn’t easy to maintain homogeneity.

**There are two main types of sampling;**

- Probability sampling,
- Non-probability sampling.

## 1. Probability Sampling

It refers to a sample that has been selected using random selection so that each unit in the population has a known chance of being selected.

In other words, individual units are chosen from the whole group not deliberately but by some mechanical processes. Thus, probability sampling is also known as ‘random sampling.’

Probability sampling is instrumental when researchers want precise, statistical descriptions of large populations- for example, the percentage of unemployed populations or plans to vote for candidate X, etc.

Thus, probability sampling is used in large-scale surveys. Probability sampling has the advantage of eliminating human biases in sampling. The sample error in this method can be kept to a minimum.

Probability sampling enhances the representativeness of sampling and provides for generalization from a sample to the population.

**There are three types of probability sampling methods:**

- Simple Random Sampling.
- Stratified Random Sampling.
- Non-Probability Sampling.

### a. Simple Random Sampling

This is the basic form of a probability sample. In this random sample, each population unit has an equal probability of inclusion in the sample. The key steps of devising a simple random sample include defining the population, deciding sample size, and selecting the mechanical process.

Generally, in this type of sampling, the units composing a population are assigned numbers. Then a set of random numbers is generated and the units having those numbers are included in the sample. Simple random sampling is free from bias and is generally more representative.

### b. Stratified Random Sampling

Random sampling will likely, by chance, include a higher proportion of one group of people than there should be for it to be truly representative.

To avoid this problem, stratified random sampling is employed. Stratified random sampling is employed when the population from which a sample is drawn does not constitute a homogenous group.

Thus, stratification means the grouping of the units composing a population into homogeneous before sampling. In this type of sampling, the population is stratified by criteria, and then the selection is made through simple random sampling from the resulting strata.

In other words, under this method, the population is divided into several subpopulations that are individually more homogenous than the total population. Then the selection is made from each stratum to constitute a representative sample.

The stratified random sampling ensures that the resulting sample will be distributed in the same way as the population in terms of the stratifying criterion.’ Stratified sampling can ensure greater representativeness of the sample if the stratification process is based on objective criteria.

### c. Systematic Sampling

In systematic sampling, the population is listed so that its order can uniquely identify each element of the population. The list of elements in the population is usually ordered randomly concerning the trait to be measured. In this sense, it is also equivalent to simple random sampling.

However, here sample is selected at every sampling interval.

Typically, simple random sampling requires a list of elements. When such a list is available, researchers usually employ systematic sampling. For instance, if the list contained 10,000 elements and the researcher wants a sample of 1,000, he should select every tenth element for his sample.

## 2. Non-Probability Sampling

Non-probability sampling means a sample that has not been selected using a random selection method. In this method, units for the sample are selected deliberately by the researcher.

Thus, in non-probability sampling, the researcher purposely chooses the particular population units with certain characteristics for constituting a sample because such units will represent the entire population.

There are two main types of non-probability sampling are;

- Judgment or Purposive Sampling.
- Quota Sampling.

### a. Judgement of Purposive Sampling

In the judgment of purposive sampling, the researcher selects the units to form his sample on his own judgment. The essence of this method is that the researcher, presumably having sufficient knowledge about the population and its elements, uses his experience to select a sample that will be the most useful or representative.

This technique is useful in cases where the whole data is homogeneous, and the researcher has full knowledge of the various aspects of the problem.

### b. Quota Sampling

This combines judgment and probability procedures. Here the population is classified into several categories based on judgment or assumption or previous knowledge. First, people are selected globally: gender, age, class, locality, etc.

For instance, in conducting research, the researcher may need to know what proportions of the population are male and what proportion female and what proportions of each gender fall into various age categories, educational levels, ethnic groups, etc.

Quota sampling aims to produce a sample that reflects a population in terms of the relative proportions of people in different categories.11 Quota sampling is much quicker and cheaper than proper probability sampling.

## Analysis and Interpretation of Data

Analysis and interpretation of data is the next stage after the collection of data from empirical methods. “The dividing line between analysis of data and interpretation is difficult to draw as the two processes are symbolical and merge imperceptibly. Interpretation is inextricably interwoven with analysis.” The analysis is a critical examination of the assembled data. Analysis of data leads to generalization.

A generalization involves concluding a whole group or category of things based on information drawn from particular instances or examples. Interpretation refers to the analysis of generalization and results. Interpretation is a search for the broader meaning of research findings. Analysis of data is to be made regarding the purpose of the study.

Data should be analyzed in light of hypothesis or research questions and organized to yield answers to the research questions. Data analysis can be both descriptive as well as a graphic in presentation. It can be presented in the form of charts, diagrams, and tables.

The data analysis includes various processes, including data classification, coding, tabulation, statistical analysis of data, and inference about causal relations among variables. The proper analysis helps in the classification and organization of unorganized data and gives scientific shape. In addition, it helps in studying the trends and changes which take place in a particular period.

**The following are the steps for processing of interpretation:**

Firstly, data should be edited. Since all the data collected is not relevant for the study, irrelevant data should be separated from the relevant ones. Careful editing is essential to avoid possible errors which may distort the analysis and interpretation of data. But the exclusion of data should be done with an objective view and should be free from bias and prejudices.

The next step is coding or converting data to a numerical form and presenting it on the coding matrix. Coding reduces the huge quantity of data to a manageable proportion.

Thirdly, all data should be arranged according to characteristics and attributes. The data should then be properly classified so that it becomes simple and clear for use.

Thirdly, data should be presented in tabular form or graphs. But any tabulation of data should be accompanied by comments as to why the particular finding of data is important. Finally, the researcher should direct the reader to its component, especially striking from the point of view of research questions.

**There are three key concepts regarding analysis and interpretation of data**

### Reliability

It refers to consistency. In other words, if a method of collecting evidence is reliable, it means that anybody else is using this method, or the same person using it at another time, would come with the same results. In other words, reliability is concerned with the extent that an experiment can be repeated or how far a given measurement will provide the same results on different occasions.

### Validity

It refers to whether the data is collected is a true picture of what is being studied. It means that the data collected should be a product of the research method used rather than studied.

### Representativeness

This refers to whether the group of people or the situation we are studying are typical’ of others.’

To draw reliable and valid inferences from the data, the following conditions should be considered.

- Reliable inference can only be drawn when the statistics are strictly comparable, and data are complete and consistent.’ Thus, to ensure comparability of different situations, the data should be homogenous; data should be complete and adequate, and the data should be appropriate.
- An ideal sample must adequately represent the whole population. Thus, when the number of units is huge, the researcher should choose those samples with the same set of qualities and features as found in the whole data.