Within the context of a research investigation, concepts are generally referred to as variables. A variable is, as the name applies, something that varies. Age, sex, export, income and expenses, family size, country of birth, capital expenditure, class grades, blood pressure readings, preoperative anxiety levels, eye color, and vehicle type are all examples of variables because each of these properties varies or differs from one individual to another.
Variable Definition in Research
A variable is any property, a characteristic, a number, or a quantity that increases or decreases over time or can take on different values (as opposed to constants, such as n, that do not vary) in different situations.
When conducting research, experiments often manipulate variables. For example, an experimenter might compare the effectiveness of four types of fertilizers.
In this case, the variable is the ‘type of fertilizers’. A social scientist may examine the possible effect of early marriage on divorce.
Here early marriage is the variable. A business researcher may find it useful to include the dividend in determining the share prices. Here dividend is the variable.
Effectiveness, divorce and share prices are also variables because they also vary as a result of manipulating fertilizers, early marriage, and dividends.
Types of Variable
- Qualitative Variables.
- Quantitative Variables.
- Discrete Variable.
- Continuous Variable.
- Dependent Variables.
- Independent Variables.
- Background Variable.
- Moderating Variable.
- Extraneous Variable.
- Intervening Variable.
- Suppressor Variable.
An important distinction between variables is between the qualitative variable and the quantitative variable.
Qualitative variables are those that express a qualitative attribute such as hair color, religion, race, gender, social status, method of payment, and so on. The values of a qualitative variable do not imply a meaningful numerical ordering.
The value of the variable ‘religion’ (Muslim, Hindu, ..,etc.) differs qualitatively; no ordering of religion is implied. Qualitative variables are sometimes referred to as categorical variables.
For example, the variable sex has two distinct categories: ‘male’ and ‘female.’ Since the values of this variable are expressed in categories, we refer to this as a categorical variable.
Similarly, place of residence may be categorized as being urban and rural and thus is a categorical variable.
Categorical variables may again be described as nominal and ordinal.
Ordinal variables are those which can be logically ordered or ranked higher or lower than another but do not necessarily establish a numeric difference between each category, such as examination grades (A+, A, B+, etc., clothing size (Extra large, large, medium, small).
Nominal variables are those who can neither be ranked nor logically ordered, such as religion, sex, etc.
A qualitative variable is a characteristic that is not capable of being measured but can be categorized to possess or not to possess some characteristics.
Quantitative variables, also called numeric variables, are those variables that are measured in terms of numbers. A simple example of a quantitative variable is a person’s age.
The age can take on different values because a person can be 20 years old, 35 years old, and so on. Likewise, family size is a quantitative variable, because a family might be comprised of one, two, three members, and so on.
That is, each of these properties or characteristics referred to above varies or differs from one individual to another. Note that these variables are expressed in numbers, for which we call them quantitative or sometimes numeric variables.
A quantitative variable is one for which the resulting observations are numeric and thus possesses a natural ordering or ranking.
Discrete and Continuous Variables
Quantitative variables are again of two types: discrete and continuous.
Variables such as some children in a household or number of defective items in a box are discrete variables since the possible scores are discrete on the scale.
For example, a household could have three or five children, but not 4.52 children.
Other variables, such as ‘time required to complete an MCQ test’ and ‘waiting time in a queue in front of a bank counter,’ are examples of a continuous variable.
The time required in the above examples is a continuous variable, which could be, for example, 1.65 minutes, or it could be 1.6584795214 minutes.
Of course, the practicalities of measurement preclude most measured variables from being continuous.
Definition 2.6: A discrete variable, restricted to certain values, usually (but not necessarily) consists of whole numbers, such as the family size, number of defective items in a box. They are often the results of enumeration or counting.
A few more examples are;
- The number of accidents in the twelve months.
- The number of mobile cards sold in a store within seven days.
- The number of patients admitted to a hospital over a specified period.
- The number of new branches of a bank opened annually during 2001- 2007.
- The number of weekly visits made by health personnel in the last 12 months.
A continuous variable is one that may take on an infinite number of intermediate values along a specified interval. Examples are:
- The sugar level in the human body;
- Blood pressure reading;
- Height or weight of the human body;
- Rate of bank interest;
- Internal rate of return (IRR),
- Earning ratio (ER);
- Current ratio (CR)
No matter how close two observations might be, if the instrument of measurement is precise enough, a third observation can be found, which will fall between the first two.
A continuous variable generally results from measurement and can assume countless values in the specified range.
Dependent and Independent Variables
In many research settings, there are two specific classes of variables that need to be distinguished from one another, independent variable and dependent variable.
Many research studies are aimed at unrevealing and understanding the causes of underlying phenomena or problems with the ultimate goal of establishing a causal relationship between them.
Look at the following statements:
- Low intake of food causes underweight.
- Smoking enhances the risk of lung cancer.
- Level of education influences job satisfaction.
- Advertisement helps in sales promotion.
- The drug causes the improvement of a health problem.
- Nursing intervention causes more rapid recovery.
- Previous job experiences determine the initial salary.
- Blueberries slow down aging.
- The dividend per share determines share prices.
In each of the above queries, we have two variables: one independent and one dependent. In the first example, ‘low intake of food’ is believed to have caused the ‘problem of underweight.’
It is thus the so-called independent variable. Underweight is the dependent variable because we believe that this ‘problem’ (the problem of underweight) has been caused by ‘the low intake of food’ (the factor).
Similarly, smoking, dividend, and advertisement all are independent variables, and lung cancer, job satisfaction, and sales are dependent variables.
In general, an independent variable is manipulated by the experimenter or researcher, and its effects on the dependent variable are measured.
The variable that is used to describe or measure the factor that is assumed to cause or at least to influence the problem or outcome is called an independent variable.
The definition implies that the experimenter uses the independent variable to describe or explain the influence or effect of it on the dependent variable.
Variability in the dependent variable is presumed to depend on variability in the independent variable.
Depending on the context, an independent variable is sometimes called a predictor variable, regressor, controlled variable, manipulated variable, explanatory variable, exposure variable (as used in reliability theory), risk factor (as used in medical statistics), feature (as used in machine learning and pattern recognition) or input variable.
The explanatory variable is preferred by some authors over the independent variable when the quantities treated as independent variables may not be statistically independent or independently manipulable by the researcher.
If the independent variable is referred to as an explanatory variable, then the term response variable is preferred by some authors for the dependent variable.
The variable that is used to describe or measure the problem or outcome under study is called a dependent variable.
In a causal relationship, the cause is the independent variable, and the effect is the dependent variable. If we hypothesize that smoking causes lung cancer, ‘smoking’ is the independent variable and cancer the dependent variable.
A business researcher may find it useful to include the dividend in determining the share prices. Here dividend is the independent variable, while the share price is the dependent variable.
The dependent variable usually is the variable the researcher is interested in understanding, explaining, or predicting.
In lung cancer research, it is the carcinoma that is of real interest to the researcher, not smoking behavior per se. The independent variable is the presumed cause of, antecedent to, or influence on the dependent variable.
Depending on the context, a dependent variable is sometimes called a response variable, regressand, predicted variable, measured variable, explained variable, experimental variable, responding variable, outcome variable, output variable, or label.
An explained variable is preferred by some authors over the dependent variable when the quantities treated as dependent variables may not be statistically dependent.
If the dependent variable is referred to as an explained variable, then the term predictor variable is preferred by some authors for the independent variable.
Levels of an Independent Variable
If an experimenter compares an experimental treatment with a control treatment, then the independent variable (a type of treatment) has two levels: experimental and control.
If an experiment were to compare five types of diets, then the independent variables (types of diet) would have five levels.
In general, the number of levels of an independent variable is the number of experimental conditions.
In almost every study, we collect information such as age, sex, educational attainment, socioeconomic status, marital status, religion, place of birth, and the like. These variables are referred to as background variables.
These variables are often related to many independent variables so that they influence the problem indirectly. Hence they are called background variables.
If the background variables are important to the study, they should be measured. However, we should try to keep the number of background variables as few as possible in the interest of the economy.
In any statement of relationships of variables, it is normally hypothesized that in some way, the independent variable ’causes’ the dependent variable to occur. In simple relationships, all other variables are extraneous and are ignored. In actual study situations, such a simple one-to-one relationship needs to be revised to take other variables into account to better explain the relationship.
This emphasizes the need to consider a second independent variable that is expected to have a significant contributory or contingent effect on the originally stated dependent-independent relationship. Such a variable is termed a moderating variable.
Suppose you are studying the impact of field-based and classroom-based training on the work performance of the health and family planning workers, you consider the type of training as the independent variable.
If you are focusing on the relationship between the age of the trainees and work performance, you might use ‘type of training’ as a moderating variable.
Most studies concern the identification of a single independent variable and the measurement of its effect on the dependent variable.
But still, several variables might conceivably affect our hypothesized independent-dependent variable relationship, thereby distorting the study. These variables are referred to as extraneous variables.
Extraneous variables are not necessarily part of the study. They exert a confounding effect on the dependent-independent relationship and thus need to be eliminated or controlled for.
An example may illustrate the concept of extraneous variables. Suppose we are interested in examining the relationship between the work-status of mothers and breastfeeding duration.
It is not unreasonable in this instance to presume that the level of education of mothers as it influences work-status might have an impact on breastfeeding duration too.
Education is treated here as an extraneous variable. In any attempt to eliminate or control the effect of this variable, we may consider this variable as a confounding variable.
An appropriate way of dealing with confounding variables is to follow the stratification procedure, which involves a separate analysis for the different levels of lies confounding variables.
For this purpose, one can construct two crosstables: one for illiterate mothers and the other for literate mothers. If we find a similar association between work status and duration of breastfeeding in both the groups of mothers, then we conclude that the educational level of mothers is not a confounding variable.
Often an apparent relationship between two variables is caused by a third variable.
For example, variables X and Y may be highly correlated, but only because X causes the third variable, Z, which in turn causes Y. In this case, Z is the intervening variable.
An intervening variable theoretically affects the observed phenomena but cannot be seen, measured, or manipulated directly; its effects can only be inferred from the effects of the independent and moderating variables on the observed phenomena.
In the work-status and breastfeeding relationship, we might view motivation or counseling as the intervening variable.
Thus, motive, job satisfaction, responsibility, behavior, justice are some of the examples of intervening variables.
In many cases, we have good reasons to believe that the variables of interest have a relationship within themselves, but our data fail to establish any such relationship. Some hidden factors may be suppressing the true relationship between the two original variables.
Such a factor is referred to as a suppressor variable because it suppresses the actual relationship between the other two variables.
The suppressor variable suppresses the relationship by being positively correlated with one of the variables in the relationship and negatively correlated with the other. The true relationship between the two variables will reappear when the suppressor variable is controlled for.
Thus, for example, low age may pull education up but income down. In contrast, a high age may pull income up but education down, effectively canceling out the relationship between education and income unless age is controlled for.
The concept is a name given to a category that organizes observations and ideas by their possession of common features. As Bulmer succinctly puts it, concepts are categories for the organization of ideas and observations (Bulmer, 1984:43).
If a concept is to be employed in quantitative research, it will have to be measured. Once they are measured, concepts can be in the form of independent or dependent variables.
In other words, concepts may explain (explanatory variable) of a certain aspect of the social world, or they may stand for things we want to explain (dependent variable).
Examples of concepts are social mobility, religious orthodoxy, social class, culture, lifestyle, academic achievement, and the like.
An indicator is a measure that is employed to refer to a concept when no direct measure is available. We use indicators to tap concepts that are less directly quantifiable.
To understand what an indicator is, it is worth making a distinction between a measure and an indicator. An indicator can be taken to refer to things relatively unambiguously counted, such as income, age, number of children, etc.
Measures, in other words, are quantities. If we are interested in some of the causes of variation in income, the latter can be quantified in a reasonably direct way.
We use indicators to tap concepts that are less directly quantifiable. If we are interested in the causes of variation in job satisfaction, we will need indicators that will stand for the concept.
These indicators allow job satisfaction to be measured, and we can treat the resulting quantitative information as if it were a measure.
An indicator, then, is something that is devised or already exists, and that is employed as though it were a measure of a concept.
It is viewed as an indirect measure of a concept, like job satisfaction. An IQ is a further example, in that it is a battery of indicators of the concept of intelligence.
A construct is an abstraction or concept that is deliberately invented or constructed by a researcher for a scientific purpose.
In a scientific theory, particularly within psychology, a hypothetical construct is an explanatory variable that is not directly observable.
For example, the concepts of intelligence and motivation are used to explain phenomena in psychology, but neither is directly observable.
A hypothetical construct differs from an intervening variable in that construct has properties and implications which have not been demonstrated in empirical research. These serve as a guide for further research. An intervening variable, on the other hand, is a summary of observed empirical findings.
Cronbach and Meehl (1955) define a hypothetical construct as a concept for which there is not a single observable referent, which cannot be directly observed, and for which there exist multiple referents, but none all-inclusive.
For example, according to Cronbach and Meehl, a fish is not a hypothetical construct because, despite variation in species and varieties of fish, there is an agreed-upon definition for a fish with specific characteristics that distinguish a fish from a bird.
Furthermore, fish can be directly observed.
On the other hand, a hypothetical construct has no single referent; rather, hypothetical constructs consist of groups of functionally related behaviors, attitudes, processes, and experiences.
Instead of seeing intelligence, love, or fear, we see indicators or manifestations of what we have agreed to call intelligence, love, or fear.
Other examples of constructs:
- In Biology: Genes, evolution, illness, taxonomy, immunity
- In Physics/Astrophysics: Black holes, the Big Bang, Dark Matter, String Theory, molecular physics or atoms, gravity, the center of mass
- In Psychology: Intelligence or knowledge, emotions, personality, moods.
Properties of Relationships between Variables
In dealing with relationships between variables in research, we observe a variety of dimensions in these relationships. We discuss a few of them below.
Positive and Negative Relationship
Two or more variables may have positive, negative, or no relationship at all. In the case of two variables, a positive relationship is one in which both variables vary in the same direction.
However, when they vary in opposite directions, they are said to have a negative relationship. When a change in the other variable does not accompany change or movement of one variable, we say that the variables in question are unrelated.
For example, if an increase in his wage rate accompanies one’s job experience, the relationship between job experience and the wage rate is positive.
If an increase in the level of education of an individual decreases his desire for additional children, the relationship is negative or inverse. If the level of education does not have any bearing on the desire, we say that the variables’ desire for additional children’ and ‘education’ are unrelated.
Strength of Relationship
Once it has been established that two variables are indeed related, we want to ascertain how strongly they are related.
A common statistic to measure the strength of a relationship is the so-called correlation coefficient symbolized by r. r is a unit-free measure, lying between -1 and +1 inclusive, with zero signifying no linear relationship.
So far as the prediction of one variable from the knowledge of the other variable is concerned, a value of r= +1 means a 100% accuracy in predicting a positive relationship between the two variables and a value of r = -1 means a 100% accuracy in predicting a negative relationship between the two variables.
So far, we have been discussing only symmetrical relationships in which a change in the other variable accompanies a change in either variable. This relationship does not indicate which variable is the independent variable and which variable is the dependent variable.
In other words, you can label either of the variables as the independent variable.
Such a relationship is a symmetrical relationship. In an asymmetrical relationship, change in variable X (say) is accompanied by a change in variable Y, but not vice versa.
The amount of rainfall, for example, will increase productivity, but productivity will not affect the rainfall. This is an asymmetrical relationship.
Similarly, the relationship between smoking and lung cancer would be asymmetrical because smoking could cause cancer, but lung cancer could not cause smoking.
Indication of a relationship between two variables does not automatically ensure that changes in one variable cause changes in another variable.
It is, however, very difficult to establish the existence of causality between variables. While no one can ever be certain that variable A causes variable B to occur, nevertheless, one can gather some evidence that increases our belief that A leads to B.
In an attempt to do so, we seek the following evidence:
- Is there a relationship between A and B? When such evidence exists, it is an indication of a possible causal link between the variables.
- Is the relationship asymmetrical so that a change in A results in a change in B but not vice-versa? In other words, does A occur before B? If we find that B occurs before A, we can have little confidence that A causes
- Does a change in A result in a change in B regardless of the actions of other factors? Or in other words, is it possible to eliminate other possible causes of B? Can one determine that C, D, and E (say) do not co-vary with B in a way that suggests possible causal connections?
Linear and Non-linear Relationship
A linear relationship is a straight-line relationship between two variables, where the variables vary at the same rate regardless of whether the values are low, high, or intermediate.
This is in contrast with the non-linear (or curvilinear) relationships where the rate at which one variable changes in value, may be different for different values of the second variable.
Whether a variable is linearly related to the other variable or not can simply be ascertained by plotting the K values against X values. If the values, when plotted, appear to lie on a straight line, the existence of a linear relationship between X and Y is suggested.
Height and weight almost always have an approximately linear relationship, while age and fertility rates have a non-linear relationship.