In the previous note on assiciation of variables, we have introduced the need to study the association of variables using different tools. Moreover, this note will start with graphical tools to measure the degree of association between variables.
A very useful graph for visualizing the relationship between two quantitative variables is a scatterplot. It provides first-hand visual information about the nature and degree of relationship between two variables. These relationships could be linear, non-linear, or no relationship at all.
Bivariate Scatterplot
Scatterplot shows the relationship between two quantitative variables measured for the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as a point on the graph.
It reveals the nature and tread of a possible relationship, mainly direction, form, and strength. To analyze and understand the relationship between two variables, we should have a basic understanding of various directions, forms, and degrees of strength. We will cover the same in the next section.
Direction of relationship
Direction is one of the important components to understand the relationship between two variables.
- Two variables have a positive association when above-average values of one tend to accompany above-average values of the other, and when below-average values also tend to occur together.
- Two variables have a negative association when above-average values of one tend to accompany below-average values of the other.
Form of relationship
Another important component to a scatterplot is the form of the relationship between the two variables. It can be in multiple forms such as linear, non-linear and random form.
If we look at the pattern on the left side of the above graph, we can observe that when x increases, y increases, and vice versa. On the other hand, in the above right graph, we can not find any strict pattern. On that diagram, at the starting, it increases, then decreases, then increases, and so on. Thus, it does not show any clear relationship between x and y variables. This type of relationship is called a non-linear relationship.
Strength and trend of linear relationship
The strength and relationship reveal the association between two variables, and it could be a strong positive, strong negative, moderate positive, and moderate negative relationship.
Strong positive linear relationship
As we can see in the fig-1 of the above diagram, the individual data points are concentrated in a band with a smaller width and form a straight line, and we can observe a trend. As the data points of x increases, y increases too. It shows a strong positive linear relationship.
Strong negative linear relationship
As we can see in the fig-2 of the above diagram, the individual data points are concentrated in a band with a smaller width and form a straight line. But it follows an inverse relationship between x and y, such as when x increases, y decreases, and vice versa. It shows a strong negative linear relationship.
Moderate positive linear relationship
As we can see in the fig-3 of the above diagram, the individual data points are concentrated in a band with a higher spread than fig-1 and form a straight line. However, it still maintains a positive trend such as x increases, y increases. That is the reason it is called a moderate positive linear relationship.
Moderate negative linear relationship
As we can see in the fig-4 of the above diagram, the individual data points are concentrated in a band with a higher spread and form a straight line. But it follows an inverse relationship between x and y, such as when x increases, y decreases, and vice versa. It shows a moderate negative linear relationship.
No clear relationship
It may be entirely possible that the individual data points do not show any relationship, and all the data points are randomly spread on the 2d plane.
Scatter plots with a smooth curve
The smoothness is based on the concept of LOESS, which is a locally weighted scatter-plot smoothing method. It uses local polynomial regression fitting. To fit a polynomial surface determined by one or more numerical predictors, using local fitting. A scatter plot along with a fitted line will provide information on the trend or relationship between them. Curve fitting can be done using various methods. One of the ways is Gaussian fitting by the least-squares method.
In the earlier note, we have seen a frequency density plot using the kernel estimates to create the frequency density curve using the kernel functions, which have nice statistical properties for univariate analysis. Similarly, when we have more than one variable, we can define the joint probability density functions. These kernel functions can be defined in the multivariate setup. These are mainly called two-dimensional (2D) kernel density estimates.
References
- Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.
- Moore, D. S., Notz, W. I, & Flinger, M. A. (2013). The basic practice of statistics (6th ed.). New York, NY: W. H. Freeman and Company.
211 total views, 1 views today