Analysis of Z-Scores for the mtcars Dataset

statistics analytics hypothesis testing

May 8, 2022

For this project, I will use the R program to analyze data in the mtcars dataset. The dataset information was called first, after which the string_vector code was used to analyze eight of the ten variables. Following this code, the sapply function calculated the mean, standard deviation, and maximum values for those variables. Individual z-scores of the maximum values were calculated separately using the formula X – mean / standard deviation for each variable (Sturdivant et al., 2016). The z-score tells analysts about outliers, or unusual values outside the normal distribution based on the mean or average for that variable. Demonstration of successful variable calculations in R and z-score interpretations are included below.  

Figure 1

Dataset with Calculated Mean, Standard Deviation, and Max Values

Figure 2

Z-scores of Max Values in mtcars Dataset.

Z-Score Interpretation

The mean, standard deviation, and maximum values are necessary to calculate the z-scores for the maximum values of the eight selected variables in the mtcars dataset. The z-score can help the analyst identify outliers when comparing data from unimodal and symmetric distributions (Sturdivant et al., 2016). Using the R program, these values were calculated and indicate several unusual values over 2 (Sturdivant et al., 2016). The unusual values are in mpg, hp, drat, wt, qsec, and carb variables, indicating they are more than 2 standard deviations away from—and in this case, above—the mean of that variable.

Reference

Sturdivant, R., Pardoe, I., Berrier, I., & Watts, K. (2016). Statistics for Data Analytics. zyBook [online].