Rachael Herman

Regression Model Analysis and Age Prediction (Presentation)

Welcome to the regression model presentation.

For this project, the Telco_Extra_Global.xls was used to evaluate the age and income of 1,000 individuals. Included in the presentation is a scatter plot with linear regression model using 95% confidence interval to evaluate the best fit for predictions (Sturdivant et al., 2016). The ANOVA test was performed in SAS to evaluate the coefficient of determination (R2) and F statistic. Finally, an income prediction is performed for a 27-year-old individual. 
Scatter plots can tell an analyst a lot about the relationship between quantitative variables.  It displays a set of predictor and response variables using the values of those variables (Sturdivant et al., 2016). For this project, the income is the dependent or response variable being predicted; therefore, it is located on the Y axis. The age is the independent or predictor variable used to predict the response and is located on the X axis. Most of the points are clustered between 20 and 50 years of age and under $500,000 annual income.
A linear regression model can be fitted to a scatter plot to help an analyst see if there’s a relationship between two variables. According to Sturdivant et al. (2016), it “is a linear function used to predict the value of one variable using the values of one or more variables” (sect. 2.5). Simply put, a linear regression model describes the relationship between the predictor and response variables, which are age and income, respectively.  In this model with 95% confidence interval, the regression line suggests a positive linear relationship between the two variables.
The ANOVA procedure gives us further insight into the linear regression model. In this code, there are procedures for the scatter plot (PROC SGPLOT) with the REG code to fit a regression line or curve to this plot. Following this procedure, the PROC ANOVA test is performed using age as the classification variable and income=age to model a fit (SAS, n.d.). Results of this test are found on the following slide.
In a linear regression model, R2 and the F statistic values tell us how well the model fits the data (Sturdivant et al., 2016).  R2 demonstrates how close the data fits the regression line as a percentage, and the F test tells us whether this is statistically significant (Minitab Blog Editor, 2013; Sturdivant et al., 2016). Ris measured on a scale of 0-100%, with 0% indicating a linear relationship does not exist and 100% indicating a relationship does exist (Minitab Blog Editor, 2013; Sturdivant et al., 2016). In this scenario, R2  is equal to 23.2984%, the F value is 4.84, and the p-value is less than 0.0001. R2 indicates there is a slight linear relationship, though it is closer to 0% than 100%. The p-value < 0.05; therefore, we reject the null hypothesis that H0: ß1 = 0, in favor of the alternative hypothesis Ha: ß1 ≠ 0. It can be concluded that ß1 is significantly different from 0, so a statistically significant linear relationship exists between income and age. 
Regression models and the ANOVA test can help an analyst make predictions.  Using SAS, the average income for an individual who is 27 years of age is predicted.  As can be seen in the image, the average income for a 27-year-old is $91,000 per year. It is important to note that there are likely additional factors that play into a person’s income, such as education, location, and experience.  However, it is worthwhile to note that there is a positive linear relationship between age and income, so as one ages, income will likely go up as well. 
A linear regression model uses predictor and response variables to determine relationships between variables. Using SAS, a scatter plot with regression line provided a visualization showing a positive linear relationship between the two variables, age and income.  Following this, the ANOVA test was performed to look at the R-squared, F-statistic and p-value. Given the data received in this test, the null hypothesis is rejected in favor of the alternative hypothesis, which states there is a linear relationship between age and income, indicating that as one ages, income goes up.