### Instructions

You have 3 hours to complete the final exam. Grades will be given based on showing the R code, output, explaining the output, using visuals and table where applicable, and determining the reasoning behind your inputs to the R function used. Try to explain your answers using non-statistical language. As an example, instead of saying R^2 is .8, better way of saying the same is to let the user know that 80% of the variance in the Y is exampled by X. Other items to remember (I might be repeating but it is important)

- You will have to decide and explain any observations (rows) you are taking out to answer each question. These might be nulls, zeros, outliers, etc.
- Use charts and graphs and explain what you are seeing. Focus on the insights.
- When you run a function and see an output, explain what the output is telling you. Also mention what input you supplied to the function and why.
- Include your R code in your submission.

Points will be deducted for not following these instructions.

### Data Dictionary

= Unique Identifier of the survey respondent__ID__= Hours in a week the person swims__Swim__= Hours in a week the person runs__Run__= Hours in a week the person rides their bicycle__Bike__= Body Mass Index. A measure that relates body weight to height. BMI is sometimes used to measure total body fat and whether a person is a healthy weight. Excess body fat is linked to an increased risk of some diseases including heart disease and some cancers.__BMI__= Answer to the survey question, Do you exercise to improve your health? Y = Yes, I do exercise to improve my health. N = No, I do not exercise to improve my health.__Exercise to Improve Health__

### Questions

- Show the mean, median, IQR(Interquartile Range) for each column? Show both as a table and visual. Explain what you are seeing?
- What is the mean, variance, and standard deviation of each column? Explain what you are seeing.
- Predict what the population mean and the range of each exercise type. Choose 90% as the confidence interval.
- If a person swims for 3 hours a week, how do they compare to this sample? If the same person runs for 1 hour a week, how do they compare to this sample? If the same person bikes 8 hours per week, how do they compare to this sample?
- Create a new column called Total Exercise Hours per week. Answer the same 3 questions above?
- Predict the population mean and the range of total exercise. Choose 90% as the confidence interval.
- Do correlations of the columns. Show correlations as table and visuals(heatmap). Explain what you are seeing. Remember, do all the correlations you think are useful to see relationships in the survey responses?
- For the two questions in the bullets below, explain what null and alternative hypotheses would be in this case, what the critical value would be and the value of the statistics and p-value. Explain your findings in a non-statistical way.
- Looking at the respondents who consider exercise important or not, is the probability of doing more than 4 hours a week the same regardless of the type of exercise?
- Looking at the respondents who consider exercise important or not, is the probability of doing more than 5 hours a week same for total exercise hours?

- Doctors think people who are serious about health, exercise for 8 hours on average. You want to check if the thinking of the doctors is valid. Based on this survey, what is your conclusion? State null and alternate hypothesis, list all the known sample and populations parameters, what test you are going to use and why? Explain your outcome
- You want to see if there is a difference in average exercise time between the types of exercises based on this survey data. What does the data show? State your null and alternate hypothesis. What test are you going to run? What is the critical value? What is the test statistic and p-value? Explain your findings in a non-statistical way if possible.
- You want to see if you can predict BMI based on how much a person exercise. What would you pick as your independent variable and why? Explain how you came to your conclusion of the independent variable. Explain what your model output is showing you in a non-statistical way. Perform residual analysis. What is residual analysis indicating?
- Based on all twelve questions above, what is your understanding and takeaway from this survey data?