\documentclass[11pt]{article} \usepackage{hyperref} \usepackage{Sweave} \begin{document} \title{Assignment 2: Computer exercises} These exercises are to show you computer techniques. Do simple prints to confirm that they worked for you and short write ups. So no detailed descriptions, but a sentence here and there is nice. (Include your R script.) Here are some useful R commands: \href{homework_2_help.R}{.R}. \begin{enumerate} \item Dogleg construction and analysis: \begin{enumerate} \item Spend 20 minutes trying to find a new historical temperature series. (If you can't find one, use \href{nh-temp-series.txt}{New Hampshire} which was discussed in class.). Fit it using a dogleg at 1900 and print it. (Model 1) \item Play around with loess.smooth to get a variety of different possible smooths of your series. The idea is to find one that looks like a straight line, and one that looks like a ``connect the dots'' pictures and one that you actually like. \item Create doglegs for 1800 and 1700 also (and 1600 if you have it). Fit a multiple regression to these. Find a smooth that looks close to your fit. Print both predictions on one graph. (Model 2) \item Fit simple regressions to all of your dog legs and save the predictions. Compute an average prediction (pred1 + pred2 + pred3 + pred4)/4. Find a smooth that looks close to the average prediction. Print both predictions on the same graph. (Model 3) \item Fit a 3rd degree polynomial to your data. As usual, find a smooth that looks similar to your 3rd degree polynomial. Print both predictions. (Model 4) \end{enumerate} \item Predictions: You can predict a point by adding a fake row to your data set (see \href{homework_2_help.R}{samples}.) or by a command similar to: \begin{Schunk} \begin{Sinput} > predict(model, data.frame(year = 2100), se.fit = TRUE) \end{Sinput} \end{Schunk} For each of the 4 models you created above, predict the temperature at 2100. For the first, second and 4th model, you should be able to create a prediction interval. Which of these 3 intervals makes the most dramatic claim about temperature? \item Residuals: Create residuals from your favorite model of temperature above. \begin{enumerate} \item Do the following checks (use regression): \begin{itemize} \item residuals vs time squared \item residuals squared vs time \item residuals vs previous residuals \item residuals squared vs previous residuals squared \item residuals vs any other columns you might have in your data table \item etc \end{itemize} \item Using the $1/21^2, 1/22^2, 1/23^2$... rule, which of the above tests (if any) show problems with the data? (see the class \href{class_residuals.tex.html}{notes} on residuals for discussion of this methodology.) \item If you have a problem, do we know how to fix it yet? \end{enumerate} \item Hetroskadasticity: Do a regression of income vs ``number of Runs''. (aka X3) from the \href{http://www4.stat.ncsu.edu/~boos/var.select/baseball.html}{baseball data}. \begin{verbatim} http://www4.stat.ncsu.edu/~boos/var.select/baseball.txt \end{verbatim} \begin{enumerate} \item Simple regression: \begin{itemize} \item By eye, you can see the problem of hetroskadasticity. Make the appropiate plot and confirm that it is significant. Clearly if it were the first test you were going to do, it would be significant. If it were the second one, it would also be significant. How many OTHER tests would you have to do before this test, so that it would no longer be considered significant? \item For a player who has RUNS of 100, how many more dollars would he earn if he increased it to 101? Give a confidence interval based on your simple regression. \end{itemize} \item log-log: Do a model of log(income) vs log(RUNS): \begin{itemize} \item Save the residuals. Check if they are hetroskadastic. \item For a player who has an RUNS of 100, how many more dollars would he earn if he increased it to 101? Give a confidence interval. (HINT: this isn't easy--it will take a bit of calculation. Ask in class.) \end{itemize} \item Weighted least squares: Do a model of income/RUNS vs 1/RUNS: \begin{itemize} \item Save the residuals. Check if they are hetroskadastic. \item Use the ``weight ='' parameter in the lm() of R to run a weighted regression. You should get the exact same coefficients and standard errors as you got in the income/RUNS model. If not, try it again using $1/RUNS^2$ as your weighting instead of $1/RUNS$. \item For a player who has an of 100, how many more dollars would he earn if he increased it to 101? Give a confidence interval. \end{itemize} \item Bootstrap a simple regression. \begin{itemize} \item YEA! R DOES THIS WELL!!!! \item Use the bootstrap command to estimate the accuracy of the slope on the linear regression model for income vs runs. \item For a player who has an RUNS of 100, how many more dollars would he earn if he increased it to 101? Give a confidence interval based on your bootstrap standard deviation. (This is a very hard problem!) \end{itemize} \end{enumerate} \end{enumerate} \end{document}