\documentclass[11pt]{article}
\usepackage{hyperref}
\usepackage{Sweave}
\begin{document}
\title{Assignment 2: Computer exercises}

These exercises are to show you computer techniques.  Do simple prints
to confirm that they worked for you and short write ups.  So no
detailed descriptions, but a sentence here and there is nice.  (Include your
R script.)  Here are some useful R commands: \href{homework_2_help.R}{.R}.

\begin{enumerate}
\item Dogleg construction and analysis:
\begin{enumerate}
\item Spend 20 minutes trying to find a new historical temperature
series.  (If you can't find one, use \href{nh-temp-series.txt}{New
Hampshire} which was discussed
in class.).   Fit it using a dogleg at 1900 and print it. (Model 1)
\item Play around with loess.smooth to get a variety of different
possible smooths of your series.  The idea is to find one that looks
like a straight line, and one that looks like a connect the dots''
pictures and one that you actually like.
\item Create doglegs for 1800 and 1700 also (and 1600 if you have
it).  Fit a multiple regression to these.  Find a smooth that looks
close to your fit.  Print both predictions on one graph.  (Model 2)
\item Fit simple regressions to all of your dog legs and save the
predictions.  Compute an average prediction (pred1 + pred2 + pred3 +
pred4)/4.  Find a smooth that looks close to the average prediction.
Print both predictions on the same graph. (Model 3)
\item Fit a 3rd degree polynomial to your data.  As usual, find a
smooth that looks similar to your 3rd degree polynomial.  Print both
predictions. (Model 4)
\end{enumerate}

\item Predictions:  You can predict a point by adding a fake row to
your data set (see \href{homework_2_help.R}{samples}.) or by a command similar to:
\begin{Schunk}
\begin{Sinput}
> predict(model, data.frame(year = 2100), se.fit = TRUE)
\end{Sinput}
\end{Schunk}
For each of the 4 models you created above, predict the temperature at
2100.   For the first, second and 4th model, you should be able to
create a prediction interval.  Which of these 3 intervals makes the

\item Residuals: Create residuals from your favorite model of
temperature above.
\begin{enumerate}
\item Do the following checks (use regression):
\begin{itemize}
\item residuals vs time squared
\item residuals squared vs time
\item residuals vs previous residuals
\item residuals squared vs previous residuals squared
\item residuals vs any other columns you might have in your data table
\item etc
\end{itemize}
\item Using the $1/21^2, 1/22^2, 1/23^2$... rule, which of the
above tests (if any) show problems with the data?  (see the class
\href{class_residuals.tex.html}{notes} on residuals for discussion of
this methodology.)
\item If you have a problem, do we know how to fix it yet?
\end{enumerate}

\item Hetroskadasticity: Do a regression of income vs number of
Runs''. (aka X3) from the \href{http://www4.stat.ncsu.edu/~boos/var.select/baseball.html}{baseball data}.
\begin{verbatim}
http://www4.stat.ncsu.edu/~boos/var.select/baseball.txt
\end{verbatim}
\begin{enumerate}
\item Simple regression:
\begin{itemize}
\item By eye, you can see the problem of hetroskadasticity.  Make the
appropiate plot and confirm that it is significant.  Clearly if it
were the first test you were going to do, it would be significant.  If
it were the second one, it would also be significant.  How many OTHER
tests would you have to do before this test, so that it would no
longer be considered significant?
\item For a player who has RUNS of 100, how many more dollars would
he earn if he increased it to 101?  Give a confidence interval based
\end{itemize}
\item log-log:  Do a model of log(income) vs log(RUNS):
\begin{itemize}
\item  Save the residuals.  Check if they are hetroskadastic.
\item For a player who has an RUNS of 100, how many more dollars would
he earn if he increased it to 101?  Give a confidence interval. (HINT:
this isn't easy--it will take a bit of calculation.  Ask in class.)
\end{itemize}
\item Weighted least squares:  Do a model of income/RUNS vs 1/RUNS:
\begin{itemize}
\item  Save the residuals.  Check if they are hetroskadastic.
\item Use the weight ='' parameter in the lm() of R to run a
weighted regression.  You should get the exact same
coefficients and standard errors as you got in the income/RUNS model.  If not, try it again using
$1/RUNS^2$ as your weighting instead of $1/RUNS$.
\item For a player who has an  of 100, how many more dollars would
he earn if he increased it to 101?  Give a confidence interval.
\end{itemize}
\item Bootstrap a simple regression.
\begin{itemize}
\item YEA!   R DOES THIS WELL!!!!
\item Use the bootstrap command to estimate the accuracy of the slope
on the linear regression model for income vs runs.
\item For a player who has an RUNS of 100, how many more dollars would
he earn if he increased it to 101?  Give a confidence interval based
on your bootstrap standard deviation.  (This is a very hard problem!)
\end{itemize}
\end{enumerate}
\end{enumerate}
\end{document}