Home‎ > ‎Assignments‎ > ‎

Homework assignment 3

Due: Tuesday March 27

In this homework, you'll work with the basic statistical methods covered in Chapter 4, and the regression modeling techniques discussed in chapter 6 of the Baayen book.

Please start by downloading the file lastname_firstname_hw3.R  and rename it so that it has your last and first names as appropriate (e.g., ''wittgenstein_ludwig_hw3.R''). You will edit this file, putting the R code you use to complete the assignment and write down comments as requested in the problems. When you are done, please submit this file on Blackboard.

A perfect solution to this homework will be worth 100 points.


Part 1

Problem 1.1 (25 pts). Distributional sleuthing

Given below are six sets of 100 numbers labeled ''a'' through ''f'' that have been generated in various ways using underlying distributions, for example like this: foo = rnorm(100, 42,3.1415).

a = c(47, 47, 47, 48, 49, 52, 53, 53, 53, 54, 55, 55, 55, 56, 56, 57, 58, 58, 58, 59, 59, 59, 60, 60, 61, 61, 61, 61, 62, 62, 62, 62, 63, 63, 63, 63, 63, 64, 64, 64, 64, 64, 65, 65, 65, 65, 65, 65, 65, 65, 66, 66, 67, 67, 67, 67, 67, 67, 68, 69, 69, 69, 70, 70, 70, 70, 71, 71, 71, 73, 73, 74, 74, 74, 76, 76, 76, 76, 76, 76, 77, 77, 77, 78, 79, 79, 79, 79, 79, 79, 79, 80, 81, 82, 83, 84, 84, 85, 86, 90)
b = c(40, 40, 41, 41, 43, 44, 44, 45, 45, 45, 45, 45, 46, 46, 46, 46, 47, 48, 48, 48, 49, 49, 49, 49, 50, 50, 51, 51, 51, 52, 53, 53, 54, 54, 55, 55, 55, 55, 56, 56, 56, 57, 57, 58, 58, 58, 58, 58, 58, 58, 59, 60, 60, 61, 62, 63, 63, 64, 64, 65, 66, 66, 66, 68, 69, 69, 70, 71, 71, 71, 72, 73, 74, 75, 75, 76, 77, 77, 78, 78, 78, 79, 79, 80, 82, 82, 82, 83, 83, 83, 84, 85, 85, 87, 87, 87, 88, 88, 88, 89)
c = c(43, 43, 43, 43, 43, 43, 43, 43, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 52, 52, 52, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 57, 57, 58, 59, 59, 59, 61, 61, 61, 61, 62, 62, 62, 62, 62, 63, 64, 65, 66, 67, 72, 72, 73, 73, 80, 80)
d = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 12, 12, 12, 14, 15, 15, 16, 17, 19, 24, 24, 25, 27, 28, 31, 33, 34, 52, 55, 64)
e = c(56, 56, 58, 58, 59, 59, 59, 60, 60, 61, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 63, 63, 63, 64, 64, 64, 64, 65, 65, 65, 65, 65, 65, 65, 65, 66, 66, 66, 66, 66, 66, 66, 67, 67, 67, 67, 67, 67, 68, 68, 68, 68, 68, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 70, 70, 70, 70, 70, 70, 71, 71, 71, 71, 71, 72, 72, 72, 72, 72, 72, 72, 73, 73, 73, 73, 73, 74, 74, 74, 75, 75, 75, 76, 76, 76, 77, 78, 79)
f = c(15, 15, 19, 23, 26, 26, 28, 28, 30, 30, 31, 31, 31, 32, 34, 35, 36, 37, 38, 38, 39, 40, 41, 41, 42, 42, 42, 42, 43, 44, 44, 44, 44, 44, 45, 45, 46, 47, 47, 48, 50, 51, 54, 54, 55, 57, 58, 63, 68, 71, 75, 75, 76, 77, 78, 79, 79, 80, 81, 81, 81, 81, 82, 82, 82, 83, 83, 83, 83, 83, 84, 84, 85, 85, 85, 86, 87, 87, 87, 87, 88, 88, 88, 88, 89, 89, 90, 90, 90, 90, 90, 90, 91, 91, 92, 92, 92, 93, 95, 97)


You can cut-and-paste the above R code directly into R.

Your job is to determine, for each set, whether they were generated from a normal or uniform distribution, or whether they are neither of those.

(a) [5 pts] Graphically inspect the sets by plotting their densities and histograms to form initial hypotheses about their distributions before applying the tests requested below. (It is recommended that you set your plotting area to show all six graphs at the same time so you can compare more easily.) Write down your hypotheses, together with your motivation for suggesting them.

(b) [10 pts] Do quantile-quantile plots (see Baayen section 4.1.1 for a refresher) against the normal and uniform distributions (you can use min and max of the values in the set to determine the upper and lower bounds for the uniform distribution). The uniform distribution has the same four functions as other distributions: dunif(), punif(), qunif(), runif(). Write down your refined hypotheses based on these.

(c) [10 pts] Based on your hypotheses from (a) and (b), apply the relevant tests for distributions discussed in Baayen's book, section 4.1.1 to confirm those hypotheses. If the outcomes differ from your predictions in a, please say why you think this is.


Problem 1.2 (30 pts). Measuring heights

The heights of a group of men in Austin was measured in 1999 and the mean of these measurements was found to be 70.1 inches. You measure 100 men today in Austin and get the following measurements:

heights.2009 = c(75.1, 70.7, 71.3, 70.6, 70.7, 71.4, 72.7, 70.5, 68.7, 68.4, 68.8, 74.7, 65.5, 70.5, 70.6, 71.7, 74, 72.7, 68.6, 66.2, 66, 68.8, 69.8, 72.5, 71, 71.1, 62.4, 69.4, 69.5, 68.6, 74.9, 68.2, 75.6, 74.8, 69.3, 66.6, 73.1, 70.3, 67.6, 67, 68.8, 69, 69.6, 68.9, 77.1, 71.1, 70.3, 69.7, 74.9, 67.1, 73.4, 73.6, 74.8, 72.3, 74.4, 72.5, 74.9, 71.1, 68.3, 69.7, 77, 71.3, 73.8, 69.4, 71.9, 67, 73.4, 69.1, 66.8, 69.5, 74.1, 70.2, 76, 69.6, 72.6, 67.9, 68.8, 73.3, 73.7, 73.2, 72.1, 71.7, 71.7, 66.2, 70.5, 71.4, 71.3, 70.6, 75.6, 65.9, 70.5, 73, 66, 68.9, 67.2, 74.3, 73.4, 73.6, 73.6, 73.1)


Based on this sample, are the heights of Austinites significantly different (at the 1% significance level) in 2009 than they were in 1999? Use the one-sample t-test to do this.  There is a function ''t.test()'' in R to do this of course, but for this problem you'll need to calculate the values for yourself using the raw data, computing the t-statistic, and using the probability functions for the t distribution (''dt'', ''pt'', ''qt'', and ''rt'') as appropriate to determine significance and confidence intervals. You may use ''t.test'' to check your answers.

As a reminder, here is the relevant formula for computing the t statistic (also, see Hinton page 66 for formula and previous pages for discussion):

    mean(x) - mu
t = ——————————————
      s/sqrt(n)


(a) [15 pts] Write a function ''t.value'' that computes this, given a vector of values and mu. Remember that you can specify a function t.value as follows: "t.value <- function(ARGUMENTS){ CODE }" where CODE is the R code used to perform calculations, and ARGUMENTS is what you are "passing" to the function (e.g. the data and the mu). The last line of CODE is what you will see as the function's output.

(b) [5 pts] Use ''t.value'' to calculate t for the problem and determine the level of significance using the appropriate function for the t distribution.

(c) [10 pts] Calculate the 99% confidence interval for the 2009 heights. See Hinton pages 69-71 for discussion and how to compute the confidence interval.


Part 2

(Warning: Bogus made-up experiment description follows.)

During several technical scientific talks, the expertise of attendees in the talk subject matter and the attention they paid to the talks were scored, both on real-valued scales. Values were collected for 1000 professors, 1000 graduate students, and 1000 undergraduates. The recorded values are available in the file talks.txt, with the ''Class'' column encoding whether the participant is a professor, graduate student, or undergraduate student. You'll use this data to answer the questions in this part of the homework.

Read in ''talks.txt'' as a dataframe called ''talks''.


Problem 2.1 (20 pts). Attentional differences

(a) [10 pts] Use ''bwplot'' to visualize the spread in values for ''Attention'' for each ''Class'', all on the same plot.

(b) [10 pts] Use the t-test to determine whether the ''Attention'' values of the graduate students and the undergraduates differ at a significance level of 0.01. Now compare the graduate students and the professors in the same way.


Problem 2.2 (25 pts). Regression

The above problem should have convinced you that the Attention values for professors, graduate students, and undergraduate students are not all the same. However, perhaps Expertise also influences Attention, so let's explore that with regression.

(a) [10 pts] Plot Expertise versus Attention using black circles (the default), with Expertise on the x-axis and Attention on the y-axis. The x-axis should be labeled ''Expertise'' and the y-axis should be labeled ''Attention''.

(b) [15 pts] Perform linear regression with only a single linear component. Add the resulting model to the plot.



ċ
lastname_firstname_hw3.r
(1k)
Katrin Erk,
Mar 19, 2012, 5:31 PM
ċ
talks.txt
(93k)
Katrin Erk,
Mar 19, 2012, 3:40 PM
Comments