2.3 Lab Activity

2.3.1 Part 1: Understanding R Documentation

Answer the following questions using the documentation for mean():

  1. What package is mean() in?
  2. What named arguments does mean() accept?
  3. Which of the named arguments are required?
  4. How are these three function calls different?
    1. mean(1:10)
    2. mean(1:10, trim = 0)
    3. mean(1:10, 0, FALSE)
  5. What classes of object are acceptable values for x?
  6. How does mean() treat missing values in x?
  7. What is the class and length of the output of mean(x) if:
    1. x is a numeric vector of length 50?
    2. x is a logical vector of length 100?

Why are there four different calls to as.data.frame() in the Usage section of its documentation?

Answer the following questions using the documentation for read.csv():

  1. Why does ?read.csv() open the documentation for read.table()?
  2. What is the difference between read.table(), read.csv(), and read.csv2()?

2.3.2 Part 2: Data Import

Researchers at the Vancouver and Okanagan campuses of The University of British Columbia have conducted a survey of child sleep. The results of this survey are found in the file child-sleep-data.csv. The table below provides names and descriptions for each variable in the data set.

Name Description
id A randomly generated participant ID, unique to each participant.
age The age of the child in the study.
campus The campus from which the participant was recruited (“Point Grey” or “Okanagan”).
sleep_location Where does the child sleep? 1 = co-sleeps with parent(s); 2 = sleeps in own room; 3 = shares a room but does not cosleep.
sleep The parent’s estimate of the average hours of sleep the child receives in a 24-hour period.

Download child-sleep-data.csv and then complete the following steps:

2.3.3 Import and Inspect the Data

  • Import child-sleep-data.csv into R; assign it a meaningful name.
  • Inspect the data.frame using head(), str(), and View().

2.3.3.1 Convert Categorical Variables to Factors

  • Use as.factor() and factor() to convert any categorical variables to factors with the appropriate levels and labels.

2.3.3.2 Subsetting and Logical Expressions

Use [] to subset the data frame and find the answers to following questions/complete the tasks described below. You will also need to use is.na(), max(), min(), mean(), and logical expressions.

  • What campus was the 48th participant recruited at?
  • Return the data for participants who are missing responses on the sleep variable.
  • What proportion of participants are missing data for the sleep variable?
  • On average, how many hours do the children in this sample sleep per day?
  • What is the average daily sleep of children in each of the three conditions?
  • What is the participant ID of the participant(s) who reported the most sleep? The least sleep?

2.3.3.3 Participant Information

Use table() to identify how many, and what proportion of participants belong to each level of the three categorical variables.