TDM 10100: R Project 8 — 2024
Motivation: We will learn about how user-defined functions work in R.
Context: Although R has lots of built-in functions, we can design our own functions too!
Scope: We start with some basic functions, just one line functions, to demonstrate how powerful they are.
Dataset(s)
This project will use the following dataset(s):
-
/anvil/projects/tdm/data/death_records/DeathRecords.csv -
/anvil/projects/tdm/data/beer/reviews_sample.csv -
/anvil/projects/tdm/data/election/itcont1980.txt -
/anvil/projects/tdm/data/flights/subset/1990.csv -
/anvil/projects/tdm/data/olympics/athlete_events.csv
Example 1:
Finding the average weight of Olympic athletes in a given country.
avgweights <- function(x) {mean(myDF$Weight[myDF$NOC == x], na.rm = TRUE)}
Example 2:
Finding the percentages of school metro types in a given state.
myschoolpercentages <- function(x) {prop.table(table(myDF$"School Metro Type"[myDF$"School State" == x]))}
Example 3:
In the 1980 election data, finding the sum of the donations in a given state.
mystatesum <- function(x) {sum(myDF$TRANSACTION_AMT[myDF$STATE == x])}
Example 4:
Finding the average number of stars for a given author of reviews.
myauthoravgstars <- function(x) {mean(myDF$stars[myDF$author == x])}
Questions
|
As before, please use the |
Question 1 (2 pts)
Consider this user-defined function, which makes a table that shows the percentages of values in each category:
makeatable <- function(x) {prop.table(table(x, useNA="always"))}
If we do something like this, with a column from a data frame:
makeatable(myDF$mycolumn)
Then it is the same as running this:
prop.table(table(myDF$mycolumn, useNA="always"))
In other words, makeatable is a user-defined function that makes a table, including all NA values, and expresses the result as percentages. That is what the prop.table does here.
Now consider the DeathRecords data set:
/anvil/projects/tdm/data/death_records/DeathRecords.csv
-
Try the function
makeatableon theSexcolumn of the DeathRecords. -
Also try the function
makeatableon theMaritalStatuscolumn of the DeathRecords.
-
Use the
makeatablefunction to display table of values from theSexcolumn of the DeathRecords. -
Use the
makeatablefunction to display table of values from theMaritalStatuscolumn of the DeathRecords.
Question 2 (2 pts)
Define a function called teenagecount as follows:
teenagecount <- function(x) {length(x[(x >= 13) & (x <= 19) & (!is.na(x))])}
-
Try this function on the
Agecolumn of the DeathRecords. -
Also try this function on the
Agecolumn of the file/anvil/projects/tdm/data/olympics/athlete_events.csv
-
Display the number of teenagers in the DeathRecords data.
-
Display the number of teenagers in the Olympics Athlete Events data.
Question 3 (2 pts)
The nchar function gives the number of characters in a string. The which.max function finds the position of the maximum value. Define the function:
longesttest <- function(x) {x[which.max(nchar(x))]}
-
Use the function
longesttestto find the longest review in thetextcolumn of the beer reviews data set/anvil/projects/tdm/data/beer/reviews_sample.csv -
Also use the function
longesttestto find the longest name in theNAMEcolumn of the 1980 election data:
library(data.table)
myDF <- fread("/anvil/projects/tdm/data/election/itcont1980.txt", quote="")
names(myDF) <- c("CMTE_ID", "AMNDT_IND", "RPT_TP", "TRANSACTION_PGI", "IMAGE_NUM", "TRANSACTION_TP", "ENTITY_TP", "NAME", "CITY", "STATE", "ZIP_CODE", "EMPLOYER", "OCCUPATION", "TRANSACTION_DT", "TRANSACTION_AMT", "OTHER_ID", "TRAN_ID", "FILE_NUM", "MEMO_CD", "MEMO_TEXT", "SUB_ID")
-
Print the longest review in the
textcolumn of the beer reviews data set/anvil/projects/tdm/data/beer/reviews_sample.csv -
Print the longest name in the
NAMEcolumn of the 1980 election data.
Question 4 (2 pts)
-
Create your own function called
mostpopulardatethat finds the most popular date in a column of dates, as well as the number of times that date occurs. -
Test your function
mostpopulardateon thedatecolumn of the beer reviews data/anvil/projects/tdm/data/beer/reviews_sample.csv -
Also test your function
mostpopulardateon theTRANSACTION_DTcolumn of the 1980 election data.
-
a. Define your function called
mostpopulardate -
b. Use your function
mostpopulardateto find the most populardatein the beer reviews data/anvil/projects/tdm/data/beer/reviews_sample.csv -
c. Also use your function
mostpopulardateto find the most popular transaction date from the 1980 election data.
Question 5 (2 pts)
Define a function called myaveragedelay that takes a 3-letter string (correspding to an airport code) and finds the average departure delays (after removing the NA values) from the DepDelay column of the 1990 flight data /anvil/projects/tdm/data/flights/subset/1990.csv for flights departing from that airport.
Try your function on the Indianapolis "IND" flights. In other words, myaveragedelay("IND") should print 5.96977225672878 because the flights with Origin airport "IND" have an average departure delay of 5.9 minutes.
Try your function on the New York City "JFK" flights. In other words, myaveragedelay("JFK") should print 11.8572741063607 because the flights with Origin airport "JFK" have an average departure delay of 11.8 minutes.
-
a. Define your function called
myaveragedelay -
b. Use
myaveragedelay("IND")to print the average departure delays for flights with Origin airport "IND". -
c. Use
myaveragedelay("JFK")to print the average departure delays for flights with Origin airport "JFK".
Submitting your Work
Now you know how to write your own functions! Please let us know if you need assistance with this project.
-
firstname_lastname_project8.ipynb
|
You must double check your You will not receive full credit if your |