Apply Functions
tapply
The documentation definition for tapply is a bit more specific than the others, where the arguments are now (X, INDEX, FUN), with X being an object where the split function applies, INDEX is a factor by which X is grouped, and FUN is function as before.
To simplify this definition, we can say tapply applies FUN to X when X is grouped by INDEX.
Examples
Using the Iowa liquor sales file, use fread to read all 27 million rows of the data set again, but this time, only read in the columns "Zip Code", "Category Name", "Sale (Dollars)." Find the 10 "Zip Code" values that have the largest sum of "Sale (Dollars)" altogether, and give those "Zip Code" values and each of their sums of "Sale (Dollars)".
Click to see solution
# read in data
iowa2 <- fread("/anvil/projects/tdm/data/iowa_liquor_sales/iowa_liquor_sales.csv", select=c("Zip Code", "Category Name", "Sale (Dollars)"))
zip_sales <- tapply(iowa2$`Sale (Dollars)`, iowa2$`Zip Code`, sum)
head(sort(zip_sales, decreasing=TRUE), 10)
50320
132861227.43
52402
108460935.17
52240
106827908.74
50266
95956448.74
51501
84485599.04
52241
80224356.18
50613
70716357.28
50311
65407916.64
52722
63447651.28
50021
61328202.38
Using the Iowa liquor sales file, find the 10 "Category Name" values that have the largest sum of "Sale (Dollars)" altogether, and give those "Category Name" values and each of their sums of "Sale (Dollars)".
Click to see solution
# read in data
iowa2 <- fread("/anvil/projects/tdm/data/iowa_liquor_sales/iowa_liquor_sales.csv", select=c("Zip Code", "Category Name", "Sale (Dollars)"))
category_sales <- tapply(iowa2$`Sale (Dollars)`, iowa2$`Category Name`, sum)
head(sort(category_sales, decreasing=TRUE), 10)
CANADIAN WHISKIES
457612891.06
AMERICAN VODKAS
380307151.309999
STRAIGHT BOURBON WHISKIES
257794861.83
SPICED RUM
254362805.42
WHISKEY LIQUEUR
199736754.69
IMPORTED VODKAS
183082358.92
TENNESSEE WHISKIES
162676709.12
100% AGAVE TEQUILA
124223944.31
BLENDED WHISKIES
109152590.55
IMPORTED BRANDIES
88413645.9