Data Transformations with the WHIN Dataset
This example is from TDM 102 Project 9 Spring 2024.
These example(s) depend on the database:
-
/anvil/projects/tdm/data/whin/weather.parquet
Learn more about the dataset here.
2a. Find out how many null records exist in myDF, within each individual column. (Your answer should specify, for each column, how many null records are in that column.)
myDF.isnull().sum()
| Column Name | Null Count |
|---|---|
station_id |
0 |
latitude |
0 |
longitude |
0 |
name |
0 |
observation_time |
0 |
temperature |
4176 |
temperature_high |
4123 |
temperature_low |
4123 |
humidity |
4162 |
solar_radiation |
48104 |
solar_radiation_high |
12828 |
rain |
0 |
rain_inches_last_hour |
0 |
wind_speed_mph |
4032 |
wind_direction_degrees |
95545 |
wind_gust_speed_mph |
0 |
wind_gust_direction_degrees |
94216 |
pressure |
0 |
soil_temp_1 |
114043 |
soil_temp_2 |
114041 |
soil_temp_3 |
114037 |
soil_temp_4 |
114036 |
soil_moist_1 |
114124 |
soil_moist_2 |
114170 |
soil_moist_3 |
114174 |
soil_moist_4 |
114150 |
2b. Now count the total number of null values in the entire data frame myDF. (In other words, add up the values from all of the counts in part 2a.)
myDF.isnull().sum().sum()
1184084
2c. Drop rows with any null values in myDF. Save the resulting cleaned data set into a new DataFrame called myDF_cleaned.
myDF_cleaned = myDF.dropna()
2d. Just to make sure that you did this properly, check myDF_cleaned carefully: Are there any null values remaining in myDF_cleaned? (There should not be.) How many rows and columns are in myDF_cleaned?
myDF_cleaned.isnull().sum().sum()
0