How to download precipitation data using rnoaa - r

I am new to the 'rnoaa' R package. I am wondering how do I find stationsid names to identify stations. I am interested in downloading hourly or daily precipitation data from 2011 to 2020 from the Prince William Sound Alaska area. I looked here: https://www.ncdc.noaa.gov/cdo-web/search but it seems to have data only up to 2014. Could someone give me a hint on what rnoaa function to use to download the desired rainfall data?
I read that the following rnoaa function:
cpc_prcp(date = "1998-04-23", drop_undefined = TRUE)
However, I don't know what to include inside the function to get the data that I am looking for and also the range of dates (2011 to 2020)

You could try this workflow:
An internet search gives the latitude and longitude for Prince William Sound Alaska area.
library(rnoaa)
# create a data frame for Prince William latitude and longitude
lat_lon_df <- data.frame(id = "pw",
lat = 60.690545,
lon = -147.097055)
# find 10 closest monitors to Prince William
mon_near_pw <-
meteo_nearby_stations(
lat_lon_df = lat_lon_df,
lat_colname = "lat",
lon_colname = "lon",
var = "PRCP",
year_min = 2011,
year_max = 2020,
limit = 10,
)
mon_near_pw
#> $pw
#> # A tibble: 10 x 5
#> id name latitude longitude distance
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 USC00501240 CANNERY CREEK 61.0 -148. 42.9
#> 2 USC00509747 WALLY NOERENBERG HATCHERY 60.8 -148. 55.1
#> 3 USS0048L06S Esther Island 60.8 -148. 55.3
#> 4 USC00505604 MAIN BAY 60.5 -148. 57.6
#> 5 USS0046M04S Sugarloaf Mtn 61.1 -146. 61.1
#> 6 USC00509687 VALDEZ 61.1 -146. 62.4
#> 7 USW00026442 VALDEZ WSO 61.1 -146. 63.4
#> 8 US1AKVC0005 VALDEZ 3.6 ENE 61.1 -146. 66.3
#> 9 USC00509685 VALDEZ AIRPORT 61.1 -146. 67.3
#> 10 USC00502179 CORDOVA WWTP 60.5 -146. 74.0
# extract precipitation data for the first location
pw_prcp_dat <-
meteo_pull_monitors(
monitors = mon_near_pw$pw$id[1],
date_min = "2011-01-01",
date_max = "2020-12-31",
var = "PRCP"
)
head(pw_prcp_dat)
#> # A tibble: 6 x 3
#> id date prcp
#> <chr> <date> <dbl>
#> 1 USC00501240 2011-01-01 704
#> 2 USC00501240 2011-01-02 742
#> 3 USC00501240 2011-01-03 211
#> 4 USC00501240 2011-01-04 307
#> 5 USC00501240 2011-01-05 104
#> 6 USC00501240 2011-01-06 0
# out of curiosity have plotted monthly summary of precipitation.
# For metadata see: https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt
# PRCP = Precipitation (tenths of mm)
library(dplyr)
library(lubridate)
library(ggplot2)
pw_prcp_dat %>%
mutate(year = year(date),
month = month(date)) %>%
group_by(year, month) %>%
summarise(prcp = sum(prcp, na.rm = TRUE) / 10) %>%
ggplot(aes(factor(month), prcp))+
geom_col()+
facet_wrap(~year)+
labs(y = "Precipitation [mm]",
x = "Month")+
theme_bw()
Created on 2021-08-22 by the reprex package (v2.0.0)

Related

how to extract geographic latitude and longitude from the variable Store Location

if (!file.exists("ames-liquor.rds")) {
url <- "https://github.com/ds202-at-ISU/materials/blob/master/03_tidyverse/data/ames-liquor.rds?raw=TRUE"
download.file(url, "ames-liquor.rds", mode="wb")
}
data <- readRDS("ames-liquor.rds")
how to extract geographic latitude and longitude from the variable Store Location and check variable types?
and how to use the package lubridate to convert the Date variable to a date. Then extract year, month and day from the variable Date
I am having hard time figuring out how can I extract geographic latitude
I used the code below to pload data on R
One option would be to use tidyr::extract to extract the longitude and latitude. For the dates convert to a proper date using e.g. as.Date. Afterwards you could get the year, month and day using the respective functions from lubridate:
library(dplyr)
library(tidyr)
library(lubridate)
data |>
tidyr::extract(`Store Location`, into = c("lon", "lat"),
regex = "\\((\\-?\\d+\\.\\d+) (\\-?\\d+\\.\\d+)\\)",
remove = FALSE,
convert = TRUE) |>
mutate(Date = as.Date(Date, "%m/%d/%Y"),
year = lubridate::year(Date),
month = lubridate::month(Date),
day = lubridate::day(Date)) |>
select(`Store Location`, lon, lat, Date, year, month, day)
#> # A tibble: 661,945 × 7
#> `Store Location` lon lat Date year month day
#> <chr> <dbl> <dbl> <date> <dbl> <dbl> <int>
#> 1 POINT (-93.619455 42.022848) -93.6 42.0 2020-11-02 2020 11 2
#> 2 POINT (-93.669896 42.02160500000001) -93.7 42.0 2020-07-01 2020 7 1
#> 3 POINT (-93.669896 42.02160500000001) -93.7 42.0 2019-07-31 2019 7 31
#> 4 <NA> NA NA 2019-07-25 2019 7 25
#> 5 <NA> NA NA 2019-07-05 2019 7 5
#> 6 POINT (-93.618911 42.022854) -93.6 42.0 2020-07-02 2020 7 2
#> 7 POINT (-93.669896 42.02160500000001) -93.7 42.0 2021-03-03 2021 3 3
#> 8 POINT (-93.619455 42.022848) -93.6 42.0 2021-03-03 2021 3 3
#> 9 POINT (-93.669896 42.02160500000001) -93.7 42.0 2019-07-17 2019 7 17
#> 10 <NA> NA NA 2022-08-03 2022 8 3
#> # … with 661,935 more rows

R - reduce multilevel list and add id variable at each step

Let's say that you have a list in R named Weather which is structured like this:
- Winter
- Europe
- Rainfall
- Temperature
- Asia
- Rainfall
- Temperature
- Africa
- Rainfall
- Temperature
- Summer
- Europe
- Rainfall
- Temperature
- Asia
- Rainfall
- Temperature
- Africa
- Rainfall
- Temperature
Each of the objects stored in Rainfall or Temperature is a data frame which contains two variables: Date and Temp in case of Temperature and Date and Rain in the case of Rainfall.
I want to turn this list of lists into a data frame which has the following variables: Date, Temp, Rain, Continent, Season.
I don't understand how can I automatically add the ID variable to the lower levels of the list, pulling the names from the lower levels of the list. For now I have to do it manually like this:
Weather$Winter$Europe$Rainfall$Continent <- "Europe"
Weather$Winter$Europe$Temperature$Continent <- "Europe"
Weather$Winter$Asia$Rainfall$Continent <- "Asia"
Weather$Winter$Asia$Temperature$Continent <- "Asia"
Weather$Winter$Africa$Rainfall$Continent <- "Africa"
Weather$Winter$Africa$Temperature$Continent <- "Africa"
Weather$Summer$Europe$Rainfall$Continent <- "Europe"
Weather$Summer$Europe$Temperature$Continent <- "Europe"
Weather$Summer$Asia$Rainfall$Continent <- "Asia"
Weather$Summer$Asia$Temperature$Continent <- "Asia"
Weather$Summer$Africa$Rainfall$Continent <- "Africa"
Weather$Summer$Africa$Temperature$Continent <- "Africa"
Then I use map_depth and reduce to join the lists. After that, I have to repeat this process to manually add the Season variable, and only then am I able to reduce all the data frames into a single one.
I am looking for a method that would allow me to automatically create the ID variables (Continent, Season), which are actually pulled from list names, while reducing the list.
You can use this as a starting point:
library(tidyverse)
data <- list(
Winter = list(
Europe = list(
Rainfall = data.frame(Date = c("2022-06-01", "2022-06-02", "2022-06-03"), Rain = c(20, 10, 15)),
Temperature = data.frame(Date = c("2022-06-01", "2022-06-02", "2022-06-03"), Temp = c(0, 0.5, 0.8))
),
Asia = list(
Rainfall = data.frame(Date = c("2022-06-01", "2022-06-02", "2022-06-03"), Rain = c(30, 35, 34)),
Temperature = data.frame(Date = c("2022-06-01", "2022-06-02", "2022-06-03"), Temp = c(0, 0, 0.1))
)
)
)
data
#> $Winter
#> $Winter$Europe
#> $Winter$Europe$Rainfall
#> Date Rain
#> 1 2022-06-01 20
#> 2 2022-06-02 10
#> 3 2022-06-03 15
#>
#> $Winter$Europe$Temperature
#> Date Temp
#> 1 2022-06-01 0.0
#> 2 2022-06-02 0.5
#> 3 2022-06-03 0.8
#>
#>
#> $Winter$Asia
#> $Winter$Asia$Rainfall
#> Date Rain
#> 1 2022-06-01 30
#> 2 2022-06-02 35
#> 3 2022-06-03 34
#>
#> $Winter$Asia$Temperature
#> Date Temp
#> 1 2022-06-01 0.0
#> 2 2022-06-02 0.0
#> 3 2022-06-03 0.1
data %>%
rapply(as.list) %>%
enframe() %>%
separate(name, into = c("Season", "Continent", "table", "name"), sep = "[.]") %>%
mutate(name = name %>% str_remove("[0-9]+$")) %>%
select(-table) %>%
pivot_wider(values_fn = list) %>%
unnest(Date) %>%
unnest(Rain) %>%
unnest(Temp)
#> # A tibble: 108 × 5
#> Season Continent Date Rain Temp
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Winter Europe 2022-06-01 20 0
#> 2 Winter Europe 2022-06-01 20 0.5
#> 3 Winter Europe 2022-06-01 20 0.8
#> 4 Winter Europe 2022-06-01 10 0
#> 5 Winter Europe 2022-06-01 10 0.5
#> 6 Winter Europe 2022-06-01 10 0.8
#> 7 Winter Europe 2022-06-01 15 0
#> 8 Winter Europe 2022-06-01 15 0.5
#> 9 Winter Europe 2022-06-01 15 0.8
#> 10 Winter Europe 2022-06-02 20 0
#> # … with 98 more rows
Created on 2022-06-28 by the reprex package (v2.0.0)

Is there any function that give the changes between columns?

I have a df that looks like this.
head(dfhigh)
rownames 2015Y 2016Y 2017Y 2018Y 2019Y 2020Y 2021Y
1 Australia 29583.7403 48397.383 45220.323 68461.941 39218.044 20140.351 29773.188
2 Austria* 1294.5092 -8400.973 14926.164 5511.625 2912.795 -14962.963 5855.014
3 Belgium* -24013.3111 68177.596 -3057.153 27119.084 -9208.553 13881.481 22955.298
4 Canada 43852.7732 36061.859 22764.156 37653.521 50141.784 23174.006 59693.992
5 Chile* 20507.8407 12249.294 6128.716 7735.778 12499.238 8385.907 15251.538
6 Czech Republic 465.2137 9814.496 9517.948 11010.423 10108.914 9410.576 5805.084
I want to calculate the changes between years, so instead of the values, the table has the percentage of change (obviously deleting 2015Y).
Try this using (current - previous)/ previous *100
lst <- list()
nm <- names(dfhigh)[-1]
for(i in 1:(length(nm) - 1)){
lst[[i]] <- (dfhigh[[nm[i+1]]] - dfhigh[[nm[i]]]) / dfhigh[[nm[i]]] * 100
}
ans <- do.call(cbind , lst)
colnames(ans) <- paste("ch_of" , nm[-1])
ans
you can change the formula to calculate percentage as you want
You could also use a tidyverse solution.
library(tidyverse)
df %>%
pivot_longer(!rownames) %>%
group_by(rownames) %>%
mutate(value = 100*value/lag(value)-100) %>%
ungroup() %>%
pivot_wider(names_from = name, values_from = value)
# # A tibble: 6 × 8
# rownames `2015Y` `2016Y` `2017Y` `2018Y` `2019Y` `2020Y` `2021Y`
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Australia NA 63.6 -6.56 51.4 -42.7 -48.6 47.8
# 2 Austria* NA -749. -278. -63.1 -47.2 -614. -139.
# 3 Belgium* NA -384. -104. -987. -134. -251. 65.4
# 4 Canada NA -17.8 -36.9 65.4 33.2 -53.8 158.
# 5 Chile* NA -40.3 -50.0 26.2 61.6 -32.9 81.9
# 6 CzechRepublic NA 2010. -3.02 15.7 -8.19 -6.91 -38.3

How do I use dplyr::across with a multi-argument function on a grouped data frame?

I want to compute a weighted moving average across multiple columns, using the same weights for each column. The weighted moving average shall be computed per group (in contrast to using `dplyr::across` with functions with more than one argument).
In the example below, the grouping should make the weighted moving average "reset" every year, yielding missing values for the first two observations of each year.
How do I make this work?
library(tidyverse)
weighted.filter <- function(x, wt, filter, ...) {
filter <- filter / sum(filter)
stats::filter(x * wt, filter, ...) / stats::filter(wt, filter, ...)
}
economics %>%
group_by(year = lubridate::year(date)) %>%
arrange(date) %>%
mutate(across(
c(pce, psavert, uempmed),
list("moving_average_weighted" = weighted.filter),
wt = pop, filter = rep(1, 3), sides = 1
))
#> Error: Problem with `mutate()` input `..1`.
#> x Input `..1` can't be recycled to size 12.
#> ℹ Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
#> ℹ Input `..1` must be size 12 or 1, not 6.
#> ℹ The error occurred in group 2: year = 1968.
Created on 2021-03-31 by the reprex package (v1.0.0)
Try
economics %>%
group_by(year = lubridate::year(date)) %>%
arrange(date) %>%
mutate(across(
c(pce, psavert, uempmed),
list("moving_average_weighted" =
~ weighted.filter(., wt = pop, filter = rep(1, 3), sides = 1))
))
# # A tibble: 574 x 10
# # Groups: year [49]
# date pce pop psavert uempmed unemploy year pce_moving_average_w~ psavert_moving_avera~ uempmed_moving_avera~
# <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1967-07-01 507. 198712 12.6 4.5 2944 1967 NA NA NA
# 2 1967-08-01 510. 198911 12.6 4.7 2945 1967 NA NA NA
# 3 1967-09-01 516. 199113 11.9 4.6 2958 1967 511. 12.4 4.60
# 4 1967-10-01 512. 199311 12.9 4.9 3143 1967 513. 12.5 4.73
# 5 1967-11-01 517. 199498 12.8 4.7 3066 1967 515. 12.5 4.73
# 6 1967-12-01 525. 199657 11.8 4.8 3018 1967 518. 12.5 4.80
# 7 1968-01-01 531. 199808 11.7 5.1 2878 1968 NA NA NA
# 8 1968-02-01 534. 199920 12.3 4.5 3001 1968 NA NA NA
# 9 1968-03-01 544. 200056 11.7 4.1 2877 1968 536. 11.9 4.57
# 10 1968-04-01 544 200208 12.3 4.6 2709 1968 541. 12.1 4.40
# # ... with 564 more rows

Why does R throw an error on iterative calculation

I'm looking at covid-19 data to calculate estimates for the reproductive number R0.
library(ggplot2)
library(dplyr)
library(tidyr)
library(stringr)
library(TTR)
# Get COVID cases, available from:
url <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv"
DoubleCOV <- read.csv(url, stringsAsFactors = FALSE)
names(DoubleCOV)[1] <- "countyFIPS"
DoubleCovid <- pivot_longer(DoubleCOV, cols=starts_with("X"),
values_to="cases",
names_to=c("X","date_infected"),
names_sep="X") %>%
mutate(infected = as.Date(date_infected, format="%m.%d.%y"),
countyFIPS = str_pad(as.character(countyFIPS), 5, pad="0"))
#data is by county, summarise for the state of interest
stateData <- DoubleCovid %>% filter(State == "AL") %>% filter(cases != 0) %>%
group_by(infected) %>% summarise(sum(cases)) %>%
mutate(DaysSince = infected - min(infected))
names(stateData)[2] <- "cumCases"
#3 day moving average to smooth a little
stateData <- stateData %>% mutate(MA = runMean(cumCases,3))
#calculate doubling rate (DR) and then R0 infectious period/doubling rate
for(j in 4:nrow(stateData)){
stateData$DR[j] <- log(2)/log(stateData$MA[j]/stateData$MA[j-1])
stateData$R0[j] <- 14/stateData$DR[j]
}
CDplot <- stateData %>%
ggplot(mapping = aes(x = as.numeric(DaysSince), y = R0)) +
geom_line(color = "firebrick")
print(CDplot)
So in the above the state of interest is Alabama, hence filter(State == "AL") and this works.
But if I change the state to "NY" I get
Error in `$<-.data.frame`(`*tmp*`, "DR", value = c(NA, NA, NA, 0.733907206043719 :
replacement has 4 rows, data has 39
head(stateData) yields
infected cumCases DaysSince MA
<date> <int> <drtn> <dbl>
1 2020-03-02 1 0 days NA
2 2020-03-03 2 1 days NA
3 2020-03-04 11 2 days 4.67
4 2020-03-05 23 3 days 12
5 2020-03-06 25 4 days 19.7
6 2020-03-07 77 5 days 41.7
The moving average values in rows 3 and 4 (12 and 4.67) would yield a doubling rate of 0.734 which aligns with the value in the error message value = c(NA, NA, NA, 0.733907206043719 but why does it throw an error after that?
Bonus question: I know loops are frowned upon in R...is there a way to get the moving average and R0 calculation without one?
You have to initialise the new variables before you can access them using the j index. Due to recycling, Alabama, which has 28 rows (divisible by 4), does not return an error, only the warnings about uninitialised columns. New York, however, has 39 rows, which is not divisible by 4 so recycling fails and R returns an error. You shouldn't ignore warnings, sometimes you can, but it's not a good idea.
Try this to see what R (you) is trying to do:
stateData[4]
You should get all rows of the 4th column, not the 4th row.
Solution: initialise your DR and R0 columns first.
stateData$DR <- NA
stateData$R0 <- NA
for(j in 4:nrow(stateData)){
stateData$DR[j] <- log(2)/log(stateData$MA[j]/stateData$MA[j-1])
stateData$R0[j] <- 14/stateData$DR[j]
}
For the bonus question, you can use lag in the same mutate with MA:
stateData <- stateData %>% mutate(MA = runMean(cumCases,3),
DR = log(2)/log(MA/lag(MA)),
R0 = 14 / DR)
stateData
# A tibble: 28 x 6
infected cumCases DaysSince MA DR R0
<date> <int> <drtn> <dbl> <dbl> <dbl>
1 2020-03-13 5 0 days NA NA NA
2 2020-03-14 11 1 days NA NA NA
3 2020-03-15 22 2 days 12.7 NA NA
4 2020-03-16 29 3 days 20.7 1.42 9.89
5 2020-03-17 39 4 days 30 1.86 7.53
6 2020-03-18 51 5 days 39.7 2.48 5.64
7 2020-03-19 78 6 days 56 2.01 6.96
8 2020-03-20 106 7 days 78.3 2.07 6.78
9 2020-03-21 131 8 days 105 2.37 5.92
10 2020-03-22 167 9 days 135. 2.79 5.03
# ... with 18 more rows
I'm using Alabama's data.

Resources