In package "openair",i want to use 'importAURN' find all 2021 AURN site data in one dataset.
i.e. merge all site data,or have other method find all 2021 AURN site data?
How can i do it.
This code can know all aurn site
importMeta(source = "aurn", all = FALSE)
each site code like
1.site=kc1
kc1 <- importAURN(site = "kc1", year = 2021)
date pm2.5 site code
1 2021-01-01 00:00:00 30.4 London N. Kensington KC1
2 2021-01-01 01:00:00 55.8 London N. Kensington KC1
3 2021-01-01 02:00:00 28.3 London N. Kensington KC1
4 2021-01-01 03:00:00 15.6 London N. Kensington KC1
5 2021-01-01 04:00:00 19.8 London N. Kensington KC1
site=AH
AH <- importAURN(site = "AH", year = 2021)
date pm2.5 site code
1 2021-01-01 00:00:00 5.33 Aberdeen ABD
2 2021-01-01 01:00:00 3.07 Aberdeen ABD
3 2021-01-01 02:00:00 2.64 Aberdeen ABD
4 2021-01-01 03:00:00 2.43 Aberdeen ABD
5 2021-01-01 04:00:00 2.38 Aberdeen ABD
Maybe this serves your purpose:
dat<- importMeta(source = "aurn", all = FALSE)
imported<- lapply(dat$code, importAURN, year = 2021)
This code store all data of aurn sites, and then apply importAURN function to each site by the code for year 2021, and then store the resulted data to a list named imported. Each element of the list contains the data of each site.
In case you want to merge all data from all elements in the imported list, you can use rbind this way:
merged <- do.call(rbind, imported)
For example, I want to import and then merge dataset from the first two sites:
first2sites <-lapply(dat$code[1:2], importAURN, year = 2021)
merged2 <- do.call(rbind, first2sites)
head(merged2)
# # A tibble: 6 x 12
# site code date nox no2 no o3 pm10 pm2.5 ws wd air_temp
# <chr> <fct> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Aberdeen ABD 2021-01-01 00:00:00 4.51 3.41 0.718 58.5 8.7 5.33 6 338. 3.9
# 2 Aberdeen ABD 2021-01-01 01:00:00 3.76 2.79 0.628 57.3 6 3.07 6.1 339. 3.9
# 3 Aberdeen ABD 2021-01-01 02:00:00 3.69 2.66 0.673 54.9 5.08 2.64 5.9 341. 4.4
# 4 Aberdeen ABD 2021-01-01 03:00:00 1.54 0.815 0.471 55.2 4.78 2.43 6.9 345. 4.1
# 5 Aberdeen ABD 2021-01-01 04:00:00 3.07 2.15 0.605 52.2 5.03 2.38 7 347. 3.7
# 6 Aberdeen ABD 2021-01-01 05:00:00 3.94 3.02 0.605 49.9 6.32 2.81 6.9 352. 3.5
tail(merged2)
# # A tibble: 6 x 12
# site code date nox no2 no o3 pm10 pm2.5 ws wd air_temp
# <chr> <fct> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Aberdeen Erroll Park ABD9 2021-12-31 18:00:00 175. 57.8 76.2 2.39 21.1 16.2 3.4 127. 5.4
# 2 Aberdeen Erroll Park ABD9 2021-12-31 19:00:00 143. 53.7 58.4 2.00 30.4 25.4 3.6 156. 5.9
# 3 Aberdeen Erroll Park ABD9 2021-12-31 20:00:00 175. 53.9 79.1 2.39 45.2 27.2 3.8 167. 6.4
# 4 Aberdeen Erroll Park ABD9 2021-12-31 21:00:00 177. 53.0 81.1 2.79 61.5 42.4 3.9 189. 6.9
# 5 Aberdeen Erroll Park ABD9 2021-12-31 22:00:00 215. 56.2 104. 2.79 41.0 29.6 4.4 194. 7.4
# 6 Aberdeen Erroll Park ABD9 2021-12-31 23:00:00 160. 43.7 75.9 8.98 25.3 20.6 5.9 200. 7.6
Related
I am trying to find way to shorten my code using dynamic naming variables & functions related with ascending & descending order. Though I can do desc but couldn't find anything for ascending. Below is the reproducible example to demonstrate my problem.
Here is the sample dataset
library(dplyr)
set.seed(100)
data <- tibble(a = runif(20, min = 0, max = 100),
b = runif(20, min = 0, max = 100),
c = runif(20, min = 0, max = 100))
Dynamically passing variable with percent rank in ascending order
current_var <- "a" # dynamic variable name
data %>%
mutate("percent_rank_{current_var}" := percent_rank(!!sym(current_var)))
#> # A tibble: 20 × 4
#> a b c percent_rank_a
#> <dbl> <dbl> <dbl> <dbl>
#> 1 30.8 53.6 33.1 0.263
#> 2 25.8 71.1 86.5 0.158
#> 3 55.2 53.8 77.8 0.684
#> 4 5.64 74.9 82.7 0
#> 5 46.9 42.0 60.3 0.526
#> 6 48.4 17.1 49.1 0.579
#> 7 81.2 77.0 78.0 0.947
#> 8 37.0 88.2 88.4 0.421
#> 9 54.7 54.9 20.8 0.632
#> 10 17.0 27.8 30.7 0.0526
#> 11 62.5 48.8 33.1 0.737
#> 12 88.2 92.9 19.9 1
#> 13 28.0 34.9 23.6 0.211
#> 14 39.8 95.4 27.5 0.474
#> 15 76.3 69.5 59.1 0.895
#> 16 66.9 88.9 25.3 0.789
#> 17 20.5 18.0 12.3 0.105
#> 18 35.8 62.9 23.0 0.316
#> 19 35.9 99.0 59.8 0.368
#> 20 69.0 13.0 21.1 0.842
Dynamically passing variable with percent rank in descending order
data %>%
mutate("percent_rank_{current_var}" := percent_rank(desc(!!sym(current_var))))
#> # A tibble: 20 × 4
#> a b c percent_rank_a
#> <dbl> <dbl> <dbl> <dbl>
#> 1 30.8 53.6 33.1 0.737
#> 2 25.8 71.1 86.5 0.842
#> 3 55.2 53.8 77.8 0.316
#> 4 5.64 74.9 82.7 1
#> 5 46.9 42.0 60.3 0.474
#> 6 48.4 17.1 49.1 0.421
#> 7 81.2 77.0 78.0 0.0526
#> 8 37.0 88.2 88.4 0.579
#> 9 54.7 54.9 20.8 0.368
#> 10 17.0 27.8 30.7 0.947
#> 11 62.5 48.8 33.1 0.263
#> 12 88.2 92.9 19.9 0
#> 13 28.0 34.9 23.6 0.789
#> 14 39.8 95.4 27.5 0.526
#> 15 76.3 69.5 59.1 0.105
#> 16 66.9 88.9 25.3 0.211
#> 17 20.5 18.0 12.3 0.895
#> 18 35.8 62.9 23.0 0.684
#> 19 35.9 99.0 59.8 0.632
#> 20 69.0 13.0 21.1 0.158
How to combine both into one statement? - I can do for desc but couldn't find any explicit statement for ascending order
rank_function <- desc # dynamic function for ranking
data %>%
mutate("percent_rank_{current_var}" := percent_rank(rank_function(!!sym(current_var))))
#> # A tibble: 20 × 4
#> a b c percent_rank_a
#> <dbl> <dbl> <dbl> <dbl>
#> 1 30.8 53.6 33.1 0.737
#> 2 25.8 71.1 86.5 0.842
#> 3 55.2 53.8 77.8 0.316
#> 4 5.64 74.9 82.7 1
#> 5 46.9 42.0 60.3 0.474
#> 6 48.4 17.1 49.1 0.421
#> 7 81.2 77.0 78.0 0.0526
#> 8 37.0 88.2 88.4 0.579
#> 9 54.7 54.9 20.8 0.368
#> 10 17.0 27.8 30.7 0.947
#> 11 62.5 48.8 33.1 0.263
#> 12 88.2 92.9 19.9 0
#> 13 28.0 34.9 23.6 0.789
#> 14 39.8 95.4 27.5 0.526
#> 15 76.3 69.5 59.1 0.105
#> 16 66.9 88.9 25.3 0.211
#> 17 20.5 18.0 12.3 0.895
#> 18 35.8 62.9 23.0 0.684
#> 19 35.9 99.0 59.8 0.632
#> 20 69.0 13.0 21.1 0.158
Created on 2022-08-17 by the reprex package (v2.0.1)
You could compose a function to return its input:
rank_function <- function(x) x
Actually this function has been defined in base, i.e. identity.
rank_function <- identity
Also, you can explore the source code of desc:
desc
function (x) -xtfrm(x)
Apparently desc is just the opposite number of xtfrm. So you can use it for ascending ordering.
rank_function <- xtfrm
In the help document of xtfrm(x):
A generic auxiliary function that produces a numeric vector which will sort in the same order as x.
I have weather data with NAs sporadically throughout and I want to calculate rolling means. I have been using the rollapplyr function within zoo but even though I include partial = TRUE, it still puts a NA whenever, for example, there is a NA in 1 of the 30 values to be averaged.
Here is the formula:
weather_rolled <- weather %>%
mutate(maxt30 = rollapplyr(max_temp, 30, mean, partial = TRUE))
Here's my data:
A tibble: 7,160 x 11
station_name date max_temp avg_temp min_temp rainfall rh avg_wind_speed dew_point avg_bare_soil_temp total_solar_rad
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 VEGREVILLE 2019-01-01 0.9 -7.9 -16.6 1 81.7 20.2 -7.67 NA NA
2 VEGREVILLE 2019-01-02 5.5 1.5 -2.5 0 74.9 13.5 -1.57 NA NA
3 VEGREVILLE 2019-01-03 3.3 -0.9 -5 0.5 80.6 10.1 -3.18 NA NA
4 VEGREVILLE 2019-01-04 -1.1 -4.7 -8.2 5.2 92.1 8.67 -4.76 NA NA
5 VEGREVILLE 2019-01-05 -3.8 -6.5 -9.2 0.2 92.6 14.3 -6.81 NA NA
6 VEGREVILLE 2019-01-06 -3 -4.4 -5.9 0 91.1 16.2 -5.72 NA NA
7 VEGREVILLE 2019-01-07 -5.8 -12.2 -18.5 0 75.5 30.6 -16.9 NA NA
8 VEGREVILLE 2019-01-08 -17.4 -21.6 -25.7 1.2 67.8 16.1 -26.1 NA NA
9 VEGREVILLE 2019-01-09 -12.9 -15.1 -17.4 0.2 71.5 14.3 -17.7 NA NA
10 VEGREVILLE 2019-01-10 -13.2 -17.9 -22.5 0.4 80.2 3.38 -21.8 NA NA
# ... with 7,150 more rows
Essentially, whenever a NA appears midway through, it results in a lot of NAs for the rolling mean. I want to still calculate the rolling mean within that time frame, ignoring the NAs. Does anyone know a way to get around this? I have been searching online for hours to no avail.
Thanks!
I am looking to calculate the TDO3 value at every date during the year 2020. I have interpolated data sets of both temperature and dissolved oxygen in 0.25 meter increments from 1m - 22m below the surface between the dates of Jan-1-2020 and Dec-31-2020.
TDO3 is the temperature when dissolved oxygen is 3mg/L. Below are snips of the merged data set.
> print(do_temp, n=85)
# A tibble: 31,110 x 4
date depth mean_temp mean_do
<date> <dbl> <dbl> <dbl>
1 2020-01-01 1 2.12 11.6
2 2020-01-01 1.25 2.19 11.5
3 2020-01-01 1.5 2.27 11.4
4 2020-01-01 1.75 2.34 11.3
5 2020-01-01 2 2.42 11.2
6 2020-01-01 2.25 2.40 11.2
7 2020-01-01 2.5 2.39 11.1
8 2020-01-01 2.75 2.38 11.1
9 2020-01-01 3 2.37 11.0
10 2020-01-01 3.25 2.41 11.0
11 2020-01-01 3.5 2.46 11.0
12 2020-01-01 3.75 2.50 10.9
13 2020-01-01 4 2.55 10.9
14 2020-01-01 4.25 2.54 10.9
15 2020-01-01 4.5 2.53 10.9
16 2020-01-01 4.75 2.52 11.0
17 2020-01-01 5 2.51 11.0
18 2020-01-01 5.25 2.50 11.0
19 2020-01-01 5.5 2.49 11.0
20 2020-01-01 5.75 2.49 11.1
21 2020-01-01 6 2.48 11.1
22 2020-01-01 6.25 2.49 10.9
23 2020-01-01 6.5 2.51 10.8
24 2020-01-01 6.75 2.52 10.7
25 2020-01-01 7 2.54 10.5
26 2020-01-01 7.25 2.55 10.4
27 2020-01-01 7.5 2.57 10.2
28 2020-01-01 7.75 2.58 10.1
29 2020-01-01 8 2.60 9.95
30 2020-01-01 8.25 2.63 10.1
31 2020-01-01 8.5 2.65 10.2
32 2020-01-01 8.75 2.68 10.3
33 2020-01-01 9 2.71 10.5
34 2020-01-01 9.25 2.69 10.6
35 2020-01-01 9.5 2.67 10.7
36 2020-01-01 9.75 2.65 10.9
37 2020-01-01 10 2.63 11.0
38 2020-01-01 10.2 2.65 10.8
39 2020-01-01 10.5 2.67 10.6
40 2020-01-01 10.8 2.69 10.3
41 2020-01-01 11 2.72 10.1
42 2020-01-01 11.2 2.75 9.89
43 2020-01-01 11.5 2.78 9.67
44 2020-01-01 11.8 2.81 9.44
45 2020-01-01 12 2.84 9.22
46 2020-01-01 12.2 2.83 9.39
47 2020-01-01 12.5 2.81 9.56
48 2020-01-01 12.8 2.80 9.74
49 2020-01-01 13 2.79 9.91
50 2020-01-01 13.2 2.80 10.1
51 2020-01-01 13.5 2.81 10.3
52 2020-01-01 13.8 2.82 10.4
53 2020-01-01 14 2.83 10.6
54 2020-01-01 14.2 2.86 10.5
55 2020-01-01 14.5 2.88 10.4
56 2020-01-01 14.8 2.91 10.2
57 2020-01-01 15 2.94 10.1
58 2020-01-01 15.2 2.95 10.0
59 2020-01-01 15.5 2.96 9.88
60 2020-01-01 15.8 2.97 9.76
61 2020-01-01 16 2.98 9.65
62 2020-01-01 16.2 2.99 9.53
63 2020-01-01 16.5 3.00 9.41
64 2020-01-01 16.8 3.01 9.30
65 2020-01-01 17 3.03 9.18
66 2020-01-01 17.2 3.05 9.06
67 2020-01-01 17.5 3.07 8.95
68 2020-01-01 17.8 3.09 8.83
69 2020-01-01 18 3.11 8.71
70 2020-01-01 18.2 3.13 8.47
71 2020-01-01 18.5 3.14 8.23
72 2020-01-01 18.8 3.16 7.98
73 2020-01-01 19 3.18 7.74
74 2020-01-01 19.2 3.18 7.50
75 2020-01-01 19.5 3.18 7.25
76 2020-01-01 19.8 3.18 7.01
77 2020-01-01 20 3.18 6.77
78 2020-01-01 20.2 3.18 5.94
79 2020-01-01 20.5 3.18 5.10
80 2020-01-01 20.8 3.18 4.27
81 2020-01-01 21 3.18 3.43
82 2020-01-01 21.2 3.22 2.60
83 2020-01-01 21.5 3.25 1.77
84 2020-01-01 21.8 3.29 0.934
85 2020-01-01 22 3.32 0.100
# ... with 31,025 more rows
https://github.com/TRobin82/WaterQuality
The above link will get you to the raw data.
What I am looking for is a data frame that looks like this but it will have 366 rows for each date during the year.
> TDO3
dates tdo3
1 2020-1-1 3.183500
2 2020-2-1 3.341188
3 2020-3-1 3.338625
4 2020-4-1 3.437000
5 2020-5-1 4.453310
6 2020-6-1 5.887560
7 2020-7-1 6.673700
8 2020-8-1 7.825672
9 2020-9-1 8.861190
10 2020-10-1 11.007972
11 2020-11-1 7.136880
12 2020-12-1 2.752500
However a DO value of a perfect 3 mg/L is not found in the interpolation data frame of DO so I would need the function to find the closest value to 3 without going below then match the depth of that value up with the other data frame for temperature to assign the proper temperature at that depth.
I am assuming the best route to take is a for-loop but not sold on the proper way to go about this question.
here's one way of doing it with tidyverse-style functions. Note that this code is reproducible because anyone can run it and should get the same answer. It's great that you showed us your data, but it's even better to post the output of dput() because then people can load the data and start helping you immediately.
This code does the following:
Load the data from the link you provided. But since there were several data files I had to guess which one you meant.
Groups the observations by date.
Puts the observations in increasing order of mean_do.
Removes rows with values of mean_do that are strictly less than 3.
Takes the first ordered observation for each date (this will be the one with the lowest value of mean_do that is greater than or equal to 3).
Rename the column mean_temp as tdo3 since it's the temperature for that date when the dissolved oxygen level was closest to 3mg/L.
library(tidyverse)
do_temp <- read_csv("https://raw.githubusercontent.com/TRobin82/WaterQuality/main/DateDepthTempDo.csv") %>%
select(-X1)
do_temp %>%
group_by(date) %>%
arrange(mean_do) %>%
filter(mean_do > 3) %>%
slice_head(n=1) %>%
rename(tdo3 = mean_temp) %>%
select(date, tdo3)
Here are the results. They're a bit different from the ones you posted, so I'm not sure if I've misunderstood you or if those were just illustrative and not real results.
# A tibble: 366 x 2
# Groups: date [366]
date tdo3
<date> <dbl>
1 2020-01-01 3.18
2 2020-01-02 3.18
3 2020-01-03 3.19
4 2020-01-04 3.21
5 2020-01-05 3.21
6 2020-01-06 3.21
7 2020-01-07 3.24
8 2020-01-08 3.28
9 2020-01-09 3.27
10 2020-01-10 3.28
# ... with 356 more rows
Let me know if you were looking for something else.
I have a large dataset ("bsa", drawn from a 23-year period) which includes a variable ("leftrigh") for "left-right" views (political orientation). I'd like to summarise how the cohorts change over time. For example, in 1994 the average value of this scale for people aged 45 was (say) 2.6; in 1995 the average value of this scale for people aged 46 was (say) 2.7 -- etc etc. I've created a year-of-birth variable ("yrbrn") to facilitate this.
I've successfully created the means:
bsa <- bsa %>% group_by(yrbrn, syear) %>% mutate(meanlr = mean(leftrigh))
Where I'm struggling is to summarise the means by year (of the survey) and age (at the time of the survey). If I could create an array (containing these means) organised by age x survey-year, I could see the change over time by inspecting the diagonals. But I have no clue how to do this -- my skills are very limited...
A tibble: 66,744 x 10
Groups: yrbrn [104]
Rsex Rage leftrigh OldWt syear yrbrn coh per agecat meanlr
1 1 [Male] 40 1 [left] 1.12 2017 1977 17 2017 [37,47) 2.61
2 2 [Female] 79 1.8 0.562 2017 1938 9 2017 [77,87) 2.50
3 2 [Female] 50 1.5 1.69 2017 1967 15 2017 [47,57) 2.59
4 1 [Male] 73 2 0.562 2017 1944 10 2017 [67,77) 2.57
5 2 [Female] 31 3 0.562 2017 1986 19 2017 [27,37) 2.56
6 1 [Male] 74 2.2 0.562 2017 1943 10 2017 [67,77) 2.50
7 2 [Female] 58 2 0.562 2017 1959 13 2017 [57,67) 2.56
8 1 [Male] 59 1.2 0.562 2017 1958 13 2017 [57,67) 2.53
9 2 [Female] 19 4 1.69 2017 1998 21 2017 [17,27) 2.46
Possible format for presenting this information to see change over time:
1994 1995 1996 1997 1998 1999 2000
18
19
20
21
22
23
24
25
etc.
You can group_by both age and year at the same time:
# Setup (& make reproducible data...)
n <- 10000
df1 <- data.frame(
'yrbrn' = sample(1920:1995, size = n, replace = T),
'Syear' = sample(2005:2015, size = n, replace = T),
'leftrigh' = sample(seq(0,5,0.1), size = n, replace = T))
# Solution
df1 %>%
group_by(yrbrn, Syear) %>%
summarise(meanLR = mean(leftrigh)) %>%
spread(Syear, meanLR)
Produces the following:
# A tibble: 76 x 12
# Groups: yrbrn [76]
yrbrn `2005` `2006` `2007` `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015`
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1920 3.41 1.68 2.26 2.66 3.21 2.59 2.24 2.39 2.41 2.55 3.28
2 1921 2.43 2.71 2.74 2.32 2.24 1.89 2.85 3.27 2.53 1.82 2.65
3 1922 2.28 3.02 1.39 2.33 3.25 2.09 2.35 1.83 2.09 2.57 1.95
4 1923 3.53 3.72 2.87 2.05 2.94 1.99 2.8 2.88 2.62 3.14 2.28
5 1924 1.77 2.17 2.71 2.18 2.71 2.34 2.29 1.94 2.7 2.1 1.87
6 1925 1.83 3.01 2.48 2.54 2.74 2.11 2.35 2.65 2.57 1.82 2.39
7 1926 2.43 3.2 2.53 2.64 2.12 2.71 1.49 2.28 2.4 2.73 2.18
8 1927 1.33 2.83 2.26 2.82 2.34 2.09 2.3 2.66 3.09 2.2 2.27
9 1928 2.34 2.02 2.1 2.88 2.14 2.44 2.58 1.67 2.57 3.11 2.93
10 1929 2.31 2.29 2.93 2.08 2.11 2.47 2.39 1.76 3.09 3 2.9
I have the dataset below:
> head(GLM_df)
# A tibble: 6 x 7
# Groups: hour [6]
hour Feeding Foraging Standing ID Area Feeding_Foraging
<int> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
1 0 3.61 23.2 1 41361 Seronera 26.8
2 1 2.85 24.2 1 41361 Seronera 27.0
3 2 2.5 24.3 2 41361 Seronera 26.8
4 3 6.92 18.6 3.89 41361 Seronera 25.6
5 4 7.5 17.6 3.78 41361 Seronera 25.1
6 5 7.26 19.6 2.45 41361 Seronera 26.8
And would like to round off numbers in columns Standing and Feeding_Foraging. I'm using as.integrer() as follows:
> GLM_df$Standing<-as.integer(GLM_df$Standing)
> GLM_df$Feeding_Foraging<-as.integer(GLM_df$Feeding_Foraging)
> head(GLM_df)
# A tibble: 6 x 7
# Groups: hour [6]
hour Feeding Foraging Standing ID Area Feeding_Foraging
<int> <dbl> <dbl> <int> <chr> <chr> <int>
1 0 3.61 23.2 1 41361 Seronera 26
2 1 2.85 24.2 1 41361 Seronera 27
3 2 2.5 24.3 2 41361 Seronera 26
4 3 6.92 18.6 3 41361 Seronera 25
5 4 7.5 17.6 3 41361 Seronera 25
6 5 7.26 19.6 2 41361 Seronera 26
However, this is rounding 26.8 into 26 and I would like 26.8 to be rounded up to 27. On the other hand, I would like values having less than a 0.5 decimal component (ex: 25.1) to be rounded down as using as.integrer() (ex: 25).
Is there a function I can use for that or do I need code?
Any input is appreciated!