dplyr - programming dynamic variable & function name - ascending & descending - r

I am trying to find way to shorten my code using dynamic naming variables & functions related with ascending & descending order. Though I can do desc but couldn't find anything for ascending. Below is the reproducible example to demonstrate my problem.
Here is the sample dataset
library(dplyr)
set.seed(100)
data <- tibble(a = runif(20, min = 0, max = 100),
b = runif(20, min = 0, max = 100),
c = runif(20, min = 0, max = 100))
Dynamically passing variable with percent rank in ascending order
current_var <- "a" # dynamic variable name
data %>%
mutate("percent_rank_{current_var}" := percent_rank(!!sym(current_var)))
#> # A tibble: 20 × 4
#> a b c percent_rank_a
#> <dbl> <dbl> <dbl> <dbl>
#> 1 30.8 53.6 33.1 0.263
#> 2 25.8 71.1 86.5 0.158
#> 3 55.2 53.8 77.8 0.684
#> 4 5.64 74.9 82.7 0
#> 5 46.9 42.0 60.3 0.526
#> 6 48.4 17.1 49.1 0.579
#> 7 81.2 77.0 78.0 0.947
#> 8 37.0 88.2 88.4 0.421
#> 9 54.7 54.9 20.8 0.632
#> 10 17.0 27.8 30.7 0.0526
#> 11 62.5 48.8 33.1 0.737
#> 12 88.2 92.9 19.9 1
#> 13 28.0 34.9 23.6 0.211
#> 14 39.8 95.4 27.5 0.474
#> 15 76.3 69.5 59.1 0.895
#> 16 66.9 88.9 25.3 0.789
#> 17 20.5 18.0 12.3 0.105
#> 18 35.8 62.9 23.0 0.316
#> 19 35.9 99.0 59.8 0.368
#> 20 69.0 13.0 21.1 0.842
Dynamically passing variable with percent rank in descending order
data %>%
mutate("percent_rank_{current_var}" := percent_rank(desc(!!sym(current_var))))
#> # A tibble: 20 × 4
#> a b c percent_rank_a
#> <dbl> <dbl> <dbl> <dbl>
#> 1 30.8 53.6 33.1 0.737
#> 2 25.8 71.1 86.5 0.842
#> 3 55.2 53.8 77.8 0.316
#> 4 5.64 74.9 82.7 1
#> 5 46.9 42.0 60.3 0.474
#> 6 48.4 17.1 49.1 0.421
#> 7 81.2 77.0 78.0 0.0526
#> 8 37.0 88.2 88.4 0.579
#> 9 54.7 54.9 20.8 0.368
#> 10 17.0 27.8 30.7 0.947
#> 11 62.5 48.8 33.1 0.263
#> 12 88.2 92.9 19.9 0
#> 13 28.0 34.9 23.6 0.789
#> 14 39.8 95.4 27.5 0.526
#> 15 76.3 69.5 59.1 0.105
#> 16 66.9 88.9 25.3 0.211
#> 17 20.5 18.0 12.3 0.895
#> 18 35.8 62.9 23.0 0.684
#> 19 35.9 99.0 59.8 0.632
#> 20 69.0 13.0 21.1 0.158
How to combine both into one statement? - I can do for desc but couldn't find any explicit statement for ascending order
rank_function <- desc # dynamic function for ranking
data %>%
mutate("percent_rank_{current_var}" := percent_rank(rank_function(!!sym(current_var))))
#> # A tibble: 20 × 4
#> a b c percent_rank_a
#> <dbl> <dbl> <dbl> <dbl>
#> 1 30.8 53.6 33.1 0.737
#> 2 25.8 71.1 86.5 0.842
#> 3 55.2 53.8 77.8 0.316
#> 4 5.64 74.9 82.7 1
#> 5 46.9 42.0 60.3 0.474
#> 6 48.4 17.1 49.1 0.421
#> 7 81.2 77.0 78.0 0.0526
#> 8 37.0 88.2 88.4 0.579
#> 9 54.7 54.9 20.8 0.368
#> 10 17.0 27.8 30.7 0.947
#> 11 62.5 48.8 33.1 0.263
#> 12 88.2 92.9 19.9 0
#> 13 28.0 34.9 23.6 0.789
#> 14 39.8 95.4 27.5 0.526
#> 15 76.3 69.5 59.1 0.105
#> 16 66.9 88.9 25.3 0.211
#> 17 20.5 18.0 12.3 0.895
#> 18 35.8 62.9 23.0 0.684
#> 19 35.9 99.0 59.8 0.632
#> 20 69.0 13.0 21.1 0.158
Created on 2022-08-17 by the reprex package (v2.0.1)

You could compose a function to return its input:
rank_function <- function(x) x
Actually this function has been defined in base, i.e. identity.
rank_function <- identity
Also, you can explore the source code of desc:
desc
function (x) -xtfrm(x)
Apparently desc is just the opposite number of xtfrm. So you can use it for ascending ordering.
rank_function <- xtfrm
In the help document of xtfrm(x):
A generic auxiliary function that produces a numeric vector which will sort in the same order as x.

Related

Pivot longer is showing transformed dataframe

I tried to create longer format of a dataset and I am getting a transformed dataframe. I have seen that the column names are in reverse format. But don't know how to fix this. I want x, y are column names. any help?
library(pacman)
p_load(tidyverse, purrr, datasauRus)
datasaurus_dozen_wide %>%
pivot_longer(everything(),
names_to = c(".value", "set"),
names_pattern = "(.*)_(.)")
#> # A tibble: 284 × 14
#> set away bullseye circle dino dots h_lines high_…¹ slant…² slant…³ star
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 x 32.3 51.2 56.0 55.4 51.1 53.4 57.6 52.9 47.7 58.2
#> 2 y 61.4 83.3 79.3 97.2 90.9 90.2 83.9 97.3 95.2 91.9
#> 3 x 53.4 59.0 50.0 51.5 50.5 52.8 51.3 59.0 44.6 58.2
#> 4 y 26.2 85.5 79.0 96.0 89.1 90.1 82.8 93.6 93.1 92.2
#> 5 x 63.9 51.9 51.3 46.2 50.2 47.1 50.8 56.4 43.9 58.7
#> 6 y 30.8 85.8 82.4 94.5 85.5 90.5 76.8 96.3 94.1 90.3
#> 7 x 70.3 48.2 51.2 42.8 50.1 42.4 37.0 37.8 41.6 57.3
#> 8 y 82.5 85.0 79.2 91.4 83.1 89.5 82.0 94.4 90.3 89.9
#> 9 x 34.1 41.7 44.4 40.8 50.6 42.7 42.9 39.9 49.2 58.1
#> 10 y 45.7 84.0 78.2 88.3 82.9 90.4 80.2 90.6 96.6 92.0
#> # … with 274 more rows, 3 more variables: v_lines <dbl>, wide_lines <dbl>,
#> # x_shape <dbl>, and abbreviated variable names ¹​high_lines, ²​slant_down,
#> # ³​slant_up
Created on 2022-10-10 with reprex v2.0.2
You could achieve your desired result by simply switching ".value" and "set" in the names_to argument:
library(tidyr)
library(datasauRus)
datasaurus_dozen_long <- datasaurus_dozen_wide %>%
pivot_longer(everything(),
names_to = c("set", ".value"),
names_pattern = "(.*)_(.)")
head(datasaurus_dozen_long)
#> # A tibble: 6 × 3
#> set x y
#> <chr> <dbl> <dbl>
#> 1 away 32.3 61.4
#> 2 bullseye 51.2 83.3
#> 3 circle 56.0 79.3
#> 4 dino 55.4 97.2
#> 5 dots 51.1 90.9
#> 6 h_lines 53.4 90.2
library(ggplot2)
ggplot(datasaurus_dozen_long, aes(x, y)) +
geom_point() +
facet_wrap(~set)

Taylor Diagrams by Group in R (openair)

I'm trying to create a taylor diagram to show agreement between observations and model output. The openair package lets you differentiate by a group, which I would like to do for each site.
This is the code that I'm using:
TaylorDiagram(month_join, obs = "temp", mod = "temp_surf", group = "dataset_id", normalise = TRUE, cex = 1)
The observation variable is temp, model variable is temp_surf, and site that I want to differentiate by different groups, is dataset_id.
When I do this, though there are 17 different datasets, they are binned into four groups. I can't find any help online about this. The function documentation says that for the group argument, "The total number of models compared will be equal to the number of unique values of group". I have 17 unique values in the group but they are automatically binned into 4.
Taylor diagram with 4 groups instead of 17
[Edit: first 20 rows of data from month_join]
# A tibble: 20 × 14
# Groups: dataset_id [2]
dataset_id month temp_surf temp_mid temp_bot ph_surf ph_mid ph_bot do_surf do_mid do_bot temp ph do
<dbl> <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 1 13.4 13.3 13.2 8.01 7.99 7.97 244. 232. 220. 13.3 8.00 NaN
2 3 2 13.3 13.2 13.0 8.01 7.98 7.96 245. 232. 218. 12.5 7.99 NaN
3 3 3 12.9 12.7 12.5 7.97 7.94 7.91 233. 216. 199. 12.7 8.04 NaN
4 3 4 12.6 12.4 12.2 7.93 7.91 7.89 223. 207. 190. NaN NaN NaN
5 3 5 12.9 12.7 12.4 7.93 7.91 7.89 223. 208. 193. NaN NaN NaN
6 3 6 13.5 13.2 12.9 7.94 7.92 7.90 226. 212. 198. 15.1 8.04 NaN
7 3 7 14.3 13.9 13.5 7.97 7.95 7.94 236. 224. 212. 16.0 8.09 NaN
8 3 8 14.4 14.1 13.8 7.98 7.97 7.95 238. 228. 217. 16.6 8.06 NaN
9 3 9 14.8 14.5 14.1 8.00 7.99 7.97 244. 235. 227. 16.7 8.05 NaN
10 3 10 14.8 14.4 14.1 8.00 7.98 7.96 243. 233. 222. 16.2 8.05 NaN
11 3 11 14.3 14.0 13.7 7.99 7.96 7.94 237. 224. 211. 15.5 8.05 NaN
12 3 12 13.6 13.4 13.3 7.99 7.97 7.94 237. 225. 213. 14.4 8.05 NaN
13 6 1 14.3 9.48 4.70 8.07 7.84 7.62 261. 143. 24.7 13.6 NaN NaN
14 6 2 14.2 9.42 4.68 8.07 7.84 7.62 264. 144. 24.4 13.5 NaN NaN
15 6 3 14.5 9.61 4.67 8.07 7.84 7.61 266. 145. 24.2 14.0 NaN NaN
16 6 4 15.0 9.86 4.68 8.06 7.84 7.61 264. 144. 24.0 14.3 NaN NaN
17 6 5 16.0 10.4 4.68 8.05 7.83 7.61 262. 143. 24.0 16.4 NaN NaN
18 6 6 17.3 11.0 4.68 8.04 7.83 7.61 257. 141. 23.9 17.6 NaN NaN
19 6 7 18.8 11.7 4.71 8.03 7.82 7.61 251. 138. 24.2 19.3 NaN NaN
20 6 8 19.2 12.0 4.76 8.03 7.82 7.61 248. 136. 24.7 NA NA NA

Add -0.5 to a value below 0 and add 0.5 to value above 0 in r

I maybe have a strange question...I have a dataframe as below:
Station Mean_length Diff
1 AMEL 28.1 -2.91
2 AMRU 21.1 -9.90
3 BALG 31.0 0
4 BORK 30.1 -0.921
5 BUSU 22.6 -8.38
6 CADZ 28.5 2.46
7 DOLL 27.9 -3.07
8 EGMO 28.3 -2.69
9 EIER 30.8 0.233
10 FANO 23.1 -7.89
Now from column "Diff" I want to get a new column and I want to add -0.5 to a value below 0 and add 0.5 to value above 0.
So I get a new dataframe like this:
Station Mean_length Diff Diff05
1 AMEL 28.1 -2.91 -3.41 (-0.5)
2 AMRU 21.1 -9.90 -13.8 (-0.5)
3 BALG 31.0 0 0.5 (+0.5)
4 BORK 30.1 -0.921 -1.421 (-0.5)
5 BUSU 22.6 -8.38 -8.88 (-0.5)
6 CADZ 28.5 2.46 2.96 (+0.5)
7 DOLL 27.9 -3.07 -3.57 (-0.5)
8 EGMO 28.3 -2.69 -3.19 (-0.5)
9 EIER 30.8 0.233 0.733 (+0.5)
10 FANO 23.1 -7.89 -8.39 (-0.5)
How can I tackle this? Is there something in dplyr possible? with the 'ifelse' function? recognizing values when they are haven the '-' in front of them....
Thank you I advance!
Another way:
df$Diff05 <- df$Diff + 0.5 * sign(df$Diff)
Station Mean_length Diff Diff05
1 AMEL 28.1 -2.910 -3.410
2 AMRU 21.1 -9.900 -10.400
3 BALG 31.0 0.000 0.000
4 BORK 30.1 -0.921 -1.421
5 BUSU 22.6 -8.380 -8.880
6 CADZ 28.5 2.460 2.960
7 DOLL 27.9 -3.070 -3.570
8 EGMO 28.3 -2.690 -3.190
9 EIER 30.8 0.233 0.733
10 FANO 23.1 -7.890 -8.390
You could also use df$Diff + (df$Diff>0) - 0.5
Does this work:
library(dplyr)
df %>% mutate(Diff05 = if_else(Diff < 0, Diff - 0.5, Diff + 0.5))
# A tibble: 10 x 4
station Mean_length Diff Diff05
<chr> <dbl> <dbl> <dbl>
1 AMEL 28.1 -2.91 -3.41
2 AMRU 21.1 -9.9 -10.4
3 BALG 31 0 0.5
4 BORK 30.1 -0.921 -1.42
5 BUSU 22.6 -8.38 -8.88
6 CADZ 28.5 2.46 2.96
7 DOLL 27.9 -3.07 -3.57
8 EGMO 28.3 -2.69 -3.19
9 EIER 30.8 0.233 0.733
10 FANO 23.1 -7.89 -8.39
The logical way
df$Diff05 <- ifelse(test = df$Diff < 0, yes = df$Diff - 0.5, no = df$Diff + 0.5)

Tidyverse: Error in as.matrix : attempt to apply non-function

I am trying to calculate SPEI values using SPEI package and Hargreaves method. I want to automate the process so that I can calculate SPEI for all 6 stations in one go and save them to a new file spei.3.
SPEI is calculated in three steps. First, we calculate PET values (spei_pet), which is then subtracted from Precipitation value to calculate climatic water balance (spei_cwbal). The CWBAL value is then used in SPEI function from the package of the same name with a scale to calculate SPEI values.
I am new to R and very new to tidyverse, but the internet says they are easier to work on. I wrote the code below to do my task. But I am surely missing something (or maybe, many things) because the code throws an error. Please help me identify error in my code, and help me get a solution.
library(tidyverse)
library(SPEI)
file_path = "I:/Proj/Excel sheets - climate/SPI/heatmap/spei_forecast_data.xlsx"
file_forecast = openxlsx::read.xlsx(file_path)
##spei calculation
spei.scale = c(3, 6, 9, 12, 15, 24)
stations = c(1:3, 5:7)
lat = c(23.29, 23.08, 22.95, 22.62, 22.43, 22.40)
lat.fn = function(i) {
if (i <= 3)
lat.fn = lat[i]
else if (i == 5)
lat.fn = lat[4]
else if (i == 6)
lat.fn = lat[5]
else if (i == 7)
lat.fn = lat[6]
}
for ( i in stations) {
file_forecast %>%
mutate(spei_pet[i] <- hargreaves(Tmin = file_forecast$paste("tmin", i),
Tmax = file_forecast$paste("tmax", i),
Pre = file_forecast$paste("p", i),
lat = lat.fn[i])) %>%
mutate(spei_cwbal[i] <- spei_pet[[i]] - file_forecast$paste("p", i)) %>%
mutate(spei.3[i] <- spei(spei_cwbal[[i]], scale = 3))
}
It throws an error
Error in as.matrix(Tmin) : attempt to apply non-function
lat.fn[i] also throws an error, which gets rectified if I use no i. But I need to use some kind of function so that lat.fn takes different value depending on i.
Error in lat.fn[i] : object of type 'closure' is not subsettable
Thanks.
Edit: The data is in the form of a data.frame. I converted it into a tibble to give an idea of what it looks like.
> file_forecast
# A tibble: 960 x 20
Month p7 p6 p5 p3 p2 p1 tmax7 tmax6 tmax5 tmax3 tmax2 tmax1 tmin7 tmin6 tmin5 tmin3 tmin2 tmin1
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Jan 0.162 0.185 0.293 0.436 0.529 0.658 26.4 26.5 26.2 25.9 25.7 24.9 9.57 9.75 10.0 10.4 9.94 9.77
2 Feb 0.207 0.305 0.250 0.260 0.240 0.186 32.2 32.2 32.1 31.9 31.8 30.9 12.4 12.7 12.7 13.0 12.2 11.9
3 Mar 0.511 0.650 0.602 0.636 0.625 0.501 37.3 37.1 37.1 37.0 36.9 36.1 18.7 19.3 18.3 18.0 17.3 16.9
4 Apr 0.976 1.12 1.05 1.12 1.17 1.16 39.5 39.2 39.6 39.5 39.5 38.8 22.8 23.2 22.5 22.2 21.7 20.8
5 May 3.86 4.12 3.76 4.29 4.15 3.84 38.2 37.9 38.3 38.1 38.2 37.6 25.1 25.4 24.9 24.7 24.5 23.8
6 Jun 7.31 8.27 7.20 8.51 9.14 8.76 38.0 37.6 38.1 38.0 38.0 37.7 27.2 27.3 26.9 26.7 26.6 26.1
7 Jul 13.9 15.6 13.2 17.0 19.1 17.8 33.9 33.6 34.0 33.9 33.8 33.5 26.8 26.9 26.6 26.5 26.4 26.0
8 Aug 15.2 17.2 14.4 18.6 20.1 18.4 32.6 32.4 32.7 32.4 32.3 32.0 26.2 26.4 26.1 25.9 25.9 25.4
9 Sep 11.4 11.9 10.5 12.9 13.2 13.1 31.9 31.9 31.8 31.5 31.5 30.9 24.4 24.6 24.3 24.3 24.3 23.7
10 Oct 5.19 5.76 4.81 5.40 5.44 5.04 29.8 30.0 29.6 29.3 29.3 28.6 20.9 21.1 20.8 20.9 20.8 20.2
# ... with 950 more rows, and 1 more variable: year <dbl>

Removed rows containing missing values ggmap

So I'm trying to make a 2d latitude-longitude map with the package ggmap and I'm encountering a problem:
dataset:
slddataset
# A tibble: 382 x 17
station year jd sl_pa sst sss ssf depth sbt sbs sbf gravel sand silt clay lat long
<int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4101 2014 142 0 11.7 25.0 0.419 39.9 4.95 31.9 0.320 2.36 97.5 0.110 0.0300 42.2 70.3
2 4102 2014 142 0 11.3 37.8 0.509 27.6 5.03 31.9 0.372 0.390 99.5 0.0700 0.0200 42.2 70.3
3 4104 2014 142 0 11.3 41.2 0.803 24.9 5.50 31.7 0.556 0.700 99.2 0.0800 0.0700 42.2 70.3
4 4105 2014 142 0 10.6 30.8 0.808 28.3 5.14 31.9 0.596 6.83 93.1 0.0700 0.0300 42.2 70.2
5 4106 2014 142 0 10.5 30.7 0.693 35.6 4.93 32.1 0.887 10.8 89.1 0.0500 0.0700 42.2 70.2
6 4107 2014 142 0 11.0 30.7 0.724 41.3 4.44 32.3 0.684 11.3 88.5 0.110 0.120 42.2 70.2
7 4108 2014 142 0 10.3 30.8 0.741 44.4 4.28 32.5 0.340 4.77 95.0 0.110 0.100 42.2 70.1
8 4109 2014 142 0 9.97 30.9 0.980 44.3 4.32 32.4 0.398 7.80 92.0 0.110 0.110 42.2 70.1
9 4110 2014 142 0 10.9 30.7 0.794 41.2 4.60 32.3 0.592 10.3 89.5 0.100 0.0900 42.2 70.2
10 4113 2014 143 0 12.0 30.5 0.684 32.2 4.98 31.9 0.336 0.320 99.6 0.0600 0.0300 42.2 70.3
# ... with 372 more rows
error:
library(ggmap)
stellwagen<-ggmap(get_googlemap(center="stellwagen bank",zoom=7,maptype = "satellite"))
stellwagen + geom_point(aes(x=long, y=lat, color=sl_pa),data=slddataset)
Warning message: Removed 382 rows containing missing values
(geom_point).
Anyone have any ideas?
I think your longitudes are wrong in slddataset. They should all be negative. After correcting those, I can plot the points on the map.
library(dplyr)
library(ggmap)
slddataset <- slddataset %>% mutate(long = long * -1)
stellwagen<-ggmap(get_googlemap(center="stellwagen bank",zoom=7,maptype = "satellite"))
stellwagen +
geom_point(aes(x=long, y=lat),data=slddataset)
DATA
slddataset <- read.table(text = "station year jd sl_pa sst sss ssf depth sbt sbs sbf gravel sand silt clay lat long
1 4101 2014 142 0 11.7 25.0 0.419 39.9 4.95 31.9 0.320 2.36 97.5 0.110 0.0300 42.2 70.3
2 4102 2014 142 0 11.3 37.8 0.509 27.6 5.03 31.9 0.372 0.390 99.5 0.0700 0.0200 42.2 70.3
3 4104 2014 142 0 11.3 41.2 0.803 24.9 5.50 31.7 0.556 0.700 99.2 0.0800 0.0700 42.2 70.3
4 4105 2014 142 0 10.6 30.8 0.808 28.3 5.14 31.9 0.596 6.83 93.1 0.0700 0.0300 42.2 70.2
5 4106 2014 142 0 10.5 30.7 0.693 35.6 4.93 32.1 0.887 10.8 89.1 0.0500 0.0700 42.2 70.2
6 4107 2014 142 0 11.0 30.7 0.724 41.3 4.44 32.3 0.684 11.3 88.5 0.110 0.120 42.2 70.2
7 4108 2014 142 0 10.3 30.8 0.741 44.4 4.28 32.5 0.340 4.77 95.0 0.110 0.100 42.2 70.1
8 4109 2014 142 0 9.97 30.9 0.980 44.3 4.32 32.4 0.398 7.80 92.0 0.110 0.110 42.2 70.1
9 4110 2014 142 0 10.9 30.7 0.794 41.2 4.60 32.3 0.592 10.3 89.5 0.100 0.0900 42.2 70.2
10 4113 2014 143 0 12.0 30.5 0.684 32.2 4.98 31.9 0.336 0.320 99.6 0.0600 0.0300 42.2 70.3",
header = TRUE, stringsAsFactors = FALSE)

Resources