how to create for loop in r - r

I'd like to create for loop throgh norm1~norm31.
norm1 = norm1 %>% group_by(ID_Pair) %>% summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T)
, Norm_avg_BT=mean(ID_avg_BT, na.rm=T))
norm2 = norm2 %>% group_by(ID_Pair) %>% summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T)
, Norm_avg_BT=mean(ID_avg_BT, na.rm=T))
I'd like to create norm1~norm31 as above.
I've tried this code but keep getting the error message.
for (i in 1:31){
nam=paste("norm",i,sep="")
assign(nam,nam %>% group_by(ID_Pair) %>% summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T)
, Norm_avg_BT=mean(ID_avg_BT, na.rm=T)))}
[Error]
Error in UseMethod ("group_by_"): no applicable methid for "group_by_" applied to an object of class "Character"

There may be better ways to organize your data. But if you like, using get() may solve your problem.
for (i in 1:31){
nam=get(paste("norm",i,sep=""))
assign(nam,nam %>% group_by(ID_Pair) %>% summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T)
, Norm_avg_BT=mean(ID_avg_BT, na.rm=T)))
}

First, let's put all your norms into a single list:
normlist<-lapply(paste0("norm",1:31),get)
Now, we can use lapply to do your thing for each of the norms:
thing<-function(x) {x %>% group_by(ID_Pair) %>%
summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T),
Norm_avg_BT=mean(ID_avg_BT, na.rm=T))}
lapply(normlist,thing)
Example with some fake data:
a1<-data.frame(id=rep(letters[1:5],3),nums=1:15)
a2<-data.frame(id=rep(letters[6:10],3),nums=16:30)
alist<-lapply(paste0("a",1:2),get)
thing<-function(x) {x %>% group_by(id) %>% summarize(means=mean(nums))}
lapply(alist,thing)
[[1]]
# A tibble: 5 x 2
id means
<fct> <dbl>
1 a 6.
2 b 7.
3 c 8.
4 d 9.
5 e 10.
[[2]]
# A tibble: 5 x 2
id means
<fct> <dbl>
1 f 21.
2 g 22.
3 h 23.
4 i 24.
5 j 25.
If you want to preserve the names, you can use sapply with simplify=FALSE instead of lapply.

Related

dplyr arrange data frame based on column position with new R pipe |>

I want to sort my data frame based on a column that I pass to dplyr's arrange function with its position. This works as long as I'm using the "old" tidyverse/magrittr pipe operator. However, changing it to the new R pipe returns an error:
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
# Works
df %>%
arrange(.[1])
x y
1 1 3
2 3 1
3 4 2
4 5 4
# Throws error
df |>
arrange(.[1])
Error:
! arrange() failed at implicit mutate() step.
Problem with `mutate()` column `..1`.
i `..1 = .[1]`.
x object '.' not found
Run `rlang::last_error()` to see where the error occurred.
How can I still arrange by column position when using the new R pipe?
I realize that the |> operator does not accept the "." as an argument, but I still don't know how else I could address the data then.
Update:
This seems to work, but wondering if there is something more straightforward:
df |>
arrange(cur_data() |> select(1))
You can pass a lambda function (suggestion by #Martin Morgan in the comments to specify the columns position instead of names):
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
df |>
(\(z) arrange(z, z[[1]]))()
# x y
# 1 1 3
# 2 3 1
# 3 4 2
# 4 5 4
With order, this looks okay:
df |>
(\(z) z[order(z[,1]), ])()
x y
3 1 3
1 3 1
2 4 2
4 5 4
|> does not support dot but tidyverse functions do support cur_data().
# 1
df |> arrange(cur_data()[1])
Another possibility is the Bizarro pipe which is not really a pipe but does look like one and uses only base R.
# 2
df ->.; arrange(., .[1])
or any of these work-arounds
# 3
arrange1 <- function(.) arrange(., .[1])
df |> arrange1()
# 4
df |> (function(.) arrange(., .[1]))()
# 5
df |> list() |> setNames(".") |> with(arrange(., .[1]))
# 6
with. <- function(data, expr, ...) {
eval(substitute(expr), list(. = data), enclos = parent.frame())
}
df |> with.(arrange(., .[1]))
# these hard code variable names so are not directly comparable
# but can be used if that is ok
# 7
df |> arrange(x)
# 8
df |> with(arrange(data.frame(x, y), x))

Dplyr to calculate mean, SD, and graph multiple variables

I have a table with columns
[Time, var1, var2, var3, var4...varN]
I need to calculate mean/SE per Time for each var1, var2...var n , and I want to do this programmatically for all variables, rather than 1 at a time which would involve a lot of copy-pasting.
Section 8.2.3 here https://tidyeval.tidyverse.org/dplyr.html is close to what I want but my below code:
x <- as.data.frame(matrix(nrow = 2, ncol = 3))
x[1,1] = 1
x[1,2] = 2
x[1,3] = 3
x[2,1] =4
x[2,2] = 5
x[2,3] = 6
names(x)[1] <- "time"
names(x)[2] <- "var1"
names(x)[3] <- "var2"
grouped_mean3 <- function(.data, ...) {
print(.data)
summary_vars <- enquos(...)
print(summary_vars)
summary_vars <- purrr::map(summary_vars, function(var) {
expr(mean(!!var, na.rm = TRUE))
})
print(summary_vars)
.data %>%
group_by(time)
summarise(!!!summary_vars) # Unquote-splice the list
}
grouped_mean3(x, var("var1"), var("var2"))
Yields
Error in !summary_vars : invalid argument type
And the original cause is "Must group by variables found in .data." and it finds a column that isn't in the dummy "x" that I generated for the purposes of testing. I have no idea what's happening, sadly.
How do I actually extract the mean from the new summary_vars and add it to the .data table? summary_vars becomes something like
[[1]]
mean(~var1, na.rm = TRUE)
[[2]]
mean(~var2, na.rm = TRUE)
Which seems close, but needs evaluation. How do I evaluate this? !!! wasn't working.
For what it's worth, I tried plugging the example in dplyr into this R engine https://rdrr.io/cran/dplyr/man/starwars.html and it didn't work either.
Help?
End goal would be a table along the lines of
[Time, var1mean, var2mean, var3mean, var4mean...]
Try this :
library(dplyr)
grouped_mean3 <- function(.data, ...) {
vars <- c(...)
.data %>%
group_by(time) %>%
summarise(across(all_of(vars), mean))
}
grouped_mean3(x, 'var1')
# time var1mean
# <dbl> <dbl>
#1 1 2
#2 4 5
grouped_mean3(x, 'var1', 'var2')
# time var1mean var2mean
# <dbl> <dbl> <dbl>
#1 1 2 3
#2 4 5 6
Perhaps this is what you are looking for?
x %>%
group_by(time) %>%
summarise_at(vars(starts_with('var')), ~mean(.,na.rm=T)) %>%
rename_at(vars(starts_with('var')),funs(paste(.,"mean"))) %>%
merge(x)
With your data (from your question) following is the output:
time var1mean var2mean var1 var2
1 1 2 3 2 3
2 4 5 6 5 6

R: rowwise nth element ordered_by row values

I have this input:
t <- data.frame(x=c(1,2,8,4), y=c(2,3,4,5), k=c(3,4,5,1))
And want to have the rowwise nth-lowest element of the dataframe ordered by the rowwise values, so that the output is something like this (example for nth_element = 2):
[1] 2 3 5 4
I tried a function like this:
apply(t, 1, nth, n=1, order_by = .)
But this does not work. Two questions:
What should I type in the order_by gument to make this function work?
Which is the best way to summarise rows with an own summary function if I don't want to mention the column names in the rowwise summary function?
Sidenote:
I don't want to mention the column names specifically, I want the function to use all rows in the dataset.
I tried the rownth function from the Rfast package but it only provides one result. Does anybody know what I do wrong?
We can use apply and sort to do this.
d <- data.frame(x=c(1,2,8,4), y=c(2,3,4,5), k=c(3,4,5,1))
nth_lowest <- 2
apply(d, 1, FUN = function(x) sort(x)[nth_lowest])
# [1] 2 3 5 4
Note that I am calling the data d instead of t. t is already a reserved name in R (matrix transpose function).
Not as elegant as #bouncyball's answer, but using dplyr (and tidyr), one possibility is to do:
library(dplyr)
library(tidyr)
t %>% mutate(Row = row_number()) %>%
pivot_longer(-Row, names_to = "Col", values_to = "Val") %>%
group_by(Row) %>%
arrange(Val) %>%
slice(2) %>%
select(Val)
Adding missing grouping variables: `Row`
# A tibble: 4 x 2
# Groups: Row [4]
Row Val
<int> <dbl>
1 1 2
2 2 3
3 3 5
4 4 4
Using Rfast you could reduce run time for big matrices and for matrices only.
d <- data.frame(x=c(1,2,8,4), y=c(2,3,4,5), k=c(3,4,5,1))
d<- Rfast::data.frame.to_matrix(d)
nth_lowests <- rep(2,ncol(d))
Rfast::rownth(d,nth_lowests)
# [1] 2 3 5 4
You could also use the parallel version of Rfast::rownth

Can you use a data.frame twice in a dplyr chain? dplyr says " Error: cannot handle "

I am trying to use a data.frame twice in a dplyr chain. Here is a simple example that gives an error
df <- data.frame(Value=1:10,Type=rep(c("A","B"),5))
df %>%
group_by(Type) %>%
summarize(X=n()) %>%
mutate(df %>%filter(Value>2) %>%
group_by(Type) %>%
summarize(Y=sum(Value)))
Error: cannot handle
So the idea is that first a data.frame is created with two columns Value which is just some data and Type which indicates which group the value is from.
I then try to use summarize to get the number of objects in each group, and then mutate, using the object again to get the sum of the values, after the data has been filtered. However I get the Error: cannot handle. Any ideas what is happening here?
Desired Output:
Type X Y
A 5 24
B 5 28
You could try the following
df %>%
group_by(Type) %>%
summarise(X = n(), Y = sum(Value[Value > 2]))
# Source: local data frame [2 x 3]
#
# Type X Y
# 1 A 5 24
# 2 B 5 28
The idea is to filter only Value by the desired condition, instead the whole data set
And a bonus solution
library(data.table)
setDT(df)[, .(X = .N, Y = sum(Value[Value > 2])), by = Type]
# Type X Y
# 1: A 5 24
# 2: B 5 28
Was going to suggest that to #nongkrong but he deleted, with base R we could also do
aggregate(Value ~ Type, df, function(x) c(length(x), sum(x[x>2])))
# Type Value.1 Value.2
# 1 A 5 24
# 2 B 5 28
This is also pretty easy to do with ifelse()
df %>% group_by(Type) %>% summarize(X=n(),y=sum( ifelse(Value>2, Value, 0 )))
outputs:
Source: local data frame [2 x 3]
Type X y
1 A 5 24
2 B 5 28

Using select_if with variable name and type conditions

There are plenty of posts on using dplyr's select_if for multiple conditions. However, in whatever way, selecting for both is.factor and variable names has not worked for me so far.
Ultimately, I would like to select all factors in a df/tibble and exclude certain variables by name.
Example:
df <- tibble(A = factor(c(0,1,0,1)),
B = factor(c("Yes","No","Yes","No")),
C = c(1,2,3,4))
Various attempts:
Attempt 1
df %>%
select_if(function(col) is.factor(col) & !str_detect(names(col), "A"))
Error in selected[[i]] <- .p(.tbl[[tibble_vars[[i]]]], ...) : replacement has length zero
Attempt 2
df %>%
select_if(function(col) is.factor(col) & negate(str_detect(names(col)), "A"))
Error: Can't convert a logical vector to function Call `rlang::last_error()` to see a backtrace
Attempt 3
df %>%
select_if(function(col) is.factor(col) && !str_detect(names(col), "A"))
Error: Only strings can be converted to symbols Call `rlang::last_error()` to see a backtrace
Attempt 4
df %>%
select_if(is.factor(.) && !str_detect(names(.), "A"))
Error in tbl_if_vars(.tbl, .predicate, caller_env(), .include_group_vars = TRUE) : length(.p) == length(tibble_vars) is not TRUE
In the meanwhile, individual conditions have no problem working:
> df %>%
+ select_if(is.factor)
# A tibble: 4 x 2
A B
<fct> <fct>
1 0 Yes
2 1 No
3 0 Yes
4 1 No
> df %>%
+ select_if(!str_detect(names(.), "A"))
# A tibble: 4 x 2
B c
<fct> <dbl>
1 Yes 1
2 No 2
3 Yes 3
4 No 4
The problem probably lies here:
df %>%
select_if(function(col) !str_detect(names(col), "A"))
Error in selected[[i]] <- .p(.tbl[[tibble_vars[[i]]]], ...) : replacement has length zero
However, I have little clue how to fix this.
Perhaps I'm missing something, but is there any reason you couldn't do the following:
df <- tibble(A = factor(c(0,1,0,1)),
B = factor(c("Yes","No","Yes","No")),
C = c(1,2,3,4))
df %>% select_if(function(col) is.factor(col)) %>% select_if(!str_detect(names(.), "A"))
# A tibble: 4 x 1
B
<fct>
1 Yes
2 No
3 Yes
4 No
Just for completeness, not sure if it is acceptable for you, but base R may save you some pain here (a first, very quick shot):
df[, sapply(names(df),
function(coln, df) !grepl("A", coln) && is.factor(df[[coln]]), df = df),
drop = FALSE]

Resources