dplyr apply a single function with changing argument to the same column - r

I have a simple df with one column, and I want to create multiple new columns using a single function (sum_x in this case) with only an argument changing. Is there a way to do this more efficiently than the way I have shown below in dplyr? Ideally I could incorporate sum_vec and do this in a single line to create 100 new columns. This seems to be a very simple problem, but I don't know how to solve this efficiently using dplyr.
df <- data.frame(x = 1:20)
sum_x <- function(x, y){
x + y
}
sum_vec <- 1:100
df %>% mutate(x_1 = sum_x(x, 1)) %>% mutate(x_2 = sum_x(x, 2)) %>% mutate(x_3 = sum_x(x, 3))

try it this way
library(tidyverse)
bind_cols(df, map_dfc(1:3, ~ df %>% transmute(!!paste0("x_", .x) := x + .x)))
x x_1 x_2 x_3
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8
6 6 7 8 9
7 7 8 9 10
8 8 9 10 11
9 9 10 11 12
10 10 11 12 13
11 11 12 13 14
12 12 13 14 15
13 13 14 15 16
14 14 15 16 17
15 15 16 17 18
16 16 17 18 19
17 17 18 19 20
18 18 19 20 21
19 19 20 21 22
20 20 21 22 23

If you don't like for loop, I am with you. So I'm not sure if this is good/efficient. Interested in a better solution.
library(dplyr)
for (i in 1:100) {
x_header = paste("x", i, sep = "_")
df = df %>%
mutate(!!x_header := sum_x(x, i))
}
> df[1:11, 1:11]
x x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10
1 1 2 3 4 5 6 7 8 9 10 11
2 2 3 4 5 6 7 8 9 10 11 12
3 3 4 5 6 7 8 9 10 11 12 13
4 4 5 6 7 8 9 10 11 12 13 14
5 5 6 7 8 9 10 11 12 13 14 15
6 6 7 8 9 10 11 12 13 14 15 16
7 7 8 9 10 11 12 13 14 15 16 17
8 8 9 10 11 12 13 14 15 16 17 18
9 9 10 11 12 13 14 15 16 17 18 19
10 10 11 12 13 14 15 16 17 18 19 20
11 11 12 13 14 15 16 17 18 19 20 21

Related

Convert dataframe from vertical to horizontal

I already checked many questions and I don't seem to find the suitable answer.
I have this df
df = data.frame(x = 1:10,y=11:20)
the output
x y
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
I just wish the output to be:
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20
thanks
Try t() like below
> data.frame(t(df), check.names = FALSE)
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20
A transpose should do it
setNames(data.frame(t(df)), df[,"x"])
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20

Error in R function rmcorr: Error in psych::r.con(rmcorrvalue, errordf, p = CI.level) : number of subjects must be greater than 3

I'm trying to do a repeated measures correlation in R using rmcorr, but received the above error, even though I have more than 3 subjects.
> scores$SUBJECT
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
[36] 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[71] 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5
[106] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
[141] 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8
[176] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
[211] 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11
[246] 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
[281] 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14
[316] 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15
[351] 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 17
[386] 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18
[421] 18 18 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
[456] 19 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 21 21 21 21 21 21 21 21 21 21
[491] 21 21 21 21 21 21 21 21 21 21 21 21 21 21
# Convert data types
scores$SUBJECT<-factor(scores$SUBJECT)
scores$FACTOR1<-factor(scores$FACTOR1)
scores$FACTOR2<-factor(scores$FACTOR2)
Interestingly, I was able to perform the correlation on some subsets of the data but not others.
# SUBSETS
subset1 <- subset(scores, FACTOR1 == "m1")
subset1a <- subset(subset1, FACTOR2 == "a")
subset1b <- subset(subset1, FACTOR2 == "b")
subset1c <- subset(subset1, FACTOR2 == "c")
subset2 <- subset(scores, FACTOR1 == "mp")
subset2a <- subset(subset2, FACTOR2 == "a")
subset2b <- subset(subset2, FACTOR2 == "b")
subset2c <- subset(subset2, FACTOR2 == "c")
rmcorr(participant = subset1$SUBJECT, measure1 = subset1$SCORE, measure2 = subset2$SCORE, dataset = scores)
rmcorr(participant = subset1a$SUBJECT, measure1 = subset1a$SCORE, measure2 = subset2a$SCORE, dataset = scores)
rmcorr(participant = subset1b$SUBJECT, measure1 = subset1b$SCORE, measure2 = subset2b$SCORE, dataset = scores)
rmcorr(participant = subset1c$SUBJECT, measure1 = subset1c$SCORE, measure2 = subset2c$SCORE, dataset = scores)
Specifically
rmcorr(participant = subset1$SUBJECT, measure1 = subset1$SCORE, measure2 = subset2$SCORE, dataset = scores)
worked, but all of the other calls to rmcorr generated the error. Does anyone know where I went wrong?

Split into groups based on (multiple) conditions?

I have set of marbles, of different colors and weights, and I want to split them into groups based on their weight and color.
The conditions are:
A group cannot weigh more than 100 units
A group cannot have more than 5 different-colored marbles.
A reproducible example:
marbles <- data.frame(color=sample(1:20, 20), weight=sample(1:40, 20, replace=T))
color weight
1 1 22
2 15 33
3 13 35
4 11 13
5 6 26
6 8 15
7 10 3
8 16 22
9 14 21
10 3 16
11 4 26
12 20 30
13 9 31
14 2 16
15 7 12
16 17 13
17 19 19
18 5 17
19 12 12
20 18 40
And what I want is this group column:
color weight group
1 1 22 1
2 15 33 1
3 13 35 1
4 11 13 2
5 6 26 2
6 8 15 2
7 10 3 2
8 16 22 2
9 14 21 3
10 3 16 3
11 4 26 3
12 20 30 3
13 9 31 4
14 2 16 4
15 7 12 4
16 17 13 4
17 19 19 4
18 5 17 5
19 12 12 5
20 18 40 5
TIA.
The below isn't an optimal assignment to the groups, it just does it sequentially through the data frame. It's uses rowwise and might not be the most efficient way as it's not a vectorized approach.
library(dplyr)
marbles <- data.frame(color=sample(1:20, 20), weight=sample(1:40, 20, replace=T))
Below I create a rowwise function which we can apply using dplyr
assign_group <- function(color, weight) {
# Conditions
clists = append(color_list, color)
sum_val = group_sum + weight
num_colors = length(unique(color_list))
assign_condition = (sum_val <= 100 & num_colors <= 5)
#assign globals
cval <- if(assign_condition) clists else c(color)
sval <- ifelse(assign_condition, sum_val, weight)
gval <- ifelse(assign_condition, group_number, group_number + 1)
assign("color_list", cval, envir = .GlobalEnv)
assign("group_sum", sval, envir = .GlobalEnv)
assign("group_number", gval, envir = .GlobalEnv)
res = group_number
return(res)
}
I then setup a few global variables to track the allocation of the marbles to each group.
# globals
color_list <<- c()
group_sum <<- 0
group_number <<- 1
Finally run this function using mutate
test <- marbles %>% rowwise() %>% mutate(group = assign_group(color,weight)) %>% data.frame()
Which results in the below
color weight group
1 6 27 1
2 12 16 1
3 15 32 1
4 20 25 1
5 19 5 2
6 2 21 2
7 16 39 2
8 17 4 2
9 11 16 2
10 7 7 3
11 10 5 3
12 1 30 3
13 13 7 3
14 9 39 3
15 14 7 4
16 8 17 4
17 18 9 4
18 4 36 4
19 3 1 4
20 5 3 5
And seems to meet the constraints
test %>% group_by(group) %>% summarise(tot_w = sum(weight), n_c = length(unique(color)) )
group tot_w n_c
<dbl> <int> <int>
1 1 100 4
2 2 85 5
3 3 88 5
4 4 70 5
5 5 3 1
in base R you could write a recursive function as shown below:
create_group = function(df,a){
if(missing(a)) a = cumsum(df$weight)%/%100
b = !ave(df$color,a,FUN=seq_along)%%6
d = ave(df$weight,a+b,FUN=cumsum)>100
a = a+b+d
if (any(b|d)) create_group(df,a) else cbind(df,group = a+1)
}
create_group(df)
color weight group
1 1 22 1
2 15 33 1
3 13 35 1
4 11 13 2
5 6 26 2
6 8 15 2
7 10 3 2
8 16 22 2
9 14 21 3
10 3 16 3
11 4 26 3
12 20 30 3
13 9 31 4
14 2 16 4
15 7 12 4
16 17 13 4
17 19 19 4
18 5 17 5
19 12 12 5
20 18 40 5

R Generate a vector with increasing and then decreasing elements

How do I generate a vector in the form
1 2 ... 19 20 19 ... 2 1
Is it possible using the c() function?
You can use seq as well as rev function for the desired purpose.
seq
> c(1:20, seq(19,1,-1))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
As suggested by #jimbou,
> c(1:20, 19:1)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
> c(1:20, rev(1:19))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Changing every set of 5 rows in R

I have a dataframe that looks like this:
df$a <- 1:20
df$b <- 2:21
df$c <- 3:22
df <- as.data.frame(df)
> df
a b c
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7
6 6 7 8
7 7 8 9
8 8 9 10
9 9 10 11
10 10 11 12
11 11 12 13
12 12 13 14
13 13 14 15
14 14 15 16
15 15 16 17
16 16 17 18
17 17 18 19
18 18 19 20
19 19 20 21
20 20 21 22
I would like to add another column to the data frame (df$d) so that every 5 rows (df$d[seq(1, nrow(df), 4)]) would take the value of the start of the respective row in the first column: df$a.
I have tried the manual way, but was wondering if there is a for loop or shorter way that can do this easily. I'm new to R, so I apologize if this seems trivial to some people.
"Manual" way:
df$d[1:5] <- df$a[1]
df$d[6:10] <- df$a[6]
df$d[11:15] <- df$a[11]
df$d[16:20] <- df$a[16]
>df
a b c d
1 1 2 3 1
2 2 3 4 1
3 3 4 5 1
4 4 5 6 1
5 5 6 7 1
6 6 7 8 6
7 7 8 9 6
8 8 9 10 6
9 9 10 11 6
10 10 11 12 6
11 11 12 13 11
12 12 13 14 11
13 13 14 15 11
14 14 15 16 11
15 15 16 17 11
16 16 17 18 16
17 17 18 19 16
18 18 19 20 16
19 19 20 21 16
20 20 21 22 16
I have tried
for (i in 1:nrow(df))
{df$d[i:(i+4)] <- df$a[seq(1, nrow(df), 4)]}
But this is not going the way I want it to. What am I doing wrong?
This should work:
df$d <- rep(df$a[seq(1,nrow(df),5)],each=5)
And here's a data.table solution:
library(data.table)
dt = data.table(df)
dt[, d := a[1], by = (seq_len(nrow(dt))-1) %/% 5]
I'd use logical indexing after initializing to NA
df$d <- NA
df$d <- rep(df$a[ c(TRUE, rep(FALSE,4)) ], each=5)
df
#--------
a b c d
1 1 2 3 1
2 2 3 4 1
3 3 4 5 1
4 4 5 6 1
5 5 6 7 1
6 6 7 8 6
7 7 8 9 6
8 8 9 10 6
9 9 10 11 6
10 10 11 12 6
11 11 12 13 11
12 12 13 14 11
13 13 14 15 11
14 14 15 16 11
15 15 16 17 11
16 16 17 18 16
17 17 18 19 16
18 18 19 20 16
19 19 20 21 16
20 20 21 22 16

Resources