sapply for each group using dplyr - r

df <- data.frame(group = rep(1:4, each = 10),
x1 = rnorm(40), x2 = rnorm(40), x3 = rnorm(40), x4 = rnorm(40),
X5 = rnorm(40), x6 = rnorm(40), x7 = rnorm(40))
sapply(df[, 4:ncol(df)], function(x) sd(x)/mean(x))
I want to apply this function for each group. How do I correct the below command?
df %>% dplyr::group_by(group) %>% do.call(sapply(.[, 4:ncol(.)] function(x) sd(x)/mean(x)))

If I understood your question/objective, the following will give the results you're seeking. It uses the plyr package over the dplyr package. You're likely running into issues using the %>% function with do.call as well, since %>% is just a shortcut for passing the preceding object as the first argument to the subsequent function, and do.call expects a named function as its first argument
library(plyr)
df <- data.frame(group = rep(1:4, each = 10),
x1 = rnorm(40), x2 = rnorm(40), x3 = rnorm(40), x4 = rnorm(40),
X5 = rnorm(40), x6 = rnorm(40), x7 = rnorm(40))
ddply(df,.(group),function(x)
{
sapply(x[,4:ncol(x)],function(y) sd(y)/mean(y))
})
Gives the following results
group x3 x4 X5 x6 x7
1 1 1.650401 -1.591829 1.509770 6.464991 3.520367
2 2 11.491301 -2.326737 -1.725810 -11.712510 2.293093
3 3 -3.623159 -1.416755 2.958689 1.629667 -4.318230
4 4 9.169641 -4.219095 2.083300 1.985500 -1.678107

Consider base R's by (object-oriented wrapper to tapply):
Data (seeded for reproducibility)
set.seed(3219)
df <- data.frame(group = rep(1:4, each = 10),
x1 = rnorm(40), x2 = rnorm(40), x3 = rnorm(40), x4 = rnorm(40),
X5 = rnorm(40), x6 = rnorm(40), x7 = rnorm(40))
by
by_list <- by(df, df$group, function(sub)
sapply(sub[, 4:ncol(sub)], function(x) sd(x)/mean(x))
)
# LIST
by_list
# df$group: 1
# x3 x4 X5 x6 x7
# -1.077354 2.252270 -2.256086 -1.716327 -5.273771
# ------------------------------------------------------------
# df$group: 2
# x3 x4 X5 x6 x7
# 2.580065 5.054094 -10.985927 32.716116 6.732901
# ------------------------------------------------------------
# df$group: 3
# x3 x4 X5 x6 x7
# -3.523565 -1.670539 -5.042595 -7.787303 -15.486737
# ------------------------------------------------------------
# df$group: 4
# x3 x4 X5 x6 x7
# -5.597470 -9.842997 1.985010 33.657188 2.629724
# MATRIX
do.call(rbind, by_list)
# x3 x4 X5 x6 x7
# 1 -1.077354 2.252270 -2.256086 -1.716327 -5.273771
# 2 2.580065 5.054094 -10.985927 32.716116 6.732901
# 3 -3.523565 -1.670539 -5.042595 -7.787303 -15.486737
# 4 -5.597470 -9.842997 1.985010 33.657188 2.629724

Related

Creating new column in data frame based on value matched to participant ID

I know there is a simple solution to this problem, as I solved it a couple of months ago, but have since lost the relevant file, and cannot for the life of me work out how I did it.
My data is in a long form, where each row represents a participant's answer to one question, with all rows for one participant sharing a common participant ID - e.g.
ParticipantID Question Resp
1 Age x1
1 Gender x2
1 Education x3
1 Q1 x4
1 Q2 x5
...
2 Age y1
2 Gender y2
...
etc
I want to add new columns to the data to associate the various demographic values with each answer provided by a given participant. So in the example above, I would have a new column "Age" which would take the value x1 for all rows where ParticipantID = 1, y1 for all rows where ParticipantID = 2, etc., like so:
ParticipantID Question Resp Age Gender ...
1 Age x1 x1 x2
1 Gender x2 x1 x2
1 Education x3 x1 x2
1 Q1 x4 x1 x2
1 Q2 x5 x1 x2
...
2 Age y1 y1 y2
2 Gender y2 y1 y2
...
etc
Importantly, I can't just rotate the table from long to wide, because I need the study questions (represented as Q1, Q2, ... above) to remain in long form.
Any help that can be offered is greatly appreciated!
As long as each participant has the same questions in the same order, you can do
cbind(df, do.call(rbind, lapply(split(df, df$ParticipantID), function(x) {
setNames(as.data.frame(t(x[-1])[rep(2, nrow(x)),]), x[[2]])
})), row.names = NULL)
#> ParticipantID Question Resp Age Gender Education Q1 Q2
#> 1 1 Age x1 x1 x2 x3 x4 x5
#> 2 1 Gender x2 x1 x2 x3 x4 x5
#> 3 1 Education x3 x1 x2 x3 x4 x5
#> 4 1 Q1 x4 x1 x2 x3 x4 x5
#> 5 1 Q2 x5 x1 x2 x3 x4 x5
#> 6 2 Age y1 y1 y2 y3 y4 y5
#> 7 2 Gender y2 y1 y2 y3 y4 y5
#> 8 2 Education y3 y1 y2 y3 y4 y5
#> 9 2 Q1 y4 y1 y2 y3 y4 y5
#> 10 2 Q2 y5 y1 y2 y3 y4 y5
Data used
df <- structure(list(ParticipantID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), Question = c("Age", "Gender", "Education", "Q1",
"Q2", "Age", "Gender", "Education", "Q1", "Q2"), Resp = c("x1",
"x2", "x3", "x4", "x5", "y1", "y2", "y3", "y4", "y5")), class = "data.frame",
row.names = c(NA, -10L))
df
#> ParticipantID Question Resp
#> 1 1 Age x1
#> 2 1 Gender x2
#> 3 1 Education x3
#> 4 1 Q1 x4
#> 5 1 Q2 x5
#> 6 2 Age y1
#> 7 2 Gender y2
#> 8 2 Education y3
#> 9 2 Q1 y4
#> 10 2 Q2 y5
Created on 2022-09-19 with reprex v2.0.2

How to stack 6 specific columns in 2 columns with melt or gather function?

Here is an example of my question :
I would like to go from
A B C D E F G H
x1 x2 x3 x4 x5 x6 x7 x8
y1 y2 y3 y4 y5 y6 y7 y8
z1 z2 z3 z4 z5 z6 z7 z8
to
A B CDE FGH
x1 x2 x3 x6
x1 x2 x4 x7
x1 x2 x5 x8
y1 y2 y3 y7
y1 y2 y4 y6
y1 y2 y5 y8
I can manage to only stack 3 columns into one with this code
NewData= melt(setDT(Data),measure = list(c(6,7,8)), value.name = "FGH ")
We can use patterns
library(data.table)
melt(setDT(Data), measure = patterns("^[CDE]", "^[FGH]"),
value.name = c("CDE", "FGH"))[, variable := NULL][]
Or another option with unite
library(dplyr)
library(tidyr)
Data %>%
unite(CDE, C, D, E) %>%
unite(FGH, F, G, H) %>%
separate_rows(CDE, FGH)
data
Data <- structure(list(A = c("x1", "y1", "z1"), B = c("x2", "y2", "z2"
), C = c("x3", "y3", "z3"), D = c("x4", "y4", "z4"), E = c("x5",
"y5", "z5"), F = c("x6", "y6", "z6"), G = c("x7", "y7", "z7"),
H = c("x8", "y8", "z8")), class = "data.frame", row.names = c(NA,
-3L))
We can get the data in long format, create separate values for c('C', 'D', 'E') and other values and get the data in wide format.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -(A:B)) %>%
mutate(name = ifelse(name %in% c('C', 'D', 'E'), 'CDE', 'FGH')) %>%
group_by(name) %>%
mutate(row = row_number()) %>%
pivot_wider() %>%
select(-row)
# A tibble: 9 x 4
# A B CDE FGH
# <chr> <chr> <chr> <chr>
#1 x1 x2 x3 x6
#2 x1 x2 x4 x7
#3 x1 x2 x5 x8
#4 y1 y2 y3 y6
#5 y1 y2 y4 y7
#6 y1 y2 y5 y8
#7 z1 z2 z3 z6
#8 z1 z2 z4 z7
#9 z1 z2 z5 z8
data
df <- structure(list(A = c("x1", "y1", "z1"), B = c("x2", "y2", "z2"
), C = c("x3", "y3", "z3"), D = c("x4", "y4", "z4"), E = c("x5",
"y5", "z5"), F = c("x6", "y6", "z6"), G = c("x7", "y7", "z7"),
H = c("x8", "y8", "z8")), class = "data.frame", row.names = c(NA, -3L))

Paste 2 data frames side by side without any key

I have two data frames
A B E H
x1 x2 x3 x6
x1 x2 x4 x7
x1 x2 x5 x8
and
A B
y1 y2
y1 y2
and this is what i would like to achieve with dplyr or reshape2
A B E H A B
x1 x2 x3 x6 y1 y2
x1 x2 x4 x7 y1 y2
x1 x2 x5 x8
Thanks
If the number of rows are same use
cbind(df1, df2)
# A B E H A B
#1 x1 x2 x3 x6 y1 y2
#2 x1 x2 x4 x7 y1 y2
#3 x1 x2 x5 x8 y1 y2
Or in dplyr
library(dplyr)
library(stringr)
df2 %>%
rename_all(~ str_c(., ".1")) %>%
bind_cols(df1, .)
In some versions of dplyr (0.8.5), it would rename correctly when there are duplicate column names
bind_cols(df1, df2)
NOTE: It is not recommended to have same column names in data.frame so we could change the column names with make.unique
If we have two datasets with unequal number of rows
library(rowr)
cbind.fill(df1, df2new, fill = NA)
# A B E H A B
#1 x1 x2 x3 x6 y1 y2
#2 x1 x2 x4 x7 y1 y2
#3 x1 x2 x5 x8 <NA> <NA>
Or with base R
mxn <- max(nrow(df1), nrow(df2new))
df2new[(nrow(df2new)+1):mxn,] <- NA
cbind(df1, df2new)
# A B E H A B
#1 x1 x2 x3 x6 y1 y2
#2 x1 x2 x4 x7 y1 y2
#3 x1 x2 x5 x8 <NA> <NA>
data
df1 <- structure(list(A = c("x1", "x1", "x1"), B = c("x2", "x2", "x2"
), E = c("x3", "x4", "x5"), H = c("x6", "x7", "x8")),
class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(A = c("y1", "y1", "y1"), B = c("y2", "y2", "y2"
)), class = "data.frame", row.names = c(NA, -3L))
df2new <- structure(list(A = c("y1", "y1"), B = c("y2", "y2")), class = "data.frame", row.names = c(NA,
-2L))

Merging several df according to one column organized by another column

I am trying to achieve the following: I have several dataframes over several years.
df1
Name Ch1 Val1 Val2 ..
A a x1 x2
B a x3 x4
...
df2
Name Ch1 Val1 Val2 ..
A b x5 x6
B b x7 x8
...
df3
Name Ch1 Val1 Val2 ..
A c x9 x10
C c x11 x12
...
While a,b,c are years so lets say 2002, 2003, 2004.
Now I want to merge these dataframes so that each value of Name is listed for all years (i.e. Ch1) like the following:
df_final
Name Ch1 Val1 Val2 ..
A a x1 x2
b x5 x6
c x9 x10
B a x3 x4
b x6 x7
C c x11 x12
...
The problem is also that the values for "Name" are not always the same for all 3 dataframes (e.g. C).
Using dplyr:
library(dplyr)
bind_rows(df1,df2,df3) %>%
arrange(Name, Ch1) %>%
mutate(Name = replace(Name, duplicated(Name), ""))
#> Name Ch1 Val1 Val2
#> 1 A a x1 x2
#> 2 b x5 x6
#> 3 c x9 x10
#> 4 B a x3 x4
#> 5 b x7 x8
#> 6 C c x11 x12
Data:
df1 <- read.table(text="
Name Ch1 Val1 Val2
A a x1 x2
B a x3 x4", header=T, stringsAsFactor=F)
df2 <- read.table(text="
Name Ch1 Val1 Val2
A b x5 x6
B b x7 x8", header=T, stringsAsFactor=F)
df3 <- read.table(text="
Name Ch1 Val1 Val2
A c x9 x10
C c x11 x12", header=T, stringsAsFactor=F)

joining data.frames from within two lists based on regular expression

This is what my workspace looks like:
list.u = list(list.1 = replicate(n = 10,
expr = {data.frame(Var1 = as.factor(paste0("X", c(1:10))),
Var2 = as.factor(paste0("X", c(11:20))),
value=rnorm(10))},
simplify = F),
list.2 = replicate(n = 10,
expr = {data.frame(Var1 = as.factor(paste0("X", c(1:10))),
Var2 = as.factor(paste0("X", c(11:20))),
value=rnorm(10))},
simplify = F))
list2env(list.u , .GlobalEnv )
names(list.1) <- paste0(LETTERS[1:10],"_NTI")
names(list.2) <- sample(paste0(LETTERS[1:10],"_RC")) # not the same order
###if meaningful can again be possibly converted to
###list.u <- list(list.1, list.2)
What i want to achieve is the joining of two correspondent data.frames based on the string found bevore _NTI and _RC, respectively:
library(dplyr)
df.A <- list.1$A_NTI %>% right_join(list.2$A_RC, by=c("Var1","Var2"))
df.B <- list.1$B_NTI %>% right_join(list.2$B_RC, by=c("Var1","Var2"))
df.C <- list.1$C_NTI %>% right_join(list.2$C_RC, by=c("Var1","Var2"))
and so on for every pair of matching elements of list.1 and list.2
How can i do this`?
You can first match the names using a simple regex, rearrange the data frames in the list, and merge one by one, i.e.
list.1 <- list.1[names(list.1)[match(sub('_.*', '', names(list.1)), sub('_.*', '', names(list.2)))]]
Map(function(i, j)merge(i, j, by = c('Var1', 'Var2'), all.y = TRUE), list.1, list.2)
which gives,
$A_NTI
Var1 Var2 value.x value.y
1 X1 X11 1.111072143 0.9893348
2 X10 X20 0.205016698 -1.0370611
3 X2 X12 -1.153484350 -0.1581219
4 X3 X13 -0.136188465 -0.8258913
5 X4 X14 0.845438616 1.0676754
6 X5 X15 -0.090040790 -0.6626899
7 X6 X16 -0.003032729 0.4220376
8 X7 X17 0.132374562 -0.5993826
9 X8 X18 -0.049654084 0.1161918
10 X9 X19 0.408352891 -0.4193510
$B_NTI
Var1 Var2 value.x value.y
1 X1 X11 -1.54096443 1.6954890
2 X10 X20 0.08418433 -1.1082467
3 X2 X12 0.77535586 0.9035127
4 X3 X13 -1.82040060 0.1870822
5 X4 X14 -1.00129026 -1.6371800
6 X5 X15 0.32455294 0.4544704
7 X6 X16 0.25704291 -0.1451332
8 X7 X17 0.61232730 2.1936744
9 X8 X18 0.43594609 -2.3836932
10 X9 X19 -0.23466536 1.3418739
$C_NTI
Var1 Var2 value.x value.y
1 X1 X11 -0.02400835 0.03265689
2 X10 X20 -1.78936480 1.55964999
....
...
NOTE: The merge(..., all.y = TRUE) is the base R equivalent of dplyr::right_join
stopifnot(length(list.1) == length(list.2))
stopifnot(length(setdiff(substr(names(list.1), 1, 1), substr(names(list.2), 1, 1))) == 0)
Looks like it'll do here to just order each list alphabetical before merging.
Map(merge, list.1[order(names(list.1))], list.2[order(names(list.2))], all.y=TRUE)

Resources