Replacement of plyr::cbind.fill in dplyr?

Replacement of plyr::cbind.fill in dplyr? - r

I apologize if this question is elementary, but I've been scouring the internet and I can't seem to find a simple solution.
I currently have a list of R objects (named vectors or dataframes of 1 variable, I can work with either), and I want to join them into 1 large dataframe with 1 row for each unique name/rowname, and 1 column for each element in the original list.
My starting list looks something like:
l1 <- list(df1 = data.frame(c(1,2,3), row.names = c("A", "B", "C")),
df2 = data.frame(c(2,6), row.names = c("B", "D")),
df3 = data.frame(c(3,6,9), row.names = c("C", "D", "A")),
df4 = data.frame(c(4,12), row.names = c("A", "E")))
And I want the output to look like:
data.frame("df1" = c(1,2,3,NA,NA),
+ "df2" = c(NA,2,NA,6,NA),
+ "df3" = c(9,NA,3,6,NA),
+ "df4" = c(4,NA,NA,NA,12), row.names = c("A", "B", "C", "D", "E"))
df1 df2 df3 df4
A 1 NA 9 4
B 2 2 NA NA
C 3 NA 3 NA
D NA 6 6 NA
E NA NA NA 12
I don't mind if the fill values are NA or 0 (ultimately I want 0 but that's an easy fix).
I'm almost positive that plyr::cbind.fill does exactly this, but I have been using dplyr in the rest of my script and I don't think using both is a good idea. dplyr::bind_cols does not seem to work with vectors of different lengths. I'm aware a very similar question has been asked here: R: Is there a good replacement for plyr::rbind.fill in dplyr?
but as I mentioned, this solution doesn't actually seem to work. Neither does dplyr::full_join, even wrapped in a do.call. Is there a straightforward solution to this, or is the only solution to write a custom function?

We can convert the rownames to a column with rownames_to_column, then rename the second column, bind the list elements with bind_rows, and reshape to 'wide' with pivot_wider
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
map_dfr(l1, ~ rownames_to_column(.x, 'rn') %>%
rename_at(2, ~'v1'), .id = 'grp') %>%
pivot_wider(names_from = grp, values_from = v1) %>%
column_to_rownames('rn')

Here's a way with some purrr and dplyr functions. Create column names to represent each data frame—since each has only one column, this is easy with setNames, but with more columns you could use dplyr::rename. Do a full-join across the whole list based on the original row names, and fill NAs with 0.
library(dplyr)
library(purrr)
l1 %>%
imap(~setNames(.x, .y)) %>%
map(tibble::rownames_to_column) %>%
reduce(full_join, by = "rowname") %>%
mutate_all(tidyr::replace_na, 0)
#> rowname df1 df2 df3 df4
#> 1 A 1 0 9 4
#> 2 B 2 2 0 0
#> 3 C 3 0 3 0
#> 4 D 0 6 6 0
#> 5 E 0 0 0 12

Yet another purrr and dplyr option could be:
l1 %>%
map2_dfr(.x = ., .y = names(.), ~ setNames(.x, .y) %>%
rownames_to_column()) %>%
group_by(rowname) %>%
summarise_all(~ ifelse(all(is.na(.)), NA, first(na.omit(.))))
rowname df1 df2 df3 df4
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 1 NA 9 4
2 B 2 2 NA NA
3 C 3 NA 3 NA
4 D NA 6 6 NA
5 E NA NA NA 12

Related

How to move dataframe variable names to first row and add new variable names to multiple dataframes in a list?

library(purrr)
library(tibble)
library(dplyr)
Starting list of dataframes
lst <- list(df1 = data.frame(X.1 = as.character(1:2),
heading = letters[1:2]),
df2 = data.frame(X.32 = as.character(3:4),
another.topic = paste("Line ", 1:2)))
lst
#> $df1
#> X.1 heading
#> 1 1 a
#> 2 2 b
#>
#> $df2
#> X.32 another.topic
#> 1 3 Line 1
#> 2 4 Line 2
Expected "combined" dataframe, with new consistent variable names, and old variable names in the first row of each constituent dataframe.
#> id h1 h2
#> 1 df1 X.1 heading
#> 2 df1 1 a
#> 3 df1 2 b
#> 4 df2 X.32 another.topic
#> 5 df2 3 Line 1
#> 6 df2 4 Line 2
add_row requires "Name-value pairs, passed on to tibble(). Values can be defined only for columns that already exist in .data and unset columns will get an NA value."
Which is what I think I have achieved with this:
df_nms <-
map(lst, names) %>%
map(set_names)
#> $df1
#> X.1 heading
#> "X.1" "heading"
#>
#> $df2
#> X.32 another.topic
#> "X.32" "another.topic"
But I cannot tie up the last bit, using a purrr function to add the names to the head of each dataframe. I've tried numerous variations with map2 and pmap the closest I can get at present (if I treat add_row as a formula , prefixing it with ~ and remove the .y I get a new first row populated with NAs). I think I'm missing how to pass the name-value pairs to the add_row function.
map2(lst, df_nms, add_row(.x, .y, .before = 1)) %>%
map(set_names, c("h1", "h2")) %>%
map_dfr(bind_rows, .id = "id")
#> Error in add_row(.x, .y, .before = 1): object '.x' not found
A pointer to resolve this last step would be most appreciated.

Not quite sure how to do this via purrr map functions, but here is an alternative,
library(dplyr)
bind_rows(lapply(lst, function(i){d1 <- as.data.frame(matrix(names(i), ncol = ncol(i)));
rbind(d1, setNames(i, names(d1)))}), .id = 'id')
# id V1 V2
#1 df1 X.1 heading
#2 df1 1 a
#3 df1 2 b
#4 df2 X.32 another.topic
#5 df2 3 Line 1
#6 df2 4 Line 2

Here's an approach using map, rbindlist from data.table and some base R functions:
library(purrr)
library(dplyr)
library(data.table)
map(lst, ~ as.data.frame(unname(rbind(colnames(.x),as.matrix(.x))))) %>%
rbindlist(idcol = "id")
# id V1 V2
#1: df1 X.1 heading
#2: df1 1 a
#3: df1 2 b
#4: df2 X.32 another.topic
#5: df2 3 Line 1
#6: df2 4 Line 2
Alternatively we could use map_df if we use colnames<-:
map_df(lst, ~ as.data.frame(rbind(colnames(.x),as.matrix(.x))) %>%
`colnames<-`(.,paste0("h",seq(1,dim(.)[2]))), .id = "id")
# id h1 h2
#1 df1 X.1 heading
#2 df1 1 a
#3 df1 2 b
#4 df2 X.32 another.topic
#5 df2 3 Line 1
#6 df2 4 Line 2
Key things here are:
Use as.matrix to get rid of the factor / character incompatibility.
Remove names with unname or set them with colnames<-
Use the idcols = or .id = feature to get the names of the list as a column.

I altered your sample data a bit, setting stringsAsFactors to FALSE when creating the data.frames in lst.
here is a solution using data.table::rbindlist().
#sample data
lst <- list(df1 = data.frame(X.1 = as.character(1:2),
heading = letters[1:2],
stringsAsFactors = FALSE), # !! <--
df2 = data.frame(X.32 = as.character(3:4),
another.topic = paste("Line ", 1:2),
stringsAsFactors = FALSE) # !! <--
)
DT <- data.table::rbindlist( lapply( lst, function(x) rbind( names(x), x ) ),
use.names = FALSE, idcol = "id" )
setnames(DT, names( lst[[1]] ), c("h1", "h2") )
# id h1 h2
# 1: df1 X.1 heading
# 2: df1 1 a
# 3: df1 2 b
# 4: df2 X.32 another.topic
# 5: df2 3 Line 1
# 6: df2 4 Line 2

How to calculate rowMeans of columns with similar colnames in r?

I have a data frame with similar colnames.
I want to calculate rowMeans of columns A and B.
How can I do rowMeans between all A and B columns?
df <- data.frame(A1=c(1,2),A2=c(3,4),A3=c(5,6),A4=c(7,7),A5=c(8,8),A6=c(9,9))
colnames(df)<- c("A","A","B","B","B","C")

An option would be split by the similar column names into a list and then get the rowMeans
i1 <- grep("^(A|B)", names(df))
sapply(split.default(df[i1], names(df)[i1]), rowMeans)
# A B
#[1,] 2 6.666667
#[2,] 3 7.000000

We can iterate over unique names, subset them from original dataframe and take rowMeans.
sapply(c("A", "B"), function(x) rowMeans(df[,colnames(df) == x]))
# A B
#[1,] 2 6.67
#[2,] 3 7.00

An other option using the tidyverse:
library(tidyverse)
df[, "rn"] <- 1:nrow(df)
df %>%
gather(letter, value, -rn) %>%
mutate(letter = str_extract(letter, "[:alpha:]")) %>%
group_by(letter, rn) %>%
summarize(sum = mean(value)) %>%
filter(letter %in% c("A", "B"))
#> # A tibble: 4 x 3
#> # Groups: letter [2]
#> letter rn sum
#> <chr> <int> <dbl>
#> 1 A 1 2
#> 2 A 2 3
#> 3 B 1 6.67
#> 4 B 2 7

You would simply need to submit the dataframe by the columns you want, and then apply the rowMeans() function.
df <- data.frame(A1=c(1,2),A2=c(3,4),A3=c(5,6),A4=c(7,7),A5=c(8,8),A6=c(9,9))
colnames(df)<- c("A","A","B","B","B","C")
rowSums(df[,which(colnames(df) %in% c("A","B"))])
#[1] 24 27
However, as r2evans pointed out in the comment, you should avoid columns with the same names. You would just want to get the position of the columns that determine the start and end of the number of columns between and subset.
colnames(df) <- c(paste0("A",1:2), paste0("B", 1:3), "C1")
strt <- which(colnames(df) == "A1")
end <- which(colnames(df) == "B3")
columrange <- strt:end
rowSums(df[,columrange])
#[1] 24 27
There are many ways to subset by column names. If you didn't rename your columns in your example, you could use grepl() to find them:
df[,grepl("A",colnames(df)) | grepl("B",colnames(df))]
# A1 A2 B1 B2 B3
#1 1 3 5 7 8
#2 2 4 6 7 8

Group data by factor level, then transform to data frame with colname being levels?

There is my problem that I can't solve it:
Data:
df <- data.frame(f1=c("a", "a", "b", "b", "c", "c", "c"),
v1=c(10, 11, 4, 5, 0, 1, 2))
data.frame:f1 is factor
f1 v1
a 10
a 11
b 4
b 5
c 0
c 1
c 2
# What I want is:(for example, fetch data with the number of element of some level == 2, then to data.frame)
a b
10 4
11 5
Thanks in advance!

I might be missing something simple here , but the below approach using dplyr works.
library(dplyr)
nlevels = 2
df1 <- df %>%
add_count(f1) %>%
filter(n == nlevels) %>%
select(-n) %>%
mutate(rn = row_number()) %>%
spread(f1, v1) %>%
select(-rn)
This gives
# a b
# <int> <int>
#1 10 NA
#2 11 NA
#3 NA 4
#4 NA 5
Now, if you want to remove NA's we can do
do.call("cbind.data.frame", lapply(df1, function(x) x[!is.na(x)]))
# a b
#1 10 4
#2 11 5
As we have filtered the dataframe which has only nlevels observations, we would have same number of rows for each column in the final dataframe.

split might be useful here to split df$v1 into parts corresponding to df$f1. Since you are always extracting equal length chunks, it can then simply be combined back to a data.frame:
spl <- split(df$v1, df$f1)
data.frame(spl[lengths(spl)==2])
# a b
#1 10 4
#2 11 5
Or do it all in one call by combining this with Filter:
data.frame(Filter(function(x) length(x)==2, split(df$v1, df$f1)))
# a b
#1 10 4
#2 11 5

Here is a solution using unstack :
unstack(
droplevels(df[ave(df$v1, df$f1, FUN = function(x) length(x) == 2)==1,]),
v1 ~ f1)
# a b
# 1 10 4
# 2 11 5
A variant, similar to #thelatemail's solution :
data.frame(Filter(function(x) length(x) == 2, unstack(df,v1 ~ f1)))
My tidyverse solution would be:
library(tidyverse)
df %>%
group_by(f1) %>%
filter(n() == 2) %>%
mutate(i = row_number()) %>%
spread(f1, v1) %>%
select(-i)
# # A tibble: 2 x 2
# a b
# * <dbl> <dbl>
# 1 10 4
# 2 11 5
or mixing approaches :
as_tibble(keep(unstack(df,v1 ~ f1), ~length(.x) == 2))

Using all base functions (but you should use tidyverse)
# Add count of instances
x$len <- ave(x$v1, x$f1, FUN = length)
# Filter, drop the count
x <- x[x$len==2, c('f1','v1')]
# Hacky pivot
result <- data.frame(
lapply(unique(x$f1), FUN = function(y) x$v1[x$f1==y])
)
colnames(result) <- unique(x$f1)
> result
a b
1 10 4
2 11 5

I'd like code this, may it helps for you
library(reshape2)
library(dplyr)
aa = data.frame(v1=c('a','a','b','b','c','c','c'),f1=c(10,11,4,5,0,1,2))
cc = aa %>% group_by(v1) %>% summarise(id = length((v1)))
dd= merge(aa,cc) #get the level
ee = dd[dd$aa==2,] #select number of level equal to 2
ee$id = rep(c(1,2),nrow(ee)/2) # reset index like (1,2,1,2)
dcast(ee, id~v1,value.var = 'f1')
all done!

Append dataFrame columns to other columns with different names and order?

I am struggling with reordering a dataFrame in R.
My dataFrame has data coming from two different sensors. So in the beginning every column has a name with the syntax "sensor number.sample number". The rowname is a coordinate of each sample.
Sadly the columns are not ordered with an ascending sample number.
How can I make an automatic ordering where after number 1 comes 2 and not 10?
With correct ordered columns I would like to cut all columns of the second sensor and append it under the rows from the first sensor. This is also tricky as the number of columns of each sensor varies in the reality.
To distinguish between both sensors I would add a postfix "a" or "b" for the new rownames.
Here my problem is that I know "rbind" but it requires identical column names, I cannot provide here. And I would also need to select the columns manually as I have no clue how to automatically select all of the second sensor.
My idea for the moment is to make subsets for each sensor, rename the columns and then use rbind with both subsets. Is this a good idea?
The rownames I then could modify with paste().
I now present simplified frames as the original is quite big. So the numbers (c(1:3)) are just exemplary.
This is how my dataFrame looks at the beginning:
myDf = data.frame(a.10= c(1:3),a.11= c(1:3),a.12= c(1:3),a.13= c(1:3),a.2= c(1:3),a.3= c(1:3),a.4= c(1:3),a.5= c(1:3),a.6= c(1:3),a.7= c(1:3),a.8= c(1:3),a.9= c(1:3),
b.1= c(1:3),b.10= c(1:3),b.11= c(1:3),b.2= c(1:3),b.3= c(1:3),b.4= c(1:3),b.5= c(1:3),b.6= c(1:3),b.7= c(1:3),b.8= c(1:3),b.9= c(1:3))
My goal is to transform the dataFrame that is looks like that:
desiredDf =data.frame(n9=rep(c(1:3),2), n10=rep(c(1:3),2), n11=rep(c(1:3),2), n12=c(c(1:3),NA, NA, NA), n13=c(c(1:3), NA, NA, NA))
rownames(desiredDf)<-(c("1a","2a","3a","1b","2b","3b"))
Thank you very much!

Here is an option.
library(tidyverse)
myDF2 <- myDf %>% gather(measure, result, a.10:b.9) %>%
separate(measure, into = c("letter", "number"), sep = "\\.") %>%
group_by(letter, number)%>%
mutate(n = row_number()) %>%
unite(col, n, letter, sep = "") %>%
ungroup() %>%
arrange(as.numeric(number))%>%
mutate(number = paste0("n", number))%>%
mutate(number = factor(number, levels = unique(number)))%>%
spread(number, result)%>%
arrange(col)
row.names(myDF2) <- myDF2$col
myDF2$col <- NULL

Convert the row names to a column, reshape into long form and separate the key, i.e. the original column names, into columns group and no converting the latter to numeric. Sort, reshape back to wide form, sort again, combine the rowname and group and preface each column name with n.
library(dplyr)
library(tibble)
library(tidyr)
myDf %>%
rownames_to_column %>%
gather(key, value, -rowname) %>%
separate(key, c("group", "no"), convert = TRUE) %>%
arrange(group, no) %>%
spread(no, value) %>%
arrange(group, rowname) %>%
unite(rowname, rowname, group, sep = "") %>%
column_to_rownames %>%
rename_all(~ paste0("n", .))
giving:
n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 n12 n13
1a NA 1 1 1 1 1 1 1 1 1 1 1 1
2a NA 2 2 2 2 2 2 2 2 2 2 2 2
3a NA 3 3 3 3 3 3 3 3 3 3 3 3
1b 1 1 1 1 1 1 1 1 1 1 1 NA NA
2b 2 2 2 2 2 2 2 2 2 2 2 NA NA
3b 3 3 3 3 3 3 3 3 3 3 3 NA NA
Note
Above we used this for myDf, the input.
myDf <-
structure(list(a.10 = 1:3, a.11 = 1:3, a.12 = 1:3, a.13 = 1:3,
a.2 = 1:3, a.3 = 1:3, a.4 = 1:3, a.5 = 1:3, a.6 = 1:3, a.7 = 1:3,
a.8 = 1:3, a.9 = 1:3, b.1 = 1:3, b.10 = 1:3, b.11 = 1:3,
b.2 = 1:3, b.3 = 1:3, b.4 = 1:3, b.5 = 1:3, b.6 = 1:3, b.7 = 1:3,
b.8 = 1:3, b.9 = 1:3), class = "data.frame", row.names = c(NA,
-3L))

Tidying table with multiple groups of wide columns, using tidyverse

I often find myself in a situation where I have a table that contains multiple groups of wide columns, like so:
replicate groupA VA1 VA2 groupB VB1 VB2
1 1 a 0.3429166 -2.30336406 f 0.05363582 1.6454078
2 2 b -1.3183732 -0.13516849 g -0.42586417 0.1541541
3 3 c -0.7908358 -0.10746447 h 1.05134242 1.4297350
4 4 d -0.9963677 -1.82557058 i -1.14532536 1.0815733
5 5 e -1.3634609 0.04385812 j -0.65643595 -0.1452877
And I'd like to turn the columns into one long table, like so:
replicate group key value
1 1 a V1 0.34291665
2 2 b V1 -1.31837322
3 3 c V1 -0.79083580
4 4 d V1 -0.99636772
5 5 e V1 -1.36346088
6 1 a V2 -2.30336406
7 2 b V2 -0.13516849
8 3 c V2 -0.10746447
9 4 d V2 -1.82557058
10 5 e V2 0.04385812
11 1 f V1 0.05363582
12 2 g V1 -0.42586417
13 3 h V1 1.05134242
14 4 i V1 -1.14532536
15 5 j V1 -0.65643595
16 1 f V2 1.64540784
17 2 g V2 0.15415408
18 3 h V2 1.42973499
19 4 i V2 1.08157329
20 5 j V2 -0.14528774
I can do this by selecting the two groups of columns individually, tidying, and then rbinding together (code below). However, this approach doesn't seem particularly elegant, and it becomes cumbersome if there are more than two groups of columns. I'm wondering whether there's a more elegant approach, using a single pipe chain of data transformations.
The fundamental question here is: How do we automate the process of breaking the table into groups of columns, tidying those, and then combining back together.
My current code:
library(dplyr)
library(tidyr)
# generate example code
df_wide <- data.frame(replicate = 1:5,
groupA = letters[1:5],
VA1 = rnorm(5),
VA2 = rnorm(5),
groupB = letters[6:10],
VB1 = rnorm(5),
VB2 = rnorm(5))
# tidy columns with A in the name
dfA <- select(df_wide, replicate, groupA, VA1, VA2) %>%
gather(key, value, VA1, VA2) %>%
mutate(key = case_when(key == "VA1" ~ "V1",
key == "VA2" ~ "V2")) %>%
select(replicate, group = groupA, key, value)
# tidy columns with B in the name
dfB <- select(df_wide, replicate, groupB, VB1, VB2) %>%
gather(key, value, VB1, VB2) %>%
mutate(key = case_when(key == "VB1" ~ "V1",
key == "VB2" ~ "V2")) %>%
select(replicate, group = groupB, key, value)
# combine
df_long <- rbind(dfA, dfB)
Note: Similar questions have been asked here and here, but I think the accepted answer shows that this here is a subtly different problem.

1
Although the question asked for a tidyverse solution, there is a convenient option with melt from data.table, which also can take multiple patterns in the measure argument.
library(data.table)
setnames(melt(melt(setDT(df1), measure = patterns('group', 'VA', 'VB')),
id.var = 1:3)[, -4, with = FALSE], 2:3, c('key', 'group'))[]
2. a
with tidyverse we can subset the datasets into a list, then loop through the list with map_df convert it to 'long' format with gather to get a single data.frame
library(tidyverse)
list(df1[1:4], df1[c(1,5:7)]) %>%
map_df(~gather(., key, value, 3:4) %>%
{names(.)[2] <- 'group';.}) %>%
mutate(key = sub('(.).(.)', '\\1\\2', key))
# replicate group key value
#1 1 a V1 0.34291660
#2 2 b V1 -1.31837320
#3 3 c V1 -0.79083580
#4 4 d V1 -0.99636770
#5 5 e V1 -1.36346090
#6 1 a V2 -2.30336406
#7 2 b V2 -0.13516849
#8 3 c V2 -0.10746447
#9 4 d V2 -1.82557058
#10 5 e V2 0.04385812
#11 1 f V1 0.05363582
#12 2 g V1 -0.42586417
#13 3 h V1 1.05134242
#14 4 i V1 -1.14532536
#15 5 j V1 -0.65643595
#16 1 f V2 1.64540780
#17 2 g V2 0.15415410
#18 3 h V2 1.42973500
#19 4 i V2 1.08157330
#20 5 j V2 -0.14528770
2.b
If we need to split based on the occurence of 'group'
split.default(df1[-1], cumsum(grepl('group', names(df1)[-1]))) %>%
map(~bind_cols(df1[1], .)) %>%
map_df(~gather(., key, value, 3:4) %>%
{names(.)[2] <- 'group';.}) %>%
mutate(key = sub('(.).(.)', '\\1\\2', key))
2.c
Included rename_at instead of names assignment in the spirit of tidyverse options
df1[-1] %>%
split.default(cumsum(grepl('group', names(df1)[-1]))) %>%
map_df(~bind_cols(df1[1], .) %>%
gather(., key, value, 3:4) %>%
rename_at(2, funs(substring(.,1, 5))))
NOTE:
1) Both 2.a, 2.b, 2.c used tidyverse functions
2) It doesn't depend upon on the substring 'A' or 'B' in the column names
3) Assumed the patterns in the OP's dataset will be 'group' followed by value columns

1) This solution consists of a:
gather which generates the desired number of rows
a mutate which combines the groupA and groupB columns and changes the key column to that requested and
select which picks out the columns wanted.
First gather the columns whose names start with V and then create a new group column from groupA and groupB choosing groupA if the key has an A in it and groupB if the key has B in it. (We used mapply(switch, ...) here for easy extension to the 3+ group case but we could have used an ifelse, viz. ifelse(grepl("A", key), as.character(groupA), as.character(groupB)), given that we have only two groups.) The mutate also reduces the key names from VA1 to V1, etc. and finally select out the columns desired.
DF %>%
gather(key, value, starts_with("V")) %>%
mutate(group = mapply(switch, gsub("[^AB]", "", key), A = groupA, B = groupB),
key = sub("[AB]", "", key)) %>%
select(replicate, group, key, value)
giving:
replicate group key value
1 1 a V1 0.34291660
2 2 b V1 -1.31837320
3 3 c V1 -0.79083580
4 4 d V1 -0.99636770
5 5 e V1 -1.36346090
6 1 a V2 -2.30336406
7 2 b V2 -0.13516849
8 3 c V2 -0.10746447
9 4 d V2 -1.82557058
10 5 e V2 0.04385812
11 1 f V1 0.05363582
12 2 g V1 -0.42586417
13 3 h V1 1.05134242
14 4 i V1 -1.14532536
15 5 j V1 -0.65643595
16 1 f V2 1.64540780
17 2 g V2 0.15415410
18 3 h V2 1.42973500
19 4 i V2 1.08157330
20 5 j V2 -0.14528770
2) Another approach would be to split the columns into groups such that all columns in a group have the same name after removing A and B from their names. Performi unlist on each such group to reduce the list to a list of plain vectors and convert that list to a data.frame. Finally gather the V columns and rearrange. Note that rownames_to_column is from the tibble package.
DF %>%
as.list %>%
split(sub("[AB]", "", names(.))) %>%
lapply(unlist) %>%
as.data.frame %>%
rownames_to_column %>%
gather(key, value, starts_with("V")) %>%
arrange(gsub("[^AB]", "", rowname), key) %>%
select(replicate, group, key, value)
2a) If the row order is not important then the rownames_to_column, arrange and select lines could be omitted shortening it to this:
DF %>%
as.list %>%
split(sub("[AB]", "", names(.))) %>%
lapply(unlist) %>%
as.data.frame %>%
gather(key, value, starts_with("V"))
Solutions (2) and (2a) could easily be converted to base-only solutions by replacing the gather with the appropriate reshape from base as in the second reshape, i.e. the one producing d2, in (3).
3) Although the question asked for a tidyverse solution there is a fairly convenient base solution consisting of two reshape calls. The varying produced by the split is: list(group = c("groupA", "groupB"), V1 = c("VA1", "VB1"), V2 = c("VA2", "VB2")) -- that is it matches up the ith column in each set of columns.
varying <- split(names(DF)[-1], gsub("[AB]", "", names(DF))[-1])
d <- reshape(DF, dir = "long", varying = varying, v.names = names(varying))
d <- subset(d, select = -c(time, id))
d2 <- reshape(d, dir = "long", varying = list(grep("V", names(d))), v.names = "value",
timevar = "key")
d2 <- subset(d2, select = c(replication, group, key, value))
d2
Note: The input in reproducible form is:
DF <- structure(list(replicate = 1:5, groupA = structure(1:5, .Label = c("a",
"b", "c", "d", "e"), class = "factor"), VA1 = c(0.3429166, -1.3183732,
-0.7908358, -0.9963677, -1.3634609), VA2 = c(-2.30336406, -0.13516849,
-0.10746447, -1.82557058, 0.04385812), groupB = structure(1:5, .Label = c("f",
"g", "h", "i", "j"), class = "factor"), VB1 = c(0.05363582, -0.42586417,
1.05134242, -1.14532536, -0.65643595), VB2 = c(1.6454078, 0.1541541,
1.429735, 1.0815733, -0.1452877)), .Names = c("replicate", "groupA",
"VA1", "VA2", "groupB", "VB1", "VB2"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Replacement of plyr::cbind.fill in dplyr? - r

Related

How to move dataframe variable names to first row and add new variable names to multiple dataframes in a list?

How to calculate rowMeans of columns with similar colnames in r?

Group data by factor level, then transform to data frame with colname being levels?

Append dataFrame columns to other columns with different names and order?

Tidying table with multiple groups of wide columns, using tidyverse

Categories

Resources