There is a classic approach how to simultaneously merge multiple data.frames in a list.
The output, however, is somewhat disordered.
Example
> L
[[1]]
a b c d e
1 5 2 4 10 1
[[2]]
a b c d e
1 6 7 4 6 1
[[3]]
a b c d
1 7 3 5 5
[[4]]
a b c d
1 5 2 6 5
[[5]]
a b c d
1 4 4 2 8
The rows of the output of Reduce(.) are ordered by 5, 1, 4, 2, 3, which could imply that the reduction works somehow from the outside to the inside.
> Reduce(function(...) merge(..., all=TRUE), L)
> Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L) # same
a b c d e
1 4 4 2 8 NA
2 5 2 4 10 1
3 5 2 6 5 NA
4 6 7 4 6 1
5 7 3 5 5 NA
Anyway, is there a way to slightly change the code to get an ordered output like that below?
# a b c d e
# 1 5 2 4 10 1
# 2 6 7 4 6 1
# 3 7 3 5 5 NA
# 4 5 2 6 5 NA
# 5 4 4 2 8 NA
Data
L <- list(structure(list(a = 5L, b = 2L, c = 4L, d = 10L, e = 1L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 6L, b = 7L, c = 4L, d = 6L, e = 1L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 7L, b = 3L, c = 5L, d = 5L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 5L, b = 2L, c = 6L, d = 5L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 4L, b = 4L, c = 2L, d = 8L), class = "data.frame", row.names = c(NA,
-1L)))
This happens due to sort of merge:
sort - logical. Should the result be sorted on the by columns?
So, instead you may use
Reduce(function(...) merge(..., all = TRUE, sort = FALSE), L)
# a b c d e
# 1 5 2 4 10 1
# 2 6 7 4 6 1
# 3 7 3 5 5 NA
# 4 5 2 6 5 NA
# 5 4 4 2 8 NA
Here, I use bind_rows from the dplyr package instead of merge.
L <- list(structure(list(a = 5L, b = 2L, c = 4L, d = 10L, e = 1L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 6L, b = 7L, c = 4L, d = 6L, e = 1L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 7L, b = 3L, c = 5L, d = 5L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 5L, b = 2L, c = 6L, d = 5L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 4L, b = 4L, c = 2L, d = 8L), class = "data.frame", row.names = c(NA,
-1L)))
library(dplyr)
Reduce(bind_rows, L)
#> a b c d e
#> 1 5 2 4 10 1
#> 2 6 7 4 6 1
#> 3 7 3 5 5 NA
#> 4 5 2 6 5 NA
#> 5 4 4 2 8 NA
Created on 2019-02-09 by the reprex package (v0.2.1.9000)
I want R to read in an Excel containing formula and produce its output.
Say for example if I provide the following as input:
a b c
2 5 =a+b
3 2 =a+b
3 3 =a+b
6 4 =a+b
4 2 =a+b
I should get this output:
a b c
2 5 7
3 2 5
3 3 6
6 4 10
4 2 6
An option would be to remove the = using sub and evaluate the first element (as it is the same for all the rows)
df1$c <- with(df1, eval(parse(text= sub("=", "", c[1]))))
df1$c
#[1] 7 5 6 10 6
data
df1 <- structure(list(a = c(2L, 3L, 3L, 6L, 4L), b = c(5L, 2L, 3L, 4L,
2L), c = c("=a+b", "=a+b", "=a+b", "=a+b", "=a+b")), .Names = c("a",
"b", "c"), class = "data.frame", row.names = c(NA, -5L))
Hello I have the data frame and I need to remove all the rows with max values from each columns.
Example
A B C
1 2 3 5
2 4 1 1
3 1 4 3
4 2 1 1
So the output is:
A B C
4 2 1 1
Is there any quick way to do this?
We can do this with %in%
df1[!seq_len(nrow(df1)) %in% sapply(df1, which.max),]
# A B C
#4 2 1 1
If there are ties for maximum values in each row, then do
df1[!Reduce(`|`, lapply(df1, function(x) x== max(x))),]
df[-sapply(df, which.max),]
# A B C
#4 2 1 1
DATA
df = structure(list(A = c(2L, 4L, 1L, 2L), B = c(3L, 1L, 4L, 1L),
C = c(5L, 1L, 3L, 1L)), .Names = c("A", "B", "C"),
class = "data.frame", row.names = c(NA,-4L))
File 1:Ele A B C DEs 1 2 3 4Ep 2 4 3 4Ek 1 9 3 8File2:A 1 B 2 C 3 D 5
Need is to ensure that each element under Column A (file 1) gets multiplied by the value assigned to A in file 2 (and so on). I know matrix multiplication in R but this is not the case of matrix multiplication I suppose. Help would be greatly appreciated. Thanks
You could try
indx <- df2$Col1
df1[indx]*df2$Col2[col(df1[indx])]
# A B C D
#1 1 4 9 20
#2 2 8 9 20
#3 1 18 9 40
Or you could use sweep
sweep(df1[indx], 2, df2$Col2, '*')
# A B C D
#1 1 4 9 20
#2 2 8 9 20
#3 1 18 9 40
data
df1 <- structure(list(Ele = c("Es", "Ep", "Ek"), A = c(1L, 2L, 1L),
B = c(2L, 4L, 9L), C = c(3L, 3L, 3L), D = c(4L, 4L, 8L)),
.Names = c("Ele", "A", "B", "C", "D"), class = "data.frame",
row.names = c(NA, -3L))
df2 <- structure(list(Col1 = c("A", "B", "C", "D"), Col2 = c(1L, 2L,
3L, 5L)), .Names = c("Col1", "Col2"), class = "data.frame",
row.names = c(NA, -4L))
I have been stuck with this issue for a while now. Need some help.
I am reading the following files (which can be mire than 3 files files) into a dataframe.
My input files look like the following:
file1:
someName someMOD someID
A T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P),S762(P) 1
B S495(P) 2
C S162(P),Q159(D) 3
D S45(P),C47(C),S48(P),S26(P) 4
E S18(P) 5
file2:
someName someMOD someID
C S162(P),Q159(D) 3
D S45(P),C47(C),S48(P),S26(P) 4
F S182(P) 6
E S18(P) 5
Z Q100(P) 9
A T754(P),M691(O),S694(P),S739(P),S740(P) 1
file3:
someName someMOD someID
A T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P) 1
B S495(P) 2
D S45(P),C47(C),S48(P),S26(P) 4
E S18(P) 5
F S182(P) 6
L Z182(P) 8
C S162(P),Q159(D) 3
My Code:
fileList <- dir(pattern="*.xls")
i<-1
j<-1
a<-list()
mybigtable<-data.frame
for (f in 1:length(fileList)){
fileName <- fileList[f]
X <-read.xls(fileName)
if(regexpr("Drug_Rep", fileName)[1]>0){
a[[i]]<-X
}
i=i+1
}
else{
#Don't do anything
}
}
#Now i want to merge my dataframes
mymerge <- function(x, y)
merge(x, y, by=c("someName", "someID"), all=TRUE))
Reduce(mymerge,a) #passing my list of dataframes 'a'
I did dput() on my 'a' list:
list(structure(list(someName = structure(c(1L, 2L, 4L, 5L, 6L,
7L, 3L), .Label = c("A", "B", "C", "D", "E", "F", "L"), class = "factor"),
someMOD = structure(c(6L, 5L, 4L, 2L, 3L, 7L, 1L), .Label = c("S162(P),Q159(D)",
"S18(P)", "S182(P)", "S45(P),C47(C),S48(P),S26(P)", "S495(P)",
"T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P)",
"Z182(P)"), class = "factor"), someID = c(1L, 2L, 4L, 5L,
6L, 8L, 3L)), .Names = c("someName", "someMOD", "someID"), class = "data.frame", row.names = c(NA,
-7L)), structure(list(someName = structure(1:5, .Label = c("A",
"B", "C", "D", "E"), class = "factor"), someMOD = structure(c(5L,
4L, 1L, 3L, 2L), .Label = c("S162(P),Q159(D)", "S18(P)", "S45(P),C47(C),S48(P),S26(P)",
"S495(P)", "T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P),S762(P)"
), class = "factor"), someID = 1:5), .Names = c("someName", "someMOD",
"someID"), class = "data.frame", row.names = c(NA, -5L)), structure(list(
someName = structure(c(2L, 3L, 5L, 4L, 6L, 1L), .Label = c("A",
"C", "D", "E", "F", "Z"), class = "factor"), someMOD = structure(c(2L,
5L, 4L, 3L, 1L, 6L), .Label = c("Q100(P)", "S162(P),Q159(D)",
"S18(P)", "S182(P)", "S45(P),C47(C),S48(P),S26(P)", "T754(P),M691(O),S694(P),S739(P),S740(P)"
), class = "factor"), someID = c(3L, 4L, 6L, 5L, 9L, 1L)), .Names = c("someName",
"someMOD", "someID"), class = "data.frame", row.names = c(NA,
-6L)))
What is my mistake in populating a list? Any help is really appreciated.
I am just trying to get an out put like the following:
The problem with the code I gave you before is that merge gets confused if there are any duplicate column names, and you're merging more than 3 datasets. You'll have to rename your someMOD columns so they don't clash. A for loop works as well as anything for this purpose.
dupvars <- which(!names(a[[1]]) %in% c("someName", "someID"))
for(i in seq_along(a))
names(a[[i]])[dupvars] <- paste0(names(a[[i]])[dupvars], i)
# and then merge
Reduce(mymerge, a)
Perhaps the problem is that you're actually not trying to merge in the standard sense, but reshape. In this case, you can rbind all the data.frames together after adding a "time" variable, and use dcast from "reshape2" to get what you're after:
Add a "time" variable and rbind the data.frames together
temp <- do.call(rbind,
lapply(seq_along(a),
function(x) data.frame(a[[x]], time = x)))
head(temp)
# someName someMOD someID time
# 1 A T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P) 1 1
# 2 B S495(P) 2 1
# 3 D S45(P),C47(C),S48(P),S26(P) 4 1
# 4 E S18(P) 5 1
# 5 F S182(P) 6 1
# 6 L Z182(P) 8 1
Transform the data.frame from a "long" format to a "wide" format
library(reshape2)
dcast(temp, someName + someID ~ time, value.var="someMOD")
# someName someID 1
# 1 A 1 T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P)
# 2 B 2 S495(P)
# 3 C 3 S162(P),Q159(D)
# 4 D 4 S45(P),C47(C),S48(P),S26(P)
# 5 E 5 S18(P)
# 6 F 6 S182(P)
# 7 L 8 Z182(P)
# 8 Z 9 <NA>
# 2
# 1 T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P),S762(P)
# 2 S495(P)
# 3 S162(P),Q159(D)
# 4 S45(P),C47(C),S48(P),S26(P)
# 5 S18(P)
# 6 <NA>
# 7 <NA>
# 8 <NA>
# 3
# 1 T754(P),M691(O),S694(P),S739(P),S740(P)
# 2 <NA>
# 3 S162(P),Q159(D)
# 4 S45(P),C47(C),S48(P),S26(P)
# 5 S18(P)
# 6 S182(P)
# 7 <NA>
# 8 Q100(P)