Match and replace in R

Match and replace in R - r

I would like to match row names from table 1 with column names from table 2 and then replace them with corresponding names from column n in table 1.
table1
x y n
CAAGCCAAGCTAGATA 5 6 um
AATCCCAAGTGACACC 4 1 cs
AATCTCAAGTCACACC 4 1 cs
table2
CAAGCCAAGCTAGATA AATCCCAAGTGACACC AATCTCAAGTCACACC
a 1 3 5
b 2 3 4
c 6 3 6
d 8 3 5
result
um cs cs
a 1 3 5
b 2 3 4
c 6 3 6
d 8 3 5

One option is also to pass a named vector to do the matching
names(df2) <- setNames(df1$n, row.names(df1))[colnames(df2)]
df2
# um cs cs
#a 1 3 5
#b 2 3 4
#c 6 3 6
#d 8 3 5
data
df1 <- structure(list(x = c(5L, 4L, 4L), y = c(6L, 1L, 1L), n = c("um",
"cs", "cs")), class = "data.frame", row.names = c("CAAGCCAAGCTAGATA",
"AATCCCAAGTGACACC", "AATCTCAAGTCACACC"))
df2 <- structure(list(CAAGCCAAGCTAGATA = c(1L, 2L, 6L, 8L), AATCCCAAGTGACACC = c(3L,
3L, 3L, 3L), AATCTCAAGTCACACC = c(5L, 4L, 6L, 5L)),
class = "data.frame", row.names = c("a",
"b", "c", "d"))

Related

How to merge multiple data.frames with Reduce and get an ordered output?

There is a classic approach how to simultaneously merge multiple data.frames in a list.
The output, however, is somewhat disordered.
Example
> L
[[1]]
a b c d e
1 5 2 4 10 1
[[2]]
a b c d e
1 6 7 4 6 1
[[3]]
a b c d
1 7 3 5 5
[[4]]
a b c d
1 5 2 6 5
[[5]]
a b c d
1 4 4 2 8
The rows of the output of Reduce(.) are ordered by 5, 1, 4, 2, 3, which could imply that the reduction works somehow from the outside to the inside.
> Reduce(function(...) merge(..., all=TRUE), L)
> Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L) # same
a b c d e
1 4 4 2 8 NA
2 5 2 4 10 1
3 5 2 6 5 NA
4 6 7 4 6 1
5 7 3 5 5 NA
Anyway, is there a way to slightly change the code to get an ordered output like that below?
# a b c d e
# 1 5 2 4 10 1
# 2 6 7 4 6 1
# 3 7 3 5 5 NA
# 4 5 2 6 5 NA
# 5 4 4 2 8 NA
Data
L <- list(structure(list(a = 5L, b = 2L, c = 4L, d = 10L, e = 1L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 6L, b = 7L, c = 4L, d = 6L, e = 1L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 7L, b = 3L, c = 5L, d = 5L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 5L, b = 2L, c = 6L, d = 5L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 4L, b = 4L, c = 2L, d = 8L), class = "data.frame", row.names = c(NA,
-1L)))

This happens due to sort of merge:
sort - logical. Should the result be sorted on the by columns?
So, instead you may use
Reduce(function(...) merge(..., all = TRUE, sort = FALSE), L)
# a b c d e
# 1 5 2 4 10 1
# 2 6 7 4 6 1
# 3 7 3 5 5 NA
# 4 5 2 6 5 NA
# 5 4 4 2 8 NA

Here, I use bind_rows from the dplyr package instead of merge.
L <- list(structure(list(a = 5L, b = 2L, c = 4L, d = 10L, e = 1L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 6L, b = 7L, c = 4L, d = 6L, e = 1L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 7L, b = 3L, c = 5L, d = 5L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 5L, b = 2L, c = 6L, d = 5L), class = "data.frame", row.names = c(NA,
-1L)), structure(list(a = 4L, b = 4L, c = 2L, d = 8L), class = "data.frame", row.names = c(NA,
-1L)))
library(dplyr)
Reduce(bind_rows, L)
#> a b c d e
#> 1 5 2 4 10 1
#> 2 6 7 4 6 1
#> 3 7 3 5 5 NA
#> 4 5 2 6 5 NA
#> 5 4 4 2 8 NA
Created on 2019-02-09 by the reprex package (v0.2.1.9000)

How to calculate field in R based on the formula specified in excel

I want R to read in an Excel containing formula and produce its output.
Say for example if I provide the following as input:
a b c
2 5 =a+b
3 2 =a+b
3 3 =a+b
6 4 =a+b
4 2 =a+b
I should get this output:
a b c
2 5 7
3 2 5
3 3 6
6 4 10
4 2 6

An option would be to remove the = using sub and evaluate the first element (as it is the same for all the rows)
df1$c <- with(df1, eval(parse(text= sub("=", "", c[1]))))
df1$c
#[1] 7 5 6 10 6
data
df1 <- structure(list(a = c(2L, 3L, 3L, 6L, 4L), b = c(5L, 2L, 3L, 4L,
2L), c = c("=a+b", "=a+b", "=a+b", "=a+b", "=a+b")), .Names = c("a",
"b", "c"), class = "data.frame", row.names = c(NA, -5L))

R Data Frame remove rows with max values from all columns

Hello I have the data frame and I need to remove all the rows with max values from each columns.
Example
A B C
1 2 3 5
2 4 1 1
3 1 4 3
4 2 1 1
So the output is:
A B C
4 2 1 1
Is there any quick way to do this?

We can do this with %in%
df1[!seq_len(nrow(df1)) %in% sapply(df1, which.max),]
# A B C
#4 2 1 1
If there are ties for maximum values in each row, then do
df1[!Reduce(`|`, lapply(df1, function(x) x== max(x))),]

df[-sapply(df, which.max),]
# A B C
#4 2 1 1
DATA
df = structure(list(A = c(2L, 4L, 1L, 2L), B = c(3L, 1L, 4L, 1L),
C = c(5L, 1L, 3L, 1L)), .Names = c("A", "B", "C"),
class = "data.frame", row.names = c(NA,-4L))

Multiply a table(file1) with individual cells of a column(file2) using R

File 1:Ele A B C DEs 1 2 3 4Ep 2 4 3 4Ek 1 9 3 8File2:A 1 B 2 C 3 D 5
Need is to ensure that each element under Column A (file 1) gets multiplied by the value assigned to A in file 2 (and so on). I know matrix multiplication in R but this is not the case of matrix multiplication I suppose. Help would be greatly appreciated. Thanks

You could try
indx <- df2$Col1
df1[indx]*df2$Col2[col(df1[indx])]
# A B C D
#1 1 4 9 20
#2 2 8 9 20
#3 1 18 9 40
Or you could use sweep
sweep(df1[indx], 2, df2$Col2, '*')
# A B C D
#1 1 4 9 20
#2 2 8 9 20
#3 1 18 9 40
data
df1 <- structure(list(Ele = c("Es", "Ep", "Ek"), A = c(1L, 2L, 1L),
B = c(2L, 4L, 9L), C = c(3L, 3L, 3L), D = c(4L, 4L, 8L)),
.Names = c("Ele", "A", "B", "C", "D"), class = "data.frame",
row.names = c(NA, -3L))
df2 <- structure(list(Col1 = c("A", "B", "C", "D"), Col2 = c(1L, 2L,
3L, 5L)), .Names = c("Col1", "Col2"), class = "data.frame",
row.names = c(NA, -4L))

R- How to merge multiple dataframes of different lengths?

I have been stuck with this issue for a while now. Need some help.
I am reading the following files (which can be mire than 3 files files) into a dataframe.
My input files look like the following:
file1:
someName someMOD someID
A T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P),S762(P) 1
B S495(P) 2
C S162(P),Q159(D) 3
D S45(P),C47(C),S48(P),S26(P) 4
E S18(P) 5
file2:
someName someMOD someID
C S162(P),Q159(D) 3
D S45(P),C47(C),S48(P),S26(P) 4
F S182(P) 6
E S18(P) 5
Z Q100(P) 9
A T754(P),M691(O),S694(P),S739(P),S740(P) 1
file3:
someName someMOD someID
A T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P) 1
B S495(P) 2
D S45(P),C47(C),S48(P),S26(P) 4
E S18(P) 5
F S182(P) 6
L Z182(P) 8
C S162(P),Q159(D) 3
My Code:
fileList <- dir(pattern="*.xls")
i<-1
j<-1
a<-list()
mybigtable<-data.frame
for (f in 1:length(fileList)){
fileName <- fileList[f]
X <-read.xls(fileName)
if(regexpr("Drug_Rep", fileName)[1]>0){
a[[i]]<-X
}
i=i+1
}
else{
#Don't do anything
}
}
#Now i want to merge my dataframes
mymerge <- function(x, y)
merge(x, y, by=c("someName", "someID"), all=TRUE))
Reduce(mymerge,a) #passing my list of dataframes 'a'
I did dput() on my 'a' list:
list(structure(list(someName = structure(c(1L, 2L, 4L, 5L, 6L,
7L, 3L), .Label = c("A", "B", "C", "D", "E", "F", "L"), class = "factor"),
someMOD = structure(c(6L, 5L, 4L, 2L, 3L, 7L, 1L), .Label = c("S162(P),Q159(D)",
"S18(P)", "S182(P)", "S45(P),C47(C),S48(P),S26(P)", "S495(P)",
"T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P)",
"Z182(P)"), class = "factor"), someID = c(1L, 2L, 4L, 5L,
6L, 8L, 3L)), .Names = c("someName", "someMOD", "someID"), class = "data.frame", row.names = c(NA,
-7L)), structure(list(someName = structure(1:5, .Label = c("A",
"B", "C", "D", "E"), class = "factor"), someMOD = structure(c(5L,
4L, 1L, 3L, 2L), .Label = c("S162(P),Q159(D)", "S18(P)", "S45(P),C47(C),S48(P),S26(P)",
"S495(P)", "T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P),S762(P)"
), class = "factor"), someID = 1:5), .Names = c("someName", "someMOD",
"someID"), class = "data.frame", row.names = c(NA, -5L)), structure(list(
someName = structure(c(2L, 3L, 5L, 4L, 6L, 1L), .Label = c("A",
"C", "D", "E", "F", "Z"), class = "factor"), someMOD = structure(c(2L,
5L, 4L, 3L, 1L, 6L), .Label = c("Q100(P)", "S162(P),Q159(D)",
"S18(P)", "S182(P)", "S45(P),C47(C),S48(P),S26(P)", "T754(P),M691(O),S694(P),S739(P),S740(P)"
), class = "factor"), someID = c(3L, 4L, 6L, 5L, 9L, 1L)), .Names = c("someName",
"someMOD", "someID"), class = "data.frame", row.names = c(NA,
-6L)))
What is my mistake in populating a list? Any help is really appreciated.
I am just trying to get an out put like the following:

The problem with the code I gave you before is that merge gets confused if there are any duplicate column names, and you're merging more than 3 datasets. You'll have to rename your someMOD columns so they don't clash. A for loop works as well as anything for this purpose.
dupvars <- which(!names(a[[1]]) %in% c("someName", "someID"))
for(i in seq_along(a))
names(a[[i]])[dupvars] <- paste0(names(a[[i]])[dupvars], i)
# and then merge
Reduce(mymerge, a)

Perhaps the problem is that you're actually not trying to merge in the standard sense, but reshape. In this case, you can rbind all the data.frames together after adding a "time" variable, and use dcast from "reshape2" to get what you're after:
Add a "time" variable and rbind the data.frames together
temp <- do.call(rbind,
lapply(seq_along(a),
function(x) data.frame(a[[x]], time = x)))
head(temp)
# someName someMOD someID time
# 1 A T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P) 1 1
# 2 B S495(P) 2 1
# 3 D S45(P),C47(C),S48(P),S26(P) 4 1
# 4 E S18(P) 5 1
# 5 F S182(P) 6 1
# 6 L Z182(P) 8 1
Transform the data.frame from a "long" format to a "wide" format
library(reshape2)
dcast(temp, someName + someID ~ time, value.var="someMOD")
# someName someID 1
# 1 A 1 T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P)
# 2 B 2 S495(P)
# 3 C 3 S162(P),Q159(D)
# 4 D 4 S45(P),C47(C),S48(P),S26(P)
# 5 E 5 S18(P)
# 6 F 6 S182(P)
# 7 L 8 Z182(P)
# 8 Z 9 <NA>
# 2
# 1 T754(P),M691(O),S692(P),S694(P),S739(P),S740(P),S759(P),S762(P)
# 2 S495(P)
# 3 S162(P),Q159(D)
# 4 S45(P),C47(C),S48(P),S26(P)
# 5 S18(P)
# 6 <NA>
# 7 <NA>
# 8 <NA>
# 3
# 1 T754(P),M691(O),S694(P),S739(P),S740(P)
# 2 <NA>
# 3 S162(P),Q159(D)
# 4 S45(P),C47(C),S48(P),S26(P)
# 5 S18(P)
# 6 S182(P)
# 7 <NA>
# 8 Q100(P)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Match and replace in R - r

Related

How to merge multiple data.frames with Reduce and get an ordered output?

How to calculate field in R based on the formula specified in excel

R Data Frame remove rows with max values from all columns

Multiply a table(file1) with individual cells of a column(file2) using R

R- How to merge multiple dataframes of different lengths?

Categories

Resources