Starting with the Dataframe y:
x <- c(2,NA,6,8,9,10)
y <- data.frame(letters[1:6], 1:6, NA, 3:8, NA, x, NA)
colnames(y) <- c("Patient", "C1", "First_C1", "C2", "First_C2", "C3", "First_C3")
I want R to look at each element of C1, find out the first patient (first row) with that element and the identify in which column it is, and add the "coordinates" "Patient_Column" to First_element_C1... Then, do the same with C2 and C3.
So, the result should be this:
y$First_C1 <- c("a_C1", "a_C3", "a_C2", "b_C2", "c_C2", "c_C3")
y$First_C2 <- c("a_C2", "b_C2", "c_C2", "c_C3", "e_C2", "d_C3")
y$First_C3 <- c("a_C3", NA, "c_C3", "d_C3", "e_C3", "f_C3")
I dont know how to write the code, not even how to search for it... Could someone help me here?
We start from the y without the output columns:
y<-structure(list(Patient = structure(1:6, .Label = c("a", "b",
"c", "d", "e", "f"), class = "factor"), C1 = 1:6, C2 = 3:8, C3 = c(2,
NA, 6, 8, 9, 10)), .Names = c("Patient", "C1", "C2", "C3"), row.names = c(NA,
-6L), class = "data.frame")
Then, we can try:
y[paste0("First_C",1:3)]<-lapply(y[,2:4],
function(x) {
d<-arrayInd(match(x,t(y[,2:4])),dim(t(y[,2:4])))[,2:1]
paste(y$Patient[d[,1]],colnames(y[,2:4])[d[,2]],sep="_")
})
y[,5:7][is.na(y[,2:4])]<-NA
# Patient C1 C2 C3 First_C1 First_C2 First_C3
#1 a 1 3 2 a_C1 a_C2 a_C3
#2 b 2 4 NA a_C3 b_C2 <NA>
#3 c 3 5 6 a_C2 c_C2 c_C3
#4 d 4 6 8 b_C2 c_C3 d_C3
#5 e 5 7 9 c_C2 e_C2 e_C3
#6 f 6 8 10 c_C3 d_C3 f_C3
Related
My df:
1 2 3 4
AASCAVDITK A D 1
ADEAATDTINR D S 2
AIADGSLLDFLR L P 8
I want to change the letters of column 2 for the letters of column 3, the position of the letter is given by column 4.
I have tried to do a For statement with no success
Maybe you can try substr like below
df$X5 <- `substr<-`(df$X1,df$X4,df$X4,df$X3)
or
df$X5 <- apply(df,1,function(x) {substr(x[1],x[4],x[4])<-x[3];x[1]})
which gives
> df
X1 X2 X3 X4 X5
1 AASCAVDITK A D 1 DASCAVDITK
2 ADEAATDTINR D S 2 ASEAATDTINR
3 AIADGSLLDFLR L P 8 AIADGSLPDFLR
Data
df <- structure(list(X1 = c("AASCAVDITK", "ADEAATDTINR", "AIADGSLLDFLR"
), X2 = c("A", "D", "L"), X3 = c("D", "S", "P"), X4 = c(1L, 2L,
8L)), class = "data.frame", row.names = c(NA, -3L))
I have two datasets:
df1:
structure(list(v1 = c(1, 4, 3, 7, 8, 1, 2, 4)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
df2:
structure(list(val = c(1, 2, 3, 4, 5, 6, 7, 8, 9), lab = c("a",
"b", "c", "d", "e", "f", "g", "h", "i")), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))
I want to recode v1 in df1 according to the values (val) and labels (lab) in df2.
Following this, my output would should look like this:
df3:
structure(list(v1 = c("a", "d", "c", "g", "h", "a", "b", "d")), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
Is there any package or function I am missing which could easily solve this problem? The problem itself looks quite easy to me but I found no simple solution. Of course, writing a for loop would be always possible but it would make this operation probably too complicated as I want to do this many times with big datasets.
An option using dplyr which will keep the original order
library(dplyr)
new_df <- df1 %>%
transmute(v1 = left_join(df1, df2, by = c("v1" = "val"))$lab)
# v1
# <chr>
#1 a
#2 d
#3 c
#4 g
#5 h
#6 a
#7 b
#8 d
identical(new_df, df3)
#[1] TRUE
Another base option is using merge, this will not keep the order
df1$v1 <- merge(df1, df2, all.x = TRUE, by.x = "v1", by.y = "val")$lab
# v1
# <chr>
#1 a
#2 a
#3 b
#4 c
#5 d
#6 d
#7 g
#8 h
Below is a simple solution:
X<-as.data.frame(df1)
Y<-as.data.frame(df2)
final_df <- merge(X, Y, all.x = TRUE, by.x = "v1", by.y = "val")
print(final_df)
output
v1 lab
1 1 a
2 1 a
3 2 b
4 3 c
5 4 d
6 4 d
7 7 g
8 8 h
This will not keep the order, but below approach using the dplyr will keep the order also.
library(dplyr)
X<-as.data.frame(df1)
Y<-as.data.frame(df2)
final_df <- X %>%
transmute(v1 = left_join(X, Y, by = c("v1" = "val"))$lab)
print(final_df)
output
v1
1 a
2 d
3 c
4 g
5 h
6 a
7 b
8 d
I hope this helps
I am trying to match the values in 2 lists only where the variable names are the same between list. I would like the result to be a list the length of the longer list filled with count of total matches.
jac <- structure(list(s1 = "a", s2 = c("b", "c", "d"), s3 = 5),
.Names = c("s1", "s2", "s3"))
larger <- structure(list(s1 = structure(c(1L, 1L, 1L), .Label = "a", class = "factor"),
s2 = structure(c(2L, 1L, 3L), .Label = c("b", "c", "d"), class = "factor"),
s3 = c(1, 2, 7)), .Names = c("s1", "s2", "s3"), row.names = c(NA, -3L), class = "data.frame")
I am using mapply(FUN = pmatch, jac, larger) which gives me a correct total but not in the format that I would like below:
s1 s2 s3 s1result s2result s3result
a c 1 1 2 NA
a b 2 1 1 NA
a c 7 1 3 NA
However, I don't think pmatch will ensure the name matching in every situation so I wrote a function that I am still having issues with:
prodMatch <- function(jac,larger){
for(i in 1:nrow(larger)){
if(names(jac)[i] %in% names(larger[i])){
r[i] <- jac %in% larger[i]
r
}
}
}
Can anyone help out?
Another dataset that causes one to not be a multiple of the ohter:
larger2 <-
structure(list(s1 = structure(c(1L, 1L, 1L), class = "factor", .Label = "a"),
s2 = structure(c(1L, 1L, 1L), class = "factor", .Label = "c"),
s3 = c(1, 2, 7), s4 = c(8, 9, 10)), .Names = c("s1", "s2",
"s3", "s4"), row.names = c(NA, -3L), class = "data.frame")
mapply returns a list of matching index, you can convert it to a data frame simply using as.data.frame:
as.data.frame(mapply(match, jac, larger))
# s1 s2 s3
# 1 1 2 NA
# 2 1 1 NA
# 3 1 3 NA
And cbind the result with larger gives what you expected:
cbind(larger,
setNames(as.data.frame(mapply(match, jac, larger)),
paste(names(jac), "result", sep = "")))
# s1 s2 s3 s1result s2result s3result
#1 a c 1 1 2 NA
#2 a b 2 1 1 NA
#3 a d 7 1 3 NA
Update: To take care of the cases where the name of the two lists don't match, we can loop through the larger and it's name simultaneously and extract the elements from jac as follows:
as.data.frame(
mapply(function(col, name) {
m <- match(jac[[name]], col)
if(length(m) == 0) NA else m # if the name doesn't exist in jac return NA as well
}, larger, names(larger)))
# s1 s2 s3
#1 1 2 NA
#2 1 1 NA
#3 1 3 NA
I have a matrix called m. I want to replace the colnames in m if they match the values in current column in dataframe mydf replacing with the values in replacement. I don't want to change anything if they don't match. So there is no change in none column in the result. I could have tried something like (colnames(m) = mydf$replacement[which(mydf$current %in% colnames(m))]) if there was everything matching and replaceable in replacement column, which is not the case as there is no replacement for none column in m.
m <- matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE,
dimnames = list(c("s1", "s2", "s3"),c("tom", "dick","none")))
# tom dick none
#s1 1 2 3
#s2 4 5 6
#s3 7 8 9
current<-c("tom", "dick","harry","bob")
replacement<-c("x","y","z","b")
mydf<-data.frame(current,replacement)
mydf
# current replacement
#1 tom x
#2 dick y
#3 harry z
#4 bob b
result
# x y none
#s1 1 2 3
#s2 4 5 6
#s3 7 8 9
Attn: akrun--here is the actual data:
m<-structure(c("chr5:11823", "chr5:11823", "9920035", "9920036",
"chr5", "chr5", "11823", "11823", "11824", "11824", "sub", "snp",
"G", "G", "CTAACCCCT", "T", NA, "dbsnp.129:rs55765826", "NN",
"NN", "NN", "NN", "NN", "NN", "NN", "NN", "NN", "NN", "NN", "NN"
), .Dim = c(2L, 15L), .Dimnames = list(c("1", "2"), c("key",
"variantId", "chromosome", "begin", "end", "varType", "reference",
"alleleSeq", "xRef", "GS000038035-ASM", "GS000038036-ASM", "GS000038037-ASM",
"GS000038038-ASM", "GS000038041-ASM", "GS000038042-ASM")))
mydf <-structure(list(assembly_id = c("GS000038042-ASM", "GS000038041-ASM",
"GS000038037-ASM", "GS000038038-ASM", "GS000038103-ASM", "GS000038096-ASM",
"GS000038064-ASM", "GS000038057-ASM", "GS000038062-ASM", "GS000038072-ASM"
), sample_id = c("GS02589-DNA_E06", "GS02589-DNA_F01", "GS02589-DNA_G01",
"GS02926-DNA_B01", "GS02589-DNA_E08", "GS02589-DNA_F07", "GS02589-DNA_B05",
"GS02589-DNA_B04", "GS02589-DNA_H04", "GS02589-DNA_H01"), customer_sample_id = c("AMLM12001KP",
"1114002", "1121501", "1231401", "AMLM12019S-P", "AMLM12014N-R",
"AMLM12012CA", "1321801", "AMLM12033MD", "1123801"), exomes.ids = c("AMLM12001KP",
"AMAS-11.3-Diagnostic", "AMAS-12.3-Diagnostic", "AMAS-18.3-Diagnostic",
"AMLM12019S-P", "AMLM12014N-R", "AMLM12012CA", "AMAS-4.3-Diagnostic",
"AMLM12033MD", "AMAS-13.3-Diagnostic")), .Names = c("current",
"customer_sample_id", "assembly_id", "replacement"), row.names = c(NA,
10L), class = "data.frame")
v <- colnames(m) %in% current
w <- current %in% colnames(m)
colnames(m)[v] <- replacement[w]
> m
x y none
s1 1 2 3
s2 4 5 6
s3 7 8 9
We could also use match
i1 <- match(colnames(m), mydf$current, nomatch=0)
colnames(m)[i1] <- as.character(mydf$replacement[i1])
m
# x y none
#s1 1 2 3
#s2 4 5 6
#s3 7 8 9
Update
Based on the updated dataset
i2 <- match(mydf$current, colnames(m), nomatch=0)
colnames(m)[i2] <- as.character(mydf$replacement)[i1]
I'd be very grateful if you could help me with the following as after a few tests I haven't still been able to get the right outcome.
I've got this data:
dd_1 <- data.frame(ID = c("1","2", "3", "4", "5"),
Class_a = c("a",NA, "a", NA, NA),
Class_b = c(NA, "b", "b", "b", "b"))
And I'd like to produce a new column 'CLASS':
dd_2 <- data.frame(ID = c("1","2", "3", "4", "5"),
Class_a = c("a",NA, "a", NA, NA),
Class_b = c(NA, "b", "b", "b", "b"),
CLASS = c("a", "b", "a-b", "b", "b"))
Thanks a lot!
Here it is:
tmp <- paste(dd_1$Class_a, dd_1$Class_b, sep='-')
tmp <- gsub('NA-|-NA', '', tmp)
(dd_2 <- cbind(dd_1, tmp))
First we concatenate (join as strings) the 2 columns. paste treats NAs as ordinary strings, i.e. "NA", so we either get NA-a, NA-b, or a-b. Then we substitute NA- or -NA with an empty string.
Which results in:
## ID Class_a Class_b tmp
## 1 1 a <NA> a
## 2 2 <NA> b b
## 3 3 a b a-b
## 4 4 <NA> b b
## 5 5 <NA> b b
Another option:
dd_1$CLASS <- with(dd_1, ifelse(is.na(Class_a), as.character(Class_b),
ifelse(is.na(Class_b), as.character(Class_a),
paste(Class_a, Class_b, sep="-"))))
This way you would check if any of the classes is NA and return the other, or, if none is NA, return both separated by "-".
Here's a short solution with apply:
dd_2 <- cbind(dd_1, CLASS = apply(dd_1[2:3], 1,
function(x) paste(na.omit(x), collapse = "-")))
The result
ID Class_a Class_b CLASS
1 1 a <NA> a
2 2 <NA> b b
3 3 a b a-b
4 4 <NA> b b
5 5 <NA> b b