Pasting several values from a vector into a dataframe column - r

My dataframe "test" is like this:
a b c
d e f
I want to add strings to the 1st col so as to get this
a__3 b c
a__23 b c
a__45 b c
...
sb <- c(3, 23, 45)
datalist <- ""
for (i in 1:length(sb)) {
new <- apply(test[,1],1,paste0,collapse=("__" sb[i]))
datalist[i] <- new
}
I want to add rows into test df including all sb[i].
I have tried rbind, but does not get the correct result

An idea is to replicate the rows based on the length of your sb vector, do the paste and filter to keep only the ones you are interested in, i.e.
d3 <- d2[rep(rownames(d2), length(sb)),]
d3$V1[d3$V1 == 'a'] <- paste0(d3$V1[d3$V1 == 'a'], '__', sb)
d3[grepl('a', d3$V1),]
# V1 V2 V3
#1 a__3 b c
#1.1 a__23 b c
#1.2 a__45 b c
DATA
dput(d2)
structure(list(V1 = c("a", "d"), V2 = c("b", "e"), V3 = c("c",
"f")), row.names = c(NA, -2L), class = "data.frame")

Related

From long to wide formats just based on two columns Rstudio

This is my data frame:
I have a data frame of six columns and last columns contains the values . The Column 'code' includes s and d. column 'Sex' includes M and F. And I have two thousand offsprings in the column offspring.
seq parent code Sex offspring Value
1 49032 s M J44010_CCG7YANXX_2_661_X4 -0.38455056
2 48741 s M J44010_CCG7YANXX_2_661_X4 0.10574340
3 48757 s M J44010_CCG7YANXX_2_661_X4 0.39572906
4 48465 d f J44010_CCG7YANXX_2_661_X4 0.43409006
5 48521 d f J44010_CCG7YANXX_2_661_X4 0.40337447
6 48703 d f J44010_CCG7YANXX_2_661_X4 -0.38148980
The column parent includes ids for both males and females.
I want to keep the female/dam id ,female/dam code and female/dam sex just beside the male/sire as a column and also keep the sire value and dam value seperately . So, the 'value' will be seprated in two parts .
The data frame will look like the below:
'seq''parent1''sirecode''Sex''parent2''damcode''Sex''offspring''sireValue' 'damvalue'
1 49032 s M 48465 d f J44010 -0.38455056 0.43409006
2 48741 s M 48521 d f J44010 0.10574340 0.40337447
3 48757 s M 48703 d f J44010 0.39572906 -0.38148980
So, each offspring will have 3 or 4 pair of parents.
I tried to use dcast function on it.
We could use dcast after creating a sequence column
library(data.table)
setDT(df1)[, n := seq_len(.N), .(code, Sex)]
dcast(df1, n + offspring ~ rowid(n), value.var = c('parent', 'code', 'Sex', 'Value'), sep = "")
# n offspring parent1 parent2 code1 code2 Sex1 Sex2 Value1 Value2
#1: 1 J44010_CCG7YANXX_2_661_X4 49032 48465 s d M f -0.3845506 0.4340901
#2: 2 J44010_CCG7YANXX_2_661_X4 48741 48521 s d M f 0.1057434 0.4033745
#3: 3 J44010_CCG7YANXX_2_661_X4 48757 48703 s d M f 0.3957291 -0.3814898
In base R, we can use reshape
df1$n <- with(df1, ave(seq_along(Sex), Sex, FUN = seq_along))
df1$n1 <- with(df1, ave(n, n, FUN = seq_along))
reshape(df1[-1], idvar = c('n', 'offspring'), timevar = 'n1', direction = 'wide' )
data
df1 <- structure(list(seq = 1:6, parent = c(49032L, 48741L, 48757L,
48465L, 48521L, 48703L), code = c("s", "s", "s", "d", "d", "d"
), Sex = c("M", "M", "M", "f", "f", "f"),
offspring = c("J44010_CCG7YANXX_2_661_X4",
"J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4",
"J44010_CCG7YANXX_2_661_X4",
"J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4"),
Value = c(-0.38455056,
0.1057434, 0.39572906, 0.43409006, 0.40337447, -0.3814898)),
class = "data.frame", row.names = c(NA, -6L))

Combining values Boolean columns to one with Priority in R

Gone through below links but it solved my problem partially.
merge multiple TRUE/FALSE columns into one
Combining a matrix of TRUE/FALSE into one
R: Converting multiple boolean columns to single factor column
I have a dataframe which looks like:
dat <- data.frame(Id = c(1,2,3,4,5,6,7,8),
A = c('Y','N','N','N','N','N','N','N'),
B = c('N','Y','N','N','N','N','Y','N'),
C = c('N','N','Y','N','N','Y','N','N'),
D = c('N','N','N','Y','N','Y','N','N'),
E = c('N','N','N','N','Y','N','Y','N')
)
I want to make a reshape my df with one column but it has to give priorities when there are 2 "Y" in a row.
THE priority is A>B>C>D>E which means if their is "Y" in A then the resultant value should be A. Similarly, in above example df both C and D has "Y" but there should be "C" in the resultant df.
Hence output should look like:
resultant_dat <- data.frame(Id = c(1,2,3,4,5,6,7,8),
Result = c('A','B','C','D','E','C','B','NA')
)
I have tried this:
library(reshape2)
new_df <- melt(dat, "Id", variable.name = "Result")
new_df <-new_df[new_df$value == "Y", c("Id", "Result")]
But the problem is doesn't handle the priority thing, it creates 2 rows for the same Id.
tmp = data.frame(ID = dat[,1],
Result = col_order[apply(
X = dat[col_order],
MARGIN = 1,
FUN = function(x) which(x == "Y")[1])],
stringsAsFactors = FALSE)
tmp$Result[is.na(tmp$Result)] = "Not Present"
tmp
# ID Result
#1 1 A
#2 2 B
#3 3 C
#4 4 D
#5 5 E
#6 6 C
#7 7 B
#8 8 Not Present

R - compare columns with rows of two different data frames

I have two data frames. I would like to take a subset of the first data frame considering only the columns for which the first values is equal to the first value of the rows of the second data frame.
Example
Data Frame 1:
columns_df1 : a b c d e
Data Frame 2:
rows_df2 : a c e
Subset I would like to obtain:
final_columns_df1 = a c e
I am stuck on how to compare columns with rows belonging to two different data frames.
Thanks for your help!
Ok. It's a little unclear what you want from your question as you don't provide a full reproducible answer. But I think this is what you're looking for.
df1 <- data.frame(a = c(1, 2),
b = c(3, 4),
c = c(5, 6),
d = c(7, 8),
e = c(9, 10))
df2 <- data.frame(f = c("a", "b"),
g = c("c", "d"),
h = c("e", "f"))
final_columns_df1 <- df1[ , names(df1) %in% df2[1, ]]
final_columns_df1
a c e
1 1 5 9
2 2 6 10

How to combine columns in a data frame so that they overlap in R?

Basically, I have data from a between subjects study design that looks like this:
> head(have, 3)
A B C
1 b
2 a
3 c
Here, A, B, C are various conditions of the study, so each subject (indicated by each row), only has a value for one of the conditions. I would like to combine these so that it looks like this:
> head(want, 3)
all
1 b
2 a
3 c
How can I combine the columns so that they "overlap" like this?
So far, I have tried using some of dplyr's join functions, but they haven't worked out for me. I appreciate any guidance in combining my columns in this way.
We can use pmax
want <- data.frame(all= do.call(pmax, have))
Or using dplyr
transmute(have, all= pmax(A, B, C))
# all
#1 b
#2 a
#3 c
data
have <- structure(list(A = c("", "a", ""), B = c("b", "", ""),
C = c("",
"", "c")), .Names = c("A", "B", "C"), class = "data.frame",
row.names = c("1", "2", "3"))

Computing correlation of vectors by factor label

I have have two data frames. The first one, df1, is a matrix of vectors with labeled columns, like the following:
df1 <- data.frame(A=rnorm(10), B=rnorm(10), C=rnorm(10), D=rnorm(10), E=rnorm(10))
> df1
A B C D E
-0.3200306 0.4370963 -0.9146660 1.03219577 0.5215359
-0.3193144 0.8900656 -1.1720264 -0.42591761 0.1936993
0.4897262 -1.3970806 0.6054637 0.12487936 1.0149530
0.3772420 0.8726322 0.3250020 -0.36952560 -0.5447512
-0.6921561 -0.6734468 0.3500812 -0.53373720 -0.6129472
0.2540649 -1.1911106 -0.3266428 0.14013437 1.0830148
0.6606825 -0.8942715 1.1099637 -1.52416540 -0.2383048
1.4767074 -2.1492360 0.2441242 -0.36136344 0.5589114
-0.5338117 -0.2357821 0.7694879 -0.21652356 0.3185631
3.4215916 -0.3157938 0.8895597 0.09946069 -1.0961730
The second data frame, df2, contains items that match the colnames of df1. Example:
group <- c("1", "1", "2", "2", "3", "3")
S1 <- c("A", "D", "E", "C", "B", "D")
S2 <- c("D", "B", "A", "C", "B", "A")
S3 <- c("B", "C", "A", "E", "E", "A")
df2 <- data.frame(group,S1, S2, S3)
> df2
group S1 S2 S3
1 A D B
1 D B C
2 E A A
2 C C E
3 B B E
3 D A A
I would like to compute the correlations between the column vectors in df1 that correspond to the labeled items in df2. Specifically, the vectors that match cor(df2$S1, df2$S2) and cor(df2$S1, df2$S3).
The output should be something like this:
group S1 S2 S3 cor.S1.S2 cor.S1.S3
1 A D B 0.003825055 -0.2817946
1 D B C -0.2817946 -0.4928023
2 E A A -0.3856809 -0.3856809
2 C C E 1 -0.3862433
3 B B E 1 -0.3888541
3 D A A 0.003825055 0.003825055
I've been trying to resolve this with cbind[] but keep running into problems such as the 'x' must be numeric error with cor. Thanks in advance for any help!
You can do this with mapply().
my.cor <- function(x,y) {
cor(df1[,x],df1[,y])
}
df2$cor.S1.S2 <- mapply(my.cor,df2$S1,df2$S2)
df2$cor.S2.S3 <- mapply(my.cor,df2$S2,df2$S3)
Another approach would to the get the correlation between the matrix/data.frame after subsetting the columns of 'df1' with the columns of 'df2', get the diag and assign the output as new column in 'df2'. Here, I am using lapply as we have to do both 'S1 vs S2' and 'S1 vs S3'.
df2[c('cor.S1.S2', 'cor.S1.S3')] <- lapply(c('S2', 'S3'),
function(x) diag(cor(df1[, df2[,x]], df1[,df2$S1])))

Resources