change data frame in R - r

i have a data frame generated inside a for loop and have this structure
V1 V2 V3
1 a a 1
2 a b 3
3 a c 2
4 a d 1
5 a e 3
6 b a 3
7 b b 1
8 b c 8
9 b d 1
10 b e 1
11 c a 2
12 c b 8
the data is longer than this , but that's the idea that i want
(transform it to a wide table [V1 by V2])
V3 is a value based on (V1, V2)
i want to rearrange data to be like this (with first col is the unique of V1 and first row is the unique of V2 and data between them are from V3 )
a b c d e
a 1 3 2 1 3
b 3 1 8 1 1
c 2 8 2 8 2
d 1 1 5 7 2
e 3 5 9 5 3
thnx in advance.

Reproducible example of yours:
df <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), V2 = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L), .Label = c("a", "b", "c", "d", "e"), class = "factor"), V3 = c(1L, 3L, 2L, 1L, 3L, 3L, 1L, 8L, 1L, 1L, 2L, 8L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
And compute a basic crosstable based on your variables:
> xtabs(V3~V1+V2, df)
V2
V1 a b c d e
a 1 3 2 1 3
b 3 1 8 1 1
c 2 8 0 0 0
I hope you meant this :)

If df is your data-frame, assuming a unique V3 is mapped to each V1,V2 combination, you can do it with
with(df, tapply(V3, list(V1,V2), identity))

Another method, perhaps slightly more baroque, for widening a dataframe from a third column on the basis of the first two... with Chase that the OP has not given an unambiguous problem description:
df2 <- expand.grid(A=LETTERS[1:5], B=LETTERS[1:5])
df2$N <- 1:25
mtx <- outer(X=LETTERS[1:5],Y=LETTERS[1:5], FUN=function(x,y){
df2[intersect(which(df2$A==x), which(df2$B==y)), "N"] })
colnames(mtx)<-LETTERS[1:5]; rownames(mtx)<-LETTERS[1:5]
mtx
A B C D E
A 1 6 11 16 21
B 2 7 12 17 22
C 3 8 13 18 23
D 4 9 14 19 24
E 5 10 15 20 25
I'm sure there are many other strategies using reshape in base or dcast in reshape2.

Related

New data frame, if specific value(s) is contained AND other values aren't included in a range of columns in r

So, I have a large data frame with monthly observations of n individuals.
ind y_0101 y_0102 y_0103 y_0104_ .... y_0311 y_0312
A 33 6 1 2 1 5
B 36 5 0 2 1 5
C 22 4 1 NA 1 5
D 2 2 0 2 1 5
E 5 2 1 2 1 6
F 7 1 0 2 1 5
G 8 6 1 2 1 5
H 2 8 0 2 2 5
I 1 3 1 2 1 5
J 3 2 0 2 1 5
I want to create a new data frame, in which include the individuals who meet some specific conditions.
E.g. if, for individual i, the range of column y_0101:y_0312 does NOT include values of 3 & 6 & NA, AND include values of 2 | 1 THEN for individual i should be included in new data frame. Which produce the following data frame:
ind y_0101 y_0102 y_0103 y_0104_ .... y_0311 y_0312
B 36 5 0 2 1 5
D 2 2 0 2 1 5
F 7 1 0 2 1 5
H 2 8 0 2 2 5
I tried different ways, but I can't figure out how to get multiple conditions included.
df <- df %>% filter(vars(starts_with("y_"))!=3 | !=6 | != NA)
or
df <- df %>% filter_at(vars(starts_with("y_")), all_vars(!=3 | !=6 | != NA)
I've tried some other things as well, like !%in%, but that doesn't seem to work. Any ideas?
I think you're almost there, but might need a slight shift in the logic:
df <- data.frame(A1 = 1:10,
A2 = 10:1,
A3 = 1:10,
B1 = 1:10)
df %>%
filter_at(vars(starts_with("A")), ~!(.x %in% c(3, 6, NA))) %>%
filter(if_any(starts_with("A"), ~ .x %in% c(1, 2)))
In the first step, I filter out all rows where any of the columns are 3, 6, or NA. In the second row, I filter down to only rows where at least one of the columns is 1 or 2. Does this help with your case?
Here is a base R option using rowSums :
cols <- grep('y_', names(df))
include <- c(1, 2)
not_include <- c(3, 6, NA)
result <- subset(df, rowSums(sapply(df[cols], `%in%`, include)) > 0 &
rowSums(sapply(df[cols], `%in%`, not_include)) == 0)
result
# ind y_0101 y_0102 y_0103 y_0104 y_0311 y_0312
#2 B 36 5 0 2 1 5
#4 D 2 2 0 2 1 5
#6 F 7 1 0 2 1 5
#8 H 2 8 0 2 2 5
data
df <- structure(list(ind = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J"), y_0101 = c(33L, 36L, 22L, 2L, 5L, 7L, 8L, 2L, 1L,
3L), y_0102 = c(6L, 5L, 4L, 2L, 2L, 1L, 6L, 8L, 3L, 2L), y_0103 = c(1L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L), y_0104 = c(2L, 2L, NA, 2L,
2L, 2L, 2L, 2L, 2L, 2L), y_0311 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L), y_0312 = c(5L, 5L, 5L, 5L, 6L, 5L, 5L, 5L, 5L, 5L
)), class = "data.frame", row.names = c(NA, -10L))

Merge two matrix by column names and row names

I want to merge the two matrices according to the column names and row names.
The values in both matrices are numeric, and the merger is to average the two matrices that have appeared together.
matrix1:
A B C
x 1 4 3
z 5 2 4
k 1 2 3
and matrix2:
A B C D
x 6 4 1 2
y 2 3 1 3
z 1 4 1 4
k 7 5 3 1
so the output will be:
A B C D
x 3.5 4 2 2
y 2 3 1 3
z 3 3 2.5 4
k 4 3.5 3 1
My idea is to use for loop or apply function, but if the matrices are big, then this program will run for a long time. Any advice? Thank you!
You can use rownames and colnames to subset matrix2 and update only part of it.
matrix2[rownames(matrix1), colnames(matrix1)] <- (matrix1 + matrix2[rownames(matrix1), colnames(matrix1)])/2
matrix2
# A B C D
#x 3.5 4.0 2.0 2
#y 2.0 3.0 1.0 3
#z 3.0 3.0 2.5 4
#k 4.0 3.5 3.0 1
data
matrix1 <- structure(c(1L, 5L, 1L, 4L, 2L, 2L, 3L, 4L, 3L), .Dim = c(3L,
3L), .Dimnames = list(c("x", "z", "k"), c("A", "B", "C")))
matrix2 <- structure(c(6L, 2L, 1L, 7L, 4L, 3L, 4L, 5L, 1L, 1L, 1L, 3L, 2L,
3L, 4L, 1L), .Dim = c(4L, 4L), .Dimnames = list(c("x", "y", "z",
"k"), c("A", "B", "C", "D")))

Match and replace in R

I would like to match row names from table 1 with column names from table 2 and then replace them with corresponding names from column n in table 1.
table1
x y n
CAAGCCAAGCTAGATA 5 6 um
AATCCCAAGTGACACC 4 1 cs
AATCTCAAGTCACACC 4 1 cs
table2
CAAGCCAAGCTAGATA AATCCCAAGTGACACC AATCTCAAGTCACACC
a 1 3 5
b 2 3 4
c 6 3 6
d 8 3 5
result
um cs cs
a 1 3 5
b 2 3 4
c 6 3 6
d 8 3 5
One option is also to pass a named vector to do the matching
names(df2) <- setNames(df1$n, row.names(df1))[colnames(df2)]
df2
# um cs cs
#a 1 3 5
#b 2 3 4
#c 6 3 6
#d 8 3 5
data
df1 <- structure(list(x = c(5L, 4L, 4L), y = c(6L, 1L, 1L), n = c("um",
"cs", "cs")), class = "data.frame", row.names = c("CAAGCCAAGCTAGATA",
"AATCCCAAGTGACACC", "AATCTCAAGTCACACC"))
df2 <- structure(list(CAAGCCAAGCTAGATA = c(1L, 2L, 6L, 8L), AATCCCAAGTGACACC = c(3L,
3L, 3L, 3L), AATCTCAAGTCACACC = c(5L, 4L, 6L, 5L)),
class = "data.frame", row.names = c("a",
"b", "c", "d"))

R Data Frame remove rows with max values from all columns

Hello I have the data frame and I need to remove all the rows with max values from each columns.
Example
A B C
1 2 3 5
2 4 1 1
3 1 4 3
4 2 1 1
So the output is:
A B C
4 2 1 1
Is there any quick way to do this?
We can do this with %in%
df1[!seq_len(nrow(df1)) %in% sapply(df1, which.max),]
# A B C
#4 2 1 1
If there are ties for maximum values in each row, then do
df1[!Reduce(`|`, lapply(df1, function(x) x== max(x))),]
df[-sapply(df, which.max),]
# A B C
#4 2 1 1
DATA
df = structure(list(A = c(2L, 4L, 1L, 2L), B = c(3L, 1L, 4L, 1L),
C = c(5L, 1L, 3L, 1L)), .Names = c("A", "B", "C"),
class = "data.frame", row.names = c(NA,-4L))

Reshape: “Error: index out of bounds”

I have a quite big dataframe with the following structure:
image coef v3 v4 v5 v6 ... v20
1 A 0 1 2 3
1 B 2 4 6 5
1 C 1 2 4 7
1 D 4 5 6 4
2 A 2 3 4 5
2 B 2 3 4 5
2 C 2 3 4 5
2 D 2 3 4 5
And I need to end up with "flattened" structure on the coef variable for each image index. Now each image have the variables with the shape [4:20] but i need it to be [1:80] with the patern [A,B,C,D,A',B',C',D'...].
like this:
image v3 v4 v5 v6 v7 v8 v9 v10 ... v80
1 0 2 1 4 1 4 2 5
2 2 2 2 2 3 3 3 3
I tried to do:
reshape(df, timevar = "coef", idvar = "image", direction = "wide")
But i gives me the Error :
Error in data[, timevar] : subindex out of bounds
Also I tried the library Reshape2 with:
dcast(df, image~coef, value.var= )
but since I have more than one value.var column I cannot figure out how to do it.
We can melt and then do the dcast
library(data.table)
dM <- melt(setDT(df1), id.var=c("image", "coef"))
dcast(dM, image~variable+coef, value.var="value")
Or use recast (which is a wrapper for melt/dcast) from reshape2
library(reshape2)
recast(df1, id.var=c("image", "coef"),image~variable+coef, value.var="value")
# image v3_A v3_B v3_C v3_D v4_A v4_B v4_C v4_D v5_A v5_B v5_C v5_D v6_A v6_B v6_C v6_D
#1 1 0 2 1 4 1 4 2 5 2 6 4 6 3 5 7 4
#2 2 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5
data
df1 <- structure(list(image = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
coef = c("A",
"B", "C", "D", "A", "B", "C", "D"), v3 = c(0L, 2L, 1L, 4L, 2L,
2L, 2L, 2L), v4 = c(1L, 4L, 2L, 5L, 3L, 3L, 3L, 3L), v5 = c(2L,
6L, 4L, 6L, 4L, 4L, 4L, 4L), v6 = c(3L, 5L, 7L, 4L, 5L, 5L, 5L,
5L)), .Names = c("image", "coef", "v3", "v4", "v5", "v6"),
class = "data.frame", row.names = c(NA, -8L))

Resources