Extract data from column aggregate function in R - r

I have a large database from which I have extracted a data value (x) using the aggregate function:
library(plotrix)
aggregate(mydataNC[,c(52)],by=list(patientNC, siteNC, supNC),max)
OUTPUT:
Each (x) value has a corresponding distance value in located in a column titled (dist) in this database.
What is the easiest way to extract the value dist and added to the table?

I'd probably start with merge() first. Here's a small reproducible example you can use to see what's going on and modify it to use your data:
# generate bogus data and view it
x1 <- rep(c("A", "B", "C"), each = 4)
x2 <- rep(c("E", "E", "F", "F"), times = 3)
y1 <- rnorm(12)
y2 <- rnorm(12)
md <- data.frame(x1, x2, y1, y2)
> head(md)
x1 x2 y1 y2
1 A E -1.4603164 -0.9662473
2 A E -0.5247227 1.7970341
3 A F 0.8990502 1.7596285
4 A F -0.6791145 2.2900357
5 B E 1.2894863 0.1152571
6 B E -0.1981511 0.6388998
# aggregate by taking maximum of each unique (x1, x2) combination
md.agg <- with(md, aggregate(y1, by = list(x1, x2), FUN = max))
names(md.agg) <- c("x1", "x2", "y1")
> md.agg
x1 x2 y1
1 A E -0.5247227
2 B E 1.2894863
3 C E 0.9982510
4 A F 0.8990502
5 B F 2.5125956
6 C F -0.5916491
# merge y2 into the aggregated data
md.final <- merge(md, md.agg)
> md.final
x1 x2 y1 y2
1 A E -0.5247227 1.7970341
2 A F 0.8990502 1.7596285
3 B E 1.2894863 0.1152571
4 B F 2.5125956 -0.2217510
5 C E 0.9982510 0.6813261
6 C F -0.5916491 1.0348518

Related

R: compare two groups of vectors

I have made two recommendation systems and would like to compere the products they recommend and to see how many products are mutual. I joined the two results into data frame - one recommendation system columns starts with "z", other one with "b".
Example data:
df <- data.frame(z1 = c("a", "s", "d"), z2 = c("z", "x", "c"), z3 = c("q", "w", "e"),
b1 = c("w", "a", "e"), b2 = c("a", "i", "r"), b3 = c("z", "w", "y"))
ID z1 z2 z3 b1 b2 b3
1 a z q q a z
2 s x w a i r
3 d c e r e y
Desired results:
ID z1 z2 z3 b1 b2 b3 mutual_recommendation
1 a z q q a z 3
2 s x w a i r 0
3 d c e e r y 1
The problem is that the order might not be the same and compering all the combinations is by Case or ifelse would be a lot of combination, specially when number of Top-N recommendation will change to 10.
We can use an apply to loop over the rows of the subset of dataset (removed the 'ID' column), get the length of intersect of the first 3 and next 3 elements
df$mutual_recommendation <- apply(df[-1], 1, FUN = function(x)
length(intersect(x[1:3], x[4:6])))
df$mutual_recommendation
#[1] 3 0 1
Here is another solution (note: I changed the data.frame code to produce the data frame that is actually shown under it in the question - they do not match):
> library(dplyr)
> df %>% mutate(mutual_recommendation=apply(df,1,function(x) sum(x[1:3] %in% x[4:6]) ))
z1 z2 z3 b1 b2 b3 mutual_recommendation
1 a z q q a z 3
2 s x w a i r 0
3 d c e r e y 1

R - Link row values to create an ID column

I have the following data frame:
X1 X2 X3 X4 X5
a 1 4 d e
f 2 5 i j
k 3 6 n o
I would like to create an ID column based on row values such that:
X1 X2 X3 X4 X5 ID
a 1 4 d e a14de
f 2 5 i j f25ij
k 3 6 n o k36no
Is there a way to do so?
Some variables are character and some numeric.
We can use paste to create the 'ID'
df1$ID <- do.call(paste0, df1)

R replacing a column from a data frame with a row from another data frame

I want to replace the first column of A with the first row of B. For example:
A <- data.frame(matrix("a", 4, 4), stringsAsFactors = FALSE)
> A
X1 X2 X3 X4
1 a a a a
2 a a a a
3 a a a a
4 a a a a
B <- data.frame(matrix("b", 4, 4), stringsAsFactors = FALSE)
> B
X1 X2 X3 X4
1 b b b b < Take this row
2 b b b b
3 b b b b
4 b b b b
I want A to become:
> A
X1 X2 X3 X4
1 b a a a
2 b a a a
3 b a a a
4 b a a a
^
replace it with this column
I tried:
A[, 1] = B[1, ]
But I get the following warning message:
In `[<-.data.frame`(`*tmp*`, , 1, value = list(X1 = "b", X2 = "b", :
provided 4 variables to replace 1 variables
By default, R does not drop the dimension when there is just one row left (while it does when there is just one column).
From ?extract.data.frame:
drop: logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left, but not to drop if only one row is left.
You can see that doing:
A[, 1]
# [1] "a" "a" "a" "a"
The result is a vector
and
B[1, ]
# X1 X2 X3 X4
#1 b b b b
the result is still a data.frame
You need to unlist the result:
A[, 1] = unlist(B[1, ])
A
# X1 X2 X3 X4
#1 b a a a
#2 b a a a
#3 b a a a
#4 b a a a
This should also work, without changing row / col names:
A[, 1] = t(B)[,1]
This should do it
A[, 1] = t(B[1, ])

Print method for multiple matrices

Could someone please suggest a method to print several matrices side by side in the terminal window.
For the matrices m1 and m2, I would like the desired output below.
m1 <- m2 <- matrix(1:4, nrow=2, dimnames=list(c("a", "b"), c("d", "e")))
Desired output
m1 m2
d e d e
a 1 3 a 1 3
b 2 4 b 2 4
The reason is that I have several 2x2 matrices that i am using in calculations and want to show in a Rmarkdown doc. It takes up a bit too much of the page when printing length ways. Thanks.
EDIT
My attempt at a solution
fn <- function(x) setNames(data.frame(.=paste(" ", rownames(x)), x,
check.names=F, row.names=NULL),c(paste(substitute(x)), colnames(x)))
cbind(fn(m1), fn(m2))
# m1 d e m2 f g
#1 a 1 3 v 1 3
#2 b 2 4 w 2 4
But this of course doesnt look very good.
A little hack-ish, but I believe it is what you want:
m1 <- m2 <- m3 <- m4 <- matrix(1:4, nrow=2, dimnames=list(c("a", "b"), c("d", "e")))
fn <- function(x) setNames(data.frame(.=paste("", rownames(x)), x, check.names=F, row.names=NULL),c(" ", colnames(x)))
matrix.names <- Filter( function(x) 'matrix' %in% class( get(x) ), ls(pattern = "m") )
matrix.list <- lapply(matrix.names, get)
matrix.chain <- do.call(cbind, lapply(matrix.list, fn))
cat(" ", paste0(matrix.names, collapse = " "), "\n"); print(matrix.chain, row.names = FALSE)
m1 m2 m3 m4
d e d e d e d e
a 1 3 a 1 3 a 1 3 a 1 3
b 2 4 b 2 4 b 2 4 b 2 4

remove duplicate rows based on conditions from multiple columns in r

I have a data set I would like to remove the rows of data that have duplicate information in 4 different columns.
foo<- data.frame(g1 = c("1","0","0","1","1"), v1 = c("7","5","4","4","3"), v2 = c("a","b","x","x","e"), y1 = c("y","c","f","f","w"), y2= c("y","y","y","f","c"), y3 = c("y","c","c","f","w"), y4= c("y","y","f","f","c"), y5=c("y","w","f","f","w"), y6=c("y","c","f","f","w"))
foo then looks like:
g1 v1 v2 y1 y2 y3 y4 y5 y6
1 1 7 a y y y y y y
2 0 5 b c y c y w c
3 0 4 x f y c f f f
4 1 4 x f f f f f f
5 1 3 e w c w c w w
Now, I want to remove any row that has duplicated data based on the Y1-6columns. So, only row 4 and 1 would be removed if done properly, based on all Y variables being the exact same. Its a multiple column condition.
I believe I am close, but its just not working correctly.
I have tried: new = foo[!(duplicated(foo[,1:6]))]
thinking to use the duplicated command that it would search and only find those that matched exactly?
I thought about using a conditional statement with &, but can't figure out how to do that either.
new = foo[foo$y1==foo$y2|foo$y3|foo$y4|foo$y5|foo$y6]
I thought about which but Im now overwhelmed and lost. I would expect foo to look like:
g1 v1 v2 y1 y2 y3 y4 y5 y6
2 0 5 b c y c y w c
3 0 4 x f y c f f f
5 1 3 e w c w c w w
> foo[apply(foo[ , paste("y", 1:6, sep = "")], 1,
FUN = function(x) length(unique(x)) > 1 ), ]
g1 v1 v2 y1 y2 y3 y4 y5 y6
2 0 5 b c y c y w c
3 0 4 x f y c f f f
5 1 3 e w c w c w w
foo[apply(foo, 1, function(x) any(x != x[1])),]
> foo[ !rowSums( apply( foo[2:6], 2, "!=", foo[1] ) )==0, ]
y1 y2 y3 y4 y5 y6
2 c y c y w c
3 f y c f f f
5 w c w c w w
> foo[ ! colSums( apply( foo, 1, duplicated, foo[1] ) ) == 5, ]
y1 y2 y3 y4 y5 y6
2 c y c y w c
3 f y c f f f
5 w c w c w w

Resources