I have a matrix named dx:
a b c d e f g h
cat 0 0 0 0 0 0 0 0
dog 1 0 1 0 0 0 0 1
fish 1 1 1 0 0 0 0 0
egg 0 0 0 0 0 0 0 0
How do I delete the rows that goes all zero across like cat and egg. So that I can end up with this only -
a b c d e f g h
dog 1 0 1 0 0 0 0 1
fish 1 1 1 0 0 0 0 0
You can try something like this:
m<-matrix(c(1,1,1,0,
0,0,0,0,
1,0,1,0,
0,0,0,0,
1,1,1,1),ncol=4,byrow=T)
m[rowSums(abs(m))!=0,]
zeros_removed = apply(dx, 1, function(row) all(row !=0 ))
dx[zeros_removed,]
Related
If I have a data frame like this:
dt <- data.frame(cols = letters[1:6])
dt
#> cols
#> 1 a
#> 2 b
#> 3 c
#> 4 d
#> 5 e
#> 6 f
How to create new columns using data in the cols column (with 1s on the diagonal):
a b c d e f
a 1 0 0 0 0 0
b 0 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 0
f 0 0 0 0 0 1
In base R, we can use table
out <-table(dt$col, dt$col)
-output
out
a b c d e f
a 1 0 0 0 0 0
b 0 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 0
f 0 0 0 0 0 1
Or use diag
`dimnames<-`(diag(nrow(dt)), list(dt$col, dt$col))
Another possible solution:
m <- matrix(0, 6, 6, dimnames = list(dt$cols, dt$cols))
diag(m) <- 1
m
#> a b c d e f
#> a 1 0 0 0 0 0
#> b 0 1 0 0 0 0
#> c 0 0 1 0 0 0
#> d 0 0 0 1 0 0
#> e 0 0 0 0 1 0
#> f 0 0 0 0 0 1
I have created a prediction matrix for large dataset as follows:
library(mice)
dfpredm <- quickpred(df, mincor=.3)
A B C D E F G H I J
A 0 1 1 1 0 1 0 1 1 0
B 1 0 0 0 1 0 1 0 0 1
C 0 0 0 1 1 0 0 0 0 0
D 1 0 1 0 0 1 0 1 0 1
E 0 1 0 1 0 1 1 0 1 0
**F 0 0 1 0 0 0 1 0 0 0**
G 0 1 0 1 0 0 0 0 0 0
H 1 0 1 0 0 1 0 0 0 1
I 0 1 0 1 1 0 1 0 0 0
J 1 0 1 0 0 1 0 1 0 0
I would like to create a subset of the original df on the basis on dfpredm.
More specifically I would like to do the following:
Let's assume that my dependent variable is F.
According to the prediction matrix F is correlated with C and G.
In addition, C and G are best predicted by D,E and B,D respectively.
The idea is now to create a subset of df based on the dependent variable F,for which in the F row the value is 1.
Fpredictors <- df[,(dfpredm["F",]) == 1]
But also do the same for the variables where the rows in F are 1. I am thinking of first getting the column names like this:
Fpredcol <-colnames(dfpredm[,(dfpredm["c241",]) == 1])
And then doing a for loop with these column names?
For the specific example I would like to end up with the subset.
dfsub <- df[,c("F","C","G","B","E","D")]
I would however like to automate this process. Could anyone show me how to do this?
Here is one strategy that seems like it would work for you:
first_preds <- function(dat, predictor) {
cols <- which(dat[predictor, ] == 1)
names(dat)[cols]
}
# wrap first_preds() for getting best and second best predictors
first_and_second_preds <- function(dat, predictor) {
matches <- first_preds(dat, predictor)
matches <- c(matches, unlist(lapply(matches, function(x) first_preds(dat, x))))
c(predictor, matches) %>% unique()
}
dat[first_and_second_preds(dat, "F")] # order is not exactly the same as your output
F C G D E B
A 1 1 0 1 0 1
B 0 0 1 0 1 0
C 0 0 0 1 1 0
D 1 1 0 0 0 0
E 1 0 1 1 0 1
F 0 1 1 0 0 0
G 0 0 0 1 0 1
H 1 1 0 0 0 0
I 0 0 1 1 1 1
J 1 1 0 0 0 0
Not sure if the ordering in the result is important, but you could add the logic if it is.
Using dat from here (a kinder way to share small R data on SO):
dat <- read.table(
text = "A B C D E F G H I J
A 0 1 1 1 0 1 0 1 1 0
B 1 0 0 0 1 0 1 0 0 1
C 0 0 0 1 1 0 0 0 0 0
D 1 0 1 0 0 1 0 1 0 1
E 0 1 0 1 0 1 1 0 1 0
F 0 0 1 0 0 0 1 0 0 0
G 0 1 0 1 0 0 0 0 0 0
H 1 0 1 0 0 1 0 0 0 1
I 0 1 0 1 1 0 1 0 0 0
J 1 0 1 0 0 1 0 1 0 0",
header = TRUE
)
Something a little more general that would let you use self_select predictors directly:
all_preds <- function(dat, predictors) {
unlist(lapply(predictors, function(x) names(dat)[which(dat[x, ] == 1 )]))
}
dat[all_preds(dat, c("A", "B"))]
B C D F H I A E G J
A 1 1 1 1 1 1 0 0 0 0
B 0 0 0 0 0 0 1 1 1 1
C 0 0 1 0 0 0 0 1 0 0
D 0 1 0 1 1 0 1 0 0 1
E 1 0 1 1 0 1 0 0 1 0
F 0 1 0 0 0 0 0 0 1 0
G 1 0 1 0 0 0 0 0 0 0
H 0 1 0 1 0 0 1 0 0 1
I 1 0 1 0 0 0 0 1 1 0
I am stuck here. I tried using spread twice from tidyr, I tried joining. But none of these methods give the right solution without some hard coding.
Is there any way to tranform this data:
cat1 cat2 title
1 A G AB
2 B G BC
3 C B CD
4 D G DE
5 E H EF
6 F A FG
into this:
A B C D E F G H
AB 1 0 0 0 0 0 1 0
BC 0 1 0 0 0 0 1 0
CD 0 1 1 0 0 0 0 0
DE 0 0 0 1 0 0 1 0
EF 0 0 0 0 1 0 0 1
FG 1 0 0 0 0 1 0 0
Sample data:
df<-data.frame(cat1=LETTERS[1:6],
cat2=c('G','G','B','G','H','A'),
title=paste0(LETTERS[1:6],LETTERS[2:7]))
Since I usually get dplyr answers faster: Base R or tidyr only solutions are also very welcome
I don't know if this qualifies as not hard coding for the op
df %>%
tidyr::gather(key = vars, value = values, cat1, cat2) %>%
dplyr::mutate(vars = 1) %>%
tidyr::spread(key = values, value = vars, fill = 0)
# title A B C D E F G H
# 1 AB 1 0 0 0 0 0 1 0
# 2 BC 0 1 0 0 0 0 1 0
# 3 CD 0 1 1 0 0 0 0 0
# 4 DE 0 0 0 1 0 0 1 0
# 5 EF 0 0 0 0 1 0 0 1
# 6 FG 1 0 0 0 0 1 0 0
Just melt first, then cast:
require(reshape2)
melt(df, id="title") %>% dcast(title ~ value, length)
title A B C D E F G H
1 AB 1 0 0 0 0 0 1 0
2 BC 0 1 0 0 0 0 1 0
3 CD 0 1 1 0 0 0 0 0
4 DE 0 0 0 1 0 0 1 0
5 EF 0 0 0 0 1 0 0 1
6 FG 1 0 0 0 0 1 0 0
melt puts all the values in a single column to cast.
I have a matrix, that has been formed after using cbind()
! ? c e i k l t
dif 0 0 1 0 0 0
dor 1 0 0 0 0 0
dor 0 0 0 0 0 1
same 0 0 0 1 0 0
same 0 1 0 0 0 0
Suggest me a code in R that could merge the rows as below
! ? c e i k l t
same 1 1 0 1 0 0
dif 0 0 1 0 0 0
dor 1 0 0 0 0 1
Thank you..
df<-read.table(header=T,text="ID c e i k l t
dif 0 0 1 0 0 0
dor 1 0 0 0 0 0
dor 0 0 0 0 0 1
same 0 0 0 1 0 0
same 0 1 0 0 0 0")
require(plyr)
ddply(df,.(ID),function(x)colSums(x[,-1]))
ID c e i k l t
1 dif 0 0 1 0 0 0
2 dor 1 0 0 0 0 1
3 same 0 1 0 1 0 0
Command acknowledged:
aggregate(df[, -1], list(df[, 1]), function(x) {
Reduce("|", x)
})
# Group.1 c e i k l t
# 1 dif 0 0 1 0 0 0
# 2 dor 1 0 0 0 0 1
# 3 same 0 1 0 1 0 0
Do you want the sum, or do you want the logical OR:
Logical OR:
require(functional)
aggregate(. ~ ID, data=df, FUN=Compose(any, as.numeric))
ID c e i k l t
1 dif 0 0 1 0 0 0
2 dor 1 0 0 0 0 1
3 same 0 1 0 1 0 0
Sum:
aggregate(. ~ ID, data=df, FUN=sum)
The result here is the same.
Suppose I have a long table like this:
A <- rep(c("a","b","c","d"),each=4)
B <- rep(c("e","f","g","h"),4)
C <- rep(c("i","j"),8)
D <- rnorm(16)
df <- data.frame(A,B,C,D)
head(df)
A B C D
1 a e i -0.18984508
2 a f j -1.82703822
3 a g i -0.17307580
4 a h j -1.38104238
5 b e i 0.08699983
6 b f j -0.36442461
I would like to change to long table to a wide format so that each element in column A and B is a title of a column. Each row should be a 1 or 0 indicating if elements exists. Column C and D remains the same. The desired table is something like this:
C D a b e f g h
i -0.18984508 1 0 1 0 0 0
j -1.82703822 1 0 0 1 0 0
i -0.17307580 1 0 0 0 1 0
j -1.38104238 1 0 0 0 0 1
i 0.08699983 0 1 1 0 0 0
j -0.36442461 0 1 0 1 0 0
This is a form of reshaping which can be done with the reshape2 package.
library("reshape2")
dcast(melt(df, id.vars=c("C", "D")), C+D~value, fun.aggregate=length)
which gives
C D a b c d e f g h
1 i -1.44485242 0 1 0 0 0 0 1 0
2 i -0.80834639 0 0 0 1 0 0 1 0
3 i -0.15202085 0 0 0 1 1 0 0 0
4 i -0.05626233 1 0 0 0 1 0 0 0
5 i 0.12031754 1 0 0 0 0 0 1 0
6 i 0.62206658 0 0 1 0 0 0 1 0
7 i 0.77101891 0 1 0 0 1 0 0 0
8 i 1.38752097 0 0 1 0 1 0 0 0
9 j -2.52137154 0 0 0 1 0 0 0 1
10 j -0.53231537 0 1 0 0 0 0 0 1
11 j -0.30178539 1 0 0 0 0 0 0 1
12 j -0.29823112 1 0 0 0 0 1 0 0
13 j -0.12988540 0 1 0 0 0 1 0 0
14 j 0.00517754 0 0 1 0 0 1 0 0
15 j 0.51452289 0 0 1 0 0 0 0 1
16 j 0.53260223 0 0 0 1 0 1 0 0
The order is not the same as the original data set, but if that is important put an order column in, carry it through, and then sort on it at the end.