I have vectors:
a <- c(1,2,4,5,6)
b <- c(2,4,5)
A want to extract values from 'a' which are not in 'b', so desired output is:
1,6
How could i do that?
We can use setdiff
setdiff(a, b)
#[1] 1 6
Or if there are duplicates
library(vecsets)
vsetdiff(a, b)
Or using %in% and !
a[! a %in% b]
Related
I have two numeric vectors:
a <- c(1,2,3,4,5,6,7,8)
b <- c(4,2,2,3,9,10,7,7,10,14)
I want to set any number in b that does not appear in a to zero.
My desired result is:
c <- c(4,2,2,3,0,0,7,7,0,0)
who can I do this in an elegant way?
(I was thinking to use left_join but I think there must be some more elegant approach)
You can do this by subsetting b with the %in% function:
b[! b %in% a] <- 0
Use the negation of the %in% condition:
b[!b %in% a] <- 0
ifelse(b %in% a, b, 0) seems to do it.
We can use replace
replace(b, !b %in% a, 0)
Say one has a data frame as follows:
data <- data.frame('obs' = c('a','c','b'), 'top1' = c('a','b','c'), 'top2' = c('b', 'c', 'f'), 'top3' = c('g', 'h', 'd'))
I wan to compute a new column topn which is a conditional that works in the following fashion: if the value of obs is in any of the top columns then topn should be equal to obs, otherwise topn can be assigned any value, say top1. Of course I know I can do this with an or and ifelse, but I'm looking for a shorter way to write it because in my table I can have up to 10 top columns.
obs top1 top2 top3 topn
a a b g a
c b c h c
b c f d c
If we are looking for a vectorized approach, then we can use the rowSums on a logical matrix to find if there are any matches, then with ifelse get the column values based on the logical vector
i1 <- data[-1] == data['obs'][col(data[-1])]
data$topn <- ifelse(rowSums(i1) != 0, as.character(data$obs), as.character(data$top1))
data$topn
#[1] "a" "c" "c"
This might be helpful and quick.
f=function(a){
if(a[1] %in% a[-1]){
return (a[1])
}
else{sample(a[-1],1)}
}
data$topn=apply(data,1,f)
I have the following df and use-case, I'd like to find and set something in all rows for which exist another row satisfying a condition e.g.
df <- data.frame(X=c('a','b','c'), Y=c('a','c','d'))
> df
X Y
1 a a
2 b c
3 c d
I'd like to find those rows whos Y value is the same as X value in another row. In the example above would be row #2 is true because Y = c and row #3 has X = c. Note that row #1 does not satisfy the condition.
Something like:
df$Flag <- find(df, Y == X_in_another_row(df))
1
For each Y, we check if any value in X (other than in the same row) matches.
sapply(1:NROW(df), function(i) df$Y[i] %in% df$X[-i])
#[1] FALSE TRUE FALSE
If indices are necessary, wrap the whole thing in which
which(sapply(1:NROW(df), function(i) df$Y[i] %in% df$X[-i]))
#[1] 2
2 (not tested well)
df <- data.frame(X=c('a','b','c'), Y=c('a','c','d'), stringsAsFactors = FALSE)
temp = outer(df$X, df$Y, "==") #Check equality among values of X and Y
diag(temp) = FALSE #Set diagonal values as FALSE (for same row)
colSums(temp) > 0
#[1] FALSE TRUE FALSE
which(match(df$Y,df$X)!=1:nrow(df))
I think this should work.
df <- data.frame(X= c(1,2,3,4,5,3,2,1), Y = c(1,2,3,4,5,6,7,8))
which(with(df, (X %in% Y) & (X != Y)))
Works on the original data.frame, if we set stringsasfactors=FALSE
df <- data.frame(X=c('a','b','c'), Y=c('a','c','d'), stringsAsFactors = F)
which(with(df, (X %in% Y) & (X != Y)))
Quite convoluted but I'll put it here anyway. This should work even if there are repeated values in X.
For example with the following dataframe df2:
df2 = data.frame(X=c('a','b','c','a','d'), Y=c('a','c','d','e','b'))
X Y
1 a a
2 b c
3 c d
4 a e
5 d b
## Specifying the same factor levels allows us to get a square matrix
df2$X = factor(df2$X,levels=union(df2$X,df2$Y))
df2$Y = factor(df2$Y,levels=union(df2$X,df2$Y))
m = as.matrix(table(df2))
valY = rowSums(m)*colSums(m)-diag(m)
which(df2$Y %in% names(valY)[as.logical(valY)])
[1] 1 2 3 5
Essentially you want to know whether Y is in X but you want the condition to be FALSE when X == Y:
df$Z <- with(df, (Y != X) & (Y %in% X))
# Assume you want to use position 4, value 'c', to find all the rows that Y is 'c'
df <- data.frame(X = c('a', 'b', 'd', 'c'),
Y = c('a', 'c', 'c', 'd'))
row <- 4 # assume the desire row is position 4
val <- as.character( df[(row),'X'] ) # get the character and turn it into character type
df[df$Y == val,]
# Result
# X Y
# 2 b c
# 3 d c
Newbie in R and I've been trying to find a neat (not using a loop) way to do the following:
x <- c(0, 4)
y <- c(1, 2)
df <- data.frame(x,y)
therefore if I want to output all x for which y=1:
df$x[df$y == 1]
but what if I have a vector such as a <- c(1,1,1)?
I can't just do:
df$x[df$y == a]
because it subsets just once:
[1] 0
but I want the output to be the vector c(0,0,0)
Obviously this isn't the way to go about it, but any clues as to which is?
Thanks!
I think what you're after is %in%. Try:
df$x[df$y %in% a]
I think you are looking for %in%:
df$x[df$y %in% a]
%in% returns TRUE for each value in df$y when it is in a.
Proper way to do this is
df[df$y %in% a,]$x
or
df[df$y %in% a,'x']
According to your question, the desired result is the vector c(0,0,0). One way you could achieve that is:
rep(df$x[df$y %in% a], length(a))
#[1] 0 0 0
But you need to be aware of the implications, for example if you change a so that it contains different numbers. Here's another example:
a <- c(3,1,2)
rep(df$x[df$y %in% a], length(a))
#[1] 0 4 0 4 0 4
So in this case, the output has a length of 2*length(a) because two different values of a match an entry in df$y. It is not clear from your question what behavior you want in such a case. So here's a third example if you want each value of a repeated only as often as the number of elements in a that are also present in df$y:
a <- c(3,1,2)
rep(df$x[df$y %in% a], length(a[a %in% df$y]))
#[1] 0 4 0 4
Let's consider two matrices A and B. A is a subset of B. How to find the index of each row of A in matrix B?
Here is a reproductible example:
set.seed(30)
B <- matrix(rnorm(n =30,mean = 0), ncol=3)
A <- subset(B, B[,1] > 1)
The goal is to find the indices idx which in this case gives row 4 and 5.
Nested apply loops should do it.
apply(A, 1, function(a)
which(apply(B, 1, function(b) all(b==a)))
)
# [1] 4 5
Or alternatively, using colSums
apply(A, 1, function(a)
which(colSums(t(B) == a) == ncol(B)))
# [1] 4 5
Alternatively, you could do this:
transform(A, idx = 1 * duplicated(rbind(A, B))[-seq_len(nrow(A))])
A nice solution without apply, originally by #Arun.
> match(apply(A, 1, paste, collapse="\b"), apply(B, 1, paste, collapse="\b"))
[1] 4 5
This takes a slightly different approach and relies on the fact that a matrix is a vector, it won't work if you have data.frames:
which( B %in% A , arr.ind=TRUE )[1:nrow(A)]
#[1] 4 5
And if you had really big matrices and wanted to be a bit more efficient you could use %in% on a subset like so:
which( B[1:nrow(B)] %in% A[1:nrow(A)] , arr.ind=TRUE )
But I don't expect this would make too much of a difference except in really big matrices.
If you had your data as data.frames you could do the same thing by passing just the first column to which:
A <- data.frame(A)
B <- data.frame(B)
which( B$X1 %in% A$X1 )
#[1] 4 5