R: unshared elements across multiple vectors (opposite to intersect)

R: unshared elements across multiple vectors (opposite to intersect) - r

I want to extract all the shared elements and unshared elements between multiple vectors.
Say I have these vectors:
set.seed(9)
a <- sample(LETTERS,10,replace=F)
b <- sample(LETTERS,10,replace=F)
c <- sample(LETTERS,10,replace=F)
I first explore their overlap with a Venn diagram:
venn.diagram(list('a'=a,'b'=b,'c'=c), filename="test.png", height=1000, width=1000, imagetype="png", units="px")
I know how to obtain the elements shared by all the vectors (the central 3), this way:
shared <- Reduce(intersect, list(a,b,c))
length(shared)#3, correct
However, how can I obtain the unshared elements across the groups (5+7+5=17)?
My attempt is the following:
outersect <- function(a,b) unique(c(setdiff(a,b), setdiff(b,a)))
unshared <- Reduce(outersect, list(a,b,c))
length(unshared)#20, I expect 17 (5+7+5)
But the number is incorrect, since comparisons are made on a pairwise basis... Any idea to do this easily?

My approach would be to combine all those vector first.
then count frequency with table function and lastly calculate the length
temp = c(a,b,c)
temp_table = table(temp)
length(temp_table[temp_table == 1])
and use names if you want to show the unique element
names(temp_table[temp_table == 1])

How about this
lapply(1:3,function(i){
sets[[i]][!sets[[i]] %in% Reduce(union,sets[i != c(1,2,3)],init = NULL)]
})
making a union of the vectors not used and checking which element of the other vector is not in the union

Related

Split lists by half of their lengths in R

I have a tibble with variable, that contains a lists. Each list has a different lengths. I would like to have two new variables, let’s say “lon” and “lat”. In variable “lon” I’d like to have first half of each list, and in variable “lat” the second half.
data:
file_url <- "https://github.com/slawomirmatuszak/Covid.UA/raw/master/sample.Rda?raw=true"
load(url(file_url))
I can achieve that by filtering lists, but I’d like to do this by more universal code (based on lengths, not a specific number).
sample.data$lon <- lapply(sample.data$geometry, function(x) unlist(x)[x<40])
sample.data$lat <- lapply(sample.data$geometry, function(x) unlist(x)[x>40])

Probably, you can try with length to get first half and second half of geometry column.
sample.data$lat <- sapply(sample.data$geometry, function(x)
{tmp <- unlist(x);tmp[1:(length(tmp)/2)]})
sample.data$lon <- sapply(sample.data$geometry, function(x)
{tmp <- unlist(x);tmp[((length(tmp)/2) + 1):length(tmp)]})

I want to apply two functions one function on the block diagonal and the second function on the off-diagonal elements in the data frame

df<- data.frame(a=c(1:10), b=c(21:30),c=c(1:10), d=c(14:23),e=c(11:20),f=c(-6:-15),g=c(11:20),h=c(-14:-23),i=c(4:13),j=c(1:10))
In this data frame, I have three block-diagonal matrices which are as shown in the image below
I want to apply two functions, one is the sine function for block diagonal and the second is cosine function for the other elements and generates the same structure of the data frame.
sin(df[1:2,1:2])
sin(df[3:5,3:5])
sin(df[6:10,6:10])
cos(the rest of the elements)

1) outer/arithmetic Create a logical block diagonal matrix indicating whether the current cell is on the block diagonal or not and then use that to take a convex combination of the sin and cos values giving a data.frame as follows:
v <- rep(1:3, c(2, 3, 5))
ind <- outer(v, v, `==`)
ind * sin(df) + (!ind) * cos(df)
2) ifelse Alternately, this gives a matrix result (or use as.matrix on the above). ind is from above.
m <- as.matrix(df)
ifelse(ind, sin(m), cos(m))
3) Matrix::bdiag Another approach is to use bdiag in the Matrix package (which comes with R -- no need to install it).
library(Matrix)
ones <- function(n) matrix(1, n, n)
ind <- bdiag(ones(2), ones(3), ones(5)) == 1
Now proceed as in the last line of (1) or as in (2).

If it's okay for you that the result is stored in a new data frame you could change the order of your instructions and do it like that:
ndf <- cos(df)
ndf[1:2,1:2] <- sin(df[1:2,1:2])
ndf[3:5,3:5] <- sin(df[3:5,3:5])
ndf[6:10,6:10] <- sin(df[6:10,6:10])

Count the number of occurring a specific ordered sequence in R

I have this vector
data<-c(3,1,1,3,1,1,1,1,2,1,1,3,3,3,1,3,1,1,3,2,1,3,3,3,3)
I need to find the number of times I can have 1, then 2, then 3 (in this particular order)
So the expected answer for the above vector is 98 times (all possible ways).
Is there any efficient way to do so, as my actual problem will be a vector with many unique values (not simply as 1,2,3).
and here is my codes that give me the answer
data<-c(3,1,1,3,1,1,1,1,2,1,1,3,3,3,1,3,1,1,3,2,1,3,3,3,3)
yind<-which(data==2)
y1<-yind[1]
y2<-yind[2]
sum(data[1:y1]<data[y1])*sum(data[y1:length(data)]>data[y1])+sum(data[1:y2]<data[y2])*sum(data[y2:length(data)]>data[y2])
but it is not suitable for a vector with many unique values.For example
set.seed(3)
data2 <- sample(1:5,100,replace = TRUE)
and then count how many times I can have 1, then 2, then 3, then 4, then 5 (all possible ways).
Thank you

Here is an option using non-equi joins from data.table:
library(data.table)
v <- data2
tofind <- 1L:5L
dat <- data.table(rn=seq_along(v), v)
paths <- dat[v==tofind[1L]][, npaths := as.double(1)]
for (k in tofind[-1L]) {
paths <- paths[dat[v==k], on=.(rn<rn), allow.cartesian=TRUE, nomatch=0L,
by=.EACHI, .(npaths=sum(npaths))]
}
paths[, sum(npaths)]
Output for your data is 98.
Output for your data2 is 20873.
—-
Explanation:
Picture a n-nomial tree where each layer is the sequence of numbers that you are looking for and each vertex is the position of numbers in the data vector. For example, for data = c(1,2,1,2,3) the tree would look like
So the code goes through each layer and find the numbers of paths going into each vertex on that layer. The code uses a non-equi inner join to find those paths going into the vertices.

Here's an approach with expand.grid.
FindComb <- function(vector,variables){
grid <- do.call(expand.grid,lapply(variables,function(x) which(vector == x)))
sum(Reduce(`&`,lapply(seq(2,ncol(grid)), function(x) grid[,x-1] < grid[,x])))
}
FindComb(data,c(1,2,3))
#[1] 98
I expect it will not scale well with longer vectors or more numbers, but it works OK for smaller scales:
set.seed(3)
data2 <- sample(1:9,1000,replace = TRUE)
FindComb(data2,c(8,2,3))
[1] 220139

Faster method of counting specified values from rows in large matrix in R

MC is a very large matrix, 1E6 rows (or more) and 500 columns. I am trying to get the number of occurrences of the values 1 through 13 for each of the columns. Sometimes the number of occurrences for one of these values will be zero. I would like my final output to be a 300X13 matrix (or data frame) with these count values. I am wondering if anyone can suggest a more efficient manner then what I currently have, which is the following:
MCct<-matrix(0,500,13)
for (j in 1:500){
for (i in 1:13){
MCct[j,i]<-length(which(MC[,j]==i))}}
I don't that table works, because I need to also know if zero occurrences occurred...I couldn't figure it out how to do that if it is possible. And I am only somewhat familiar with apply, so maybe there is a method to use that...I haven't been successful in figuring that out yet.
Thanks for the help,
Vivien

You could do this with sapply (to iterate from 1 to 13) and colSums (to add up the columns of j):
MCct <- sapply(1:13, function(i) {
colSums(MC == i)
})

Suppose you have a set of values you're interested in
set <- 1:4
n = length(set)
and you have a matrix that includes those values, and others
m <- matrix(sample(10, 120, TRUE), 12, 10)
Create a vector indicating the index in the set of each matching value
idx <- match(m, set)
then make the index unique to each column
idx <- idx + (col(m) - 1) * n
idx ranges from 1 (occurrences of the first set element in the first column) to n * ncol(m) (occurrence of the nth set element in the last column of m). Tabulate the unique values of idx
v <- tabulate(idx, nbin = n * ncol(m))
The first n elements of v summarize the number of times set elements 1..n appear in the first column of m. The second n elements of v summarize the number of times set elements 1..n appear in the second column of m, etc. Reshape as the desired matrix, where each row represents the corresponding member of the set.
matrix(v, ncol=ncol(m))

table can count zero occurrences, you just need to create a factor that has the whole range of levels, e.g.
apply(MC, 2, function(x) table(factor(x, levels=1:13)))
This is not as efficient as #Patronus' solution though.

Find reciprocal row duplicates

(Please feel free to change the title to something more appropriate)
I would like extract all reciprocal pairs from a asymmetric square matrix.
Some dummy data to clarify:
m <- matrix(c(NA,0,1,0,0,-1,NA,1,-1,0,1,1,NA,-1,-1,-1,1,0,NA,0,-1,1,0,0,NA), ncol=5, nrow=5)
colnames(m) <- letters[seq(ncol(m))]
rownames(m) <- letters[seq(nrow(m))]
require(reshape2)
m.m <- melt(m) # get all pairs
m.m <- m.m[complete.cases(m.m),] # remove NAs
How would I now extract all "reciprocal duplicates" from m.m (or directly from m)?
This is what I mean with reciprocal duplicate:
Var1 Var2 value
b a 0
a b -1
And I would like to store each value combination, i.e. {1,1},{-1,-1},{1,0},{-1,0},{0,0} in a list with its Var combination {a,b},{a,c},{a,d},{a,e},{b,c},{b,d},{b,e},{c,d},{c,e},{d,e} pointing to it, something like
$`a,b`
[1] 0,-1
I haven't manage to solve this. Feel like it could be possible with merge() or inner_join. Also, I apologize for not providing the best example.
Any pointers would be highly appreciated.

Here's an approach based on the object m.m:
# extract the unique combinations
levs <- apply(m.m[-3], 1, function(x) paste(sort(x), collapse = ","))
# create a list of values for these combinations
split(m.m$value, levs)

Using the matrix representation, you can get vectors of each triangle of the matrix (which align as you wish) using:
m[upper.tri(m)]
t(m)[upper.tri(m)]
To name them:
nm <- matrix(paste("(",rep(rownames(m),times=nrow(m)), ",",rep(rownames(m),each=nrow(m)),")",sep=""), nrow=nrow(m))
nm[as.vector(upper.tri(m))]
Finally to convert to a list as you wish. First I put them in a new 2 x 10 matrix. Then I used lapply to create the list structure.
pairs<- cbind(m[upper.tri(m)], t(m)[upper.tri(m)] )
rownames(pairs) <- nm[as.vector(upper.tri(m))]
pairs
m.list <- lapply(seq_len(nrow(pairs)),function(i) pairs[i,])
names(m.list) <- rownames(pairs)
m.list

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: unshared elements across multiple vectors (opposite to intersect) - r

My approach would be to combine all those vector first. then count frequency with table function and lastly calculate the length temp = c(a,b,c) temp_table = table(temp) length(temp_table[temp_table == 1]) and use names if you want to show the unique element names(temp_table[temp_table == 1])

How about this lapply(1:3,function(i){ sets[[i]][!sets[[i]] %in% Reduce(union,sets[i != c(1,2,3)],init = NULL)] }) making a union of the vectors not used and checking which element of the other vector is not in the union

Related

Split lists by half of their lengths in R

I want to apply two functions one function on the block diagonal and the second function on the off-diagonal elements in the data frame

Count the number of occurring a specific ordered sequence in R

Faster method of counting specified values from rows in large matrix in R

Find reciprocal row duplicates

Categories

Resources