R - Repetitions of an array in other array - r

From a dataframe I get a new array, sliced from a dataframe.
I want to get the amount of times a certain repetition appears on it.
For example
main <- c(A,B,C,A,B,V,A,B,C,D,E)
p <- c(A,B,C)
q <- c(A,B)
someFunction(main,p)
2
someFunction(main,q)
3
I've been messing around with rle but it counts every subrepetion also, undersirable.
Is there a quick solution I'm missing?

You can use one of the regular expression tools in R since this is really a pattern matching exercise, specifically gregexpr for this question. The p and q vectors represent the search pattern and main is where we want to search for those patterns. From the help page for gregexpr:
gregexpr returns a list of the same length as text each element of which is of
the same form as the return value for regexpr, except that the starting positions
of every (disjoint) match are given.
So we can take the length of the first list returned by gregexpr which gives the starting positions of the matches. We'll first collapse the vectors and then do the searching:
someFunction <- function(haystack, needle) {
haystack <- paste(haystack, collapse = "")
needle <- paste(needle, collapse = "")
out <- gregexpr(needle, haystack)
out.length <- length(out[[1]])
return(out.length)
}
> someFunction(main, p)
[1] 2
> someFunction(main, q)
[1] 3
Note - you also need to throw "" around your vector main, p, and q vectors unless you have variables A, B, C, et al defined.
main <- c("A","B","C","A","B","V","A","B","C","D","E")
p <- c("A","B","C")
q <- c("A","B")

I'm not sure if this is the best way, but you can simply do that work by:
f <- function(a,b)
if (length(a) > length(b)) 0
else all(head(b, length(a)) == a) + Recall(a, tail(b, -1))
Someone may or may not find a built-in function.

Using sapply:
find_x_in_y <- function(x, y){
sum(sapply(
seq_len(length(y)-length(x)),
function(i)as.numeric(all(y[i:(i+length(x)-1)]==x))
))
}
find_x_in_y(c("A", "B", "C"), main)
[1] 2
find_x_in_y(c("A", "B"), main)
[1] 3

Here's a way to do it using embed(v,n), which returns a matrix of all n-length sub-sequences of vector v:
find_x_in_y <- function(x, y)
sum( apply( embed( y, length(x)), 1,
identical, rev(x)))
> find_x_in_y(p, main)
[1] 2
> find_x_in_y(q, main)
[1] 3

Related

Sum specified elements of sub-lists

The idea:
Suppose I have a lists with two vectors. Then, I would like to take the first element of the first vector and divide it by the sum of it and the first element of the second vector of the list. Then do that for all elements of the first list. After that, do the same thing but with the second vector of the list.
The code of the lists:
tau1 <- list(c(0.43742669 , 0.64024429, 0.39660069, 0.11849773), c(0.5060767, 0.4857891, 0.4553237, 0.5045598))
My worked code for only two vectors.
Tau1 <- vector('list', 2)
for(i in seq_along(tau1)){
for(j in 1:length(tau1[[1]])){
Tau1[[i]][[j]] <- tau1[[i]][[j]] / Reduce('+', tau1[[1]][[j]], tau1[[2]][[j]])
}
}
Example:
First element of the list:
TT1 <- tau1[[1]][[1]]/(tau1[[1]][[1]]+tau1[[2]][[1]])
[1] 0.4636196
Then for the second element of the list:
TT2 <- tau1[[2]][[1]]/(tau1[[1]][[1]]+tau1[[2]][[1]])
[1] 0.5363804
The problem:
I would like to do that for arbitrary number of vectors. For example,
Reduce('+', tau1[[1]][[j]], tau1[[2]][[j]], tau1[[3]][[j]], tau1[[4]][[j]])
How can I do that automatically? any help, please?
If we are using Reduce, then we need to remove the [[i]] to get the sum of corresponding list elements to get a vector. Then subset by the 'j'th index to divide the 'j'th element of 'tau1[[i]]'
Tau1 <- vector('list', 2)
for(i in seq_along(tau1)){
for(j in seq_along(tau1[[1]])){
Tau1[[i]][[j]] <- tau1[[i]][[j]] /Reduce(`+`, tau1)[j]
}
}
Regarding the error mentioned in the comments, it can happen if there are non-numeric elements. The OP mentioned about NULL elements, but NULL occurs as a single element in a list. So, there is a possibility of character "NULL". For e.g.
tau1 <- list(c(0.43742669 , 0.64024429, "NULL", 0.11849773),
c(0.5060767, 0.4857891, 0.4553237, 0.5045598))
Upon running the code above
Error in f(init, x[[i]]) : non-numeric argument to binary operator
To work with any number of list elements use apply family of functions with [[.
To extract j element in each sub-list use sapply(tau1, "[[", j). To sum those elements use: sum(sapply(tau1, "[[", j))
PS.: Instead of for(j in 1:length(tau1[[1]]){} you should have for(j in 1:length(tau1[[i]]){} - just in case.
Here is a base R one-liner:
lapply(tau1, "/", do.call(mapply, c(FUN = sum, tau1)))
# [[1]]
# [1] 0.4636196 0.5685838 0.4655351 0.1901875
#
# [[2]]
# [1] 0.5363804 0.4314162 0.5344649 0.8098125
Or alternatively (from #lmo's comment):
lapply(tau1, "/", Reduce("+", tau1))
Here is a purrr equivalent:
library(purrr)
tau1 %>% map(`/`, pmap_dbl(., sum))
# [[1]]
# [1] 0.4636196 0.5685838 0.4655351 0.1901875
#
# [[2]]
# [1] 0.5363804 0.4314162 0.5344649 0.8098125

How to find out the best combination of a given vector whose sum is closest to a given number

My question is quite similar to this one: Find a subset from a set of integer whose sum is closest to a value
It discussed the algorithm only, but I want to solve it with R. I'm quite new to R and tried to work out a solution, but I wonder whether there is a more efficient way.
Here is my example:
# Define a vector, to findout a subset whose sum is closest to the reference number 20.
A <- c(2,5,6,3,7)
# display all the possible combinations
y1 <- combn(A,1)
y2 <- combn(A,2)
y3 <- combn(A,3)
y4 <- combn(A,4)
y5 <- combn(A,5)
Y <- list(y1,y2,y3,y4,y5)
# calculate the distance to the reference number of each combination
s1 <- abs(apply(y1,2,sum)-20)
s2 <- abs(apply(y2,2,sum)-20)
s3 <- abs(apply(y3,2,sum)-20)
s4 <- abs(apply(y4,2,sum)-20)
s5 <- abs(apply(y5,2,sum)-20)
S <- list(s1,s2,s3,s4,s5)
# find the minimum difference
M <- sapply(S,FUN=function(x) list(which.min(x),min(x)))
Mm <- which.min(as.numeric(M[2,]))
# return the right combination
data.frame(Y[Mm])[as.numeric(M[,Mm[1]])]
so the answer is 2,5,6,7.
How can I refine this program? Especially the five combn()s and five apply()s, is there a way that can work them at once? I hope when A has more items in it, I can use length(A) to cover it.
Here is another way to do it,
l1 <- sapply(seq_along(A), function(i) combn(A, i))
l2 <- sapply(l1, function(i) abs(colSums(i) - 20))
Filter(length, Map(function(x, y)x[,y], l1, sapply(l2, function(i) i == Reduce(min, l2))))
#[[1]]
#[1] 2 5 6 7
The last line uses Map to index l1 based on a logical list created by finding the minimum value from list l2.
combiter library has isubsetv iterator, which goes through all subset of a vector. Combined with foreach simplifies the code.
library(combiter)
library(foreach)
A <- c(2,5,6,3,7)
res <- foreach(x = isubsetv(A), .combine = c) %do% sum(x)
absdif <- abs(res-20)
ind <- which(absdif==min(absdif))
as.list(isubsetv(A))[ind]

How to insert elements into multiple positions in a vector without iteration

Given a vector u of elements and a vector i of indices into vector x, how can we insert the elements of u into x after the elements corresponding to the indices in i, without iteration?
For example
x <- c('a','b','c','d','e')
u <- c('X','X')
i <- c(2,3)
# now we want to get c('a','b','X','c','X','d','e')
I want to do this in one step (i.e. avoid loops) because each step requires the creation of a new vector, and in practice these are long vectors.
I'm hoping for some index magic.
I think this should work as long as i does not contain duplicate indices.
idx <- sort(c(seq_along(x), i))
y <- x[idx]
y[duplicated(idx)] <- u
y
#[1] "a" "b" "X" "c" "X" "d" "e"
Edit
As #MartinMorgan suggested in the comments, a much better way of doing this is
c(x, u)[order(c(seq_along(x), i))].
In the mean time, I've come up with something that does the job (provided indices in i are unique):
xn <- rep(NA,length(x))
xn[i] <- u
y <- c(rbind(x,xn))
y <- y[!is.na(y)]
Here is my function
# vec->source vector, val->values to insert, at->positions to insert
func_insert_vector_at <- function(vec, val, at){
out=numeric(length(vec)+length(val))
out[at]=NA
out[!is.na(out)]=vec
out[at]=val
return(out)
}

Is it possible to modify list elements?

I have a list of records:
z <- list(list(a=1),list(a=4),list(a=2))
and I try to add fields to each of them.
Alas, neither
lapply(z,function(l) l$b <- 1+l$a)
nor
for(l in z) l$b <- 1+l$a
modifies z.
In this simple case I can, of course, do
z <- lapply(z,function(l) c(list(b= 1+l$a),l))
but this quickly gets out of hand when the lists have more nesting:
z <- list(list(a=list(b=1)),list(a=list(b=4)),list(a=list(b=2)))
How do I turn it into
list(list(a=list(b=1,c=2)),list(a=list(b=4,c=5)),list(a=list(b=2,c=3)))
without repeating the definition of the whole structure?
Each element of z has many fields, not just a; and z[[10]]$a has many subfields, not just b.
Your first code example doesn't modify the list because you need to return the list in your call to lapply:
z <- list(list(a=1),list(a=4),list(a=2))
expected <- list(list(a=1, b=2), list(a=4, b=5), list(a=2, b=3))
outcome <- lapply(z,function(l) {l$b <- 1+l$a ; l})
all.equal(expected, outcome)
# [1] TRUE
In the doubly nested example, you could use lapply within lapply, again making sure to return the list in the inner lapply:
z <- list(list(a=list(b=1)),list(a=list(b=4)),list(a=list(b=2)))
expected <- list(list(a=list(b=1, c=2)), list(a=list(b=4, c=5)), list(a=list(b=2, c=3)))
obtained <- lapply(z, function(l1) { lapply(l1, function(l2) {l2$c = l2$b+1 ; l2 } )})
all.equal(expected, obtained)
# [1] TRUE
Another, somewhat convoluted, option:
z <- list(list(a=1),list(a=4),list(a=2))
res <- list(list(a=list(b=1,c=2)),list(a=list(b=4,c=5)),list(a=list(b=2,c=3)))
res1 <- rapply(z,function(x) list(b = x,c = x+1),how = "replace")
> all.equal(res,res1)
[1] TRUE
I only say convoluted because rapply can be tricky to use at times (for me at least).

compare one list item against the rest in R

If you have a list of files, and you want to compare 1 against a set of the others, how do you do it?
my.test <- list[1]
my.reference.set <- list[-1]
This works of course, but I want to have this in a loop, with my.test varying each time (so that each file in the list is my.test for one iteration i.e. I have a list of 250 files, and I want to do this for every subset of 12 files within it.
> num <- (1:2)
> sdasd<- c("asds", "ksad", "nasd", "ksasd", "nadsd", "kasdih")
> splitlist<- split(sdasd, num)
> splitlist
$`1`
[1] "asds" "nasd" "nadsd"
$`2`
[1] "ksad" "ksasd" "kasdih"
> for (i in splitlist) {my.test <- splitlist[i] # "asds"
+ my.reference.set <- splitlist[-i] # "nasd" and "nadsd"
+ combined <- data.frame (my.test, my.reference.set)
+ combined}
Error in -i : invalid argument to unary operator
>
then i want next iteration to be,
my.test <- splitlist[i] #my.test to be "nasd"
my.reference.set <- splitlist[-i] # "asds" and "nadsd"
}
and finally for splitlist[1],
my.test <- splitlist[i] # "nadsd"
my.reference.set <- splitlist[-i] # "asds" and "ksad"
}
Then the same for splitlist[2]
Does this do what you want? The key point here is to loop over the indices of the list, rather than the names, because x[-n] indexing only works when n is a natural number (with some obscure exceptions). Also, I wasn't sure if you wanted the results as a data frame or a list -- the latter allows the components to be different lengths.
num <- 1:2
sdasd <- c("asds", "ksad", "nasd", "ksasd", "nadsd", "kasdih")
splitlist<- split(sdasd, num)
L <- vector("list",length(splitlist))
for (i in seq_along(splitlist)) {
my.test <- splitlist[[i]] # "asds"
my.reference.set <- splitlist[-i] # "nasd" and "nadsd"
L[[i]] <- list(test=my.test, ref.set=my.reference.set)
}
edit: I'm still a little confused by your example above, but I think this is what you want:
refs <- lapply(splitlist,
function(S) {
lapply(seq_along(S),
function(i) {
list(test=S[i], ref.set=S[-i])
})
})
refs is a nested list; the top level has length 2 (the length of splitlist), each of the next levels has length 3 (the lengths of the elements of splitslist), and each of the bottom levels has length 2 (test and reference set).

Resources