seq and seq_along, best of both worlds? - r

If I want to number all elements in two vectors, vector 1 gets all odd bumbers and vector 2 gets all even numbers, I can do this assuming the vectors are of length 10.
seq(1, 10, by=2)
[1] 1 3 5 7 9
seq(2, 11, by=2)
[1] 2 4 6 8 10
but if my vector has only one element I will run into problems:
seq(2)
[1] 1 2
so I use:
seq_along(2)
[1] 1
BUT I cant use by= in seq_long(). How do i get the reliability of seq_along with the functionality of seq()?
This example might clear things.
Imagine I ahve two lists:
list1 <- list(4)
list2 <- list(4)
list1 must get even names along the element of the list.
list2 must get odd names along the element of the list.
I dont know how long the list elements will be.
seq_along(list1[[1]]) # this will know to only give one name but I cant make it even
seq(list2[[1]]) # this know to give 1 name
#and
seq(2, list1[[1]], by=2) # this gives me even but too nay names

Here's a function that adds a 'by' argument to seq_along:
seq_along_by = function(x, by=1L, from = 1L) (seq_along(x) - 1L) * by + from
and some test cases
> seq_along_by(integer(), 2L)
integer(0)
> seq_along_by(1, 2L)
[1] 1
> seq_along_by(1:4, 2L)
[1] 1 3 5 7
> seq_along_by(1:4, 2.2)
[1] 1.0 3.2 5.4 7.6
> seq_along_by(1:4, -2.2)
[1] 1.0 -1.2 -3.4 -5.6

one way i just found is:
y <- seq_along(1:20)
y[y %% 2 == 0 ]
[1] 2 4 6 8 10 12 14 16 18 20
y[ !y %% 2 == 0 ]
[1] 1 3 5 7 9 11 13 15 17 19
But this will only work when my vectors are even. Must be able to do better.

I'm not sure what you are trying to do, but if you want to split odd and even elements in a vector, you can do just that:
x <- 1:19
split(x,x%%2)
$`0`
[1] 2 4 6 8 10 12 14 16 18
$`1`
[1] 1 3 5 7 9 11 13 15 17 19
To extract the odd and even numbered elements, use lapply on this list using seq_along to enumerate the element numbers:
x <- rep(c("odd","even"),times=4)
lapply(split(seq_along(x),seq_along(x)%%2),function(y) "["(x,y))
$`0`
[1] "even" "even" "even" "even"
$`1`
[1] "odd" "odd" "odd" "odd"
This can of course be made into a function:
split_oe <- function(x) lapply(split(seq_along(x),seq_along(x)%%2),function(y) "["(x,y))
split_oe(1:10)
$`0`
[1] 2 4 6 8 10
$`1`
[1] 1 3 5 7 9
> split_oe(2)
$`1`
[1] 2

I'm adding another answer to address what may be your intent of the question rather than the question as you've stated it.
Let's assume you have a couple arrays, A1 and A2, with values, and you want to link an index to those values, so you can say index[n] and get a corresponding value from A1[n/2 + 1] if n is odd and A2[n/2] if n is even.
We would build a new vector, index, like so:
# Sample arrays
A1 <- sample(LETTERS, 5, rep=TRUE)
A2 <- sample(LETTERS, 5, rep=TRUE)
n_Max <- length(c(A1,A2))
index <- integer(n_Max)
index[seq(1,n_Max,by=2)] <- A1
index[seq(2,n_Max,by=2)] <- A2
Now, index[n] returns A1 values when n is odd, and returns A2 values when n is even. This breaks if length(A2) is not equal to or one less than length(A1).

If I understand correctly, what you really want is a to get the 'seq' function to return only odd or oven numbers 1..max or 2..max, respectively. You would write that like so:
seq(1, max, by=2) # Odd numbers
seq(2, max, by=2) # Even numbers
Where max is the top number in your series. The only time this will break is if max is less than 2.
Update 1: There seems to be a bit of discussion about what the OP is requesting. If we assume there are two existing vectors to be numbered, we can obtain the total number of vector items using max <- length(c(vector1, vector2)) to obtain the maximum number being used. Then, the indices would be assigned like so:
vector1 <- seq(1, max, by=2)
vector2 <- seq(2, max, by=2)
And this will work for any set EXCEPT when one vector does not have any elements at all.
Update 2: There is one final approach, which you can take if your vectors do not represent all values between 1 and max. This is how it would work:
vector1 <- seq(1, length(vector1) * 2, by=2)
vector2 <- seq(1, length(vector2) * 2, by=2)
This independently assigns the values of vector1 and vector2 according to their own lengths.

Related

Subtract a single vector element from List of vectors

I have a list r containing n vectors of different length.
And a separate vector a also of length n
x <- 1:100
r <- slider(x,.size=5)
a <- 1:length(r)
From every element in each vector of the list r I want to subtract an element of a.
So the first element of a shall be subtracted from every element of the first vector of r.
Something like this, but on a larger scale and keeping the vectors in the list r
r[1]-a[1]
r[2]-a[2]
r[3]-a[3]
This gives me Error in r[1] - n[1] : non-numeric argument to binary operator
Disclaimer: The vectors of r in the example do NOT have different lengths. I do not know how to do this when generating the example.
You can use Map :
Map(`-`, r, a)
Same result from #RonakShah can be obtained with:
mapply(`-`,r,a)
Output:
[[1]]
[1] 0 1 2 3 4 5 6 7
[[2]]
[1] -1 0 1 2 3 4 5 6 7
[[3]]
[1] -2 -1 0 1 2 3 4 5 6 7
We could use a for loop
out <- vector('list', length(r))
for(i in seq_along(r)) {
out[[i]] <- r[[i]] - a[i]
}

Matching all elements in nested lists (irrespective of position) and returning matches with indexes

I have two lists x and y created from
x1 = list(c(1,2,3,4))
x2 = list(c(seq(1, 10, by = 2)))
x<- list(x1,x2)
x
[[1]]
[[1]][[1]]
[1] 1 2 3 4
[[2]]
[[2]][[1]]
[1] 1 3 5 7 9
and y,
y1 = list(c(5, 6, 7, 8))
y2 = list(c(9, 7, 5, 3, 1))
y <- list(y1, y2)
y
[[1]]
[[1]][[1]]
[1] 5 6 7 8
[[2]]
[[2]][[1]]
[1] 9 7 5 3 1
So basically, I want to get matches of x into y so I should just get '1 3 5 7 9' actually being a match. I am also needing indexes.
I have tried, I want to match the values irrespective of the position each x[[ ]] with each y[[ ]].
Matches <- x[x %in% y]
IDX <- which(x %in% y)
This does not work....
I would like something that can return matches of the same elements irrespective of positions per each list. This would be a rough idea of what I need...
matches
[1] False
[1] 1 3 5 7 9
Thanks in advance, appreciate all the help.
Here is what you can do:
So, you have made list of lists, which is quite confusing to work with, you could have totally avoided using c, so you can have, x <- c(x1, x2) to get list of vectors, which is much more easy to work with.
But since you provided with list of lists, I will work with that.
Now back to solving your question:
flags <- lapply(Map(`%in%`, unlist(x, recursive = F), unlist(y, recursive=F)),all)
k <- lapply(1:length(x), function(i)ifelse(unlist(flags)[i] == TRUE,
list(unlist(x, recursive=F)[[i]]),
unlist(flags[i])))
unlist(k, recursive = F) #Final Output
Logic:
Mapping each items in list using %in% to see if an element
contains item of other elements, if all the elements are present it
will return a TRUE or a FALSE, In your case it would return FALSE and
TRUE respectively.
Here we are iterating to the lists of x by using flag as a filter
criteria you can make another list k, when value of flag created in
earlier step is TRUE it will copy back the contents of x, however
when FALSE it will remain as FALSE
Final step to your answer, unlist k again to convert into a list
of vectors using unlist with recursive = F.
Output:
# [[1]]
# [1] FALSE
# [[2]]
# [1] 1 3 5 7 9

find indices of values within tolerance range in R

say I have vector x
x <- c(1, 1, 1.1, 2, 1, 2.1, 2.6)
tol <- 0.4
how do I get the indices of the groups of elements that are 'unique' within the tolerance range (tol) as in the list below. I don't know how many of these groups there are beforehand.
[[1]]
[1] 1 2 3 5
[[2]]
[1] 4 6
[[3]]
[1] 7
thanks
Not 100% reliable, since it uses unique on lists, but you can try:
unique(apply(outer(x,x,function(a,b) abs(a-b)<tol),1,which))
#[[1]]
#[1] 1 2 3 5
#
#[[2]]
#[1] 4 6
#
#[[3]]
#[1] 7
The point #Roland raised in the comments showed that there is some ambiguity in your requirements. For instance if x<-c(1, 1.3, 1.6), my line gives three groups: 1-2, 2-3 and 1-2-3. This because, from the 1 point of view, it is similar only to 1.3, but from 1.3 point of view, it is similar to both 1 and 1.6.
An alternative using nn2 from RANN to find nearest neighbors within radius for clustering:
library(RANN)
x <- c(1, 1, 1.1, 2, 1, 2.1, 2.6)
tol=0.4
nn <- nn2(x,x,k=length(x),searchtype="radius",radius=tol)
m <- unique(apply(nn$nn.idx,1,sort), MARGIN=2)
sapply(seq_len(ncol(m)), function(i) m[which(m[,i] > 0),i])
##[[1]]
##[1] 1 2 3 5
##
##[[2]]
##[1] 4 6
##
##[[3]]
##[1] 7
x <- c(1, 1.3, 1.6)
nn <- nn2(x,x,k=length(x),searchtype="radius",radius=tol)
m <- unique(apply(nn$nn.idx,1,sort), MARGIN=2)
sapply(seq_len(ncol(m)), function(i) m[which(m[,i] > 0),i])
##[[1]]
##[1] 1 2
##
##[[2]]
##[1] 1 2 3
##
##[[3]]
##[1] 2 3
Notes:
The call to nn2 finds all nearest neighbors for each element of x with respect to all elements of x within a radius equalling the tol. The result nn$nn.idx is a matrix whose rows contain the indices that are nearest neighbors for each element in x. The matrix is dense and filled with zeroes as needed.
Clustering is performed by sorting each row so that unique rows can be extracted. The output m is a matrix where each column contains the indices in a cluster. Again, this matrix is dense and filled with zeroes as needed.
The resulting list is extracted by subsetting each column to remove the zero entries.
This is likely more efficient for large x because nn2 uses a KD-Tree, but it suffers from the same issue for elements that overlap (with respect to the tolerance) as pointed out by nicola.
Maybe it's a hammer to kill a mosquito, but i thought of univariate density clustering: the dbscan library enables you to do exactly that:
library(dbscan)
groups <- dbscan(as.matrix(x), eps=tol, minPts=1)$cluster
#### [1] 1 1 1 2 1 2 3
You don't neek to know in advance the number of groups.
It gives you the cluster number in output but you can if you prefer, take the groups means and round them to the closest integer. Once you've got this, you generate the list for instance like this:
split(seq_along(x), groups)
#### $`1`
#### [1] 1 2 3 5
#### ...
Edit: Behaviour with overlapping:
This algo attributes the same group to all elements that are within the range of tolerance of one other (works by proximity). So you might end up with fewer groups than expected if there is overlapping.
Here is another attempt with cut function from base R. We first try to create the range vector named sq and then go through x elements that falls within any specific range.
sq <- seq(min(x)-tol,max(x)+tol*2,tol*2)
# [1] 0.6 1.4 2.2 3.0
sapply(1:(length(sq)-1), function(i) which(!is.na(cut(x, breaks =c(sq[i], sq[i+1])))))
# [[1]]
# [1] 1 2 3 5
# [[2]]
# [1] 4 6
# [[3]]
# [1] 7
It does not produce any duplicate. (no need to use unique as it is the case for #nicola's answer)
It works as follows, in sapply, first we search for elements within the range [0.6, 1.4], then for [1.4, 2.2] and finally [2.2, 3.0].

How do you determine which element in a list contains a value matching some other value?

If I have the following list:
a <- list(1:3, 4:5, 6:9)
a
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5
[[3]]
[1] 6 7 8 9
I want to determine which element of the list a specific value is in. For example, I might want to find which element the number 5 falls under. In this case it would be [[2]].
My goal is to have something like
match(5,a)
return the value 2.
However, this code only checks whether the selected number exists as a complete element of a given element
match(5,a)
[1] NA
Further, unlist only tells me where in the entire length of all values my number of interest falls:
match(5,unlist(a))
[1] 5
Thoughts?
You can use grep function
grep(5, a)
# [1] 2
grep(9, a)
# [1] 3
Updated Answer
After reading #nicola 's comment came to know that grep command works only for the numbers that belong to start and end of the list and not for the numbers that are in between.
You can try the below mentioned code for the complete solution,
a <- list(1:3, 4:5, 6:9)
df <- data.frame(unlist(a))
df$group <- 0
k <- 1
i<-0
for(i in 1:length(a))
{
x[i] <- length(unlist(a[i]))
for(j in 1:x[i])
{
df$group[k] <- i
k <- k+1
}
}
colnames(df)[1] <- "num"
df[df$num == 5, ]$group
# [1] 2
> df[df$num == 9, ]$group
#[1] 3
df[df$num == 8, ]$group
# [1] 3

"replace" function examples

I don't find the help page for the replace function from the base package to be very helpful. Worst part, it has no examples which could help understand how it works.
Could you please explain how to use it? An example or two would be great.
If you look at the function (by typing it's name at the console) you will see that it is just a simple functionalized version of the [<- function which is described at ?"[". [ is a rather basic function to R so you would be well-advised to look at that page for further details. Especially important is learning that the index argument (the second argument in replace can be logical, numeric or character classed values. Recycling will occur when there are differing lengths of the second and third arguments:
You should "read" the function call as" "within the first argument, use the second argument as an index for placing the values of the third argument into the first":
> replace( 1:20, 10:15, 1:2)
[1] 1 2 3 4 5 6 7 8 9 1 2 1 2 1 2 16 17 18 19 20
Character indexing for a named vector:
> replace(c(a=1, b=2, c=3, d=4), "b", 10)
a b c d
1 10 3 4
Logical indexing:
> replace(x <- c(a=1, b=2, c=3, d=4), x>2, 10)
a b c d
1 2 10 10
You can also use logical tests
x <- data.frame(a = c(0,1,2,NA), b = c(0,NA,1,2), c = c(NA, 0, 1, 2))
x
x$a <- replace(x$a, is.na(x$a), 0)
x
x$b <- replace(x$b, x$b==2, 333)
Here's two simple examples
> x <- letters[1:4]
> replace(x, 3, 'Z') #replacing 'c' by 'Z'
[1] "a" "b" "Z" "d"
>
> y <- 1:10
> replace(y, c(4,5), c(20,30)) # replacing 4th and 5th elements by 20 and 30
[1] 1 2 3 20 30 6 7 8 9 10
Be aware that the third parameter (value) in the examples given above: the value is a constant (e.g. 'Z' or c(20,30)).
Defining the third parameter using values from the data frame itself can lead to confusion.
E.g. with a simple data frame such as this (using dplyr::data_frame):
tmp <- data_frame(a=1:10, b=sample(LETTERS[24:26], 10, replace=T))
This will create somthing like this:
a b
(int) (chr)
1 1 X
2 2 Y
3 3 Y
4 4 X
5 5 Z
..etc
Now suppose you want wanted to do, was to multiply the values in column 'a' by 2, but only where column 'b' is "X". My immediate thought would be something like this:
with(tmp, replace(a, b=="X", a*2))
That will not provide the desired outcome, however. The a*2 will defined as a fixed vector rather than a reference to the 'a' column. The vector 'a*2' will thus be
[1] 2 4 6 8 10 12 14 16 18 20
at the start of the 'replace' operation. Thus, the first row where 'b' equals "X", the value in 'a' will be placed by 2. The second time, it will be replaced by 4, etc ... it will not be replaced by two-times-the-value-of-a in that particular row.
Here's an example where I found the replace( ) function helpful for giving me insight. The problem required a long integer vector be changed into a character vector and with its integers replaced by given character values.
## figuring out replace( )
(test <- c(rep(1,3),rep(2,2),rep(3,1)))
which looks like
[1] 1 1 1 2 2 3
and I want to replace every 1 with an A and 2 with a B and 3 with a C
letts <- c("A","B","C")
so in my own secret little "dirty-verse" I used a loop
for(i in 1:3)
{test <- replace(test,test==i,letts[i])}
which did what I wanted
test
[1] "A" "A" "A" "B" "B" "C"
In the first sentence I purposefully left out that the real objective was to make the big vector of integers a factor vector and assign the integer values (levels) some names (labels).
So another way of doing the replace( ) application here would be
(test <- factor(test,labels=letts))
[1] A A A B B C
Levels: A B C

Resources