How to condense non-sequential integers? - r

I'm trying to condense non-sequential numbers to subset haplotype data. I could do it manually, but given that I've got hundreds to do, I'd rather not if there's an alternative
class(haplotype1[[1]])
#[1] "integer"
haplotype1[[1]]
#[1] 1 2 3 4 5 7 8 9 10 11
I want to get [1:5, 7:11], which seems simple, but I haven't found a solution exactly matching my problem
Thanks!

Using cumsum to create the sequential groups,
tapply(x, cumsum(c(TRUE, diff(x) != 1)), FUN = function(i)paste(i[1], i[length(i)], sep = ':'))
# 1 2
#"1:5" "7:11"

It's unclear what type of object you want to create. I would just store the start and end values.
x <- c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11)
starts <- x[!c(FALSE, diff(x) == 1L)]
#[1] 1 7
ends <- x[!c(diff(x) == 1L, FALSE)]
#[1] 5 11
paste(starts, ends, sep = ":")
#[1] "1:5" "7:11"

Maybe you want something like this ?
vec <- c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11)
split(vec, cumsum(c(1,diff(vec)>1)))
# $`1`
# [1] 1 2 3 4 5
#
# $`2`
# [1] 7 8 9 10 11

Related

For each value in a vector get the corresponding next smallest value

For each element in a vector, I want the corresponding next smaller value in the vector, without changing the original order of the elements.
For example, suppose the given vector is:
c(4, 5, 5, 10, 3, 7)
Then the result would be:
c(3, 4, 4, 7, 0, 5)
Note that since 3 does not have any smaller value, I want it to be replaced with 0.
Any help will be much appreciated. Thank you.
We may use
sapply(v1, function(x) sort(v1)[match(x, sort(v1))-1][1])
[1] 3 4 4 7 NA 5
Or use a vectorized option
v2 <- unique(v1)
v3 <- sort(v2)
v4 <- v3[-length(v3)]
i1 <- match(v1, v3) - 1
i1[i1 == 0] <- NA
v4[i1]
[1] 3 4 4 7 NA 5
data
v1 <- c(4, 5, 5, 10, 3, 7)
We can try the code below using outer + max.col
> m <- outer(v, u <- sort(unique(v)), `>`)
> replace(u[max.col(m, ties.method = "last")], rowSums(m) == 0, NA)
[1] 3 4 4 7 NA 5
Using findInterval:
sx = sort(x)
i = findInterval(x, sx, left.open = TRUE)
sx[replace(i, i == 0, NA)]
# [1] 3 4 4 7 NA 5

Is there a simple way to separate a vector of repeating values in R by the chunks of values and get indices?

I have a dataframe of transect data, such that for each transect, there are species codes and for each species, there is a count associated. I'm trying to calculate a proportion of transects that ID a particular species by separating the data frame into each transect. How can I take a vector of repeating chunks of numbers, separate it by chunks of same values, and get the indices?
Example:
x <- c(1, 2, 1, 2, 3, 1)
y <- c(3, 2, 3, 3, 2, 3)
Transects <- rep(x, y)
I want it to output chunks like these
c(1, 1, 1)
c(2, 2)
c(1, 1, 1)
c(2, 2, 2)
c(3, 3)
c(1, 1, 1)
or more importantly, the associated indices, which would give me
c(1, 2, 3)
c(4, 5)
c(6, 7, 8)
c(9, 10, 11)
c(12, 13)
c(14, 15, 16)
I don't even know what functions to try, because I don't know what indices to separate the vector at, nor can I separate by simple value because there are chunks of the same values and I don't want those mixed together since they're different transects. Any help is appreciated, I wouldn't even know how to go about building a function that could do this.
You can do:
split(Transects, with(rle(Transects), rep(seq_along(values), lengths)))
$`1`
[1] 1 1 1
$`2`
[1] 2 2
$`3`
[1] 1 1 1
$`4`
[1] 2 2 2
$`5`
[1] 3 3
$`6`
[1] 1 1 1
Or if interested in indices:
split(seq_along(Transects), with(rle(Transects), rep(seq_along(values), lengths)))
$`1`
[1] 1 2 3
$`2`
[1] 4 5
$`3`
[1] 6 7 8
$`4`
[1] 9 10 11
$`5`
[1] 12 13
$`6`
[1] 14 15 16
Alternatively, you can do:
split(Transects, cumsum(c(0, diff(Transects)) != 0))
You can use map2 function from purrr package:
purrr::map2(x, y, rep)

Consecutive Sum of a Vector

This is a question following a previous one. In that question, it is suggested to use rollapply to calculate sum of the 1st, 2nd, 3rd entry of a vector; then 2nd, 3rd, 4th; and so on.
My question is how calculate sum of the 1st, 2nd and 3rd; then the 4th, 5th and 6th. That is, rolling without overlapping. Can this be easily done, please?
Same idea. You just need to specify the by argument. Default is 1.
x <-c(1, 5, 4, 5, 7, 8, 9, 2, 1)
zoo::rollapply(x, 3, by = 3, sum)
#[1] 10 20 12
#or another Base R option
sapply(split(x, ceiling(seq_along(x)/3)), sum)
# 1 2 3
#10 20 12
Using tapply in base R:
set.seed(1)
vec <- sample(10, 20, replace = TRUE)
#[1] 3 4 6 10 3 9 10 7 7 1 3 2 7 4 8 5 8 10 4 8
unname(tapply(vec, (seq_along(vec)-1) %/% 3, sum))
# [1] 13 22 24 6 19 23 12
Alternatively,
colSums(matrix(vec[1:(ceiling(length(vec)/3)*3)], nrow = 3), na.rm = TRUE)
#[1] 13 22 24 6 19 23 12
vec[1:(ceiling(length(vec)/3)*3)] fills in the vector with NA if the length is not divisible by 3. Then, you simply ignore NAs in colSums.
Yet another one using cut and aggregate:
x <- ceiling(length(vec)/3)*3
df <- data.frame(vec=vec[1:x], col=cut(1:x, breaks = seq(0,x,3)))
aggregate(vec~col, df, sum, na.rm = TRUE)[[2]]
#[1] 13 22 24 6 19 23 12
We can use roll_sum from RcppRoll which would be very efficient
library(RcppRoll)
roll_sum(x, n=3)[c(TRUE, FALSE, FALSE)]
#[1] 10 20 12
data
x <-c(1, 5, 4, 5, 7, 8, 9, 2, 1)
you can define the window size, and do:
x <-c(1, 5, 4, 5, 7, 8, 9, 2, 1)
n <- 3
diff(c(0, cumsum(x)[slice.index(x, 1)%%n == 0]))
p.s. using the input from the answer by #Sotos

Exchange two elements of a vector in one call

I have a vector c(9,6,3,4,2,1,5,7,8), and I want to switch the elements at index 2 and at index 5 in the vector. However, I don't want to have to create a temporary variable and would like to make the switch in one call. How would I do that?
How about just x[c(i,j)] <- x[c(j,i)]? Similar to replace(...), but perhaps a bit simpler.
swtch <- function(x,i,j) {x[c(i,j)] <- x[c(j,i)]; x}
swtch(c(9,6,3,4,2,1,5,7,8) , 2,5)
# [1] 9 2 3 4 6 1 5 7 8
You could use replace().
x <- c(9, 6, 3, 4, 2, 1, 5, 7, 8)
replace(x, c(2, 5), x[c(5, 2)])
# [1] 9 2 3 4 6 1 5 7 8
And if you don't even want to assign x, you can use
replace(
c(9, 6, 3, 4, 2, 1, 5, 7, 8),
c(2, 5),
c(9, 6, 3, 4, 2, 1, 5, 7, 8)[c(5, 2)]
)
# [1] 9 2 3 4 6 1 5 7 8
but that's a bit silly. You will probably want x assigned to begin with.
If you actually want to do it without creating a temporary copy of the vector, you would need to write a short C function.
library(inline)
swap <- cfunction(c(i = "integer", j = "integer", vec="integer"),"
int *v = INTEGER(vec);
int ii = INTEGER(i)[0]-1, jj = INTEGER(j)[0]-1;
int tmp = v[ii];
v[ii] = v[jj];
v[jj] = tmp;
return R_NilValue;
")
vec <- as.integer(c(9,6,3,4,2,1,5,7,8))
swap(2L, 5L, vec)
vec
# [1] 9 2 3 4 6 1 5 7 8

how to get index of sorted array elements

Say I have an array in R : c(10, 7, 4, 3, 8, 2)
Upon sorting, this would be : c(2, 3, 4, 7, 8, 10)
What is the best way in R to return the indices for the sorted array elements from the original array. I'm looking for an output like :
6(index of 2), 4(index of 3), 3(index of 4), 2(index of 7), 5(index of 8), 1(index of 10)
The function you're looking for is order:
> x
[1] 10 7 4 3 8 2
> order(x)
[1] 6 4 3 2 5 1
sort has index.return argument, which by default is FALSE
x <- c(10,7,4,3,8,2)
sort(x, index.return=TRUE) #returns a list with `sorted values`
#and `$ix` as index.
#$x
#[1] 2 3 4 7 8 10
#$ix
#[1] 6 4 3 2 5 1
You can extract the index by
sort(x, index.return=TRUE)$ix
#[1] 6 4 3 2 5 1

Resources