First occurrence of each value in a vector depending on a condition - r

From a vector:
v <- c(2,2,2,2,5,7,7,5,5,7,3,3,3)
and according to the condition v[i] != v[i+1], how can I obtain:
[1] 2 5 7 5 7 3

The rle function will do this. rle stands for run length encoding.
v <- c(2,2,2,2,5,7,7,5,5,7,3,3,3)
rle(v)$values
## [1] 2 5 7 5 7 3

This can be also done using diff
v[c(TRUE,diff(v)!=0)]
#[1] 2 5 7 5 7 3
Or using rleid from library(data.table)
library(data.table)
setDT(list(v))[,V1[1L] ,rleid(V1)]$V1
#[1] 2 5 7 5 7 3

Related

Vector recycling concept in R

I am trying to understand the working of vector recycling in R. I have 2 vectors
c(2,4,6)
and
c(1,2)
And I want to use the rep() to produce an output as follows:
[1] 2 4 6 4 8 12
based on what I understand from ?rep() is that there are times and each parameters which do the operations which I tried.
> rep(c(2,4,6), times=2)
[1] 2 4 6 2 4 6
But I also see the first vector is multiplied by the first element of the second vector and then to the second element of the second vector. Not sure how to proceed with it.
You can use:
rep(c(2,4,6), 2) * rep(c(1,2), each=3)
#[1] 2 4 6 4 8 12
or with auto recycling:
c(2,4,6) * rep(c(1,2), each=3)
#[1] 2 4 6 4 8 12
Alternative outer could be used:
c(outer(c(2,4,6), c(1,2)))
#[1] 2 4 6 4 8 12
Also crossprod could be used:
c(crossprod(t(c(2,4,6)), c(1,2)))
#[1] 2 4 6 4 8 12
Or %*%:
c(c(2,4,6) %*% t(c(1,2)))
#[1] 2 4 6 4 8 12

Replicate certain values in vector determined by other vector

I have a vector of values (say 1:10), and want to repeat certain values in it 2 or more times, determined by another vector (say c(3,4,6,8)). In this example, the result would be c(1,2,3,3,4,4,5,6,6,7,8,8,9,10) when repeating 2 times.
This should work for an arbitrary length range vector (like 200:600), with a second vector which is contained by the first. Is there a handy way to achieve this?
Akrun's is a more compact method, but this also will work
# get rep vector
reps <- rep(1L, 10L)
reps[c(3,4,6,8)] <- 2L
rep(1:10, reps)
[1] 1 2 3 3 4 4 5 6 6 7 8 8 9 10
The insight here is that rep will take an integer vector in the second argument the same length as the first argument that indicates the number of repetitions for each element of the first argument.
Note that this solution relies on the assumption that c(3,4,6,8) is the index or position of the elements that are to be repeated. Under this scenario, then d-b's comment has a one-liner
rep(x, (seq_along(x) %in% c(3,4,6,8)) + 1)
If instead, c(3,4,6,8) indicates the values that are to be repeated, then docendo-discimus's super-compact code,
rep(x, (x %in% c(3,4,6,8)) * (n-1) +1)
where n may be adjusted to change the number of repetitions. If you need to call this a couple times, this could be rolled up into a function like
myReps <- function(x, y, n) rep(x, (x %in% y) * (n-1) +1)
and called as
myReps(1:10, c(3,4,6,8), 2)
in the current scenario.
We can try
i1 <- v1 %in% v2
sort(c(v1[!i1], rep(v1[i1], each = 2)))
#[1] 1 2 3 3 4 4 5 6 6 7 8 8 9 10
Update
For the arbitrary vector,
f1 <- function(vec1, vec2, n){
i1 <- vec1 %in% vec2
vec3 <- seq_along(vec1)
c(vec1[!i1], rep(vec1[i1], each = n))[order(c(vec3[!i1],
rep(vec3[i1], each=n)))]
}
set.seed(24)
v1N <- sample(10)
v2 <- c(3,4,6,8)
v1N
#[1] 3 10 6 4 7 5 2 9 8 1
f1(v1N, v2, 2)
#[1] 3 3 10 6 6 4 4 7 5 2 9 8 8 1
f1(v1N, v2, 3)
#[1] 3 3 3 10 6 6 6 4 4 4 7 5 2 9 8 8 8 1
Here's another approach using sapply
#DATA
x = 1:10
r = c(3,4,6,8)
n = 2 #Two repetitions of selected values
#Assuming 'r' is the index of values in x to be repeated
unlist(sapply(seq_along(x), function(i) if(i %in% r){rep(x[i], n)}else{rep(x[i],1)}))
#[1] 1 2 3 3 4 4 5 6 6 7 8 8 9 10
#Assuming 'r' is the values in 'x' to be repeated
unlist(sapply(x, function(i) if(i %in% r){rep(i, n)}else{rep(i, 1)}))
#[1] 1 2 3 3 4 4 5 6 6 7 8 8 9 10
Haven't tested these thoroughly but could be possible alternatives. Note that the order of the output will be considerably different with this approach.
sort(c(x, rep(x[x %in% r], n-1))) #assuming 'r' is values
#[1] 1 2 3 3 4 4 5 6 6 7 8 8 9 10
sort(c(x, rep(x[r], n-1))) #assuming 'r' is index
#[1] 1 2 3 3 4 4 5 6 6 7 8 8 9 10
I suggest this solution just to emphasize the cool usage of append function in base R:
ff <- function(vec, v, n) {
for(i in seq_along(v)) vec <- append(vec, rep(v[i], n-1), after = which(vec==v[i]))
vec
}
Examples:
set.seed(1)
ff(vec = sample(10), v = c(3,4,6,8), n = 2)
#[1] 3 3 4 4 5 7 2 8 8 9 6 6 10 1
ff(vec = sample(10), v = c(2,5,9), n = 4)
#[1] 3 2 2 2 2 6 10 5 5 5 5 7 8 4 1 9 9 9 9

Generate sequence between each element of 2 vectors

I have a for loop that generate each time 2 vectors of the same length (length can vary for each iteration) such as:
>aa
[1] 3 5
>bb
[1] 4 8
I want to create a sequence using each element of these vectors to obtain that:
>zz
[1] 3 4 5 6 7 8
Is there a function in R to create that?
We can use Mapto get the sequence of corresponding elements of 'aa' , 'bb'. The output is a list, so we unlist to get a vector.
unlist(Map(`:`, aa, bb))
#[1] 3 4 5 6 7 8
data
aa <- c(3,5)
bb <- c(4, 8)
One can obtain a sequence by using the colon operator : that separates the beginning of a sequence from its end. We can define such sequences for each vector, aa and bb, and concatenate the results with c() into a single series of numbers.
To avoid double entries in overlapping ranges we can use the unique() function:
zz <- unique(c(aa[1]:aa[length(aa)],bb[1]:bb[length(bb)]))
#> zz
#[1] 3 4 5 6 7 8
with
aa <- c(3,5)
bb <- c(4,8)
Depending on your desired output, here are a few more alternatives:
> do.call("seq",as.list(range(aa,bb)))
[1] 3 4 5 6 7 8
> Reduce(seq,range(aa,bb)) #all credit due to #BrodieG
[1] 3 4 5 6 7 8
> min(aa,bb):max(aa,bb)
[1] 3 4 5 6 7 8

cumsum the opposite of diff in r

I have a question and I'm not sure if I'm being totally stupid here or if this is a genuine problem, or if I've misunderstood what these functions do.
Is the opposite of diff the same as cumsum? I thought it was. However, using this example:
dd <- c(17.32571,17.02498,16.71613,16.40615,
16.10242,15.78516,15.47813,15.19073,
14.95551,14.77397)
par(mfrow = c(1,2))
plot(dd)
plot(cumsum(diff(dd)))
> dd
[1] 17.32571 17.02498 16.71613 16.40615 16.10242 15.78516 15.47813 15.19073 14.95551
[10] 14.77397
> cumsum(diff(dd))
[1] -0.30073 -0.60958 -0.91956 -1.22329 -1.54055 -1.84758 -2.13498 -2.37020 -2.55174
These aren't the same. Where have I gone wrong?
AHHH! Fridays.
Obviously
The functions are quite different: diff(x) returns a vector of length (length(x)-1) which contains the difference between one element and the next in a vector x, while cumsum(x) returns a vector of length equal to the length of x containing the sum of the elements in x
Example:
x <- c(1:10)
#[1] 1 2 3 4 5 6 7 8 9 10
> diff(x)
#[1] 1 1 1 1 1 1 1 1 1
v <- cumsum(x)
> v
#[1] 1 3 6 10 15 21 28 36 45 55
The function cumsum() is the cumulative sum and therefore the entries of the vector v[i] that it returns are a result of all elements in x between x[1] and x[i]. In contrast, diff(x) only takes the difference between one element x[i] and the next, x[i+1].
The combination of cumsum and diff leads to different results, depending on the order in which the functions are executed:
> cumsum(diff(x))
# 1 2 3 4 5 6 7 8 9
Here the result is the cumulative sum of a sequence of nine "1". Note that if this result is compared with the original vector x, the last entry 10 is missing.
On the other hand, by calculating
> diff(cumsum(x))
# 2 3 4 5 6 7 8 9 10
one obtains a vector that is again similar to the original vector x, but now the first entry 1 is missing.
In none of the cases the original vector is restored, therefore it cannot be stated that cumsum() is the opposite or inverse function of diff()
You forgot to account for the impact of the first element
dd == c(dd[[1]], dd[[1]] + cumsum(diff(dd)))
#RHertel answered it well, stating that diff() returns a vector with length(x)-1.
Therefore, another simple workaround would be to add 0 to the beginning of the original vector so that diff() computes the difference between x[1] and 0.
> x <- 5:10
> x
#[1] 5 6 7 8 9 10
> diff(x)
#[1] 1 1 1 1 1
> diff(c(0,x))
#[1] 5 1 1 1 1 1
This way it is possible to use diff() with c() as a representation of the inverse of cumsum()
> cumsum(diff(c(0,x)))
#[1] 1 2 3 4 5 6 7 8 9 10
> diff(c(0,cumsum(x)))
#[1] 1 2 3 4 5 6 7 8 9 10
If you know the value of "lag" and "difference".
x<-5:10
y<-diff(x,lag=1,difference=1)
z<-diffinv(y,lag=1,differences = 1,xi=5) #xi is first value.
k<-as.data.frame(cbind(x,z))
k
x z
1 5 5
2 6 6
3 7 7
4 8 8
5 9 9
6 10 10

Splitting a vector into two

How can I split a vector into two such that it selects a random sample for each new vector. But I always want to split in half. For instance
x <- 1:10
obj <- splitMyVector(x)
obj$a
> 5 3 9 7 10
obj$b
> 8 4 1 6 2
Note: the purpose for this is to do a split half reliability.
split(sample(x),letters[seq(length(x))%%2+1])
$a
[1] 9 7 10 4 2
$b
[1] 6 1 8 3 5

Resources