Remove continuously repeating values [duplicate] - r

This question already has answers here:
Remove/collapse consecutive duplicate values in sequence
(5 answers)
Closed 4 years ago.
Does anyone know how to remove continuously repeating values? Not just repeating values with unique() function.
So for example, I want:
0,0,0,0,1,1,1,2,2,2,3,3,3,3,2,2,1,2
to become
0,1,2,3,2,1,2
and not just
0,1,2,3
Is there a word to describe this? I'm sure that the solution is out there somewhere and I just can't find it because I don't know the word for it.

Keep a value when it's difference from the previous value is not zero (and keep the first one):
x <- c(0,0,0,0,1,1,1,2,2,2,3,3,3,3,2,2,1,2)
x[c(1, diff(x)) != 0]
# [1] 0 1 2 3 2 1 2

v <- c(0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 2, 2, 1, 2)
rle(v)$values
Output:
[1] 0 1 2 3 2 1 2

Related

Is there a way to tell case_when something like "otherwise, leave values as they are"? [duplicate]

This question already has answers here:
Keep value if not in case_when statement
(2 answers)
Closed 17 days ago.
In a survey I have two vectors, one containing respondents' answers to a question (which includes NAs), and one that is a dummy for a specific NA code (i.e. it's 1 for all respondents with a specific NA value, such as "don't know" or "don't wish to say").
It could look something like this.
a <- c(0, 1, 2, 3, 4, NA, NA, 7)
b <- c(0, 0, 0, 0, 0, 0, 1, 0)
Now I want to modify a in such a way that it maintains all the observations, but gets assigned a different value (let's say 99) if b=1.
The end result should look something like this.
> a
[1] 0 1 2 3 4 NA 99 7
I can get to that outcome with work-around solutions, but it'd be great to know if there's a way to get there in a straightforward manner.
One way using dplyr:
library(dplyr)
a <- c(0, 1, 2, 3, 4, NA, NA, 7)
b <- c(0, 0, 0, 0, 0, 0, 1, 0)
dat <-
tibble(
A = a,
B = b
)
dat2 <-
dat %>%
mutate(
A = if_else(B == 1, 99, A)
)
or a very simple direct way a[b==1] = 99
under the assumption that both vectors have the same length you could just create an "index" vector of logicals based on b and use that to index a's elements for value assignment
b.index <- b == 1
a[b.index] = 99
# or in one line
a[b == 1] = 99
a
[1] 0 1 2 3 4 NA 99 7

How many times does the sign change in my vector? [duplicate]

This question already has answers here:
Counting the number of times a value change signs with R
(2 answers)
Closed 1 year ago.
I would like you to help me to create a function that helps me to identify how many times does the sign of the numbers in a vector change, for example:
1,2,-5,-6,-7,5,1,-8
How could my function identify that there are 3 sign changes?
Try the code below
> sum(diff(sign(v))!=0)
[1] 3
or
> sum(rowSums(embed(sign(v), 2)) == 0)
[1] 3
Data
v <- c(1, 2, -5, -6, -7, 5, 1, -8)
Using rle :
x <- c(1, 2, -5, -6, -7, 5, 1, -8)
length(rle(sign(x))$lengths) - 1
#[1] 3

How return the count of number of occurrences of an integer in a vector, in a new vector using R [duplicate]

This question already has answers here:
Count the occurrence of one vector's values in another vector
(2 answers)
Comparing Vectors Values: 1 element with all other
(2 answers)
Closed 4 years ago.
New to R. I have seen a lot of similar questions where tables are used to count the number of occurrences, but I want to create a new vector for each integer in vector_1 (e.g. 1 through 10,), where the number of occurrences of the integer in vector_1 is checked in vector_2, and then returned in a third vector_3.
Desired Result:
vector_1 <- c(1:10)
vector_2 <- c(3, 4, 4, 5, 7, 9, 10)
vector_3 <- c(0, 0, 1, 2, 1, 0, 1, 0, 1, 1)
I have tried using for loops such as:
for (i in 1:10) {
for (j in vector_2) {
print(i) <- vector_3
}
}
Obviously this code doesn't work, but I am just not finding a good way to do a summation of the occurrences between the vectors. Any guidance or alternate approaches would be welcomed.
*Edit: most all answers that I have seen to similar questions use tables to count the occurrences within vector_2; I haven't come across questions that compare the two vectors and then output the result.
Your code doesn't make sense to me. Anyway, you can easily compare each value in vector 1 with each value in vector 2 using outer. rowSums then can give you the required counts.
vector_1 <- c(1:10)
vector_2 <- c(3, 4, 4, 5, 7, 9, 10)
rowSums(outer(vector_1, vector_2, "=="))
#[1] 0 0 1 2 1 0 1 0 1 1
Also you can create a factor variable:
vector_2 <- c(3, 4, 4, 5, 7, 9, 10)
vector_2 <- factor(vector_2,levels = 1:10)
table(vector_2)

how to fill in values in a vector?

I have vectors in R containing a lot of 0's, and a few non-zero numbers.Each vector starts with a non-zero number.
For example <1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0>
I would like to set all of the zeros equal to the most recent non-zero number.
I.e. this vector would become <1,1,1,1,1,1,2,2,2,2,2,2,4,4,4,4>
I need to do this for a about 100 vectors containing around 6 million entries each. Currently I am using a for loop:
for(k in 1:length(vector){
if(vector[k] == 0){
vector[k] <- vector[k-1]
}
}
Is there a more efficient way to do this?
Thanks!
One option, would be to replace those 0 with NA, then use zoo::na.locf:
x <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
x[x == 0] <- NA
zoo::na.locf(x) ## you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4
Thanks to Richard for showing me how to use replace,
zoo::na.locf(replace(x, x == 0, NA))
You could try this:
k <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]
or another case that cummax would not be appropriate
k <- c(1,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]
Logic:
I am keeping "track" of the indices of the vector elements that are non zero which(k != 0), lets denote this new vector as x, x=c(1, 7, 13)
Next I am going to "sample" this new vector. How? From k I am creating a new vector that increments every time there is a non zero element cumsum(k != 0), lets denote this new vector as y y=c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3)
I am "sampling" from vector x: x[y] i.e. taking the first element of x 6 times, then the second element 6 times and the third element 3 times. Let denote this new vector as z, z=c(1, 1, 1, 1, 1, 1, 7, 7, 7, 7, 7, 7, 13, 13, 13)
I am "sampling" from vector k, k[z], i.e. i am taking the first element 6 times, then the 7th element 6 times then the 13th element 3 times.
Add to #李哲源's answer:
If it is required to replace the leading NAs with the nearest non-NA value, and to replace the other NAs with the last non-NA value, the codes can be:
x <- c(0,0,1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
zoo::na.locf(zoo::na.locf(replace(x, x == 0, NA),na.rm=FALSE),fromLast=TRUE)
# you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4

Number Duplicated Cases

I want to identify duplicate cases and number them as a vector (such as with an ID variable). Any case without any direct matches should be labeled as a fixed value (such as zero). Any case with a corresponding duplicate should be labeled 1, with each subsequent case being labeled n+1. So, if I have an ID variable like this 1, 2, 2, 2, 3, 4, 4, 5, I'd want the corresponding vector to produce: 0, 1, 2, 3, 0, 1, 2, 0.
How can I do this?
Duplicate identifies the first case as a non-duplicate, so that doesn't work.
Base R, ave with seq_along
x<-c(1,2,2,2,3,4,4,5)
ave(seq_along(x),x,FUN=function(g) if(length(g)>1) seq_along(g) else 0)
#> 0 1 2 3 0 1 2 0

Resources