I have a vector of 1s and 0s. I would like to replace the 1s with its "spot" in the vector.
For example I would like to change
x = c(1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1)
to
1 1 1 0 2 2 2 0 0 3 0 4
The numbers of 1s and 0s in a row can change.
Here is one way to do it...
x = c(1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1)
x[x==1] <- cumsum(c(x[1], diff(x)) == 1)[x==1]
x
[1] 1 1 1 0 2 2 2 0 0 3 0 4
Another way with rle
x * with(rle(x), rep(cumsum(values), lengths))
#[1] 1 1 1 0 2 2 2 0 0 3 0 4
We create a run-length sequence of x and repeat cumsum of values lengths time and multiply it by x so that 0's remain as 0's and only 1's are changed.
Here is an option with rle and inverse.rle
inverse.rle(within.list(rle(x), values[values==1] <- seq_along(values[values==1])))
#[1] 1 1 1 0 2 2 2 0 0 3 0 4
Or an option using rleid from data.table
library(data.table)
x1 <- rleid(x)
x1[x!= 0] <- rleid(x1[x!=0])
x1 * x
#[1] 1 1 1 0 2 2 2 0 0 3 0 4
Related
I have data that looks like this:
d <- data.frame(Item = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0)
I would like to create a column where the value is based on the last time a 0 was present in the row d$item. I don't really know how to get started with something like this in R.
Expected outcome is this:
f$recent <- c(NA, 0, 0, 1, 2, 3, 4, 5, 6, 0, 0, 1, 0, 1, 2, 3, 0)
Where each row is the the most recent observation of 0 (0 = on same row, 1 = previous row, etc.)
edit: Changed row to column, was posting before coffee. Also added expected result.
You can try rle + sequence
transform(
d,
recent = with(rle(Item), sequence(lengths)) * (Item != 0)
)
which gives
Item recent
1 1 1
2 0 0
3 0 0
4 1 1
5 1 2
6 1 3
7 1 4
8 1 5
9 1 6
10 0 0
11 0 0
12 1 1
13 0 0
14 1 1
15 1 2
16 1 3
17 0 0
You can do this, with sequence. It calculate the distance to the latest 1.
dif <- diff(c(which(d$Item == 1), length(d$Item) + 1))
sequence(dif, 0)
#[1] 0 1 2 0 0 0 0 0 0 1 2 0 1 0 0 0 1
Edit:
dif <- diff(c(1, which(d$Item != 1), length(d$Item) + 1))
sequence(dif, 0)
#[1] 0 0 0 1 2 3 4 5 6 0 0 1 0 1 2 3 0
If I have a vector like
"a": 0 0 1 1 1 0 0 0 0 1 1 0 0 0
How can I generate a vector of the same length containing the count of consecutive elements, like so:
"b": 2 2 3 3 3 4 4 4 4 2 2 3 3 3
I tried rle, but I did not manage to stretch it out this way.
Another option using rle and rep
with(rle(a), rep(lengths, times = lengths))
# [1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
data
a <- c(0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0)
Create a grouping variable using diff and use it in ave to calculate length of each group.
ave(x, cumsum(c(0, diff(x) != 0)), FUN = length)
# [1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
You can do the same with dplyr lag
library(dplyr)
ave(x,cumsum(x != lag(x, default = FALSE)), FUN = length)
#[1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
And for completeness with data.table rleid
library(data.table)
ave(x, rleid(x), FUN = length)
#[1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
data
x <- c(0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0)
Here is another solution using vapply
count_consec <- function (a) {
# creating output vector out
out <- integer(length(a))
# consecutive differences
diffs <- which(diff(a) != 0)
# returning 0 just in order to have a return statement in vapply - you can return anything else
vapply(1:(length(diffs)+1), function (k) {
if (k == 1) {
out[1:diffs[1]] <<- diffs[1]
return (0L)
}
if (k == length(diffs)+1) {
out[(diffs[k-1]+1):length(out)] <<- length(out) - diffs[k-1]
return (0L)
}
out[(diffs[k-1]+1):diffs[k]] <<- diffs[k] - diffs[k-1]
return (0L)
}, integer(1))
out
}
count_consec(a)
# [1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
with the data a <- as.integer(unlist(strsplit('0 0 1 1 1 0 0 0 0 1 1 0 0 0', ' ')))
I want to find a way to replace consecutive same values into 0 at the beginning of each trial, but once the value has changed it should stop replacing and keep the value. It should occur every trials per subject.
For example, first subject has multiple trials (1, 2, etc). At the beginning of each trial, there may be some consecutive rows with the same value (e.g., 1, 1, 1). For these values, I would like to replace them to 0. However, once the value has changed from 1 to 0, I want to keep the values in the rest of the trial (e.g., 0, 0, 1).
subject <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
trial <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
value <- c(1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1)
df <- data.frame(subject, trial, value)
Thus, from the original data frame, I would like to have a new variable (value_new) like below.
subject trial value value_new
1 1 1 1 0
2 1 1 1 0
3 1 1 1 0
4 1 1 0 0
5 1 1 0 0
6 1 1 1 1
7 1 2 1 0
8 1 2 1 0
9 1 2 0 0
10 1 2 1 1
11 1 2 1 1
12 1 2 1 1
I was thinking to use tidyr and group_by(subject, trial) and mutate a new variable using conditional statement, but no idea how to do that. I guess I need to use rle(), but again, have no clue of how to replace the consecutive values into 0, and stop replacing once the value has changed and keep the rest of the values.
Any suggestions or advice would be really appreciated!
You can use rleid from data.table :
library(data.table)
setDT(df)[, new_value := value * +(rleid(value) > 1), .(subject, trial)]
df
# subject trial value new_value
# 1: 1 1 1 0
# 2: 1 1 1 0
# 3: 1 1 1 0
# 4: 1 1 0 0
# 5: 1 1 0 0
# 6: 1 1 1 1
# 7: 1 2 1 0
# 8: 1 2 1 0
# 9: 1 2 0 0
#10: 1 2 1 1
#11: 1 2 1 1
#12: 1 2 1 1
You can also do this with dplyr :
library(dplyr)
df %>%
group_by(subject, trial) %>%
mutate(new_value = value * +(rleid(value) > 1))
If I have a vector like
"a": 0 0 1 1 1 0 0 0 0 1 1 0 0 0
How can I generate a vector of the same length containing the count of consecutive elements, like so:
"b": 2 2 3 3 3 4 4 4 4 2 2 3 3 3
I tried rle, but I did not manage to stretch it out this way.
Another option using rle and rep
with(rle(a), rep(lengths, times = lengths))
# [1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
data
a <- c(0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0)
Create a grouping variable using diff and use it in ave to calculate length of each group.
ave(x, cumsum(c(0, diff(x) != 0)), FUN = length)
# [1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
You can do the same with dplyr lag
library(dplyr)
ave(x,cumsum(x != lag(x, default = FALSE)), FUN = length)
#[1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
And for completeness with data.table rleid
library(data.table)
ave(x, rleid(x), FUN = length)
#[1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
data
x <- c(0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0)
Here is another solution using vapply
count_consec <- function (a) {
# creating output vector out
out <- integer(length(a))
# consecutive differences
diffs <- which(diff(a) != 0)
# returning 0 just in order to have a return statement in vapply - you can return anything else
vapply(1:(length(diffs)+1), function (k) {
if (k == 1) {
out[1:diffs[1]] <<- diffs[1]
return (0L)
}
if (k == length(diffs)+1) {
out[(diffs[k-1]+1):length(out)] <<- length(out) - diffs[k-1]
return (0L)
}
out[(diffs[k-1]+1):diffs[k]] <<- diffs[k] - diffs[k-1]
return (0L)
}, integer(1))
out
}
count_consec(a)
# [1] 2 2 3 3 3 4 4 4 4 2 2 3 3 3
with the data a <- as.integer(unlist(strsplit('0 0 1 1 1 0 0 0 0 1 1 0 0 0', ' ')))
Hi I would really appreciate some help for this, I really couldn't find the solution in previous questions.
I have a tibble in long format (rows grouped by id and arranged by time).
I want to create a variable "eleg" based on "varx". The condition would be that "eleg" = 1 if "varx" in the previous 3 rows == 0 and in the current row varx == 1, if not = 0, for each ID. If possible using dplyr.
id <- c(1,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3)
time <- c(1,2,3,4,5,6,7,1,2,3,4,5,6,1,2,3,4)
varx <- c(0,0,0,0,1,1,0,0,1,1,1,1,1,0,0,0,1)
eleg <- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1)
table <- data.frame(id, time, varx, eleg)
In my real dataset the condition is "in the previous 24 rows" and the same ID could have eleg == 1 more than one time if it suits the condition.
Thank you.
One of the approach could be
library(dplyr)
m <- 3 #number of times previous rows are looked back
df %>%
group_by(id) %>%
mutate(eleg = ifelse(rowSums(sapply(1:m, function(k) lag(varx, n = k, order_by = id, default = 1) == 0)) == m & varx == 1,
1,
0)) %>%
data.frame()
which gives
id time varx eleg
1 1 1 0 0
2 1 2 0 0
3 1 3 0 0
4 1 4 0 0
5 1 5 1 1
6 1 6 1 0
7 1 7 0 0
8 2 1 0 0
9 2 2 1 0
10 2 3 1 0
11 2 4 1 0
12 2 5 1 0
13 2 6 1 0
14 3 1 0 0
15 3 2 0 0
16 3 3 0 0
17 3 4 1 1
Sample data:
df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3), time = c(1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6,
1, 2, 3, 4), varx = c(0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1,
0, 0, 0, 1)), .Names = c("id", "time", "varx"), row.names = c(NA,
-17L), class = "data.frame")
library(data.table)
df %>%
mutate(elegnew = ifelse(Reduce("+", shift(df$varx, 1:3)) == 0 & df$varx == 1, 1, 0))
id time varx eleg elegnew
1 1 1 0 0 0
2 1 2 0 0 0
3 1 3 0 0 0
4 1 4 0 0 0
5 1 5 1 1 1
6 1 6 1 0 0
7 1 7 0 0 0
8 2 1 0 0 0
9 2 2 1 0 0
10 2 3 1 0 0
11 2 4 1 0 0
12 2 5 1 0 0
13 2 6 1 0 0
14 3 1 0 0 0
15 3 2 0 0 0
16 3 3 0 0 0
17 3 4 1 1 1
Here's another approach, using dplyr and zoo:
library(dplyr)
library(zoo)
df %>%
group_by(id) %>%
mutate(elegnew = as.integer(varx == 1 &
rollsum(varx == 1, k = 4, align = "right", fill = 0) == 1))
# # A tibble: 17 x 5
# # Groups: id [3]
# id time varx eleg elegnew
# <dbl> <dbl> <dbl> <dbl> <int>
# 1 1. 1. 0. 0. 0
# 2 1. 2. 0. 0. 0
# 3 1. 3. 0. 0. 0
# 4 1. 4. 0. 0. 0
# 5 1. 5. 1. 1. 1
# 6 1. 6. 1. 0. 0
# 7 1. 7. 0. 0. 0
# 8 2. 1. 0. 0. 0
# 9 2. 2. 1. 0. 0
# 10 2. 3. 1. 0. 0
# 11 2. 4. 1. 0. 0
# 12 2. 5. 1. 0. 0
# 13 2. 6. 1. 0. 0
# 14 3. 1. 0. 0. 0
# 15 3. 2. 0. 0. 0
# 16 3. 3. 0. 0. 0
# 17 3. 4. 1. 1. 1
The idea is to group by id and then check a) whether varx is 1 and b) whether the sum of varx=1 events in the previous 3 plus current row (k=4) is 1 (which means all previous 3 must be 0). I assume that varx is either 0 or 1.
You have asked for a dplyr solution, preferably.
The following is a base R one, with a function that you can adapt to "in the previous 24 rows", just pass n = 24 to the function.
fun <- function(DF, crit = "varx", new = "eleg", n = 3){
DF[[new]] <- 0
for(i in seq_len(nrow(DF))[-seq_len(n)]){
if(all(DF[[crit]][(i - n):(i - 1)] == 0) && DF[[crit]][i] == 1)
DF[[new]][i] <- 1
}
DF
}
sp <- split(table[-4], table[-4]$id)
new_df <- do.call(rbind, lapply(sp, fun))
row.names(new_df) <- NULL
identical(table, new_df)
#[1] TRUE
Note that if you are creating a new column, eleg, you would probably not need to split table[-4], just table since the 4th column wouldn't exist yet.
You could do do.call(rbind, lapply(sp, fun, n = 24)) and the rest would be the same.