Recode a value in a vector based on surrounding values - r

I'm trying to programmatically change a variable from a 0 to a 1 if there are three 1s before and after a 0.
For example, if the number in a vector were 1, 1, 1, 0, 1, 1, and 1, then I want to change the 0 to a 1.
Here is data in the vector dummy_code in the data.frame df:
original_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1))
Here is how I'm trying to have the values be recoded:
desired_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1)
I tried to use the function fill in the package tidyr, but this fills in missing values, so it won't work. If I were to recode the 0 values to be missing, then that would not work either, because it would simply code every NA as 1, when I would only want to code every NA surrounded by three 1s as 1.
Is there a way to do this in an efficient way programmatically?

An rle alternative, using the x from #G. Grothendieck's answer:
r <- rle(x)
Find indexes of runs of three 1:
i1 <- which(r$lengths == 3 & r$values == 1)
Check which of the "1 indexes" that surround a 0, and get the indexes of the 0 to be replaced:
i2 <- i1[which(diff(i1) == 2)] + 1
Replace relevant 0 with 1:
r$values[i2] <- 1
Reverse the rle operation on the updated runs:
inverse.rle(r)
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
A similar solution based on data.table::rleid, slightly more compact and perhaps easier to read:
library(data.table)
d <- data.table(x)
Calculate length of each run:
d[ , n := .N, by = rleid(x)]
For "x" which are zero and the preceeding and subsequent runs of 1 are of length 3, set "x" to 1:
d[x == 0 & shift(n) == 3 & shift(n, type = "lead") == 3, x := 1]
d$x
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1

Here is a one-liner using rollapply from zoo:
library(zoo)
rollapply(c(0, 0, 0, x, 0, 0, 0), 7, function(x) if (all(x[-4] == 1)) 1 else x[4])
## [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
Note: Input used was:
x <- c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1)

Related

R Lookback few days and assign new value if old value exists

I have two timeseries vectors as follows -
a <- c(1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0)
b <- c(1, 0, 1, 0)
I want to look back 7 days and replace only 1's in vectors a and b with 2. It is important to check if there were any values 7 days before replacing.
The expected result is -
a = c(1, 0, 0, 0, 1, 0, 2, 1, 1, 0, 2, 0)
b = c(1, 0, 1, 0) - Since no value existed 7 days ago, nothing changes here.
Thanks!
We can create a condition with lag
library(dplyr)
f1 <- function(vec) replace(vec, lag(vec, 6) == 1, 2)
-output
f1(a)
#[1] 1 0 0 0 1 0 2 1 1 0 2 0
f1(b)
#[1] 1 0 1 0
A base R option by defining an user function f
f <- function(v) replace(v, (ind <- which(v == 1) + 6)[ind <= length(v)], 2)
such that
> f(a)
[1] 1 0 0 0 1 0 2 1 1 0 2 0
> f(b)
[1] 1 0 1 0

Creating multiple summary tables with one function in R

I couldn't find an answer to this specific question sorry if it's been asked:
library(tidyverse)
#sampledata
df <- data.frame(group=c(1, 1, 1, 1, 0, 0, 0, 0),
v1=c(1, 0, 0, 1, 0, 1, 1, 1),
v2=c(0, 0, 0, 0, 1, 0, 0, 1),
v3=c(0, 1, 0, 1, 1, 0, 1, 1))
I want to find the number of "1"s and "0"s in each v1, v2, v3 for each level of "group".
Currently I have been using
table(df$group, df$v1)
table(df$group, df$v2)
table(df$group, df$v3)
ad nauseum to get the number of "1" in each variable but I can't figure out how to create many such tables with one function...Any help would be greatly appreciated
We can use lapply to apply the same function to multiple columns.
lapply(df[-1], function(x) table(df$group, x))
#$v1
# x
# 0 1
# 0 1 3
# 1 2 2
#$v2
# x
# 0 1
# 0 2 2
# 1 4 0
#$v3
# x
# 0 1
# 0 1 3
# 1 2 2
Or with dplyr we can use count
purrr::map(names(df)[-1], ~count(df, group, !!sym(.x)))

how to remove one data in r

In R I have some vector.
x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0)
I want to remove only "0" in x vector, but it removes all '0' in this vector.
Example
x=x[!x %in% 0 )]
All zero in this vector had been remove in x vector
For Example in Python
x = [0,1,0,1,0,0,0,1]
x.remove(0)
x
[1, 0, 1, 0, 0, 0, 1]
x.remove(0)
x
[1, 1, 0, 0, 0, 1]
We can use match to remove the first occurrence of a particular number
x <- c(1, 0, 1, 0, 0, 0, 1)
x[-match(1, x)]
#[1] 0 1 0 0 0 1
If you have any other number to remove in array, for example 5 in the case below,
x <- c(1, 0, 5, 5, 0, 0, 1)
x[-match(5, x)]
#[1] 1 0 5 0 0 1
You may need which.min(),
which determines the index of the first minimum of a vector:
x <- c(0,1,0,1,0,0,0,1)
x <- x[-which.min(x)]
x
# [1] 1 0 1 0 0 0 1
If your vector contains elements other than 0 or 1: x <- x[-which.min(x != 0)]

string index matching alternate 1 and -1

I am trying to write a function that takes in a vector of integers and returns the indexes where 1 and -1 alternate. I thought this would be a simple function but it is proving devilishly hard to write!
Example:
index: 1 2 3 4 5 6 7 8 9 10
string: 1 0 0 0 1 -1 -1 0 1 -1 #this is input to function
returns index of -1 and 1 so that -1 index is listed first
[6,1] #from 1 to -1
[6,9] #from -1 to 1
[10,9] #from 1 to -1
My (non-working) attempt:
x<-c(1,0,0,0,1,-1,-1,0,1)
matchVals(x)
matchVals<-function(x){
current.index<-getStart(x)
#next VALUE to search for should be:
next.val<-x[current.index]*-1
next.index<-getNextVal(x[current.index:length(x)],next.val) + current.index #adding in offset
return(current.index,next.index)
}
#gets index of first value that isnt a 0
getStart<-function(x){
lapply(1:length(x),function(i){if(x[i]!=0)return(i)})
return(NA)
}
#gets FIRST index of the value specified (so dont feed it the entire string). Must add offset for truncated portion of string
getNextVal<-function(x,v){
lapply(1:length(x),function(i){if(x[i]==v)return(i)})
return(NA)
}
A way would to be to iterate, alternately, over the indices of 1 and -1 and save indices progressively.
Having:
x = c(1, 0, 0, 0, 1, -1, -1, 0, 1, -1)
and computing the indices:
i1 = which(x == 1)
i2 = which(x == -1)
find the progressive sequence recursively:
ff = function(x, y, acc = integer())
{
if(!length(x)) return(acc)
if(!length(y)) return(c(acc, x[[1L]]))
Recall(y[(findInterval(x[[1L]], y) + 1L):length(y)], x[-1L], c(acc, x[[1L]]))
}
ans = if(i1[[1]] < i2[[1]]) ff(i1, i2) else ff(i2, i1)
ans
#[1] 1 6 9 10
To get the exact desired output (among alternative ways):
tmp = embed(ans, 2)
i = (seq_len(nrow(tmp)) %% 2) == (if(i1[[1]] < i2[[1]]) 0 else 1)
tmp[i, ] = t(apply(tmp[i, , drop = FALSE], 1, rev))
tmp
# [,1] [,2]
#[1,] 6 1
#[2,] 6 9
#[3,] 10 9
Testing on other data:
X = c(0, 0, -1, 0, 0, -1, 1, 1, 1, -1, -1, 0, 0, -1, 0, 0, 1, -1,
-1, -1, 0, 1, 0, 1, 1, 1, -1, 0, 0, 1, 0, 1, 0, -1, 1, 1, 1,
-1, 0, 0, 1, 0, 1, 0, -1, 1, 1, 1)
i1 = which(X == 1)
i2 = which(X == -1)
if(i1[[1]] < i2[[1]]) ff(i1, i2) else ff(i2, i1)
# [1] 3 7 10 17 18 22 27 30 34 35 38 41 45 46
#..and proceed as necessary
This is an incomplete answer, but I think it's in the right direction.
Test case:
x <- c(1,0,0,0,1,-1,-1,0,1)
We basically want to ignore zero values, so let's (1) replace them by NAs and (2) use zoo::na.locf ("last observation carried forward") to replace them by the leading value
x2 <- x
x2[x2==0] <- NA
x2 <- zoo::na.locf(x2)
Now use rle() to identify runs/breakpoints:
(r <- rle(x2))
## Run Length Encoding
## lengths: int [1:3] 5 3 1
## values : num [1:3] 1 -1 1
The following statement gives (6,9,10), the locations you were interested in: r$values gives the corresponding information about the direction of the switch
cumsum(r$lengths)+1

generate increasing sequence of varying length in R

Given n, generate a sequence like this:
0, 0, 1, 0, 1, 2, ........, 0, 1, 2, 3, 4, 5, 6, ....n
Let's say n=3, then the sequence should be:
0, 0, 1, 0, 1, 2, 0, 1, 2, 3
I've tried using rep, but it only generates a fixed length, where as I need the sequence length to increase each time.
You can use a simply Map with an unlist to get the result you want
n <- 3
unlist(Map(seq, from=0, to=0:n))
# [1] 0 0 1 0 1 2 0 1 2 3
From this answer
n <- 3
sequence(0:(n+1))-1
# [1] 0 0 1 0 1 2 0 1 2 3

Resources