Conditional formula referring to preview row in DF not working - r

trying these two methods to solve a problem... but they doesn't work.
Column 3 should continue to be "1" according to the final condition after column 1 changes to 0 from 1.
Method 1:
a <- as.data.frame(c(0,0,0,1,1,1,1,1,0,0,0,0))
b <- as.data.frame(c(0,0,0,0,0,0,0,0,0,0,0,0))
df <- cbind(a,b)
df[1,3] <- 0
df[-1,3] <- ifelse(df[-1,1] == 1 & df[-1,2] == 0, 1, ifelse(df[-1,1] == 1 &
df[-1,2] == 1, 0, df[sum(!is.na(df[,3])),3]))
Method 2:
a <- as.data.frame(c(0,0,0,1,1,1,1,1,0,0,0,0))
b <- as.data.frame(c(0,0,0,0,0,0,0,0,0,0,0,0))
df <- cbind(a,b)
df[1,3] <- 0
ndates <- as.numeric(length(df[,1]))
x <- 1
while (ndates > x - 1){
df[-1,3] <- ifelse(df[-1,1] == 1 & df[-1,2] == 0, 1, ifelse(df[-1,1] == 1
& df[-1,2] == 1, 0, df[sum(!is.na(df[,3])),3]))
x <- x + 1
}
Any help would be appreciated... seems like I'm missing something that is probably quite basic.

Updated answer:
Okay with a better understanding of what you're trying to accomplish here's a for loop version that should be right. Let me know if I'm still missing what your intention is.
df[-1,3] <- ifelse(df[-1,1] == 1 & df[-1,2] == 0, 1, ifelse(df[-1,1] == 1 &
df[-1,2] == 1, 0, df[sum(!is.na(df[,3])),3]))
a <- as.data.frame(c(0,0,0,1,1,1,1,1,0,0,0,0))
b <- as.data.frame(c(0,0,0,0,0,0,0,0,0,0,0,0))
df <- cbind(a,b)
df[1,3] <- 0
for (i in 2:nrow(df)) {
df[i,3] <- ifelse(df[i,1] == 1 & df[i,2] == 0, 1,
ifelse(df[i,1] == 1 & df[i,2] == 1, 0, df[i-1,3]))
}
This code loops once through each row and updates based on the rules you gave ([1,0] = 1, [1,1] = 0, otherwise previous row). And this is the resulting output:
> df
c(0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0) c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) V3
1 0 0 0
2 0 0 0
3 0 0 0
4 1 0 1
5 1 0 1
6 1 0 1
7 1 0 1
8 1 0 1
9 0 0 1
10 0 0 1
11 0 0 1
12 0 0 1
>
Initial answer:
It might be helpful if you could clarify what you're trying to accomplish. I worked through your first method and it seems to give the expected results. This is my understanding in pseudocode:
if C1,C2 == [1,0]:
set C3 to 1
else:
if C1,C2 == [1,1]:
set C3 to 0
else:
set C3 to the val in col=3,row=number of non NA vals in col3
Since you have 1 non-NA then the final statement evaluates to cell [1,3] which is 0 so it sets all the [1,0] cases to 1 and 0 otherwise.
What exactly are you trying to accomplish with df[sum(!is.na(df[,3])),3]))? This might be a case of a logic error, but it's hard to tell without understanding what you're hoping the outcome will be.

Related

How to change values of R cells (dataframe) based on a condition for specific rows>?

I have the following dataframe,
C1
C2
C3
0
0
0
1
1
0
0
0
0
1
1
0
0
0
0
I want to now apply the following condition on the dataframe for specific indexes only.
C1 should be equal to 0
A random number should be less than 0.5
If the above conditions match, I want to change the value of the Cell in C1 and C2 to 1 else do nothing.
I am trying the following: (rowIndex is the specific indexes on which I want to apply the conditions)
apply(DF[rowsIndex,], 2, fun)
where fun is:
fun<- function(x) {
ifelse(x==0,ifelse(runif(n=1)<0.5,x <- 1,x),x )
print(x)
}
My questions are:
In my function, How do I apply the conditions to a certain column only i.e C1 (I have tried using DF[rowsIndex,c(1)], but gives an error
Is there any other approach I can take Since this approach is not giving me any results and the same DF is printed.
Thanks
If you want to stay in base R:
#your dataframe
DF <- data.frame(C1 = c(0, 1, 0, 1, 0),
C2 = c(0, 1, 0, 1, 0),
C3 = c(0, 0, 0, 0, 0))
fun<- function(x) {
if(x[1]==0 & runif(n=1)<0.5) {
x[1:2] <- 1
}
return(x)
}
#your selection of rows you want to process
rowsIndex <- c(1, 2, 3, 4)
#Using MARGIN = 1 applies the function to the rows of a dataframe
#this returns a dataframe containing your selected and processed rows
DF_processed <- t(apply(DF[rowsIndex,], 1, fun))
#replace the selected rows in the original DF by the processed rows
DF[rowsIndex, ] <- DF_processed
print(DF)
Something like this?
library(dplyr)
df %>%
mutate(across(c(C1, C2), ~ifelse(C1 == 0 & runif(1) < 0.5, 1, .)))
C1 C2 C3
1 1 0 0
2 1 1 0
3 1 0 0
4 1 1 0
5 1 0 0
Applying it to your function:
fun<- function(df, x, y) {
df %>%
mutate(across(c({{x}}, {{y}}), ~ifelse({{x}} == 0 & runif(1) < 0.5, 1, .)))
}
fun(df, C1, C2)
C1 C2 C3
1 0 0 0
2 1 1 0
3 0 0 0
4 1 1 0
5 0 0 0

Merging 2 Vectors to 1 Vector that satisfies certain criteria

I have two vectors that can be written as follows:
aa <- c(0, 0, 0, 0, 1, 0, 0, 0)
bb <- c(0, 2, 0, 0, 3, 1, 1, 1)
I want to merge these vectors such that the rest of vector bb takes the value zero when vector aa interfere with the value 1. In this example the result should look like:
cc <- c(0, 2, 0, 0, 3, 0, 0, 0)
What is the fastest and most efficient way to do this in R?
We may do
library(dplyr)
ifelse(lag(cummax(aa), default = 0) == 0, bb, aa)
[1] 0 2 0 0 3 0 0 0
Or another way is
bb * !c(0, head(cummax(aa), -1))
[1] 0 2 0 0 3 0 0 0
Or another option
ind <- (which.max(aa) + 1):length(aa)
bb[ind] <- aa[ind]
> bb
[1] 0 2 0 0 3 0 0 0
This is maybe too much for this task. At least for me it is easier to follow:
library(dplyr)
cc <- tibble(aa,bb) %>%
group_by(id_group=lag(cumsum(aa==1), default = 0)) %>%
mutate(cc = ifelse(id_group == 0, coalesce(bb,aa), coalesce(aa,bb))) %>%
pull(cc)
output:
[1] 0 2 0 0 3 0 0 0

R Lookback few days and assign new value if old value exists

I have two timeseries vectors as follows -
a <- c(1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0)
b <- c(1, 0, 1, 0)
I want to look back 7 days and replace only 1's in vectors a and b with 2. It is important to check if there were any values 7 days before replacing.
The expected result is -
a = c(1, 0, 0, 0, 1, 0, 2, 1, 1, 0, 2, 0)
b = c(1, 0, 1, 0) - Since no value existed 7 days ago, nothing changes here.
Thanks!
We can create a condition with lag
library(dplyr)
f1 <- function(vec) replace(vec, lag(vec, 6) == 1, 2)
-output
f1(a)
#[1] 1 0 0 0 1 0 2 1 1 0 2 0
f1(b)
#[1] 1 0 1 0
A base R option by defining an user function f
f <- function(v) replace(v, (ind <- which(v == 1) + 6)[ind <= length(v)], 2)
such that
> f(a)
[1] 1 0 0 0 1 0 2 1 1 0 2 0
> f(b)
[1] 1 0 1 0

Recode a value in a vector based on surrounding values

I'm trying to programmatically change a variable from a 0 to a 1 if there are three 1s before and after a 0.
For example, if the number in a vector were 1, 1, 1, 0, 1, 1, and 1, then I want to change the 0 to a 1.
Here is data in the vector dummy_code in the data.frame df:
original_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1))
Here is how I'm trying to have the values be recoded:
desired_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1)
I tried to use the function fill in the package tidyr, but this fills in missing values, so it won't work. If I were to recode the 0 values to be missing, then that would not work either, because it would simply code every NA as 1, when I would only want to code every NA surrounded by three 1s as 1.
Is there a way to do this in an efficient way programmatically?
An rle alternative, using the x from #G. Grothendieck's answer:
r <- rle(x)
Find indexes of runs of three 1:
i1 <- which(r$lengths == 3 & r$values == 1)
Check which of the "1 indexes" that surround a 0, and get the indexes of the 0 to be replaced:
i2 <- i1[which(diff(i1) == 2)] + 1
Replace relevant 0 with 1:
r$values[i2] <- 1
Reverse the rle operation on the updated runs:
inverse.rle(r)
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
A similar solution based on data.table::rleid, slightly more compact and perhaps easier to read:
library(data.table)
d <- data.table(x)
Calculate length of each run:
d[ , n := .N, by = rleid(x)]
For "x" which are zero and the preceeding and subsequent runs of 1 are of length 3, set "x" to 1:
d[x == 0 & shift(n) == 3 & shift(n, type = "lead") == 3, x := 1]
d$x
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
Here is a one-liner using rollapply from zoo:
library(zoo)
rollapply(c(0, 0, 0, x, 0, 0, 0), 7, function(x) if (all(x[-4] == 1)) 1 else x[4])
## [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
Note: Input used was:
x <- c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1)

how to remove one data in r

In R I have some vector.
x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0)
I want to remove only "0" in x vector, but it removes all '0' in this vector.
Example
x=x[!x %in% 0 )]
All zero in this vector had been remove in x vector
For Example in Python
x = [0,1,0,1,0,0,0,1]
x.remove(0)
x
[1, 0, 1, 0, 0, 0, 1]
x.remove(0)
x
[1, 1, 0, 0, 0, 1]
We can use match to remove the first occurrence of a particular number
x <- c(1, 0, 1, 0, 0, 0, 1)
x[-match(1, x)]
#[1] 0 1 0 0 0 1
If you have any other number to remove in array, for example 5 in the case below,
x <- c(1, 0, 5, 5, 0, 0, 1)
x[-match(5, x)]
#[1] 1 0 5 0 0 1
You may need which.min(),
which determines the index of the first minimum of a vector:
x <- c(0,1,0,1,0,0,0,1)
x <- x[-which.min(x)]
x
# [1] 1 0 1 0 0 0 1
If your vector contains elements other than 0 or 1: x <- x[-which.min(x != 0)]

Resources