Merging 2 Vectors to 1 Vector that satisfies certain criteria - r

I have two vectors that can be written as follows:
aa <- c(0, 0, 0, 0, 1, 0, 0, 0)
bb <- c(0, 2, 0, 0, 3, 1, 1, 1)
I want to merge these vectors such that the rest of vector bb takes the value zero when vector aa interfere with the value 1. In this example the result should look like:
cc <- c(0, 2, 0, 0, 3, 0, 0, 0)
What is the fastest and most efficient way to do this in R?

We may do
library(dplyr)
ifelse(lag(cummax(aa), default = 0) == 0, bb, aa)
[1] 0 2 0 0 3 0 0 0
Or another way is
bb * !c(0, head(cummax(aa), -1))
[1] 0 2 0 0 3 0 0 0
Or another option
ind <- (which.max(aa) + 1):length(aa)
bb[ind] <- aa[ind]
> bb
[1] 0 2 0 0 3 0 0 0

This is maybe too much for this task. At least for me it is easier to follow:
library(dplyr)
cc <- tibble(aa,bb) %>%
group_by(id_group=lag(cumsum(aa==1), default = 0)) %>%
mutate(cc = ifelse(id_group == 0, coalesce(bb,aa), coalesce(aa,bb))) %>%
pull(cc)
output:
[1] 0 2 0 0 3 0 0 0

Related

R Lookback few days and assign new value if old value exists

I have two timeseries vectors as follows -
a <- c(1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0)
b <- c(1, 0, 1, 0)
I want to look back 7 days and replace only 1's in vectors a and b with 2. It is important to check if there were any values 7 days before replacing.
The expected result is -
a = c(1, 0, 0, 0, 1, 0, 2, 1, 1, 0, 2, 0)
b = c(1, 0, 1, 0) - Since no value existed 7 days ago, nothing changes here.
Thanks!
We can create a condition with lag
library(dplyr)
f1 <- function(vec) replace(vec, lag(vec, 6) == 1, 2)
-output
f1(a)
#[1] 1 0 0 0 1 0 2 1 1 0 2 0
f1(b)
#[1] 1 0 1 0
A base R option by defining an user function f
f <- function(v) replace(v, (ind <- which(v == 1) + 6)[ind <= length(v)], 2)
such that
> f(a)
[1] 1 0 0 0 1 0 2 1 1 0 2 0
> f(b)
[1] 1 0 1 0

Conditional formula referring to preview row in DF not working

trying these two methods to solve a problem... but they doesn't work.
Column 3 should continue to be "1" according to the final condition after column 1 changes to 0 from 1.
Method 1:
a <- as.data.frame(c(0,0,0,1,1,1,1,1,0,0,0,0))
b <- as.data.frame(c(0,0,0,0,0,0,0,0,0,0,0,0))
df <- cbind(a,b)
df[1,3] <- 0
df[-1,3] <- ifelse(df[-1,1] == 1 & df[-1,2] == 0, 1, ifelse(df[-1,1] == 1 &
df[-1,2] == 1, 0, df[sum(!is.na(df[,3])),3]))
Method 2:
a <- as.data.frame(c(0,0,0,1,1,1,1,1,0,0,0,0))
b <- as.data.frame(c(0,0,0,0,0,0,0,0,0,0,0,0))
df <- cbind(a,b)
df[1,3] <- 0
ndates <- as.numeric(length(df[,1]))
x <- 1
while (ndates > x - 1){
df[-1,3] <- ifelse(df[-1,1] == 1 & df[-1,2] == 0, 1, ifelse(df[-1,1] == 1
& df[-1,2] == 1, 0, df[sum(!is.na(df[,3])),3]))
x <- x + 1
}
Any help would be appreciated... seems like I'm missing something that is probably quite basic.
Updated answer:
Okay with a better understanding of what you're trying to accomplish here's a for loop version that should be right. Let me know if I'm still missing what your intention is.
df[-1,3] <- ifelse(df[-1,1] == 1 & df[-1,2] == 0, 1, ifelse(df[-1,1] == 1 &
df[-1,2] == 1, 0, df[sum(!is.na(df[,3])),3]))
a <- as.data.frame(c(0,0,0,1,1,1,1,1,0,0,0,0))
b <- as.data.frame(c(0,0,0,0,0,0,0,0,0,0,0,0))
df <- cbind(a,b)
df[1,3] <- 0
for (i in 2:nrow(df)) {
df[i,3] <- ifelse(df[i,1] == 1 & df[i,2] == 0, 1,
ifelse(df[i,1] == 1 & df[i,2] == 1, 0, df[i-1,3]))
}
This code loops once through each row and updates based on the rules you gave ([1,0] = 1, [1,1] = 0, otherwise previous row). And this is the resulting output:
> df
c(0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0) c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) V3
1 0 0 0
2 0 0 0
3 0 0 0
4 1 0 1
5 1 0 1
6 1 0 1
7 1 0 1
8 1 0 1
9 0 0 1
10 0 0 1
11 0 0 1
12 0 0 1
>
Initial answer:
It might be helpful if you could clarify what you're trying to accomplish. I worked through your first method and it seems to give the expected results. This is my understanding in pseudocode:
if C1,C2 == [1,0]:
set C3 to 1
else:
if C1,C2 == [1,1]:
set C3 to 0
else:
set C3 to the val in col=3,row=number of non NA vals in col3
Since you have 1 non-NA then the final statement evaluates to cell [1,3] which is 0 so it sets all the [1,0] cases to 1 and 0 otherwise.
What exactly are you trying to accomplish with df[sum(!is.na(df[,3])),3]))? This might be a case of a logic error, but it's hard to tell without understanding what you're hoping the outcome will be.

Take value from list and use it as index

I have a matrix (A) like this (the names of row and column are identification codes (ID):
1 3 10 38 46
1 0 0.4 0 0 0
3 0 0 0 0 0
10 0 0 0.9 0.8 0
38 0 0 0 0 0
46 0 0.1 0 0 0
And another matrix (B) like this:
a b c
1 2.676651e-04 4.404911e-06 9.604227e-06
3 6.073389e-10 3.273222e-05 3.360321e-04
10 4.156392e-08 1.269607e-06 7.509217e-06
38 4.200699e-08 3.227431e-02 8.286920e-11
46 9.352353e-05 3.318948e-20 8.694981e-06
I would like to take the index of the elements of the A matrix >0, therefore I used this command:
temp <- apply(A,1, FUN=function(x) which(x>0))
it returned a list with the correct index of the elements >0.
After that I would like to multiply the element of the matrix B using the index. In particular, I would like to do something like these for each row:
1: 6.073389e-10*3.273222e-05*3.360321e-04
I have used the information of the matrix A (in the second column of the first row I have a value >0) as index to take the element in the matrix B for the first row.
For the second row, I obtained 0 because there aren't element in A[2,]>0
For the third row, I would like to obtain something like the first row, but I should sum the two products
10: 4.156392e-08*1.269607e-06*7.509217e-06 +4.200699e-08*3.227431e-02*8.286920e-11
I have tried to unlist the list but in this way I obtained a vector losing the corresponding between the ID
A <-
matrix(
c(0, 0.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.9, 0.8, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0, 0),
nrow = 5,
ncol = 5,
byrow = T
)
B <-
matrix(
c(
2.676651e-04, 4.404911e-06, 9.604227e-06,
6.073389e-10, 3.273222e-05, 3.360321e-04,
4.156392e-08, 1.269607e-06, 7.509217e-06,
4.200699e-08, 3.227431e-02, 8.286920e-11,
9.352353e-05, 3.318948e-20, 8.694981e-06
),
nrow = 5,
ncol = 3,
byrow = T
)
idx<-which(A>0, arr.ind = T)
result <- 0;
for (i in 1:nrow(idx)) {
cat(A[idx[i,1],idx[i,2]], sep="\n")
cat(B[idx[i,2], ], sep="\n")
result = result + sum(A[idx[i,1],idx[i,2]] * B[idx[i,2],])
}
cat("result=")
cat(result)

how to remove one data in r

In R I have some vector.
x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0)
I want to remove only "0" in x vector, but it removes all '0' in this vector.
Example
x=x[!x %in% 0 )]
All zero in this vector had been remove in x vector
For Example in Python
x = [0,1,0,1,0,0,0,1]
x.remove(0)
x
[1, 0, 1, 0, 0, 0, 1]
x.remove(0)
x
[1, 1, 0, 0, 0, 1]
We can use match to remove the first occurrence of a particular number
x <- c(1, 0, 1, 0, 0, 0, 1)
x[-match(1, x)]
#[1] 0 1 0 0 0 1
If you have any other number to remove in array, for example 5 in the case below,
x <- c(1, 0, 5, 5, 0, 0, 1)
x[-match(5, x)]
#[1] 1 0 5 0 0 1
You may need which.min(),
which determines the index of the first minimum of a vector:
x <- c(0,1,0,1,0,0,0,1)
x <- x[-which.min(x)]
x
# [1] 1 0 1 0 0 0 1
If your vector contains elements other than 0 or 1: x <- x[-which.min(x != 0)]

selecting only the 0s and the first 1 from a sequence of many 0s and few 1s in R?

I have a sequence of 0s and 1s in this manner:
xx <- c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0,
0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1)
And I want to select the 0s and the first 1s.
The results should be:
ans <- c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1)
What's the fastest way? in R
Use rle() to extract the run lengths and values, do some minor surgery, and then put the run-length encoded vector "back together" using inverse.rle().
rr <- rle(xx)
rr$lengths[rr$values==1] <- 1
inverse.rle(rr)
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1
Here's one way:
idx <- which(xx == 1)
pos <- which(diff(c(xx[1], idx)) == 1)
xx[-idx[pos]] # following Frank's suggestion
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1
Without rle:
xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
#[1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1
Since OP mentioned speed, here's a benchmark:
josh = function(xx) {
rr <- rle(xx)
rr$lengths[rr$values==1] <- 1
inverse.rle(rr)
}
arun = function(xx) {
idx <- which(xx == 1)
pos <- which(diff(c(xx[1], idx)) == 1)
xx[setdiff(seq_along(xx), idx[pos])]
}
eddi = function(xx) {
xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
}
simon = function(xx) {
# The body of the function is supplied in #SimonO101's answer
first1(xx)
}
set.seed(1)
N = 1e6
xx = sample(c(0,1), N, T)
library(microbenchmark)
bm <- microbenchmark(josh(xx), arun(xx), eddi(xx), simon(xx) , times = 25)
print( bm , digits = 2 , order = "median" )
#Unit: milliseconds
# expr min lq median uq max neval
# simon(xx) 20 21 23 26 72 25
# eddi(xx) 97 102 104 118 149 25
# arun(xx) 205 245 253 258 332 25
# josh(xx) 228 268 275 287 365 25
Here's a quick Rcpp solution. Should be fastish (but I've no idea how it will stack up against the others here)...
Rcpp::cppFunction( 'std::vector<int> first1( IntegerVector x ){
std::vector<int> out;
for( IntegerVector::iterator it = x.begin(); it != x.end(); ++it ){
if( *it == 1 && *(it-1) != 1 || *it == 0 )
out.push_back(*it);
}
return out;
}')
first1(xx)
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1
Even tho' I'm a staunch supporter of rle , since it's Friday here's an alternative method. I did it for fun, so YMMV.
yy<-paste(xx,collapse='')
zz<-gsub('[1]{1,}','1',yy) #I probably screwed up the regex here
aa<- as.numeric(strsplit(zz,'')[[1]])

Resources