I would like the find the pattern of either a 0/1 followed by a 2 which occurs more than three times in a row. I would like to find this pattern and transform the 2's in this pattern into 1s - such as
Input:
Y <- c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1,2,1,3,1,2,1)
Some Function findPattern that finds the pattern:
findPattern(Y)
And Outputs the following:
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0
I have tried the following:
as.numeric(Y == 2 & lead(Y) %in% 1:2)
1. Find 0/1 followed by 2s
findPattern<-function(Y){
as.numeric(Y==2 & (c(NA,Y[-length(Y)])==0 |c(NA,Y[-length(Y)])==1 ))
}
I add a NA a the start and remove last item so that you "shift" your vector by 1 position but still keep same vector length. This way you avoid for loops.
If you want to use %in% which avoids a second passage:
findPattern<-function(Y){
as.numeric(Y==2 & (c(NA,Y[-length(Y)]) %in% c(0,1))
}
2. Select only those that have at least three 1s every other position
findPattern<-function(Y){
w <- which(Y==2 & (c(NA,Y[-length(Y)]) %in% c(0,1)))
centers<- w[((w - 2) %in% w) & ((w+2) %in% w)]
result<-rep(0, times = length(Y))
result[c(centers,centers-2,centers+2)]<-1
return(result)
}
Testing:
findPattern(c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1,2,1,3,1,2,1))
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0
Here is a possible approach to solve the problem where you can combine with the regular expression to find the pattern.
Starting vector:
> Y
[1] 0 2 0 3 2 5 2 1 2 0 2 1 2 0 1
1) Find out all the 2s preceded by 0 or 1;
> ind <- as.integer(lag(Y %in% c(0, 1)) & (Y == 2) )
> ind
[1] 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0
2) Paste the resulting vector into a string and use regular expression to find out the location and length of the required pattern, i.e., alternating 0 and 1 equal or more than three times;
> id <- gregexpr("(01){3,}", paste0(ind, collapse = ""))
> id
[[1]]
[1] 8
attr(,"match.length")
[1] 6
attr(,"useBytes")
[1] TRUE
3) Extracting the location and length from the regular expression result and convert them into the index pattern;
> start <- as.numeric(id[[1]])
> end <- start + attr(id[[1]], "match.length") - 1
> indArray <- unlist(Map(`:`, start, end))
> indArray
[1] 8 9 10 11 12 13
4) Assign all the values at 01 pattern less than 3 times to 0
> ind[-indArray] <- 0
> ind
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0
Wrap them into a function:
library(dplyr)
findPattern <- function(Y) {
ind <- as.integer(lag(Y %in% c(0, 1)) & (Y == 2) )
id <- gregexpr("(01){3,}", paste0(ind, collapse = ""))
start <- as.numeric(id[[1]])
end <- start + attr(id[[1]], "match.length") - 1
indArray <- unlist(Map(`:`, start, end))
ind[-indArray] <- 0
ind
}
Using stringi package
Y <- c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1)
matchVec = stri_count(Y,fixed=2)
remapVec = as.integer(matchVec & (cumsum(matchVec)>=3))
remapVec
#[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0
Related
I am tasked to create the vector
0 1 0 1 0 1 0 1 0 1
using two approaches without using c() or rep() in R.
I have tried a bunch of methods, but none of them seem to work.
Here are some of my attempts (all of which have failed) -
vector(0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
a<-seq(from = 0, to = 1 , by = 1)
a
replicate(5, a)
b<-1*(0:1)
do.call(cbind, replicate(5, b, simplify=FALSE))
Any help on this would be appreciated! Thank you.
We can use bitwAnd
> bitwAnd(0:9, 1)
[1] 0 1 0 1 0 1 0 1 0 1
or kronecker
> kronecker(as.vector(matrix(1, 5)), 0:1)
[1] 0 1 0 1 0 1 0 1 0 1
> kronecker((1:5)^0, 0:1)
[1] 0 1 0 1 0 1 0 1 0 1
or outer
> as.vector(outer(0:1, (1:5)^0))
[1] 0 1 0 1 0 1 0 1 0 1
Solution 1: Generalized Function my_rep()
A generalized solution my_rep() for any vector x you wish repeated n times
my_rep <- function(x, n) {
return(
# Use modulo '%%' to subscript the original vector (whose length I'll call "m"), by
# cycling 'n' times through its indices.
x[0:(length(x) * n - 1) %% length(x) + 1]
# 1 2 ... m 1 2 ... m 1 2 ... m
# | 1st cycle | | 2nd cycle | ... | nth cycle |
)
}
which can solve this case
my_rep(x = 0:1, n = 5)
# [1] 0 1 0 1 0 1 0 1 0 1
and many others
# Getting cute, to make a vector of strings without using 'c()'.
str_vec <- strsplit("a b ", split = " ")[[1]]
str_vec
# [1] "a" "b" ""
my_rep(x = str_vec, n = 3)
# [1] "a" "b" "" "a" "b" "" "a" "b" ""
Solution 2: Binary Vector of Arbitrary Length
Another quick solution, for a 0 1 0 1 ... 0 1 vector of arbitrary length l
# Whatever length you desire.
l <- 10
# Generate a vector of alternating 0s and 1s, of length 'l'.
(1:l - 1) %% 2
which yields the output:
[1] 0 1 0 1 0 1 0 1 0 1
Note
Special thanks to #Adam, who figured out 0:9 %% 2 on their own, shortly after my comment with that same solution; and who gracefully retracted their initial answer in favor of mine. :)
Exploiting boolean coercion.
+(1:10*c(-1, 1) > 0)
# [1] 0 1 0 1 0 1 0 1 0 1
Or without c().
+(1:10*(0:1*2) - 1 > 0)
# [1] 0 1 0 1 0 1 0 1 0 1
Here is a way using the apply functions.
unlist(lapply(1:5, function(x) 0:1))
# [1] 0 1 0 1 0 1 0 1 0 1
Similar but with replicate.
as.vector(replicate(5, 0:1))
# [1] 0 1 0 1 0 1 0 1 0 1
And just in case you love trig.
abs(as.integer(cos((1:10 * pi) / 2)))
# [1] 0 1 0 1 0 1 0 1 0 1
And here is one last one that I consider cheating just because. This one generalizes to any vector you want!
unlist(unname(read.table(textConnection("0 1 0 1 0 1 0 1 0 1"))))
We can use purrr::accumulate, and a simple negate(!) operation.
accumulate will perform the same operation recursively over its data argument and output all intermediate results.
In this case, it can be broken down into:
output[1] <-0
output[2] <-!output[1]
output[3] <-!output[2]
...
the output would then be c(0, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE), which is coerced to numeric.
purrr::accumulate(0:9, ~!.x)
[1] 0 1 0 1 0 1 0 1 0 1
Firstly we will make a list of given no. and then apply unlist() function on list to convert it into a vector as shown in below code:
my_list = list(0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
v = unlist(my_list)
print(v)
[ 1 ] 0 1 0 1 0 1 0 1 0 1
I have a vector of a certain length of which I want to randomly replace every 2 by 0 or 1, with a probability of 0.4 (for value=1). I have used this code below. I expected to have a different value (0 or 1) for the different 2 replaced, but I have only 1 or 0 that replace the 2.
vec<-c(rep(2,18),1,0)
ifelse (vec==2,rbinom(1,1,0.40)
here is one output
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
and another output
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
When you go into the source code of ifelse via typing View(ifelse), you will see a piece of code showing that
len <- length(ans)
ypos <- which(test)
npos <- which(!test)
if (length(ypos) > 0L)
ans[ypos] <- rep(yes, length.out = len)[ypos]
if (length(npos) > 0L)
ans[npos] <- rep(no, length.out = len)[npos]
ans
That means, once you have one single value for yes or no in ifelse, that single value is repeated len times and placed to the corresponding logical positions.
In you case, rbinom(1,1,0.40) is just a single value for yes, thus being repeated once it has an realization.
One workaround is like below
> ifelse(vec == 2, rbinom(sum(vec == 2), 1, 0.40), vec)
[1] 1 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 0
This replaces all 2 values with either 0 or 1
vec[vec == 2] <- rbinom(sum(vec == 2), 1, prob = .4)
If you draw a 0 and want the value to remain 2 then you could use sample, which would be equivalent to a binomial draw:
vec[vec == 2] <- sample(c(1, 2), sum(vec == 2), prob = c(0.4, 0.6), replace = T)
Try next code:
#Code
vec<-c(rep(2,18),1,0)
vec2 <- unlist(lapply(seq(2,length(vec),by=2), function(x) {vec[x] <- rbinom(1,1,0.40)}))
vec[seq(2,length(vec),by=2)] <-vec2
Output:
vec
[1] 2 0 2 0 2 1 2 0 2 0 2 0 2 1 2 0 2 0 1 1
I have a data table and one of the columns is a bunch of 0's and 1's, just like vec below.
vec = c(rep(1, times = 6), rep(0, times = 10), rep(1, times = 11), rep(0, times = 4))
> vec
[1] 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
What I want to do is to split the data everytime there's a change in that column from 0 to 1 or vice-versa. Here is what I have done so far:
b = c(vec[1],diff(vec))
rowby = numeric(0)
for (i in 2:(length(b))) {
if (b[i] != 0) {
rowby <- c(rowby, i-1)
}
}
splitted_data <- split(vec, cumsum(c(TRUE,(1:length(vec) %in% rowby)[-length(vec)])))
There must be some thing right under my nose I can't see. What is a correct way to do this? This works for the example above, but not generally.
Try
split(vec,cumsum(c(1, abs(diff(vec)))))
#$`1`
#[1] 1 1 1 1 1 1
#$`2`
#[1] 0 0 0 0 0 0 0 0 0 0
#$`3`
#[1] 1 1 1 1 1 1 1 1 1 1 1
#$`4`
#[1] 0 0 0 0
Or use rle
split(vec,inverse.rle(within.list(rle(vec), values <- seq_along(values))))
With current versions of data.table, rleid is one function which can be used for this job:
library(data.table)#v1.9.5+
split(vec,rleid(vec))
So I have a list of coordinates that I perform a chull on.
X <- matrix(stats::rnorm(100), ncol = 2)
hpts <- chull(X)
chull would return something like "[1] 1 3 44 16 43 9 31 41". I want to then multiple X by another vector to return only the values of X that are in the result set of chull. So for example [-2.1582511,-2.1761699,-0.5796294]*[1,0,1,...] = [-2.1582511,0,-0.5796294...] would be the result. I just don't know how to populate the second vector correctly.
Y <- matrix(0, ncol = 1,nrow=50) #create a vector with nothing
# how do I fill vector y with a 1 or 0 based on the results from chull what do I do next?
X[,1] * Y
X[,2] * Y
Thanks,
To return only the values of X that are in the result set of hpts, use
> X[hpts]
## [1] 2.1186262 0.5038656 -0.4360200 -0.8511972 -2.6542077 -0.3451074 1.0771153
## [8] 2.2306497
I read it like "X such that hpts", or "the values of hpts that are in X"
Of course, these values of X are different from yours, due to my values of rnorm
To get a vector of 1s and 0s signifying results use
> Y <- ifelse(X[,1] %in% X[hpts], 1, 0)
> Y
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0
## [44] 0 1 0 0 1 0 1
I am attempting to reformat the data set my.data to obtain the output shown below the my.data2 statement. Specifically, I want to put the last 4 columns of my.data on one line per record.id, where the last four
columns of my.data will occupy columns 2-5 of the new data matrix if group=1 and columns 6-9 if group=2.
I wrote the cumbersome code below, but the double for-loop is causing an error that I simply cannot locate.
Even if the double for-loop worked, I suspect there is a much more efficient way of accomplishing the
same thing - (maybe reshape?)
Thank you for any help correcting the double for-loop or with more efficient code.
my.data <- "record.id group s1 s2 s3 s4
1 1 2 0 1 3
1 2 0 0 0 12
2 1 0 0 0 0
3 1 10 0 0 0
4 1 1 0 0 0
4 2 0 0 0 0
8 2 0 2 2 0
9 1 0 0 0 0
9 2 0 0 0 0"
my.data2 <- read.table(textConnection(my.data), header=T)
# desired output
#
# 1 2 0 1 3 0 0 0 12
# 2 0 0 0 0 0 0 0 0
# 3 10 0 0 0 0 0 0 0
# 4 1 0 0 0 0 0 0 0
# 8 0 0 0 0 0 2 2 0
# 9 0 0 0 0 0 0 0 0
Code:
dat_sorted <- sort(unique(my.data2[,1]))
my.seq <- match(my.data2[,1],dat_sorted)
my.data3 <- cbind(my.seq, my.data2)
group.min <- tapply(my.data3$group, my.data3$my.seq, min)
group.max <- tapply(my.data3$group, my.data3$my.seq, max)
# my.min <- group.min[my.data3[,1]]
# my.max <- group.max[my.data3[,1]]
my.records <- matrix(0, nrow=length(unique(my.data3$record.id)), ncol=9)
x <- 1
for(i in 1:max(my.data3$my.seq)) {
for(j in group.min[i]:group.max[i]) {
if(my.data3[x,1] == i) my.records[i,1] = i
# the two lines below seem to be causing an error
if((my.data3[x,1] == i) & (my.data3[x,3] == 1)) (my.records[i,2:5] = my.data3[x,4:7])
if((my.data3[x,1] == i) & (my.data3[x,3] == 2)) (my.records[i,6:9] = my.data3[x,4:7])
x <- x + 1
}
}
You are right, reshape helps here.
library(reshape2)
m <- melt(my.data2, id.var = c("record.id", "group"))
dcast(m, record.id ~ group + variable, fill = 0)
record.id 1_s1 1_s2 1_s3 1_s4 2_s1 2_s2 2_s3 2_s4
1 1 2 0 1 3 0 0 0 12
2 2 0 0 0 0 0 0 0 0
3 3 10 0 0 0 0 0 0 0
4 4 1 0 0 0 0 0 0 0
5 8 0 0 0 0 0 2 2 0
6 9 0 0 0 0 0 0 0 0
Comparison:
dfTest <- data.frame(record.id = rep(1:10e5, each = 2), group = 1:2,
s1 = sample(1:10, 10e5 * 2, replace = TRUE),
s2 = sample(1:10, 10e5 * 2, replace = TRUE),
s3 = sample(1:10, 10e5 * 2, replace = TRUE),
s4 = sample(1:10, 10e5 * 2, replace = TRUE))
system.time({
...# Your code
})
Error in my.records[i, 1] = i : incorrect number of subscripts on matrix
Timing stopped at: 41.61 0.36 42.56
system.time({m <- melt(dfTest, id.var = c("record.id", "group"))
dcast(m, record.id ~ group + variable, fill = 0)})
user system elapsed
25.04 2.78 28.72
Julius' answer is better, but for completeness, I think I managed to get the following for-loop to work:
dat_x <- (unique(my.data2[,1]))
my.seq <- match(my.data2[,1],dat_x)
my.data3 <- as.data.frame(cbind(my.seq, my.data2))
my.records <- matrix(0, nrow=length(unique(my.data3$record.id)), ncol=9)
my.records <- as.data.frame(my.records)
my.records[,1] = unique(my.data3[,2])
for(i in 1:9) {
if(my.data3[i,3] == 1) (my.records[my.data3[i,1],c(2:5)] = my.data3[i,c(4:7)])
if(my.data3[i,3] == 2) (my.records[my.data3[i,1],c(6:9)] = my.data3[i,c(4:7)])
}