for example I have data like this
x<-c(0,0,1,1,1,1,0,0,1,1,0,1,1,1)
I want find the longest sequence of "1" by considering the start and end position, in this case should be (3,6)
How to do this in R
thanks all
Here's an approach that uses seqle from the "cgwtools" package:
library(cgwtools)
y <- seqle(which(x == 1))
z <- which.max(y$lengths)
y$values[z] + (sequence(y$lengths[z]) - 1)
# [1] 3 4 5 6
You can use range if you just wanted the "3" and "6".
seqle "extends rle to find and encode linear sequences".
Here's the answer as a function:
longSeq <- function(invec, range = TRUE) {
require(cgwtools)
y <- seqle(which(invec == 1))
z <- which.max(y$lengths)
out <- y$values[z] + (sequence(y$lengths[z]) - 1)
if (isTRUE(range)) range(out) else out
}
Usage would be:
x <- c(0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1)
longSeq(x)
# [1] 3 6
longSeq(x, range = FALSE)
# [1] 3 4 5 6
And, with KFB's example input:
y <- c(0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)
longSeq(y)
# [1] 9 11
You can do this easily with base R too using rle and inverse.rle combination
Creating the funciton
longSeq2 <- function(x, range = TRUE){
temp <- rle(x == 1)
temp$values <- temp$lengths == max(temp$lengths[temp$values == TRUE])
temp <- which(inverse.rle(temp))
if (isTRUE(range)) range(temp) else temp
}
Testing
x <- c(0,0,1,1,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1)
longSeq2(x)
## [1] 3 6
longSeq2(x, range = FALSE)
## [1] 3 4 5 6
y <- c(0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)
longSeq2(y)
## [1] 9 11
longSeq2(y, range = FALSE)
## [1] 9 10 11
Related
The sample data is as follows
ID <- c(1, 2, 3)
O1D1 <- c(0, 0, 0)
O1D2 <- c(0, 0, 0)
O1D3 <- c(0, 10, 0)
O2D1 <- c(0, 0, 0)
O2D2 <- c(0, 0, 0)
O2D3 <- c(18, 0, 17)
O3D1 <- c(0, 9, 0)
O3D2 <- c(20, 1, 22)
O3D3 <- c(0, 0, 0)
x <- data.frame(ID, O1D1, O1D2, O1D3, O2D1, O2D2, O2D3, O3D1, O3D2, O3D3)
I created a new column with some conditional logic.
Say, the new column is n
x$n <- (x$O1D3 > 0 & x$O2D3 == 0)
> x$n
[1] FALSE TRUE FALSE
What I am looking to get instead is a column with values such as
> x$n
[1] 0 10 0
Or, in other words, the values of O1D3 should replace TRUE values in the n column and the FALSE values can be replaced with 0.
Thanks for your time and help.
I have made a matrix with values 1 and 0, and I want to check if there is one or more rows identical to (0, 0, 0, 0, 0, 0, 0, 0, 0, 0).
How can I do this?
Here's my code so far for making the matrix:
moeda <- c(0, 1)
n <- 100
casosTotais <- 0
casosFav <- 0
caras <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0) ## the vector to compare with
matriz <- matrix(nrow = n, ncol = 10)
i <- 1
lin <- 1
col <- 1
while(i <= n * 10){
matriz[lin, col] <- sample(moeda,1)
if(col==10){
lin <- lin + 1
col <- col - 10
}
i <- i + 1
col <- col + 1
}
matriz
I will first assume a general caras with zeros and ones:
## a vector of TRUE/FALSE; TRUE means a row of `matriz` is identical to `caras`
comp <- colSums(abs(t(matriz) - caras)) == 0
Then if caras is a simply a vector of zeros:
## a vector of TRUE/FALSE; TRUE means a row of `matriz` only contains zeros
comp <- rowSums(matriz) == 0
If you want to summarize the comparison:
To know which rows of matriz are identical to caras, do which(comp).
To know if any row of matriz is identical to caras, do any(comp).
To know how many rows of matriz is identical to caras, do sum(comp).
Note: You can generate this random matrix using:
## an n x 10 random matrix of zeros and ones
matriz <- matrix(rbinom(n * 10, size = 1, prob = 0.5), ncol = 10)
I would like to determine whether or not the ranges of min and max values cross zero (0 = crossing zero, 1 = not crossing zero).
min <- c(0, -1, -1, 1, 1)
max <- c(1, 1, -0.1, 3, 1.5)
answer <- c(0, 0, 1, 1, 1)
data <- cbind(min,max, answer)
You can use the between function from dplyr:
library(dplyr)
min <- c(0, -1, -1, 1, 1)
max <- c(1, 1, -0.1, 3, 1.5)
df1 = data.frame(min,max) %>%
rowwise() %>%
mutate(answer = as.numeric(!between(0,min,max)))
Or using base R:
df1 = data.frame(min,max)
df1$answer = apply(df1, 1, function(x) as.numeric(!(x[1]<= 0 & x[2] >=0)))
Base R vectorised answer -
transform(data, answer = as.integer(!(min <= 0 & max > 0)))
# min max answer
#1 0 1.0 0
#2 -1 1.0 0
#3 -1 -0.1 1
#4 1 3.0 1
#5 1 1.5 1
If you prefer dplyr the same can be written as -
library(dplyr)
data %>% mutate(answer = as.integer(!(min <= 0 & max > 0)))
data
min <- c(0, -1, -1, 1, 1)
max <- c(1, 1, -0.1, 3, 1.5)
data <- data.frame(min,max)
You can simply multiply them, as to cross you either need a negative product or one being zero.
answer <- ifelse(min * max <= 0, 0, 1)
or
answer <- as.integer(min * max > 0)
# [1] 0 0 1 1 1
If your 0 and 1 are not a requirement, even shorter to get a TRUE or FALSE
answer <- min * max <= 0
# [1] TRUE TRUE FALSE FALSE FALSE
Say I have some sequence consisting of 2 numbers:
seq <- c(0, 1, 1, 1, 0, 0)
Assume I'd want to plot this into a graph in the following way:
My graph (x, y) starts in (0, 0) and has one straight line to (1, 0).
Then, the sequence comes in action:
If the number is a 0, I turn left with 1 coordinate, if the number is a 1, I turn right with 1 coordinate.
So for the example sequence, I start with:
(0, 0) -> (1, 0) -> (1, 1) -> (1, 2) -> (1, 1) -> (1, 0) etc.
It's better to draw this if you want a good idea of what I mean with turning left and right.
How would I get these points into a plot? Any tips?
Plot example of the sequence:
x = c(0, 1, 1, 1, 0, 0)
m = cbind(x = c(0, 1),
y = c(0, 0))
flag_xy = 1 #Track whether to add to x- or y- coordinate
for (i in x){
flag_direction = diff(tail(m, 2)) #Track which way the line is facing
if (i == 0){
if (flag_xy == 1){
m = rbind(m, tail(m, 1) + c(0, flag_direction[,1] * 1))
} else{
m = rbind(m, tail(m, 1) + c(flag_direction[,2] * -1, 0))
}
flag_xy = flag_xy * -1
} else{
if (flag_xy == 1){
m = rbind(m, tail(m, 1) + c(0, flag_direction[,1] * -1))
} else{
m = rbind(m, tail(m, 1) + c(flag_direction[,2]* 1, 0))
}
flag_xy = flag_xy * -1
}
}
graphics.off()
plot(m, asp = 1)
lines(m)
m
# x y
# 0 0
# 1 0
#[2,] 1 1
#[2,] 2 1
#[2,] 2 0
#[2,] 1 0
#[2,] 1 -1
#[2,] 2 -1
Given two vectors of integers:
X <- c(0, 201, 0, 0, 160, 0, 0, 0, 15, 80)
Y <- c(0, 0, 0, 0, 1, 4, 42, 10, 19, 0)
I want to calculate the probability p1 = P(X10 > X11), where X10 is a variable with a conditional distribution of X given that Y = 0, and X11 is a variable with a conditional distribution of X given that Y > 0. (This problem is motivated by a desire to implement equation 8 from RS Pimentel et al. 2015, Stat Prob Lett 96:61-67.)
For two pairs of vectors, I can simply calculate:
N <- length(X)
X10 <- X
X10[Y > 0] <- 0
X11 <- X
X11[Y == 0] <- 0
p1 <- sum(X10 > X11) / N
However, I now want to calculate p1 for all pairs of columns in an integer matrix:
Z <- c(0, 0, 0, 0, 0, 1, 0, 1, 8, 0)
matrix(c(X, Y, Z), ncol = 3)
I am not interested in the diagonal.
The desired output is therefore:
[,1] [,2] [,3]
[1,] 0.2 0.3
[2,] 0.2
[3,]
How can I write a function that will calculate p1 for all pairs of columns in the matrix?
You can create a custom function to compute your probability, then apply it to each combination of columns:
p1 <- function(x, y) {
x10 <- x
x10[y > 0] <- 0
x11 <- x
x11[y == 0] <- 0
mean(x10 > x11)
}
combinations <- t(combn(ncol(M), 2))
# create a matrix of NAs, fill the appropriate values
result <- matrix(NA, nrow = ncol(M), ncol = ncol(M))
result[combinations] <- apply(combinations, 1, function(r) p1(M[, r[1]], M[, r[2]]))