Suppose I have a vector, say:
x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0,1,1,1,1,0,1,1,0,1,0,0,0,0,0,1,0,1,0,1,0)
and I would like to obtain a vector that sums the values that falls between two zeros, i.e. the output should look like:
y = c(1,2,4,1,1,1)
Note that all ones should have zero at the beginning and zero at the end, otherwise it will not be counted. so the string 01010 only produce 1.
I tried to use run length with an index of zeros.
Thanks in Advance
sum.between.zeroes <- function(x) {
library(stringr)
x.str <- paste(x, collapse = "")
nchar(str_extract_all(x.str, "01+0")[[1]]) - 2L
}
sum.between.zeroes(c(1,0,1,0,0,0,1,1,0,0,1,1,1,1,0,1,1,0,1,0,0,0,0,0,1,0,1,0,1,0))
# [1] 1 2 4 1 1 1
sum.between.zeroes(c(0,1,0,1,0))
# [1] 1
sum.between.zeroes(c(1,1))
# integer(0)
If you want to remain within the base package, you can use gregexpr and regmatches:
sum.between.zeroes <- function(x) {
x.str <- paste(x, collapse = "")
nchar(regmatches(x.str, gregexpr("01+0", x.str))[[1]]) - 2L
}
Related
Suppose x is a real number, or a vector. i is valued-False. Then x[i] will return numeric(0). I would like to treat this as a real number 0, or integer 0, which are both fit for addition.
numeric(0) added to any real number will return numeric(0), whereas I wish to obtain the real number being added as the result. What can I do to convert the numeric (0) value? Thanks in advance!
It is only when we do the +, it is a problem. This can be avoided if we use sum
sum(numeric(0), 5)
#[1] 5
sum(numeric(0), 5, 10)
#[1] 15
Or if we need to use +, an easy option is to concatenate with 0, select the first element. If the element is numeric(0), that gets replaced by 0, for other cases, the first element remain intact
c(numeric(0), 0)[1]
#[1] 0
Using a small example
lst1 <- list(1, 3, numeric(0), 4, numeric(0))
out <- 0
for(i in seq_along(lst1)) {
out <- out + c(lst1[[i]], 0)[1]
}
out
#[1] 8
You can use max/min with 0 to get 0 as output when input is numeric(0).
x <- 1:10
max(x[FALSE], 0)
#[1] 0
min(x[FALSE], 0)
#[1] 0
I would like to know the starting index of a vector in another vector. For example, for c(1, 1) and c(1, 0, 0, 1, 1, 0, 1) it would be 4.
What is important I want to look for exactly the same vector. Thus, for c(1, 1) inside c(1, 0, 1, 1, 1, 0) it is FALSE as c(1, 1) != c(1, 1, 1).
For now I am checking if the short vector is contained in the long like this:
any(with(rle(longVec), lengths[as.logical(values)]) == length(shortVec)
But I don't know how to determine the index of it...
This function should work:
my_function <- function(x, find) {
# we create two matrix from rle function
m = matrix(unlist(rle(x)), nrow=2, byrow = T)
n = matrix(unlist(rle(find)), nrow=2, byrow = T)
# for each column in m we see if its equal to n
temp_bool = apply(m, 2, function(x) x == n) # this gives a matrix of T/F
# then we simply sum by columns, if we have at least a 2 it means that we found (1,1) at least once
temp_bool = apply(temp_bool, 2, sum)
# updated part
if (any(temp_bool==2)) {
return(position = which(temp_bool==2)+1)
} else {
return(position = FALSE)
}
}
my_function(x, find)
#[1] 4
my_function(y, find)
#[1] FALSE
To make it more clear here I show the results from those two apply:
apply(m, 2, function(x) x == n)
# [,1] [,2] [,3] [,4] [,5]
# [1,] FALSE TRUE TRUE FALSE FALSE
# [2,] TRUE FALSE TRUE FALSE TRUE # TRUE-TRUE on column 3 is what we seek
apply(temp_bool, 2, sum)
#[1] 1 1 2 0 1
Example data:
x <- c(1,0,0,1,1,0,1)
y <- c(1,0,1,1,1,0)
find <- c(1,1) # as pointed this needs to be a pair of the same number
Assuming that shortVec contains only ones and longVec contains only zeros and ones use rle and rep to create a vector lens the same length as longVec such that each element in each run is replaced by that run's length. Then multiply that by longVec to zero out the elements corresponding to 0 in longVec. Now return the indices corresponding to elements equal to length(shortVec) and take the first.
lookup <- function(shortVec, longVec) {
lens <- with(rle(longVec), rep(lengths, lengths))
which(lens * longVec == length(shortVec))[1]
}
lookup(c(1,1), c(1, 0, 0, 1, 1, 0, 1))
## [1] 4
lookup(c(1,1), c(1, 0, 0, 1, 1, 1, 0, 1))
## [1] NA
This works for the examples below.
a <- c(1,1)
b <- c(1,0,1,1,0,0)
c <- c(1,0,1,1,1,0)
f <- function(x, y) {
len.x <- length(x)
len.y <- length(y)
for(i in 1:(len.y - (len.x - 1))) {
if(identical(y[i:(i + (len.x - 1))], x)){
if(y[i + len.x] != x[len.x] & y[i - 1] != x[1]) {return(TRUE)}
}
}
return(FALSE)
}
f(a, b)
# TRUE
f(a, c)
# FALSE
In R, you can define an arbitrary integer sequence using :, e.g.
a = c(1:3, 12:14)
print(a)
## 1 2 3 12 13 14
I'm looking for a way to do the inverse operation, e.g. given a vector of integers I want to produce a character (or character vector) that collapses the integer sequence(s) to the equivalent expressions using :, e.g.
some_function (a)
## "1:3" "12:14"
Bonus if the stride can be detected, e.g. c(1, 3, 5) becomes "1:2:5" or something like that.
Motivation: generate an integer sequence in R based on some data manipulation to identify database row selection, and pass the most concise representation of that sequence to an external program in the proper format.
We can be able to take into consideration the rle of the differences and paste the range together taking into consideration the sequence distance.
fun=function(s){
m=c(0,diff(s))
b=rle(m)
b$values[b$lengths==1&b$values!=1]=0
l=cumsum(!inverse.rle(b))
d=function(x)paste0(range(x[,1]),
collapse = paste0(":",unique(x[-1,-1]),":"))
f=c(by(cbind(s,m),l,d))
sub("::.*","",sub(":1:",":",f))
}
fun(c(1,1:3,12:14,c(1,3,5)))
1 2 3 4
"1" "1:3" "12:14" "1:2:5"
fun(c(1, 3, 5, 8:10, 14, 17, 20))
1 2 3
"1:2:5" "8:10" "14:3:20"
fun(1)
1
"1"
Ah, nerd heaven. Here's a first shot. You could even use this for encoding within R.
Needs testing; code always prints the stride out.
encode_ranges <- function (x) {
rle_diff <- list(
start = x[1],
rled = rle(diff(x))
)
class(rle_diff) <- "rle_diff"
rle_diff
}
decode_ranges <- function (x) {
stopifnot(inherits(x, "rle_diff"))
cumsum(c(x$start, inverse.rle(x$rled)))
}
format.rle_diff <- function (x, ...) {
stopifnot(inherits(x, "rle_diff"))
output <- character(length(x$rled$values))
start <- x$start
for (j in seq_along(x$rled$values)) {
stride <- x$rled$values[j]
len <- x$rled$lengths[j]
if (len == 1L) {
start <- end + stride
next
}
end <- start + stride * x$rled$lengths[j]
output[j] <- paste(start, end, stride, sep = ":")
}
output <- output[nchar(output) > 0]
paste(output, collapse = ", ")
}
print.rle_diff <- function (x, ...) cat(format(x, ...))
encode_ranges(c(1:3, 12:14))
encode_ranges(c(1, 3, 5, 8:10, 14, 17, 20))
We create a grouping variable with diff and cumsum, then use on the group by functions to paste the range of values
f1 <- function(vec) {
unname(tapply(vec, cumsum(c(TRUE, diff(vec) != 1)),
FUN = function(x) paste(range(x), collapse=":")))
}
f1(a)
#[1] "1:3" "12:14"
For the second case
b <- c(1, 3, 5)
un1 <- unique(diff(c(1, 3, 5)))
paste(b[1], un1, b[length(b)], sep=":")
#[1] "1:2:5"
I have created a function that essentially creates a vector of a 1000 binary values. I have been able to count the longest streak of consecutive 1s by using rle.
I was wondering how to find a specific vector (say c(1,0,0,1)) in this larger vector? I would want it to return the amount of occurrences of that vector. So c(1,0,0,1,1,0,0,1) should return 2, while c(1,0,0,0,1) should return 0.
Most solutions that I have found just find whether a sequence occurs at all and return TRUE or FALSE, or they give results for the individual values, not the specific vector that is specified.
Here's my code so far:
# creates a function where a 1000 people choose either up or down.
updown <- function(){
n = 1000
X = rep(0,n)
Y = rbinom(n, 1, 1 / 2)
X[Y == 1] = "up"
X[Y == 0] = "down"
#calculate the length of the longest streak of ups:
Y1 <- rle(Y)
streaks <- Y1$lengths[Y1$values == c(1)]
max(streaks, na.rm=TRUE)
}
# repeat this process n times to find the average outcome.
longeststring <- replicate(1000, updown())
longeststring(p_vals)
This will also work:
library(stringr)
x <- c(1,0,0,1)
y <- c(1,0,0,1,1,0,0,1)
length(unlist(str_match_all(paste(y, collapse=''), '1001')))
[1] 2
y <- c(1,0,0,0,1)
length(unlist(str_match_all(paste(y, collapse=''), '1001')))
[1] 0
If you want to match overlapped patterns,
y <- c(1,0,0,1,0,0,1) # overlapped
length(unlist(gregexpr("(?=1001)",paste(y, collapse=''),perl=TRUE)))
[1] 2
Since Y is only 0s and 1s, we can paste it into a string and use regex, specifically gregexpr. Simplified a bit:
set.seed(47) # for reproducibility
Y <- rbinom(1000, 1, 1 / 2)
count_pattern <- function(pattern, x){
sum(gregexpr(paste(pattern, collapse = ''),
paste(x, collapse = ''))[[1]] > 0)
}
count_pattern(c(1, 0, 0, 1), Y)
## [1] 59
paste reduces the pattern and Y down to strings, e.g. "1001" for the pattern here, and a 1000-character string for Y. gregexpr searches for all occurrences of the pattern in Y and returns the indices of the matches (together with a little more information so they can be extracted, if one wanted). Because gregexpr will return -1 for no match, testing for numbers greater than 0 will let us simply sum the TRUE values to get the number of macthes; in this case, 59.
The other sample cases mentioned:
count_pattern(c(1,0,0,1), c(1,0,0,1,1,0,0,1))
## [1] 2
count_pattern(c(1,0,0,1), c(1,0,0,0,1))
## [1] 0
I have a vector like this:
x <- c(0, 0, 0, 0, 4, 5, 0, 0, 3, 2, 7, 0, 0, 0)
I want to keep only the elements from position 5 to 11. I want to delete the zeroes in the start and end. For this vector it is quite easy since it is small.
I have very large data and need something in general for all vectors.
Try this:
x[ min( which ( x != 0 )) : max( which( x != 0 )) ]
Find index for all values that are not zero, and take the first -min and last - max to subset x.
You can try something like:
x=c(0,0,0,0,4,5,0,0,3,2,7,0,0,0)
rl <- rle(x)
if(rl$values[1] == 0)
x <- tail(x, -rl$lengths[1])
if(tail(rl$values,1) == 0)
x <- head(x, -tail(rl$lengths,1))
x
## 4 5 0 0 3 2 7
Hope it helps,
alex
This would also work :
x[cumsum(x) & rev(cumsum(rev(x)))]
# [1] 4 5 0 0 3 2 7
I would probably define two functions, and compose them:
trim_leading <- function(x, value=0) {
w <- which.max(cummax(x != value))
x[seq.int(w, length(x))]
}
trim_trailing <- function(x, value=0) {
w <- which.max(cumsum(x != value))
x[seq.int(w)]
}
And then pipe your data through:
x %>% trim_leading %>% trim_trailing