na.pad not working in diff() function - r

For some reason the diff() functions na.pad parameter is not working properly? Anyone else having this problem or have a work around?
yo <- c(5,3,3,4,5,6,5,8,9)
diff(yo, na.pad = TRUE)
[1] -2 0 1 1 1 -1 3 1
The resulting vector should be:
[1] NA -2 0 1 1 1 -1 3 1

The function diff you use certainly comes from xts package, na.pad does not apply on base R vectors. And you also need to convert your vector to times series:
library(xts)
library(zoo)
yy = zoo(yo)
diff(yy, na.pad=TRUE)
# 1 2 3 4 5 6 7 8 9
#NA -2 0 1 1 1 -1 3 1

Related

Replace Value with the number of times a group of numbers has appeared in the vector up to that point

I have a vector like this of 1s and 0s next to each other.
vec <- c(1,1,1,0,0,1,1,1,1,0,0,0,1,1,1)
I want to replace the 1s with the number the consecutive 1s showed up so the final vector would look like the one below.
vec1 <- c(1,1,1,0,0,2,2,2,2,0,0,0,3,3,3)
I am not sure how to even start with this problem. Any help is appreciated.
One way using rle :
with(rle(vec), rep(values * cumsum(values), lengths))
#[1] 1 1 1 0 0 2 2 2 2 0 0 0 3 3 3
One base R option could be:
cumsum(vec & c(0, head(vec, -1)) == 0) * vec
[1] 1 1 1 0 0 2 2 2 2 0 0 0 3 3 3
We can use rleid from data.table
library(data.table)
vec[as.logical(vec)] <- as.integer(factor(rleid(vec)[as.logical(vec)]))
vec
#[1] 1 1 1 0 0 2 2 2 2 0 0 0 3 3 3

Estimate how many consecutives true elements there is in a vector in R

I have a really large boolean vector (i.e. T or F) and I want to simply be able to estimate how many "blocks" of consecutive T there are in my vector contained between the F elements.
A simple example of a vector with 3 of these consecutive "blocks" of T elements:
x <- c(T,T,T,T,F,F,F,F,T,T,T,T,F,T,T)
Output:
1,1,1,1,0,0,0,0,2,2,2,2,0,3,3
You can do:
rle <- rle(x)
rle$values <- with(rle, cumsum(values) * values)
inverse.rle(rle)
[1] 1 1 1 1 0 0 0 0 2 2 2 2 0 3 3
And a simplified and more elegant version of the basic idea (proposed by #Lyngbakr):
with(rle(x), rep(cumsum(values) * values, lengths))
Another solution with rle/inverse.rle:
x <- c(T,T,T,T,F,F,F,F,T,T,T,T,F,T,T)
rle_x <- rle(x)
rle_x$values[rle_x$values] <- 1:length(which(rle_x$values))
inverse.rle(rle_x)
# [1] 1 1 1 1 0 0 0 0 2 2 2 2 0 3 3

Issue in replace function in R

I have a vector:
> a <- c(0,1,2,3,4)
I am trying to replace the value of everything with that value incremented by 1, like below:
a <- (1,2,3,4,5)
> replace(a,a==4,5)
[1] 0 1 2 3 5
But when I try to replace 3 with 4, there is some issue
replace(a,a==3,4)
[1] 0 1 2 4 4
Both 3 and 5 are getting converted to 4.
and again when I try to replace 2 with 3, the same happens
> replace(a,a==2,3)
[1] 0 1 3 3 4
Can someone point out what i am doing wrong here?
replace doesn't change its argument.
> a = c(0,1,2,3,4)
> replace(a,a==2,99)
[1] 0 1 99 3 4
But a is still the same:
> a
[1] 0 1 2 3 4
so when you thought you'd converted the 4 to a 5 in a you hadn't. Use the return value if you want to change a:
> a
[1] 0 1 2 3 4
> a = replace(a,a==2,99)
> a
[1] 0 1 99 3 4
[As pointed out in comments, there are better ways to add 1 to all values of a vector, a=a+1 being the best]

Splitting one Column to Multiple R and Giving logical value if true

I am trying to split one column in a data frame in to multiple columns which hold the values from the original column as new column names. Then if there was an occurrence for that respective column in the original give it a 1 in the new column or 0 if no match. I realize this is not the best way to explain so, for example:
df <- data.frame(subject = c(1:4), Location = c('A', 'A/B', 'B/C/D', 'A/B/C/D'))
# subject Location
# 1 1 A
# 2 2 A/B
# 3 3 B/C/D
# 4 4 A/B/C/D
and would like to expand it to wide format, something such as, with 1's and 0's (or T and F):
# subject A B C D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 1
# 4 4 1 1 1 1
I have looked into tidyr and the separate function and reshape2 and the cast function but seem to getting hung up on giving logical values. Any help on the issue would be greatly appreciated. Thank you.
You may try cSplit_e from package splitstackshape:
library(splitstackshape)
cSplit_e(data = df, split.col = "Location", sep = "/",
type = "character", drop = TRUE, fill = 0)
# subject Location_A Location_B Location_C Location_D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 1
# 4 4 1 1 1 1
You could take the following step-by-step approach.
## get the unique values after splitting
u <- unique(unlist(strsplit(as.character(df$Location), "/")))
## compare 'u' with 'Location'
m <- vapply(u, grepl, logical(length(u)), x = df$Location)
## coerce to integer representation
m[] <- as.integer(m)
## bind 'm' to 'subject'
cbind(df["subject"], m)
# subject A B C D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 1
# 4 4 1 1 1 1

Cumulative sum for positive numbers only [duplicate]

This question already has answers here:
Create counter within consecutive runs of certain values
(6 answers)
Closed 1 year ago.
I have this vector :
x = c(1,1,1,1,1,0,1,0,0,0,1,1)
And I want to do a cumulative sum for the positive numbers only. I should have the following vector in return:
xc = (1,2,3,4,5,0,1,0,0,0,1,2)
How could I do it?
I've tried : cumsum(x) but that do the cumulative sum for all values and gives :
cumsum(x)
[1] 1 2 3 4 5 5 6 6 6 6 7 8
One option is
x1 <- inverse.rle(within.list(rle(x), values[!!values] <-
(cumsum(values))[!!values]))
x[x1!=0] <- ave(x[x1!=0], x1[x1!=0], FUN=seq_along)
x
#[1] 1 2 3 4 5 0 1 0 0 0 1 2
Or a one-line code would be
x[x>0] <- with(rle(x), sequence(lengths[!!values]))
x
#[1] 1 2 3 4 5 0 1 0 0 0 1 2
Here's a possible solution using data.table v >= 1.9.5 and its new rleid funciton
library(data.table)
as.data.table(x)[, cumsum(x), rleid(x)]$V1
## [1] 1 2 3 4 5 0 1 0 0 0 1 2
Base R, one line solution with Map Reduce :
> Reduce('c', Map(function(u,v) if(v==0) rep(0,u) else 1:u, rle(x)$lengths, rle(x)$values))
[1] 1 2 3 4 5 0 1 0 0 0 1 2
Or:
unlist(Map(function(u,v) if(v==0) rep(0,u) else 1:u, rle(x)$lengths, rle(x)$values))
x=c(1,1,1,1,1,0,1,0,0,0,1,1)
cumsum_ <- function(x) {
r <- rle(x)
s <- split(x, rep(seq_along(r$values), rle(x)$lengths))
return(unlist(sapply(s, cumsum), use.names = F))
}
(xc <- cumsum_(x))
# [1] 1 2 3 4 5 0 1 0 0 0 1 2
I dont know much of R but i have written a small code in Python. Logic remains the same in all language. Hope this will help you
x=[1,1,1,1,1,0,1,0,0,0,1,1]
tot=0
for i in range(0,len(x)):
if x[i]!=0:
tot=tot+x[i]
x[i]=tot
else:
tot=0
print x
x<-c(1,1,1,1,1,0,1,0,0,0,1,1)
skumulowana<-function(x) {
dl<-length(x)
xx<-numeric(dl+1)
for (i in 1:dl){
ifelse (x[i]==0,xx[i+1]<-0,xx[i+1]<-xx[i]+x[i])
}
wynik<<-xx[1:dl+1]
return (wynik)
}
skumulowana(x)
## [1] 1 2 3 4 5 0 1 0 0 0 1 2
Try this one-liner...
Reduce(function(x,y) (x+y)*(y!=0), x, accumulate=T)
split and lapply version:
x <- c(1,1,1,1,1,0,1,0,0,0,1,1)
unlist(lapply(split(x, cumsum(x==0)), cumsum))
step by step:
a <- split(x, cumsum(x==0)) # divides x into pieces where each 0 starts a new piece
b <- lapply(a, cumsum) # calculates cumsum in each piece
unlist(b) # rejoins the pieces
Result has useless names but is otherwise what you wanted:
# 01 02 03 04 05 11 12 2 3 41 42 43
# 1 2 3 4 5 0 1 0 0 0 1 2
Here is another base R solution using aggregate. The idea is to make a data frame with x and a new column named x.1 by which we can apply aggregate functions (cumsum in this case):
x <- c(1,1,1,1,1,0,1,0,0,0,1,1)
r <- rle(x)
df <- data.frame(x,
x.1=unlist(sapply(1:length(r$lengths), function(i) rep(i, r$lengths[i]))))
# df
# x x.1
# 1 1 1
# 2 1 1
# 3 1 1
# 4 1 1
# 5 1 1
# 6 0 2
# 7 1 3
# 8 0 4
# 9 0 4
# 10 0 4
# 11 1 5
# 12 1 5
agg <- aggregate(df$x~df$x.1, df, cumsum)
as.vector(unlist(agg$`df$x`))
# [1] 1 2 3 4 5 0 1 0 0 0 1 2

Resources