I have a simple question, that I can't seem to find the answer to on google, stackoverflow or stackexchange. I am currently working with the example of rollapply to find the sum of some values that contain NA's. For example:
z <- zoo(c(NA, NA, NA, NA,2, 3, 4, 5, NA))
rollapply(z, 3, sum, na.rm = TRUE, align = "right")
This outputs:
3 4 5 6 7 8 9
0 0 2 5 9 12 9
This looks good, however, there are two times where there are 3 NA's in a row. The sum feature exchanges NA's to 0's. Unfortunately, that won't work with the data I'm going to work with since 0 is a meaningful value. Is there a way to replace the 0's with NA's again?
I'm looking for an output as below:
3 4 5 6 7 8 9
NA NA 2 5 9 12 9
Thank you in advance!
You can do the following (even though not very nice)
require(zoo)
z <- zoo(c(NA, NA, NA, NA,2, 3, 4, 5, NA))
tmp <- rollapply(z, 3, sum, na.rm = TRUE, align = "right")
tmp[is.na(z)[-2:-1] & tmp == 0] <- NA
tmp
so you assign NA wherever z is na and there is a NA produced by rollapply
which gives you:
> tmp
3 4 5 6 7 8 9
NA NA 2 5 9 12 9
Here are two approaches:
1) Note that it is not that rollapply is giving 0 but that sum(x, na.rm = TRUE) is giving 0. The function sum(x, na.rm = TRUE) is not really what is desired here.
Instead, provide a version of sum that works in the way needed, i.e. it returns NA when the input is entirely NA values and returns sum(x, na.rm = TRUE) otherwise.
sum_na <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
rollapplyr(z, 3, sum_na)
2) Alternately, use your code and fix it up afterwards by replacing any position that whose input was all NA with an NA:
zz <- rollapplyr(z, 3, sum, na.rm = TRUE)
zz[rollapply(is.na(z), 3, all)] <- NA
giving:
> zz
3 4 5 6 7 8 9
NA NA 2 5 9 12 9
Related
I want to add empty rows at specific positions of a dataframe. Let's say we have this dataframe:
df <- data.frame(var1 = c(1,2,3,4,5,6,7,8,9),
var2 = c(9,8,7,6,5,4,3,2,1))
In which I want to add an empty row after rows 1, 3 and 5 (I know that this is not best practice in most cases, ultimately I want to create a table using flextable here). These row numbers are saved in a vector:
rows <- c(1,3,5)
Now I want to use a for loop that loops through the rows vector to add an empty row after each row using add_row():
for (i in rows) {
df <- add_row(df, .after = i)
}
The problem is, that while the first iteration works flawlessly, the other empty rows get misplaced, since the dataframe gets obviously longer. To fix this I tried adding 1 to the vector after each iteration:
for (i in rows) {
df <- add_row(df, .after = i)
rows <- rows+1
}
Which does not work. I assume the rows vector does only get evaluated once. Anyone got any ideas?
Do it all at once, no need for looping. Make a sequence of row numbers, add the new rows in, sort, then replace the duplicated row numbers with NA:
s <- sort(c(seq_len(nrow(df)), rows))
out <- df[s,]
out[duplicated(s),] <- NA
# var1 var2
#1 1 9
#1.1 NA NA
#2 2 8
#3 3 7
#3.1 NA NA
#4 4 6
#5 5 5
#5.1 NA NA
#6 6 4
#7 7 3
#8 8 2
#9 9 1
This will be much more efficient than looping or loop-like code, for even moderately sized data:
df <- df[rep(1:9,1e4),]
rows <- seq(1,9e4,100)
system.time({
s <- sort(c(seq_len(nrow(df)), rows))
out <- df[s,]
out[duplicated(s),] <- NA
})
# user system elapsed
# 0.01 0.00 0.02
df <- df[rep(1:9,1e4),]
rows <- seq(1,9e4,100)
system.time({
Reduce(function(x, y) tibble::add_row(x, .after = y), rev(rows), init = df)
})
# user system elapsed
# 26.03 0.00 26.03
df <- df[rep(1:9,1e4),]
rows <- seq(1,9e4,100)
system.time({
for (i in rev(rows)) {
df <- tibble::add_row(df, .after = i)
}
})
# user system elapsed
# 25.05 0.00 25.04
You could achieve your result by looping in the reverse direction:
df <- data.frame(
var1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
var2 = c(9, 8, 7, 6, 5, 4, 3, 2, 1)
)
rows <- c(1, 3, 5)
for (i in rev(rows)) {
df <- tibble::add_row(df, .after = i)
}
df
#> var1 var2
#> 1 1 9
#> 2 NA NA
#> 3 2 8
#> 4 3 7
#> 5 NA NA
#> 6 4 6
#> 7 5 5
#> 8 NA NA
#> 9 6 4
#> 10 7 3
#> 11 8 2
#> 12 9 1
I built this custom "winsorize" function that does what it should, unless there are NA's in the data.
How it works:
winsor1 <- function(x, probability){
numWin <- ceiling(length(x)*probability)
# Replace first lower, then upper
x <- pmax(x, sort(x)[numWin+1])
x <- pmin(x, sort(x)[length(x)-numWin])
return(x)
}
x <- 0:10
winsor1(x, probability=0.01)
[1] 1 1 2 3 4 5 6 7 8 9 9
So it replaces the top (and bottom) 1% of the data (rounded up to the next value, since there are only 11 values in the example). If there are, e.g., 250 values then the bottom 3 and top 3 values would be replaced by the bottom 4th and top 4th respectively.
The whole thing breaks down when there are NA's in the data, causing an error. However, if I set na.rm = TRUE in the pmax() and pmin() then the NA's themselves are replaced by the bottom value.
x[5] <- NA
winsor1(x, probability=0.01)
[1] 1 1 2 3 1 5 6 7 8 9 9
What can I do so that the NA's are preserved but do not cause an error? This is the output I want for the last line:
winsor1(x, probability=0.01)
[1] 1 1 2 3 NA 5 6 7 8 9 9
The issue is with sort as it removes the NA by default or else we have to specify na.last = TRUE which may also not be the case we need. One option is order
winsor1 <- function(x, probability){
numWin <- ceiling(length(x)*probability)
# Replace first lower, then upper
x1 <- x[order(x)]
x <- pmax(x, x1[numWin+1])
x1 <- x1[order(x1)]
x <- pmin(x, x1[length(x)-numWin], na.rm = TRUE)
return(x)
}
-testing
x <- 0:10
winsor1(x, probability=0.01)
#[1] 1 1 2 3 4 5 6 7 8 9 9
x[5] <- NA
winsor1(x, probability=0.01)
#[1] 1 1 2 3 NA 5 6 7 8 9 10
or with na.last in sort
winsor1 <- function(x, probability){
numWin <- ceiling(length(x)*probability)
# Replace first lower, then upper
x <- pmax(x, sort(x, na.last = TRUE)[numWin+1])
x <- pmin(x, sort(x, na.last = TRUE)[length(x)-numWin], na.rm = TRUE)
return(x)
}
Returning values after last NA in a vector
I can remove all NA values from a vector
v1 <- c(1,2,3,NA,5,6,NA,7,8,9,10,11,12)
v2 <- na.omit(v1)
v2
but how do I return a vector with values only after the last NA
c( 7,8,9,10,11,12)
Thank you for your help.
You could detect the last NA with which and add 1 to get the index past the last NA and index until the length(v1):
v1[(max(which(is.na(v1)))+1):length(v1)]
[1] 7 8 9 10 11 12
Here’s an alternative solution that does not use indices and only vectorised operations:
after_last_na = as.logical(rev(cumprod(rev(! is.na(v1)))))
v1[after_last_na]
The idea is to use cumprod to fill the non-NA fields from the last to the end. It’s not a terribly useful solution in its own right (I urge you to use the more obvious, index range based solution from other answers) but it shows some interesting techniques.
You could detect the last NA with which
v1[(tail(which(is.na(v1)), 1) + 1):length(v1)]
# [1] 7 8 9 10 11 12
However, the most general - as #MrFlick pointed out - seems to be this:
tail(v1, -tail(which(is.na(v1)), 1))
# [1] 7 8 9 10 11 12
which also handles the following case correctly:
v1[13] <- NA
tail(v1, -tail(which(is.na(v1)), 1))
# numeric(0)
To get the null NA case, too,
v1 <- 1:13
we can do
if (any(is.na(v1))) tail(v1, -tail(which(is.na(v1)), 1)) else v1
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
Data
v1 <- c(1, 2, 3, NA, 5, 6, NA, 7, 8, 9, 10, 11, 12)
v1 <- c(1,2,3,NA,5,6,NA,7,8,9,10,11,12)
v1[seq_along(v1) > max(0, tail(which(is.na(v1)), 1))]
#[1] 7 8 9 10 11 12
v1 = 1:5
v1[seq_along(v1) > max(0, tail(which(is.na(v1)), 1))]
#[1] 1 2 3 4 5
v1 = c(1:5, NA)
v1[seq_along(v1) > max(0, tail(which(is.na(v1)), 1))]
#integer(0)
The following will do what you want.
i <- which(is.na(v1))
if(i[length(i)] < length(v1)){
v1[(i[length(i)] + 1):length(v1)]
}else{
NULL
}
#[1] 7 8 9 10 11 12
Can you explain me, why, when i fill vector in R with sequence, i ve got this result:
sekv <- seq(from = 1, to = 20, by = 2)
test <- c()
for (j in sekv) {
test[j] = j
}
test
[1] 1 NA 3 NA 5 NA 7 NA 9 NA 11 NA 13 NA 15 NA 17 NA 19
I want make a vector, which i can fill with some sequence and use it in loop, but only with values, not with NA values. Can somebody help me ?
The actual issue is that whenever your are trying to assign value to test[j] it using value of j to change the size of the vector. The last value of j is 19 hence size of test is set to 19. But you have assigned value only to few indexes i.e. 1, 3, 5 etc. Rest of values are set to NA.
You can solve it by executing for loop only for number of items available in sekv.
sekv <- seq(from = 1, to = 20, by = 2)
test <- c()
for (j in 1:length(sekv)) {
test[j] = sekv[j]
}
print(test)
In sekv you currently pass 2.. 4 ..6, use seq_along
sekv <- seq(from = 1, to = 20, by = 2)
test <- c()
for (j in seq_along(sekv)) {
test[j] = j
}
test
The reason why you have NAs in the test vector is that you assign the first, third, fifth, seventh, etc element in the test vector to be 1, 3, 5, 7 etc...
I am still not very clear what you are trying to do, but one solution is that you can remove all the NAs after the for loop.
sekv <- seq(from = 1, to = 20, by = 2)
test <- c()
for (j in sekv) {
test[j] = j
}
test
# [1] 1 NA 3 NA 5 NA 7 NA 9 NA 11 NA 13 NA 15 NA 17 NA 19
test <- test[-c(which(is.na(test)))]
test
# [1] 1 3 5 7 9 11 13 15 17 19
As #PoGibas suggested, this also works if you want to remove NAs:
test <- na.omit(test)
test
# [1] 1 3 5 7 9 11 13 15 17 19
I was wondering if anyone had a quick and dirty solution to the following problem, I have a matrix that has rows of NAs and I would like to replace the rows of NAs with the previous row (if it is not also a row of NAs).
Assume that the first row is not a row of NAs
Thanks!
Adapted from an answer to this question: Idiomatic way to copy cell values "down" in an R vector
f <- function(x) {
idx <- !apply(is.na(x), 1, all)
x[idx,][cumsum(idx),]
}
x <- data.frame(a=c(1, 2, NA, 3, NA, NA), b=c(4, 5, NA, 6, NA, 7))
> x
a b
1 1 4
2 2 5
3 NA NA
4 3 6
5 NA NA
6 NA 7
> f(x)
a b
1 1 4
2 2 5
2.1 2 5
4 3 6
4.1 3 6
6 NA 7
Trying to think of times you may have two all NA rows in a row.
#create a data set like you discuss (in the future please do this yourself)
set.seed(14)
x <- matrix(rnorm(10), nrow=2)
y <- rep(NA, 5)
v <- do.call(rbind.data.frame, sample(list(x, x, y), 10, TRUE))
One approach:
NArows <- which(apply(v, 1, function(x) all(is.na(x)))) #find all NAs
notNA <- which(!seq_len(nrow(v)) %in% NArows) #find non NA rows
rep.row <- sapply(NArows, function(x) tail(notNA[x > notNA], 1)) #replacement rows
v[NArows, ] <- v[rep.row, ] #assign
v #view
This would not work if your first row is all NAs.
You can always use a loop, here assuming that 1 is not NA as indicated:
fill = data.frame(x=c(1,NA,3,4,5))
for (i in 2:length(fill)){
if(is.na(fill[i,1])){ fill[i,1] = fill[(i-1),1]}
}
If m is your matrix, this is your quick and dirty solution:
sapply(2:nrow(m),function(i){ if(is.na(m[i,1])) {m[i,] <<- m[(i-1),] } })
Note it uses the ugly (and dangerous) <<- operator.
Matthew's example:
x <- data.frame(a=c(1, 2, NA, 3, NA, NA), b=c(4, 5, NA, 6, NA, 7))
na.rows <- which( apply( x , 1, function(z) (all(is.na(z)) ) ) )
x[na.rows , ] <- x[na.rows-1, ]
x
#---
a b
1 1 4
2 2 5
3 2 5
4 3 6
5 3 6
6 NA 7
Obviously a first row with all NA's would present problems.
Here is a straightforward and conceptually perhaps the simplest one-liner:
x <- data.frame(a=c(1, 2, NA, 3, NA, NA), b=c(4, 5, NA, 6, NA, 7))
a b
1 1 4
2 2 5
3 NA NA
4 3 6
5 NA NA
6 NA 7
x1<-t(sapply(1:nrow(x),function(y) ifelse(is.na(x[y,]),x[y-1,],x[y,])))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 2 5
[4,] 3 6
[5,] 3 6
[6,] NA 7
To put the column names back, just use colnames(x1)<-colnames(x)