I was wondering if anyone had a quick and dirty solution to the following problem, I have a matrix that has rows of NAs and I would like to replace the rows of NAs with the previous row (if it is not also a row of NAs).
Assume that the first row is not a row of NAs
Thanks!
Adapted from an answer to this question: Idiomatic way to copy cell values "down" in an R vector
f <- function(x) {
idx <- !apply(is.na(x), 1, all)
x[idx,][cumsum(idx),]
}
x <- data.frame(a=c(1, 2, NA, 3, NA, NA), b=c(4, 5, NA, 6, NA, 7))
> x
a b
1 1 4
2 2 5
3 NA NA
4 3 6
5 NA NA
6 NA 7
> f(x)
a b
1 1 4
2 2 5
2.1 2 5
4 3 6
4.1 3 6
6 NA 7
Trying to think of times you may have two all NA rows in a row.
#create a data set like you discuss (in the future please do this yourself)
set.seed(14)
x <- matrix(rnorm(10), nrow=2)
y <- rep(NA, 5)
v <- do.call(rbind.data.frame, sample(list(x, x, y), 10, TRUE))
One approach:
NArows <- which(apply(v, 1, function(x) all(is.na(x)))) #find all NAs
notNA <- which(!seq_len(nrow(v)) %in% NArows) #find non NA rows
rep.row <- sapply(NArows, function(x) tail(notNA[x > notNA], 1)) #replacement rows
v[NArows, ] <- v[rep.row, ] #assign
v #view
This would not work if your first row is all NAs.
You can always use a loop, here assuming that 1 is not NA as indicated:
fill = data.frame(x=c(1,NA,3,4,5))
for (i in 2:length(fill)){
if(is.na(fill[i,1])){ fill[i,1] = fill[(i-1),1]}
}
If m is your matrix, this is your quick and dirty solution:
sapply(2:nrow(m),function(i){ if(is.na(m[i,1])) {m[i,] <<- m[(i-1),] } })
Note it uses the ugly (and dangerous) <<- operator.
Matthew's example:
x <- data.frame(a=c(1, 2, NA, 3, NA, NA), b=c(4, 5, NA, 6, NA, 7))
na.rows <- which( apply( x , 1, function(z) (all(is.na(z)) ) ) )
x[na.rows , ] <- x[na.rows-1, ]
x
#---
a b
1 1 4
2 2 5
3 2 5
4 3 6
5 3 6
6 NA 7
Obviously a first row with all NA's would present problems.
Here is a straightforward and conceptually perhaps the simplest one-liner:
x <- data.frame(a=c(1, 2, NA, 3, NA, NA), b=c(4, 5, NA, 6, NA, 7))
a b
1 1 4
2 2 5
3 NA NA
4 3 6
5 NA NA
6 NA 7
x1<-t(sapply(1:nrow(x),function(y) ifelse(is.na(x[y,]),x[y-1,],x[y,])))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 2 5
[4,] 3 6
[5,] 3 6
[6,] NA 7
To put the column names back, just use colnames(x1)<-colnames(x)
Related
There are multiple ways to fill missing values in R. However, I can't find a solution for filling just the last n NAs.
Available options:
na_vector <- c(1, NA, NA, NA, 2, 3, NA, NA)
library(zoo)
na.locf(na_vector)
# Outputs: [1] 1 1 1 1 2 3 3 3
na.locf0(na_vector, maxgap = 2)
# Outputs: [1] 1 NA NA NA 2 3 3 3
How I would like it to be:
na_vector <- c(1, NA, NA, NA, 2, 3, NA, NA)
fill_na <- function(vector, n){
...
}
fill_na(na_vector, n = 1)
# Outputs: [1] 1 1 NA NA 2 3 3 NA
fill_na(na_vector, n = 2)
# Outputs: [1] 1 1 1 NA 2 3 3 3
Here is an option to get those outputs using dplyr and recursion:
na_vector <- c(1, NA, NA, NA, 2, 3, NA, NA)
fill_na <- function(vector, n){
if (n == 0) {
vector
} else {
fill_na(
vector = dplyr::coalesce(vector, dplyr::lag(vector)),
n = n - 1
)
}
}
fill_na(na_vector, n = 1)
# [1] 1 1 NA NA 2 3 3 NA
fill_na(na_vector, n = 2)
# [1] 1 1 1 NA 2 3 3 3
Number the NA's in each consecutive run of NA's giving a and then only fill in those with a number less than or equal to n. This uses only vector operations internally and no iteration or recursion.
library(collapse)
library(zoo)
fill_na <- function(x, n) {
a <- ave(x, groupid(is.na(x)), FUN = seq_along)
ifelse(a <= n, na.locf0(x), x)
}
fill_na(na_vector, 1)
## [1] 1 1 NA NA 2 3 3 NA
fill_na(na_vector, 2)
## [1] 1 1 1 NA 2 3 3 3
Here is a solution to impute everything except the last n NA's based on base R + imputeTS.
library(imputeTS)
na_vector <- c(1, NA, NA, NA, 2, 3, NA, NA)
# The function that allows imputing everything except the last n NAs
fill_except_last_n_na <- function(x, n) {
index <- which(rev(cumsum(rev(as.numeric(is.na(x))))) == n+1)
x[1:tail(index,1)] <- na_locf(x[1:tail(index,1)])
return(x)
}
# Call the new function
fill_except_last_n_na(na_vector,2)
## Result
[1] 1 1 1 1 2 3 NA NA
When you want to use another imputation option than last observation carried forward, you can just replace the na_locf with na_ma (moving average), na_interpolation (interpolation), na_kalman (Kalman Smooting on State Space Models) or other imputation function provided by the imputeTS package (see also in the imputeTS documentation for a list of functions.
I've got a dataset
>view(interval)
# V1 V2 V3 ID
# 1 NA 1 2 1
# 2 2 2 3 2
# 3 3 NA 1 3
# 4 4 2 2 4
# 5 NA 5 1 5
>dput(interval)
structure(list(V1 = c(NA, 2, 3, 4, NA),
V2 = c(1, 2, NA, 2, 5),
V3 = c(2, 3, 1, 2, 1), ID = 1:5), row.names = c(NA, -5L), class = "data.frame")
I would like to extract the previous not NA value (or the next, if NA is in the first row) for every row, and store it as a local variable in a custom function, because I have to perform other operations on every row based on this value(which should change for every row i'm applying the function).
I've written this function to print the local variables, but when I apply it the output is not what I want
myFunction<- function(x){
position <- as.data.frame(which(is.na(interval), arr.ind=TRUE))
tempVar <- ifelse(interval$ID == 1, interval[position$row+1,
position$col], interval[position$row-1, position$col])
return(tempVar)
}
I was expecting to get something like this
# [1] 2
# [2] 2
# [3] 4
But I get something pretty messed up instead.
Here's attempt number 1:
dat <- read.table(header=TRUE, text='
V1 V2 V3 ID
NA 1 2 1
2 2 3 2
3 NA 1 3
4 2 2 4
NA 5 1 5')
myfunc1 <- function(x) {
ind <- which(is.na(x), arr.ind=TRUE)
# since it appears you want them in row-first sorted order
ind <- ind[order(ind[,1], ind[,2]),]
# catch first-row NA
ind[,1] <- ifelse(ind[,1] == 1L, 2L, ind[,1] - 1L)
x[ind]
}
myfunc1(dat)
# [1] 2 2 4
The problem with this is when there is a second "stacked" NA:
dat2 <- dat
dat2[2,1] <- NA
dat2
# V1 V2 V3 ID
# 1 NA 1 2 1
# 2 NA 2 3 2
# 3 3 NA 1 3
# 4 4 2 2 4
# 5 NA 5 1 5
myfunc1(dat2)
# [1] NA NA 2 4
One fix/safeguard against this is to use zoo::na.locf, which takes the "last observation carried forward". Since the top-row is a special case, we do it twice, second time in reverse. This gives us the "next non-NA value in the column (up or down, depending).
library(zoo)
myfunc2 <- function(x) {
ind <- which(is.na(x), arr.ind=TRUE)
# since it appears you want them in row-first sorted order
ind <- ind[order(ind[,1], ind[,2]),]
# this is to guard against stacked NA
x <- apply(x, 2, zoo::na.locf, na.rm = FALSE)
# this special-case is when there are one or more NAs at the top of a column
x <- apply(x, 2, zoo::na.locf, fromLast = TRUE, na.rm = FALSE)
x[ind]
}
myfunc2(dat2)
# [1] 3 3 2 4
Say I have the following data.frame:
t<-c(1,1,2,4,5,4)
u<-c(1,3,4,5,4,2)
v<-c(2,3,4,5,NA,2)
w<-c(NA,3,4,5,2,3)
x<-c(2,3,4,5,6,NA)
df<-data.frame(t,u,v,w,x)
I would like to replace the NAs with values that represent the average of the case before and after the NA, unless a row starts (row 4) or ends (row 5) with an NA. When the row begins with NA, I would like to substitute the NA with the following case. When the row ends with NA, I would like to substitute the NA with the previous case.
Thus, I would like my output to look like:
t<-c(1,1,2,4,5,4)
u<-c(1,3,4,5,4,2)
v<-c(2,3,4,5,3.5,2)
w<-c(3,3,4,5,2,3)
x<-c(2,3,4,5,6,6)
df<-data.frame(t,u,v,w,x)
The question refers to row 4 starting with NA and row 5 ending in NA but in fact column 4 of the input df starts with an NA and column 5 of the input ends with an NA and neither row 4 nor row 5 of the input start or end with an NA so we will assume that column was meant, not row. Also there are two data frames both named df in the question. Evidently one is supposed to represent the input and the other data frame having the same name is the output but for complete clarity we have repeated the definition of the df we have used in the Note at the end.
na.approx in zoo pretty much does this. (If a matrix result is OK then omit the data.frame() part.)
library(zoo)
data.frame(na.approx(df, rule = 2))
giving:
t u v w x
1 1 1 2.0 3 2
2 1 3 3.0 3 3
3 2 4 4.0 4 4
4 4 5 5.0 5 5
5 5 4 3.5 2 6
6 4 2 2.0 3 6
Note: For clarity, we used this data frame as input above:
df <- structure(list(t = c(1, 1, 2, 4, 5, 4), u = c(1, 3, 4, 5, 4,
2), v = c(2, 3, 4, 5, NA, 2), w = c(NA, 3, 4, 5, 2, 3), x = c(2,
3, 4, 5, 6, NA)), .Names = c("t", "u", "v", "w", "x"), row.names = c(NA,
-6L), class = "data.frame")
sapply(df, function(x){
replace(x, is.na(x), rowMeans(cbind(c(NA, head(x, -1)), c(x[-1], NA)), na.rm = TRUE)[is.na(x)])
})
# t u v w x
#[1,] 1 1 2.0 3 2
#[2,] 1 3 3.0 3 3
#[3,] 2 4 4.0 4 4
#[4,] 4 5 5.0 5 5
#[5,] 5 4 3.5 2 6
#[6,] 4 2 2.0 3 6
I have a simple question, that I can't seem to find the answer to on google, stackoverflow or stackexchange. I am currently working with the example of rollapply to find the sum of some values that contain NA's. For example:
z <- zoo(c(NA, NA, NA, NA,2, 3, 4, 5, NA))
rollapply(z, 3, sum, na.rm = TRUE, align = "right")
This outputs:
3 4 5 6 7 8 9
0 0 2 5 9 12 9
This looks good, however, there are two times where there are 3 NA's in a row. The sum feature exchanges NA's to 0's. Unfortunately, that won't work with the data I'm going to work with since 0 is a meaningful value. Is there a way to replace the 0's with NA's again?
I'm looking for an output as below:
3 4 5 6 7 8 9
NA NA 2 5 9 12 9
Thank you in advance!
You can do the following (even though not very nice)
require(zoo)
z <- zoo(c(NA, NA, NA, NA,2, 3, 4, 5, NA))
tmp <- rollapply(z, 3, sum, na.rm = TRUE, align = "right")
tmp[is.na(z)[-2:-1] & tmp == 0] <- NA
tmp
so you assign NA wherever z is na and there is a NA produced by rollapply
which gives you:
> tmp
3 4 5 6 7 8 9
NA NA 2 5 9 12 9
Here are two approaches:
1) Note that it is not that rollapply is giving 0 but that sum(x, na.rm = TRUE) is giving 0. The function sum(x, na.rm = TRUE) is not really what is desired here.
Instead, provide a version of sum that works in the way needed, i.e. it returns NA when the input is entirely NA values and returns sum(x, na.rm = TRUE) otherwise.
sum_na <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
rollapplyr(z, 3, sum_na)
2) Alternately, use your code and fix it up afterwards by replacing any position that whose input was all NA with an NA:
zz <- rollapplyr(z, 3, sum, na.rm = TRUE)
zz[rollapply(is.na(z), 3, all)] <- NA
giving:
> zz
3 4 5 6 7 8 9
NA NA 2 5 9 12 9
I have for example vectors like the following:
a= c(1, NA, NA, 2, 3)
b=c(NA, 1, NA, NA, NA)
c=c(NA, NA, 5, NA, NA)
I wish to merge the three vectors to get
d=c(1,1,5,2,3)
Is there a way of doing this without extensive looping? Many thanks :)
You could try
rowSums(cbind(a,b,c), na.rm=TRUE)
#[1] 1 1 5 2 3
or
mat <- cbind(a,b,c)
mat[cbind(1:nrow(mat),max.col(!is.na(mat)))]
#[1] 1 1 5 2 3
Or
ind <- which(!is.na(mat), arr.ind=TRUE)
mat[ind[order(ind[,1]),]]
#[1] 1 1 5 2 3
I would consider pmin or pmax for a more direct approach given the conditions you describe:
pmin(a, b, c, na.rm = TRUE)
# [1] 1 1 5 2 3
pmax(a, b, c, na.rm = TRUE)
# [1] 1 1 5 2 3