Interpolate multiple NA values with R - r

I want to interpolate multiple NA values in a matrix called, tester.
This is a part of tester with only 1 column of NA values, in the whole 744x6 matrix other columns have multiple as well:
ZONEID TIMESTAMP U10 V10 U100 V100
1 20121022 12:00 -1.324032e+00 -2.017107e+00 -3.278166e+00 -5.880225574
1 20121022 13:00 -1.295168e+00 NA -3.130429e+00 -6.414975148
1 20121022 14:00 -1.285004e+00 NA -3.068829e+00 -7.101699541
1 20121022 15:00 -9.605904e-01 NA -2.332645e+00 -7.478168285
1 20121022 16:00 -6.268261e-01 -3.057278e+00 -1.440209e+00 -8.026791079
I have installed the zoo package and used the code library(zoo). I have tried to use the na.approx function, but it returns on a linear basis:
na.approx(tester)
# Error ----> need at least two non-NA values to interpolate
na.approx(tester, rule = 2)
# Error ----> need at least two non-NA values to interpolate
na.approx(tester, x = index(tester), na.rm = TRUE, maxgap = Inf)
Afterward I tried:
Lines <- "tester"
library(zoo)
z <- read.zoo(textConnection(Lines), index = 2)[,2]
na.approx(z)
Again I got the same multiple NA values error. I also tried:
z <- zoo(tester)
index(Cz) <- Cz[,1]
Cz_approx <- na.approx(Cz)
Same error.
I must be doing something really stupid, but I would really appreciate your help.

You may apply na.approx only on columns with at least two non-NA values. Here I use colSums on a boolean matrix to find relevant columns.
# create a small matrix
m <- matrix(data = c(NA, 1, 1, 1, 1,
NA, NA, 2, NA, NA,
NA, NA, NA, NA, 2,
NA, NA, NA, 2, 3),
ncol = 5, byrow = TRUE)
m
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 1 1 1 1
# [2,] NA NA 2 NA NA
# [3,] NA NA NA NA 2
# [4,] NA NA NA 2 3
library(zoo)
# na.approx on the entire matrix does not work
na.approx(m)
# Error in approx(x[!na], y[!na], xout, ...) :
# need at least two non-NA values to interpolate
# find columns with at least two non-NA values
idx <- colSums(!is.na(m)) > 1
idx
# [1] FALSE FALSE TRUE TRUE TRUE
# interpolate 'TRUE columns' only
m[ , idx] <- na.approx(m[ , idx])
m
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 1 1 1.000000 1.0
# [2,] NA NA 2 1.333333 1.5
# [3,] NA NA NA 1.666667 2.0
# [4,] NA NA NA 2.000000 3.0

Related

Cbind get environment objects in R

I would like cbind the vectors of same dimension using a vector of their names.
For example I would like get from
a <- c(2, 5, NA, NA, 6, NA)
b <- c(NA, 1, 3, 4, NA, 8)
A matrix using cbind(a,b)
a b
[1,] 2 NA
[2,] 5 1
[3,] NA 3
[4,] NA 4
[5,] 6 NA
[6,] NA 8
but calling variables from a vector of environment objects names, e.g. vectornames <- c("a","b")
My last try failed on cbind(for(i in vectornames) get(i))
You want to sapply/lapply the get function here. For example:
a <- c(2, 5, NA, NA, 6, NA)
b <- c(NA, 1, 3, 4, NA, 8)
nmes <- c("a", "b")
# Apply get() to each name in the nmes vector
# Then convert the resulting matrix to a data frame
as.data.frame(sapply(nms, get))
a b
1 2 NA
2 5 1
3 NA 3
4 NA 4
5 6 NA
6 NA 8
Technically you can do this using cbind, but it's more awkward:
# Convert the vector of names to a list of vectors
# Then bind those vectors together as columns
do.call(cbind, lapply(nms, get))
We can use mget to 'get' a list, then "loop-unlist" with sapply and function(x) x or [ to create a matrix
sapply(mget(vectornames), \(x) x)
#OR
sapply(mget(vectornames), `[`)
a b
[1,] 2 NA
[2,] 5 1
[3,] NA 3
[4,] NA 4
[5,] 6 NA
[6,] NA 8

Shifting leading NA cells to the left, but NOT internal NAs in R

As a preface, I've noticed that there are at least 5 answers to a similar question, "How do I shift non-NA cells leftward." There are many good answers for this, and the zoo package has a na.locf() that does it well.
My particular issue is that I want to shift the cells to the right of the final leading NA leftward, replacing all leading NAs. This means that I need to retain "internal" and trailing NAs. Here's a small example with a matrix, but my real data is a large data.frame:
matrixtest[1, 1:3] <- NA
matrixtest[3, 1:2] <- NA
matrixtest[2, 3] <- NA
matrixtest[4, 2] <- NA
matrixtest
matrixresult <-matrix(4, ncol = 4, nrow = 4)
matrixresult[1, 2:4] <- NA
matrixresult[3, 3:4] <- NA
matrixresult[2, 3] <- NA
matrixresult[4, 2] <- NA
matrixresult
After manipulation the matrixtest original should look like the matrixresult
as such:
[,1] [,2] [,3] [,4]
[1,] NA NA NA 4
[2,] 4 4 NA 4
[3,] NA NA 4 4
[4,] 4 NA 4 4
> matrixresult
[,1] [,2] [,3] [,4]
[1,] 4 NA NA NA
[2,] 4 4 NA 4
[3,] 4 4 NA NA
[4,] 4 NA 4 4
Apologies if I've missed an applicable answer already. I've spent too much time on this seemingly simple problem.
We create an index based on the last column and reverse those rows
i1 <- is.na(matrixresult[, ncol(matrixresult)])
matrixresult[i1, ] <- t(apply(matrixresult[i1,], 1, rev))
-output
matrixresult
# [,1] [,2] [,3] [,4]
#[1,] NA NA NA 4
#[2,] 4 4 NA 4
#[3,] NA NA 4 4
#[4,] 4 NA 4 4
1) shiftLeft removes the NAs off the ends in its input x using na.trim giving y and then overwrites the beginning of a vector of NAs with that. apply that to every row and transpose that since apply results in the transpose of what we want.
library(zoo)
shiftLeft <- function(x, y = na.trim(x)) replace(NA * x, seq_along(y), y)
m <- t(apply(matrixtest, 1, shiftLeft))
# check
identical(m, matrixresult)
## [1] TRUE
2) An alternate which is a bit longer but only involves base R is:
shiftLeft2 <- function(x) {
ix <- which.max(!is.na(x))
replace(NA*x, seq_len(length(x) - ix + 1), x[ix:length(x)])
}
m2 <- t(apply(matrixtest, 1, shiftLeft2))
Note
We used the first matrix as input and the second to check against.
matrixtest <- structure(c(NA, 4, NA, 4, NA, 4, NA, NA, NA, NA, 4, 4, 4, 4,
4, 4), .Dim = c(4L, 4L))
matrixresult <- structure(c(4, 4, 4, 4, NA, 4, 4, NA, NA, NA, NA, 4, NA, 4, NA,
4), .Dim = c(4L, 4L))

Delete only an entire row with NA, in R

I have something like this in my dataset and I only want to delete a row if it only has NA's, not if it has at least one value.
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 NA 4
[3,] 4 6 7
[4,] NA NA NA
[5,] 4 8 NA
In this example they were able to delete what i want, but when i try to do in the exact same way, it doesn't work.
I've already tried their example:
data[rowSums(is.na(data)) != ncol(data),]
But my row's number don't change like this one.
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 NA 4
[3,] 4 6 7
[4,] 4 8 NA
My NA's are not characters.if i ask for their class:
class(NA)
[1] "logical"
Do you know another way to ask for these, please?
______UPDATE_____
Maybe I said it wrong.
My problem, and it's why there code is not working
mymat[rowSums(is.na(mymat)) != ncol(mymat), ]
Because i have 3 columns with information but after that, is everything NA, like this:
Date Product Code protein fat
2016-01-01 aaa 0001 NA NA
2016-01-01 bbb 0003 NA NA
2016-02-01 ccc 0032 NA NA
So the row is not entirly NA's, only after the 3rd column... But i want to remove the entire row.. (1:5)
Thank you!
First, I would coerce the matrix to a data frame, because this is the typical ("tidy") format to store variables and observations. Then you could use the remove_empty_rows() function from the sjmisc-package:
library(sjmisc)
df <- data.frame(
a = c(1, 1, 4, NA, 4),
b = c(2, NA, 6, NA, 8),
c = c(3, 4, 7, NA, NA)
)
# get row numbers of empty rows
empty_rows(df)
## [1] 4
# remove empty rows
remove_empty_rows(df)
## A tibble: 4 × 3
## a b c
## * <dbl> <dbl> <dbl>
## 1 1 2 3
## 2 1 NA 4
## 3 4 6 7
## 4 4 8 NA
There are also functions for columns: empty_cols() and remove_empty_cols().
If you just want to keep complete cases (rows), use complete.cases():
df[complete.cases(df), ]
## a b c
## 1 1 2 3
## 3 4 6 7
Check if this will work with the updated explanation. It will subset the data.frame to ignore the information columns when checking for NA. I added some additional rows that contain a mix of numbers and NA
df1 <- data.frame(Date=c("2016-01-01", "2016-01-01", "2016-02-01", "2016-03-01", "2016-03-01"),
Product=c("aaa", "bbb", "ccc", "ddd", "eee"),
Code=c("0001", "0003", "0032", "0005", "0007"),
protein=c(NA, NA, NA, 5, NA),
fat=c(NA, NA, NA, NA, 4))
# place any columns you do not want to check for NA in names.info
names.info <- c("Date", "Product", "Code")
names.check <- setdiff(names(df1), names.info)
df1[rowSums(is.na(df1[, names.check])) != length(names.check), ]
Date Product Code protein fat
4 2016-03-01 ddd 0005 5 NA
5 2016-03-01 eee 0007 NA 4
You need to delete the as.integer
mymat <- matrix(c(1:3, NA, 4:6, NA, rep(NA, 4)), ncol = 3)
Which translates to
[,1] [,2] [,3]
[1,] 1 4 NA
[2,] 2 5 NA
[3,] 3 6 NA
[4,] NA NA NA
mymat[as.integer(rowSums(is.na(mymat)) != ncol(mymat)), ]
Gives you
[,1] [,2] [,3]
[1,] 1 4 NA
[2,] 1 4 NA
[3,] 1 4 NA
But you want
mymat[rowSums(is.na(mymat)) != ncol(mymat), ]
To get
[,1] [,2] [,3]
[1,] 1 4 NA
[2,] 2 5 NA
[3,] 3 6 NA
Cheers,
Marc

Setting matrix values comparing to vector in R

I want to set NA's in every element of a matrix where the value in a column is greater than or equal to the value of a given vector. For example, I can create a matrix:
set.seed(1)
zz <- matrix(data = round(10L * runif(12)), nrow = 4, ncol = 3)
which gives for zz:
[,1] [,2] [,3]
[1,] 8 5 7
[2,] 6 5 1
[3,] 5 10 3
[4,] 9 1 9
and for the comparison vector (for example):
xx <- round(10L * runif(4))
where xx is:
[1] 6 3 8 2
if I perform this operation:
apply(zz,2,function(x) x >= xx)
I get:
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE TRUE FALSE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
What I want is everywhere I have a TRUE element I want an NA and everywhere I have a FALSE I get the number in the zz matrix (e.g., manually ...):
NA 5 NA
NA NA 1
5 NA 3
NA 1 NA
I can cobble together some "for" loops to do what I want, but is there a vector-based way to do this??
Thanks for any tips.
You could simply do:
zz[zz>=xx] <- NA
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Here is one option to get the expected output. We get a logical matrix (zz >= xx), using NA^ on that returns NA for the TRUE values and 1 for the FALSE, then multiply it with original matrix 'zz' so that NA remains as such while the 1 changes to the corresponding value in 'zz'.
NA^(zz >= xx)*zz
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Or another option is ifelse
ifelse(zz >= xx, NA, zz)
data
zz <- structure(c(8, 6, 5, 9, 5, 5, 10, 1, 7, 1, 3, 9), .Dim = c(4L, 3L))
xx <- c(6, 3, 8, 2)

Consecutive NAs in a column

I'd like to remove the rows that got more than 3 consecutive NAs in one column.
[,1] [,2]
[1,] 1 1
[2,] NA 1
[3,] 2 4
[4,] NA 3
[6,] 1 4
[7,] NA 8
[8,] NA 5
[9,] NA 6
so I'd have this data
[,1] [,2]
[1,] 1 1
[2,] NA 1
[3,] 2 4
[4,] NA 3
[6,] 1 4
I did a research and I tried this code
data[! rowSums(is.na(data)) >3 , ]
but I think this is only used for consecutive NAs in a row.
As mentioned, rle is a good place to start:
is.na.rle <- rle(is.na(data[, 1]))
Since NAs are "bad" only when they come by three or more, we can re-write the values:
is.na.rle$values <- is.na.rle$values & is.na.rle$lengths >= 3
Finally, use inverse.rle to build the vector of indices to filter:
data[!inverse.rle(is.na.rle), ]
You could use rle, or you could do this:
library(data.table)
d = data.table(a = c(1,NA,2,NA,3,4,NA,NA,NA), b = c(1:9))
d[d[, if(.N > 3) {.I[1]} else {.I}, by = cumsum(!is.na(a))]$V1]
# a b
#1: 1 1
#2: NA 2
#3: 2 3
#4: NA 4
#5: 3 5
#6: 4 6
Run d[, cumsum(!is.na(a))] to see why this works. Also, I could've used .SD instead of .I to get cleaner code, but opted for efficiency instead.
As #DirkEddelbuettel suggested, the rle() function will help. You can create your own function to identify the elements of a vector with 3 or more consecutive NA values.
consecna <- function(x, n=3) {
# function to identify elements with n or more consecutive NA values
y <- rle(is.na(x))
y$values <- y$lengths > (n - 0.5) & y$values
inverse.rle(y)
}
Then you can apply this function to each column of your matrix.
# example matrix of data
m <- matrix(c(1, NA, 2, NA, 1, NA, NA, NA, 1, 1, 4, 3, 4, 8, 5, 6), ncol=2)
# index matrix identifying elements with 3 or more consecutive NA values
mindex <- apply(m, 2, consecna)
Then use the created index matrix to get rid of all those rows that were identified.
# removal of all the identified rows
m2 <- m[!apply(mindex, 1, any), ]

Resources