Removing NAs from column while calculating the length of it

Removing NAs from column while calculating the length of it - r

So i've got a column in dataframe for 237 different pulses, and from those i gotta take pulses that are over 100 and less than 45, and see how many of them there are. I know that i can get the lenght of that with
length(survey$Pulse[survey$Pulse > 100 | survey$Pulse < 45])
However there are NA values in the column and i got no idea how to remove those from the lenght.
If you need more info ill try to provide but the only thing i dont know how to do is removing NA values from the column.
I know i could use na.rm=TRUE but i got no idea how to implement it to the line.

One option is to use na.omit - it returns object with NA values removed.
For example:
# With na.omit
length(na.omit(c(1:10, NA)))
10
# Without na.omit
length(c(1:10, NA))
11
In your case use:
length(na.omit(survey$Pulse[survey$Pulse > 100 | survey$Pulse < 45]))

Another way is to wrap which around the logical condition. When there are NA values present, the logical condition is not enough. I'll give an example with fake data.
x <- c(1:3, NA, 4, NA, 5:7, NA, 8:10)
x[x < 4 | x > 7]
#[1] 1 2 3 NA NA NA 8 9 10
x[which(x < 4 | x > 7)]
#[1] 1 2 3 8 9 10
And the length is obviously different.

Related

NA Remove to calculation

I have some problems with NA value cause my dataset from excel is not same column number so It showed NA. It deleted all row containing NA value when make calculation Similarity Index function Psicalc in RInSp package.
B F
4 7
5 6
6 8
7 5
NA 4
NA 3
NA 2
Do you know how to handle with NA or remove it but not delete all row or not affect to package?. Beside when I import.RinSP it has message
In if (class(filename) == "character") { :
the condition has length > 1 and only the first element will be used
Thank you so much

Many R functions ( specifically base R ) have an na.rm argument, which is FALSE by default. That means if you omit this argument, and your data has NA, your "calculation" will result in NA. To remove these in the calculations, include an na.rm argument and assign it to TRUE.
Example:
x <- c(4,5,6,7,NA,NA)
mean(x) # Oops!
[1] NA
mean(x, na.rm=TRUE)
[1] 5.5

R inverse of logical operators returning nothing [duplicate]

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.
I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.
df2 <- subset ( df1 , Height < 40 )
However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm
f1 <- function ( x , na.rm = FALSE ) {
df2 <- subset ( x , Height < 40 )
}
f1 ( df1 , na.rm = FALSE )
but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?

If we decide to use subset function, then we need to watch out:
For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.
So only non-NA values will be retained.
If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:
subset(df1, Height < 40 | is.na(Height))
# or `df1[df1$Height < 40 | is.na(df1$Height), ]`
Don't use directly (to be explained soon):
df2 <- df1[df1$Height < 40, ]
Example
df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)
subset(df1, Height < 40 | is.na(Height))
# Height y
#1 NA 1
#2 2 2
#3 4 3
#4 NA 4
df1[df1$Height < 40, ]
# Height y
#1 NA NA
#2 2 2
#3 4 3
#4 NA NA
The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:
x <- 1:4
ind <- c(NA, TRUE, NA, FALSE)
x[ind]
# [1] NA 2 NA
We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):
x[ind | is.na(ind)]
# [1] 1 2 3
This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.

You could also do:
df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]

For subsetting by character/factor variables, you can use %in% to keep NAs. Specify the data you wish to exclude.
# Create Dataset
library(data.table)
df=data.table(V1=c('Surface','Bottom',NA),V2=1:3)
df
# V1 V2
# 1: Surface 1
# 2: Bottom 2
# 3: <NA> 3
# Keep all but 'Bottom'
df[!V1 %in% c('Bottom')]
# V1 V2
# 1: Surface 1
# 2: <NA> 3
This works because %in% never returns an NA (see ?match)

Identify duplicate values and remove them

I have a vector:
vec <- c(2,3,5,5,5,5,6,1,9,4,4,4)
I want to check if a particular value is repeated consecutively and if yes, keep the first two values and assign NA to the rest of the values.
For example, in the above vector, 5 is repeated 4 times, therefore I will keep the first two 5's and make the second two 5's NA.
Similarly, 4 is repeated three times, so I will keep the first two 4's and remove the third one.
In the end my vector should look like:
2,3,5,5,NA,NA,6,1,9,4,4,NA
I did this:
bad.values <- vec - binhf::shift(vec, 1, dir="right")
bad.repeat <- bad.values == 0
vec[bad.repeat] <- NA
[1] 2 3 5 NA NA NA 6 1 9 4 NA NA
I can only get it to work to keep the first 5 and 4 (rather than first two 5's or 4',4's).
Any solutions?

Another option with just base R functions:
rl <- rle(vec)
i <- unlist(lapply(rl$lengths, function(l) if (l > 2) c(FALSE,FALSE,rep(TRUE, l - 2)) else rep(FALSE, l)))
vec * NA^i
which gives:
[1] 2 3 5 5 NA NA 6 1 9 4 4 NA

I figured it out. I just had to change the argument to 2 in binhf::shift
vec <- c(2,3,5,5,5,5,6,1,9,4,4,4)
bad.values <- vec - binhf::shift(vec, 2, dir="right")
bad.repeat <- bad.values == 0
vec[bad.repeat] <- NA
[1] 2 3 5 5 NA NA 6 1 9 4 4 NA

I think this might work, if I got your problem right:
vec <- c(2,3,5,5,5,5,6,1,9,4,4,4)
diffs1<-vec-binhf::shift(vec,1,dir="right")
diffs2<-vec-binhf::shift(vec,2,dir="right")
get_zeros<-abs(diffs1)+abs(diffs2)
vec[which(get_zeros==0)]<-NA
I hope this helps!

This question may refer to a problem you encountered in a dataframe, not a vector. In any case, here's a tidyverse solution to both.
tibble(x = vec) %>%
group_by(x) %>%
mutate(mycol = ifelse(row_number()>2, NA, x) ) %>%
pull(mycol)

How to subset data in R without losing NA rows?

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.
I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.
df2 <- subset ( df1 , Height < 40 )
However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm
f1 <- function ( x , na.rm = FALSE ) {
df2 <- subset ( x , Height < 40 )
}
f1 ( df1 , na.rm = FALSE )
but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?

If we decide to use subset function, then we need to watch out:
For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.
So only non-NA values will be retained.
If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:
subset(df1, Height < 40 | is.na(Height))
# or `df1[df1$Height < 40 | is.na(df1$Height), ]`
Don't use directly (to be explained soon):
df2 <- df1[df1$Height < 40, ]
Example
df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)
subset(df1, Height < 40 | is.na(Height))
# Height y
#1 NA 1
#2 2 2
#3 4 3
#4 NA 4
df1[df1$Height < 40, ]
# Height y
#1 NA NA
#2 2 2
#3 4 3
#4 NA NA
The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:
x <- 1:4
ind <- c(NA, TRUE, NA, FALSE)
x[ind]
# [1] NA 2 NA
We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):
x[ind | is.na(ind)]
# [1] 1 2 3
This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.

You could also do:
df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]

For subsetting by character/factor variables, you can use %in% to keep NAs. Specify the data you wish to exclude.
# Create Dataset
library(data.table)
df=data.table(V1=c('Surface','Bottom',NA),V2=1:3)
df
# V1 V2
# 1: Surface 1
# 2: Bottom 2
# 3: <NA> 3
# Keep all but 'Bottom'
df[!V1 %in% c('Bottom')]
# V1 V2
# 1: Surface 1
# 2: <NA> 3
This works because %in% never returns an NA (see ?match)

Reducing Vectors to common non-NA values in R

I'm pretty new to R so the answer might be obvious, but so far I have only found answers to similar problems that don't match, or which I can't translate to mine.
The Requirement:
I have two vectors of the same length which contain numeric values as well as NA-values which might look like:
[1] 12 8 11 9 NA NA NA
[1] NA 7 NA 10 NA 11 9
What I need now is two vectors that only contain those values that are not NA in both original vectors, so in this case the result should look like this:
[1] 8 9
[1] 7 10
I was thinking about simply going through the vectors in a loop, but the dataset is quite large so I would appreciate a faster solution to that... I hope someone can help me on that...

You are looking for complete.cases But you should put your vectors in a data.frame.
dat <- data.frame(x=c(12 ,8, 11, 9, NA, NA, NA),
y=c(NA ,7, NA, 10, NA, 11, 9))
dat[complete.cases(dat),]
x y
2 8 7
4 9 10

Try this:
#dummy vector
a <- c(12,8,11,9,NA,NA,NA)
b <- c(NA,7,NA,10,NA,11,9)
#result
a[!is.na(a) & !is.na(b)]
b[!is.na(a) & !is.na(b)]

Something plus NA in R is generally NA. So, using that piece of information, you can simply do:
cbind(a, b)[!is.na(a + b), ]
# a b
# [1,] 8 7
# [2,] 9 10
More generally, you could write a function like the following to easily accept any number of vectors:
myFun <- function(...) {
myList <- list(...)
Names <- sapply(substitute(list(...)), deparse)[-1]
out <- do.call(cbind, myList)[!is.na(Reduce("+", myList)), ]
colnames(out) <- Names
out
}
With that function, the usage would be:
myFun(a, b)
# a b
# [1,] 8 7
# [2,] 9 10
In my timings, this is by far the fastest option here, but that's only important if you are able to detect differences down to the microseconds or if your vector lengths are in the millions, so I won't bother posting benchmarks.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Removing NAs from column while calculating the length of it - r

One option is to use na.omit - it returns object with NA values removed. For example: # With na.omit length(na.omit(c(1:10, NA))) 10 # Without na.omit length(c(1:10, NA)) 11 In your case use: length(na.omit(survey$Pulse[survey$Pulse > 100 | survey$Pulse < 45]))

Related

NA Remove to calculation

R inverse of logical operators returning nothing [duplicate]

Identify duplicate values and remove them

How to subset data in R without losing NA rows?

Reducing Vectors to common non-NA values in R

Categories

Resources