How to show indexes of NAs? - r

I have the piece to display NAs, but I can't figure it out.
try(na.fail(x))
> Error in na.fail.default(x) : missing values in object
# display NAs
myvector[is.na(x)]
# returns
NA NA NA NA
The only thing I get from this the length of the NA vector, which is actually not too helpful when the NAs where caused by a bug in my code that I am trying to track. How can I get the index of NA element(s) ?
I also tried:
subset(x,is.na(x))
which has the same effect.
EDIT:
y <- complete.cases(x)
x[!y]
# just returns another
NA NA NA NA

You want the which function:
which(is.na(arr))

is.na() will return a boolean index of the same shape as the original data frame.
In other words, any cells in that m x n index with the value TRUE correspond to NA values in the original data frame.
You can them use this to change the NAs, if you wish:
DF[is.na(DF)] = 999
To get the total number of data rows with at least one NA:
cc = complete.cases(DF)
num_missing = nrow(DF) - sum(ok)

which(Dataset$variable=="") will return the corresponding row numbers in a particular column

R Code using loop and condition :
# Testing for missing values
is.na(x) # returns TRUE if x is missing
y <- c(1,NA,3,NA)
is.na(y)
# returns a vector (F F F T)
# Print the index of NA values
for(i in 1:length(y)) {
if(is.na(y[i])) {
cat(i, ' ')
}
}
Output is :
Click here
Also :
which(is.na(y))

Related

doing for loop in R

I have a file that I have filtered my SNPs for LD (in the example below;my.filtered.snp.id). I want to keep only these SNPs in my genotype matrix (geno_snp), I am trying to write a for loops in R, and I would appreciate any help to fix my code. I want to keep those lines (the whole line including snp.id and genotype information) in the genotype matrix where snp.id matches with snp.id in my my.filtered.snp.id and delete those that are not match.
head(my.filtered.snp.id)
Chr10_31458
Chr10_31524
Chr10_45901
Chr10_102754
Chr10_102828
Chr10_103480
head (geno_snp)
XRQChr10_103805 NA NA NA 0 NA 0 NA NA NA NA NA 0 0
XRQChr10_103937 NA NA NA 0 NA 1 NA NA NA NA NA 0 2
XRQChr10_103990 NA NA NA 0 NA 0 NA NA NA NA NA 0 NA
I am trying something like this:
for (i in 1:length(geno_snp[,1])){
for (j in 1:length(my.filtered.snp.id)){
if geno_snp[i,] == my.filtered.snp.i[j]
print (the whole line in geno_snp)
}
else (remove the line)
}
If I understood it correctly, you want a subset of your data.frame geno_snp in which the row names must match the selected SNP IDs from the vector my.filtered.snp.id.
Please check if this solution works for you:
index <- unlist(sapply(row.names(geno_snp), function(x) grep(pattern = x, x = my.filtered.snp.id)))
selected_subset <- geno_snp[index,]
What I did was to create an index adressing the rows with names that were a match with any value in my.filtered.snp.id. Then I used the index to make the subset of the dataframe. Since the result from applying the grep function with the aid of sapply was in the form of a list, I used unlist to obtain the results in the form of a vector.
EDIT:
I noticed you had some row.names that weren't an exact match with your original my.filtered.snp.id values. In this case, maybe what you wanna do is:
index <- unlist(sapply(my.filtered.snp.id, function(x) grep(pattern = x, x = row.names(geno_snp))))
selected_subset <- geno_snp[index,]
The thing is that you have row.names beggining with XRQ... so in this last case the code uses the reference values from my.filtered.snp.id to detect matches in row.names(geno_snp), even if there is this XRQ string in the beggining of it.
Finally, in the case I have misunderstood your data and what I'm calling row names here are, in fact, data in a column (the SNP IDs), just use geno_snp[,1] instead of row.names(geno_snp) in both codes above.

Function to change blanks to NA

I'm trying to write a function that turns empty strings into NA. A summary of one of my column looks like this:
a b
12 210 468
I'd like to change the 12 empty values to NA. I also have a few other factor columns for which I'd like to change empty values to NA, so I borrowed some stuff from here and there to come up with this:
# change nulls to NAs
nullToNA <- function(df){
# split df into numeric & non-numeric functions
a<-df[,sapply(df, is.numeric), drop = FALSE]
b<-df[,sapply(df, Negate(is.numeric)), drop = FALSE]
# Change empty strings to NA
b<-b[lapply(b,function(x) levels(x) <- c(levels(x), NA) ),] # add NA level
b<-b[lapply(b,function(x) x[x=="",]<- NA),] # change Null to NA
# Put the columns back together
d<-cbind(a,b)
d[, names(df)]
}
However, I'm getting this error:
> foo<-nullToNA(bar)
Error in x[x == "", ] <- NA : incorrect number of subscripts on matrix
Called from: FUN(X[[i]], ...)
I have tried the answer found here: Replace all 0 values to NA but it changes all my columns to numeric values.
You can directly index fields that match a logical criterion. So you can just write:
df[is_empty(df)] = NA
Where is_empty is your comparison, e.g. df == "":
df[df == ""] = NA
But note that is.null(df) won’t work, and would be weird anyway1. I would advise against merging the logic for columns of different types, though! Instead, handle them separately.
1 You’ll almost never encounter NULL inside a table since that only works if the underlying vector is a list. You can create matrices and data.frames with this constraint, but then is.null(df) will never be TRUE because the NULL values are wrapped inside the list).
This worked for me
df[df == 'NULL'] <- NA
How about just:
df[apply(df, 2, function(x) x=="")] = NA
Works fine for me, at least on simple examples.
This is the function I used to solve this issue.
null_na=function(vector){
new_vector=rep(NA,length(vector))
for(i in 1:length(vector))
if(vector[i]== ""){new_vector[i]=NA}else if(is.na(vector[i]))
{new_vector[i]=NA}else{new_vector[i]=vector[i]}
return(new_vector)
}
Just plug in the column or vector you are having an issue with.

Finding the unique identifyer for a row with NA in a particular column in R

I have data in the following format:
ID Species Side_of_boat
1 spA Port
2 spB Starboard
3 spA NA
I would like to write a line of code that gives me the unique ID for all rows that have NA in 'side of boat'.
I have tried:
unique(df$ID[df$side_of_boat == "NA"])
But it doesn't give me the output I want. I would like the output to be:
"3"
Thanks!
Try
unique(df$ID[is.na(df$Side_of_boat)])
instead. NA is a special value in R and also has its own special function is.na() to test if an entry is NA. Check ?NA for more information.
#Method1
n <- which(is.na(df$side_of_boat))
you can also use *apply with this, e.g.
lapply(apply(df$side_of_boat, 1, function(x) which(!is.na(x)) ) , paste, collapse=", ")
#Method 2
new_DF <- subset(df, is.na(df$side_of_boat))
#Method 3
You could also write a function to do this for you:
getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) )
#Note
In case you have NA character values, first run
df$side_of_boat[df$side_of_boat=='NA'] <- NA
Try:
df$ID[which(is.na(df$Side_of_Boat))]
It should give you a vector of the ID's regardless of them being numbers or characters

Create new column with binary data based on several columns

I have a dataframe in which I want to create a new column with 0/1 (which would represent absence/presence of a species) based on the records in previous columns. I've been trying this:
update_cat$bobpresent <- NA #creating the new column
x <- c("update_cat$bob1999", "update_cat$bob2000", "update_cat$bob2001","update_cat$bob2002", "update_cat$bob2003", "update_cat$bob2004", "update_cat$bob2005", "update_cat$bob2006","update_cat$bob2007", "update_cat$bob2008", "update_cat$bob2009") #these are the names of the columns I want the new column to base its results in
bobpresent <- function(x){
if(x==NA)
return(0)
else
return(1)
} # if all the previous columns are NA then the new column should be 0, otherwise it should be 1
update_cat$bobpresence <- sapply(update_cat$bobpresent, bobpresent) #apply the function to the new column
Everything is going fina until the last string where I'm getting this error:
Error in if (x == NA) return(0) else return(1) :
missing value where TRUE/FALSE needed
Can somebody please advise me?
Your help will be much appreciated.
By definition all operations on NA will yield NA, therefore x == NA always evaluates to NA. If you want to check if a value is NA, you must use the is.na function, for example:
> NA == NA
[1] NA
> is.na(NA)
[1] TRUE
The function you pass to sapply expects TRUE or FALSE as return values but it gets NA instead, hence the error message. You can fix that by rewriting your function like this:
bobpresent <- function(x) { ifelse(is.na(x), 0, 1) }
In any case, based on your original post I don't understand what you're trying to do. This change only fixes the error you get with sapply, but fixing the logic of your program is a different matter, and there is not enough information in your post.

Filling a 3D array in R: How to avoid coercion to list?

I have preallocated a 3D array and try to fill it with data. However, whenever I do this with a previously defined data.frame collumn, the array gets mysteriously converted to a list, which messes up everything. Converting the data.frame collumn to a vector does not help it.
Example:
exampleArray <- array(dim=c(3,4,6))
exampleArray[2,3,] <- c(1:6) # direct filling works perfectly
exampleArray
str(exampleArray) # output as expected
Problem:
exampleArray <- array(dim=c(3,4,6))
exampleContent <- as.vector(as.data.frame(c(1:6)))
exampleArray[2,3,] <- exampleContent # filling array from a data.frame column
# no errors or warnings
exampleArray
str(exampleArray) # list-like output!
Is there any way I can get around this and fill up my array normally?
Thanks for your suggestions!
Try this:
exampleArray <- array(dim=c(3,4,6))
exampleContent <- as.data.frame(c(1:6))
> exampleContent[,1]
[1] 1 2 3 4 5 6
exampleArray[2,3,] <- exampleContent[,1] # take the desired column
# no errors or warnings
str(exampleArray)
int [1:3, 1:4, 1:6] NA NA NA NA NA NA NA 1 NA NA ...
You were trying to insert data frame in array, which won't work. You should use the dataframe$column or dataframe[,1] instead.
Also, as.vector doesn't do anything in as.vector(as.data.frame(c(1:6))), you were probably after as.vector(as.data.frame(c(1:6))), although that doesn't work:
as.vector(as.data.frame(c(1:6)))
Error: (list) object cannot be coerced to type 'double'

Resources