I have a dataframe in which I want to identify columns that have NA values.
Only about 10% of the columns have NA values, so I only want to list out the columns where the number of NAs is greater than 0.
The line below will bring back all columns, is there a way I can filter out the columns that dont have NA values?
colSums(is.na(df))
To list out columns with at least a single null value you can use:
df.columns[df.isna().any()].tolist()
To list out columns with all null values you can use:
df.columns[df.isna().all()].tolist()
Related
I have a huge dataset of about 1.6 million rows, and the variable (column) I need to focus on is 'temperature'. The temperature column has many NA values, and the other variable columns have NA values throughout as well. I want to remove only the rows with NA values in the temperature column, I don't particularly care about the NA values in the other columns. How can I do this? If I end up needing to remove rows with NA values for more than just my temperature column, (eg the depth column) how can I select two columns? This is my code:
otn <- tidync(filename, row.names=TRUE) %>% activate('D0')
glider_table <- hyper_tibble(otn)
attach(glider_table)
summary(temperature)
na.omit(glider_table)
na.omit () removes all rows with NA values regardless of which column they're in, so I need something more selective.
You can use the drop_na() function, the first argument is the dataset name, and the second is an optional argument where you can name the specific columns you want to remove the NA responses from.
Like this , drop_na(dataset, column)
Suppose I have a data frame with 6 columns.
How do I replace all the NA values in the first 4 columns with a 0?
I have tried:
grades[is.na(grades), 1:4] = 0
The is.na is applied on the full dataset and it gives a matrix with dimensions equal to the original dataset. So, it is better to subset the dataset and apply the is.na on the first four column to get a logical matrix and then use the same subset of data to assign the TRUE values to 0
grades[1:4][is.na(grades[1:4])] <- 0
I have a dataframe as folowing:
base<-matrix(1:20,nrow=10)
base1<-matrix(rnorm(180),nrow=10)
base2<-cbind(base,base1)
What i need is to change a part of each row for NAs, based on the first 2 columns (the numbers in each column show wich columns of that row need to be changed to NA. So the first row would be something like this:
base2[1,1:11]<-NA
base2[1,]
This works for 1 row, but my real dataframe has over 100.000 rows. Any idea on how to do this fast??
thanks!
I have a dataset and the row which I need has an NA value.
If I use na.omit, the rows will be omitted; however, I need the row. Hence I need to predict a value in place of the NA.
How do I proceed?
I am new to R with a fairly simple question, I just can't figure out the answer. For my example I will use a data frame with 3 columns, but my actual data set is 139 columns with 10000 rows.
I want to replace all of the values in a given row with NA if the value in the same row in column C contains a value < 10.
Assume that all of my columns are either number or integer values.
so I want to take the data frame:
x=data.frame(c(5,9,2),c(3,4,6),c(12,9,11))
names(x)=c("A","B","C")
and replace row 2 with NA to create
y=data.frame(c(5,"NA",2),c(3,"NA",6),c(12,"NA",11))
names(y)=c("A","B","C")
Thanks!
how about:
x[x$C <10 ,] <- NA