Columns With NA Values - r

I have a dataframe in which I want to identify columns that have NA values.
Only about 10% of the columns have NA values, so I only want to list out the columns where the number of NAs is greater than 0.
The line below will bring back all columns, is there a way I can filter out the columns that dont have NA values?
colSums(is.na(df))

To list out columns with at least a single null value you can use:
df.columns[df.isna().any()].tolist()
To list out columns with all null values you can use:
df.columns[df.isna().all()].tolist()

Related

remove rows with NA values in a specific column

I have a huge dataset of about 1.6 million rows, and the variable (column) I need to focus on is 'temperature'. The temperature column has many NA values, and the other variable columns have NA values throughout as well. I want to remove only the rows with NA values in the temperature column, I don't particularly care about the NA values in the other columns. How can I do this? If I end up needing to remove rows with NA values for more than just my temperature column, (eg the depth column) how can I select two columns? This is my code:
otn <- tidync(filename, row.names=TRUE) %>% activate('D0')
glider_table <- hyper_tibble(otn)
attach(glider_table)
summary(temperature)
na.omit(glider_table)
na.omit () removes all rows with NA values regardless of which column they're in, so I need something more selective.
You can use the drop_na() function, the first argument is the dataset name, and the second is an optional argument where you can name the specific columns you want to remove the NA responses from.
Like this , drop_na(dataset, column)

replace all NA values in first 4 columns of data frame

Suppose I have a data frame with 6 columns.
How do I replace all the NA values in the first 4 columns with a 0?
I have tried:
grades[is.na(grades), 1:4] = 0
The is.na is applied on the full dataset and it gives a matrix with dimensions equal to the original dataset. So, it is better to subset the dataset and apply the is.na on the first four column to get a logical matrix and then use the same subset of data to assign the TRUE values to 0
grades[1:4][is.na(grades[1:4])] <- 0

How to replace specific parts of rows in dataframe for NA

I have a dataframe as folowing:
base<-matrix(1:20,nrow=10)
base1<-matrix(rnorm(180),nrow=10)
base2<-cbind(base,base1)
What i need is to change a part of each row for NAs, based on the first 2 columns (the numbers in each column show wich columns of that row need to be changed to NA. So the first row would be something like this:
base2[1,1:11]<-NA
base2[1,]
This works for 1 row, but my real dataframe has over 100.000 rows. Any idea on how to do this fast??
thanks!

Predict values in place of NA by using the original data frame?

I have a dataset and the row which I need has an NA value.
If I use na.omit, the rows will be omitted; however, I need the row. Hence I need to predict a value in place of the NA.
How do I proceed?

Change values in row based on a column value r

I am new to R with a fairly simple question, I just can't figure out the answer. For my example I will use a data frame with 3 columns, but my actual data set is 139 columns with 10000 rows.
I want to replace all of the values in a given row with NA if the value in the same row in column C contains a value < 10.
Assume that all of my columns are either number or integer values.
so I want to take the data frame:
x=data.frame(c(5,9,2),c(3,4,6),c(12,9,11))
names(x)=c("A","B","C")
and replace row 2 with NA to create
y=data.frame(c(5,"NA",2),c(3,"NA",6),c(12,"NA",11))
names(y)=c("A","B","C")
Thanks!
how about:
x[x$C <10 ,] <- NA

Resources