This question already has answers here:
Remove columns from dataframe where some of values are NA
(8 answers)
Closed 6 years ago.
I need a function in R which is doing the following.
I have a matrix with some data
mydata <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3))
mydata
X1 X2 X3
1 1 NA NA
2 2 2 NA
3 3 3 2
No I want to check every column of this matrix if there is any NA in a column and create a vector which stores 0 if there is a NA in a column or 1 if there is no NA in the column.
So check column X1: if NA is in this column write 0 to the vector, if not write 1 to the vector. Then check the next column and so on.
After checking mydata the vector should look like this
(1 0 0)
with
colnames(mydata)[colSums(is.na(mydata)) > 0]
I get the column names which have NA. But how can I use this function to create the vector?
We can coerce the logical vector from colSums on the logical matrix to binary with as.integer
as.integer(!colSums(is.na(mydata)))
#[1] 1 0 0
Related
This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Omit rows containing specific column of NA
(10 answers)
Closed 1 year ago.
I am checking a list of ID numbers from one data frame against another to see if they exist, and I want to only see the values from two columns in the target data frame.
Thus:
df1[!is.na(df1$id %in% df2$id), c("D2", "D8")]
This returns:
D2 D8
1 2021-03-16T13:58:22.4700000Z NA
2 2021-03-22T08:10:12.3190000Z NA
3 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
4 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
5 2021-03-09T08:07:45.4410000Z NA
6 2021-03-23T12:03:46.4720000Z NA
7 2021-04-05T09:01:55.9520000Z NA
8 2021-03-29T12:57:58.5500000Z NA
9 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
10 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
I want to know if there is a way to exclude NA values in either of the columns so that the returned results look like this:
D2 D8
1 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
2 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
3 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
4 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
You can omit NA values from your dataframe by doing
na.omit(df1[, c("D2", "D8")])
This question already has answers here:
How do I extract a single column from a data.frame as a data.frame?
(3 answers)
Closed 2 years ago.
I have a data frame:
L1 2020 NA
1 1 0 0
2 2 1 0
3 3 1 0
I want to delete first and last column, to get dataframe like this:
2020
1 0
2 1
3 1
I tried:
1)
df <- df[,-c(1,ncol(df))]
or 2)
df <- subset(df, select = -c(1,ncol(df)))
For both I get result:
[1] 0 1 1
So I guess it changed data frame into vector. How can I delete these columns to keep it as a data frame?It is important for me to keep it like this. I don't have this problem when there are more columns. It changes only when one column is supposed to be left.
After specifiing the columns in the square-brackets, add ,drop=FALSE right after it.
The drop-argument is TRUE by default and you are struggling with this default.
df <- data.frame(a=1:10,b=1:10)
df[,1] #R simplifies to a vector via implicit drop=TRUE default
df[,1,drop=FALSE] #dataframe-structure remains
This question already has answers here:
Select last non-NA value in a row, by row
(3 answers)
Closed 2 years ago.
I have a dataframe that looks like this:
column1 column2 column3
NA NA NA
0 NA NA
0 1 NA
0 1 2
and I would like to keep the last non-NA value of each row and add it in a new column.
This would be the desired output:
column4
NA
0
1
2
Use max.col with ties.method = "last"
df[cbind(1:nrow(df), max.col(!is.na(df), ties.method = "last"))]
#[1] NA 0 1 2
Explanation :
The logic is to create a row/column index to subset values from df.
max.col returns the column number for each row where the last non-NA value is present. This is the column index. In case there is no non-NA value it returns the 1st column number.
max.col(!is.na(df), ties.method = "last")
#[1] 3 1 2 3
We generate row index using 1:nrow(df), cbind them to create matrix which we use to subset the dataframe (df).
If your values are increasing as per your example, you can use pmax, i.e.
do.call(pmax, c(df, na.rm = TRUE))
#[1] NA 0 1 2
This question already has answers here:
Subset / filter rows in a data frame based on a condition in a column
(3 answers)
Closed 5 years ago.
I have this dataframe with a set of zero values along two columns. df
Sl_No No_of_Mails No_of_Responses
1 10 2
2 0 0
3 20 0
4 10 10
5 0 0
6 0 NA
7 10 NA
8 10 0
I want to remove those rows where No_of_Mails equals zero without disturbing the other column.
I have tried the following code
row_sub = apply(df, 1, function(row) all(row !=0 ))
df[row_sub,]
This removes all the 0 values including the one from the number_of_responses column. I wish to have that column undisturbed
I have also tried this
df[df$No_of_Mails][!(apply(df, 1, function(y) any(y == 0))),].
This deletes all the rows and gives me a table with zero rows.
Just subset the data frame based on the value in the No_of_Mails column:
df[df$No_of_Mails != 0, ]
Demo
I have a large data set which consists of a columns of IDs followed by a monthly time series for each ID. There are frequent missing values in this set, but what I would like to do is replace all NAs after the first non-zero with a zero while leaving all the NAs before the first non-zero value as NA's.
eg.
[NA NA NA 1 2 3 NA 4 5 NA] would be changed to [NA NA NA 1 2 3 0 4 5 0]
Any help or advice you guys could offer would be much appreciated!
Easy to do using match() and numeric indices:
use match() to find the first occurence of a non-NA value
use which() to convert the logical vector from is.na() to a numeric index
use that information to find the correct positions in x
Hence:
x <- c(NA,NA,NA,1,2,3,NA,NA,4,5,NA)
isna <- is.na(x)
nonna <- match(FALSE,isna)
id <- which(isna)
x[id[id>nonna]] <- 0
gives:
> x
[1] NA NA NA 1 2 3 0 0 4 5 0
Here's another method. Convert all to zeros first, then covert the first zeros back to NA.
> x <- c(NA,NA,NA,1,2,3,NA,NA,4,5,NA)
> x[which(is.na(x))] <- 0
### index from 1 to first element before the first element >0
> x[1:min(which(x>0))-1] <- NA
> x
[1] NA NA NA 1 2 3 0 0 4 5 0
also
### end of vector (elements are >0)
> endOfVec <- min(which(x>0)):length(x)
> x[endOfVec][is.na(x[endOfVec])] <- 0
[1] NA NA NA 1 2 3 0 0 4 5 0