Suppose I have a data frame with 6 columns.
How do I replace all the NA values in the first 4 columns with a 0?
I have tried:
grades[is.na(grades), 1:4] = 0
The is.na is applied on the full dataset and it gives a matrix with dimensions equal to the original dataset. So, it is better to subset the dataset and apply the is.na on the first four column to get a logical matrix and then use the same subset of data to assign the TRUE values to 0
grades[1:4][is.na(grades[1:4])] <- 0
Related
I have a dataframe in which I want to identify columns that have NA values.
Only about 10% of the columns have NA values, so I only want to list out the columns where the number of NAs is greater than 0.
The line below will bring back all columns, is there a way I can filter out the columns that dont have NA values?
colSums(is.na(df))
To list out columns with at least a single null value you can use:
df.columns[df.isna().any()].tolist()
To list out columns with all null values you can use:
df.columns[df.isna().all()].tolist()
I would like to known, how to subset in R based on condition. I have a large object with 10 columns, the 8 columns are logical. I want to extract all values TRUE for any 4 columns out of total 8 ?
See below. I create a vector that includes the names of the true/false variables. R will interpret TRUE as 1 and FALSE as 0; consequently, when summing across rows we want to keep rows that have a sum of 4 or greater. rowSums(df[,tf_vars]) >= 4 creates a TRUE/FALSE vector that indicates where the row has 4 or more trues. (Note that df[,tf_vars] will subset the columns of the dataframe, only keeping the variables in tf_vars). I then use that vector to subset the dataframe.
# Create dummy dataframe
df <- data.frame(matrix(nrow=100, ncol=0))
for(i in 1:8){
df[[paste0("TFvar",i)]] <- sample(100, x=c(T,F), prob=c(.5,.5), replace=T)
}
# Subset dataframe where at least 4 of the columns are true
tf_vars <- c("TFvar1", "TFvar2", "TFvar3", "TFvar4", "TFvar5", "TFvar6", "TFvar7", "TFvar8")
# (or you could use this to grab the variable names that are TRUE/FALSE variables.)
tf_vars <- names(apply(df, FUN=is.logical, 2))
df_subset <- df[rowSums(df[,tf_vars]) >= 4,]
I have a column Total with about 300 values in a Data Frame, here first 30 values are NA’s .. and I would like to fill in these with a vector values c(233,423,545,354,223,646,243,553,634----231),
do you have any suggestions for getting it done?.
You could just do:
df[1:30, "Total"] <- yourvector
I am trying to subset a data frame by taking the integer values of 2 columns om my data frame
Subs1<-subset(DATA,DATA[,2][!is.na(DATA[,2])] & DATA[,3][!is.na(DATA[,3])])
but it gives me an error : longer object length is not a multiple of shorter object length.
How can I construct a subset which is composed of NON NA values of column 2 AND column 3?
Thanks a lot?
Try this:
Subs1<-subset(DATA, (!is.na(DATA[,2])) & (!is.na(DATA[,3])))
The second parameter of subset is a logical vector with same length of nrow(DATA), indicating whether to keep the corresponding row.
The na.omit functions can be an answer to you question
Subs1 <- na.omit(DATA[2:3])
[https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html]
Here an example.
a,b ,c are 3 vectors which a and b have a missing value.
once they are created i use cbind in order to bind them in one matrix which afterwards you can transform to data frame.
The final result is a dataframe where 2 out of 3 columns have a missing value.
So we need to keep only the rows with complete cases.DATA[complete.cases(DATA), ] is used in order to keep only these rows that have not missing values in every column. subset object is these rows that have complete cases.
a <- c(1,NA,2)
b <- c(NA,1,2)
c <- c(1,2,3)
DATA <- as.data.frame(cbind(a,b,c))
subset <- DATA[complete.cases(DATA), ]
I am new to R with a fairly simple question, I just can't figure out the answer. For my example I will use a data frame with 3 columns, but my actual data set is 139 columns with 10000 rows.
I want to replace all of the values in a given row with NA if the value in the same row in column C contains a value < 10.
Assume that all of my columns are either number or integer values.
so I want to take the data frame:
x=data.frame(c(5,9,2),c(3,4,6),c(12,9,11))
names(x)=c("A","B","C")
and replace row 2 with NA to create
y=data.frame(c(5,"NA",2),c(3,"NA",6),c(12,"NA",11))
names(y)=c("A","B","C")
Thanks!
how about:
x[x$C <10 ,] <- NA