I would like to known, how to subset in R based on condition. I have a large object with 10 columns, the 8 columns are logical. I want to extract all values TRUE for any 4 columns out of total 8 ?
See below. I create a vector that includes the names of the true/false variables. R will interpret TRUE as 1 and FALSE as 0; consequently, when summing across rows we want to keep rows that have a sum of 4 or greater. rowSums(df[,tf_vars]) >= 4 creates a TRUE/FALSE vector that indicates where the row has 4 or more trues. (Note that df[,tf_vars] will subset the columns of the dataframe, only keeping the variables in tf_vars). I then use that vector to subset the dataframe.
# Create dummy dataframe
df <- data.frame(matrix(nrow=100, ncol=0))
for(i in 1:8){
df[[paste0("TFvar",i)]] <- sample(100, x=c(T,F), prob=c(.5,.5), replace=T)
}
# Subset dataframe where at least 4 of the columns are true
tf_vars <- c("TFvar1", "TFvar2", "TFvar3", "TFvar4", "TFvar5", "TFvar6", "TFvar7", "TFvar8")
# (or you could use this to grab the variable names that are TRUE/FALSE variables.)
tf_vars <- names(apply(df, FUN=is.logical, 2))
df_subset <- df[rowSums(df[,tf_vars]) >= 4,]
Related
I have a data frame with 2 columns and 26 rows, the first column is composed of characters while the second column is composed of numbers.
I also have a vector with a random selection of 5 characters.
I want to sum the numbers from column two of the 5 random characters.
How can I calculate this sum?
We can use aggregate
aggregate(ints ~ char, data1, sum)
Maybe what you need is :
result <- sum(data1$ints[data1$char %in% sample1], na.rm = TRUE)
This will sum the ints value in data1 which is present in sample1.
Suppose I have a data frame with 6 columns.
How do I replace all the NA values in the first 4 columns with a 0?
I have tried:
grades[is.na(grades), 1:4] = 0
The is.na is applied on the full dataset and it gives a matrix with dimensions equal to the original dataset. So, it is better to subset the dataset and apply the is.na on the first four column to get a logical matrix and then use the same subset of data to assign the TRUE values to 0
grades[1:4][is.na(grades[1:4])] <- 0
I have a dataset in which I wish to sum each value in column n, with its corresponding value in column (n+(ncol/2)); i.e., so I can sum a value in column 1 row 1 with a value in column 12 row 1, for a dataset with 22 columns, and repeat this until column 11 is summed with column 22. The solution needs to work for hundreds of rows.
How do I do this using R, while ignoring the column names?
Suppose your data is
d <- setNames(as.data.frame(matrix(rnorm(100 * 22), nc = 22)), LETTERS[1:22])
You can do a simple matrix addition using numbers to select the columns:
output <- d[, 1:11] + d[, 12:22]
so, e.g.
all.equal(output[,1], d[,1] + d[,12])
# [1] TRUE
I have a data frame with many rows and columns in it (3000x37) and I want to be able to select only rows that may have >= 2 columns of value "NA". These columns have data of different data types. I know how to do this in case I want to select only one column via:
df[is.na(df$col.name), ]
How to make this selection if I want to select two (or more) columns?
First create a vector nn with the of the number of NA's in each row and then select only those rows with >= 2 NA's d[nn>=2,]
d = data.frame(x=c(NA,1,2,3), y=c(NA,"a",NA,"c"))
nn = apply(d, 1, FUN=function (x) {sum(is.na(x))})
d[nn>=2,]
x y
1 NA <NA>
I am trying to subset a data frame by taking the integer values of 2 columns om my data frame
Subs1<-subset(DATA,DATA[,2][!is.na(DATA[,2])] & DATA[,3][!is.na(DATA[,3])])
but it gives me an error : longer object length is not a multiple of shorter object length.
How can I construct a subset which is composed of NON NA values of column 2 AND column 3?
Thanks a lot?
Try this:
Subs1<-subset(DATA, (!is.na(DATA[,2])) & (!is.na(DATA[,3])))
The second parameter of subset is a logical vector with same length of nrow(DATA), indicating whether to keep the corresponding row.
The na.omit functions can be an answer to you question
Subs1 <- na.omit(DATA[2:3])
[https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html]
Here an example.
a,b ,c are 3 vectors which a and b have a missing value.
once they are created i use cbind in order to bind them in one matrix which afterwards you can transform to data frame.
The final result is a dataframe where 2 out of 3 columns have a missing value.
So we need to keep only the rows with complete cases.DATA[complete.cases(DATA), ] is used in order to keep only these rows that have not missing values in every column. subset object is these rows that have complete cases.
a <- c(1,NA,2)
b <- c(NA,1,2)
c <- c(1,2,3)
DATA <- as.data.frame(cbind(a,b,c))
subset <- DATA[complete.cases(DATA), ]