I have a dataset in which I wish to sum each value in column n, with its corresponding value in column (n+(ncol/2)); i.e., so I can sum a value in column 1 row 1 with a value in column 12 row 1, for a dataset with 22 columns, and repeat this until column 11 is summed with column 22. The solution needs to work for hundreds of rows.
How do I do this using R, while ignoring the column names?
Suppose your data is
d <- setNames(as.data.frame(matrix(rnorm(100 * 22), nc = 22)), LETTERS[1:22])
You can do a simple matrix addition using numbers to select the columns:
output <- d[, 1:11] + d[, 12:22]
so, e.g.
all.equal(output[,1], d[,1] + d[,12])
# [1] TRUE
Related
I have a data frame with 2 columns and 26 rows, the first column is composed of characters while the second column is composed of numbers.
I also have a vector with a random selection of 5 characters.
I want to sum the numbers from column two of the 5 random characters.
How can I calculate this sum?
We can use aggregate
aggregate(ints ~ char, data1, sum)
Maybe what you need is :
result <- sum(data1$ints[data1$char %in% sample1], na.rm = TRUE)
This will sum the ints value in data1 which is present in sample1.
I would like to known, how to subset in R based on condition. I have a large object with 10 columns, the 8 columns are logical. I want to extract all values TRUE for any 4 columns out of total 8 ?
See below. I create a vector that includes the names of the true/false variables. R will interpret TRUE as 1 and FALSE as 0; consequently, when summing across rows we want to keep rows that have a sum of 4 or greater. rowSums(df[,tf_vars]) >= 4 creates a TRUE/FALSE vector that indicates where the row has 4 or more trues. (Note that df[,tf_vars] will subset the columns of the dataframe, only keeping the variables in tf_vars). I then use that vector to subset the dataframe.
# Create dummy dataframe
df <- data.frame(matrix(nrow=100, ncol=0))
for(i in 1:8){
df[[paste0("TFvar",i)]] <- sample(100, x=c(T,F), prob=c(.5,.5), replace=T)
}
# Subset dataframe where at least 4 of the columns are true
tf_vars <- c("TFvar1", "TFvar2", "TFvar3", "TFvar4", "TFvar5", "TFvar6", "TFvar7", "TFvar8")
# (or you could use this to grab the variable names that are TRUE/FALSE variables.)
tf_vars <- names(apply(df, FUN=is.logical, 2))
df_subset <- df[rowSums(df[,tf_vars]) >= 4,]
I am interested in inserting all missing rows into a data table for a new range of values for 2 columns.
Example, dt1[,a] has some values from 1 to 5, as does dt1[,b], but i'd like not only all pair wise combinations to be present in columns a and b, but all combinations to be present in a newly defined range, e.g. 1 to 7 instead.
# Example data.table
dt1 <- data.table(a=c(1,1,1,1,2,2,2,2,3,3,3,4,4,4,4,4,5,5,5),
b=c(1,3,4,5,1,2,3,4,1,2,3,1,2,3,4,5,3,4,5),
c=sample(1:10,19,replace=T))
setkey(dt1,a,b)
# CJ in data.table will create all rows to ensure all
# pair wise combinations are present (using the nominated columns).
dt1[CJ(a,b,unique=T)]
The above is great but will only use the max and min in the nominated columns. I'd like the inserted rows to give me all combinations between a new, nominated range, e.g. 1 to 7. There would be 49 rows.
# the following is a temporary workaround
template <- data.table(a1=rep(1:7,each=7),b1=rep(1:7,7))
setkey(template,a1,b1)
full <- dt1[template]
Instead of the already existing values in 'a' column, we can have a range of values to pass into 'CJ' for the 'a'
dt1[CJ(a = 1:7, b, unique = TRUE)]
I have a matrix with the size of 10x100. How can I swap the values between row 1 and row 2 in the first 30% of the columns?
We can just reverse the row index for the 1st two rows along along with column index created by taking the sequence of rounded 30% total number of columns for swapping the values in the rows.
colS <- seq(round(ncol(m1)*0.3))
m1[2:1, colS] <- m1[1:2, colS]
data
m1 <- matrix(1:1000, 10, 100)
I have a data frame with many rows and columns in it (3000x37) and I want to be able to select only rows that may have >= 2 columns of value "NA". These columns have data of different data types. I know how to do this in case I want to select only one column via:
df[is.na(df$col.name), ]
How to make this selection if I want to select two (or more) columns?
First create a vector nn with the of the number of NA's in each row and then select only those rows with >= 2 NA's d[nn>=2,]
d = data.frame(x=c(NA,1,2,3), y=c(NA,"a",NA,"c"))
nn = apply(d, 1, FUN=function (x) {sum(is.na(x))})
d[nn>=2,]
x y
1 NA <NA>