Conditional sum of rows by a column value - r

I'm trying to sum rows that contain a value in a different column.
rowSums(wood_plastics[,c(48,52,56,60)], na.rm=TRUE)
The above got me row sums for the columns identified but now I'd like to only sum rows that contain a certain year in a different column. I tried this
rowSums(mydata[,c(48,52,56,60)], na.rm=TRUE, mydata$current_year = '2015')
with no success. I thought I might have to single out the year value from the column number, 7, in the initial column list.
Any help is appreciated.

I would say simply
rowSums(mydata[mydata$current_year == '2015',c(48,52,56,60)], na.rm=TRUE)
since I don't have the original data frame I cannot give you the result. But the idea is that you can select which rows you want before the comma while selecting which column you want. Is this clear enough for you?

Related

Conditionally replace values from multiple columns with values from one column

My aim is to replace values from multiple columns by values in a column b_Y_1, provided they are not missing. If there is a missing value in a column b_Y_1 for the corresponding line, it remains unchanged. My problem is that I do not know how to write "remains unchanged" for more columns at once.
I use data.table package.
data[ ,c("a_.Y3_1","a_.Y3_2","a_.Y3_3","a_Y1_1","a_Y1_2","a_Y1_3") := ifelse(!is.na(data$b_Y_1), data$b_Y_1,????)]

R - removing rows where values of one column fail to match another column

I have a line of code in my R script which calculates a percentage from my dataset edata regarding how often the values of two columns GazeCueTarget.CRESP and GazeCueTarget.RESP match up per row of data.
I want to be able to delete all the rows where the values from both of these columns do not match up. So if my below code tells me 97% of the time the values of GazeCueTarget.CRESP match that of GazeCueTarget.RESP on a given row of my data, I want to be able to get rid of the remaining 3% where the values mismatch.
This is what I have produced to give me a percentage for when the rows match up.
Any advice would be very much appreciated. I think the solution should be quite simple but I am not sure.
paste0((100*with(edata, mean(GazeCueTarget.CRESP==GazeCueTarget.RESP, na.rm = "TRUE"))), "%")
You can try subsetting off non matching rows:
edata <- edata[edata$GazeCueTarget.CRESP == edata$GazeCueTarget.RESP, ]

Put column sums in a new row in a matrix

I have a data frame that consists of municipality names (factors) in the first column and number of projects (integers) in columns two and three.
Var.1<-c("Andover", "Avon", "Bethany")
Freq.x<-c(2,NA,10)
Freq.y<-c(4,2,9)
Projects<-data.frame(Var.1,as.integer(as.numeric(Freq.y)),as.integer(as.numeric(Freq.x)))
[Note: I am making the second and third columns as integers here because that's how they are categorized in my actual data set.]
I was able to take the row sums of the rows using:
Projects$Sum<-rowSums(Projects[,2:3])
However, I'm unable to figure out how to take the column sums. I tried using the following formula:
Projects[Total,]<-colSums(Projects[2:3,])
I get the error:
Error in colSums(Projects[2:3, ]) : 'x' must be numeric
Even when I convert the second and third columns to as.numeric, I get the same response.
Can someone advise how to obtain the column sums create a new row at the bottom which will house the results?
You can do something like this:
Var.1<-c("Andover", "Avon", "Bethany")
Freq.x<-c(2,NA,10)
Freq.y<-c(4,2,9)
freq <- cbind(Freq.x, Freq.y)
freq <- rbind(freq, colSums(freq, na.rm=TRUE))
Projects <- data.frame(name=c(Var.1, "Total"), freq)
In particular: keep numeric part separate and compute it's sums; add "TOtal" to the character vector before it will be converted to factor, and thereafter make the data.frame

Calculating column totals and then sorting on the results in R

Please can anyone offer guidance on how to
calculate column totals and then
sort on the resultant totals in R?
Everything I've tried so far to total the columns, eg, (colsums() and sapply() returns the resultant totals as a vector (eg ABOUT 4022) and I cannot find any information on how I can split this into the Column Header ABOUT and Column Value 4022 and then sort both the Header and Value on the Column Value.
Note that the function is called colSums (not colsums).
Use this with sort. Or if you want to order the columns use order:
colSums(mtcars)
sort(colSums(mtcars))
mtcars[ ,order(colSums(mtcars))]

Subset a dataframe based on a single condition applied to multiple columns

I've had a look through the existing subset Q&A's on this site and couldn't quite find what I was looking for.
I want to subset a data frame based on one condition (e.g. if the value is below 5). However, I only want the rows where the value in all of the columns is below 5.
For example using the iris dataset - I would like to select all the rows where columns 1-3 all have values below 5.
subdata <- iris[which(iris[,1:3]<5),]
This doesn't do it for me. I get lots of NA rows at the bottom of the subset data.
Any help much appreciated!
Try
subdata <- iris[apply(iris[,1:3] < 5, 1, all),]

Resources