Adding across a dataset

Adding across a dataset - r

I have a dataset and I am trying to add across the columns. For example, say there are 50 rows and 100 columns. For each row I want to go through specific columns (not all) and add the results.
Thanks for any help!

apply(df[,c(1,5,10,11,15)],1,sum) will add columns 1,5,10,11, and 15.

rowSums is generally faster than apply(dat, 1, sum). Furthermore they both may need to have an additional argument to prevent NA values for sabotaging the results.
rowSums( dat[ , cols_to_sum] , na.rm=TRUE )
If you want to have an irregular selection of columns, i.e. different columns from different rows, then that too is possible but you will need to clarify the question.

Related

Getting subset of data based on conditional values from two columns

I have a data frame called (image attached below) where I need to select rows based on conditional values of two columns. Specifically, I need to select those rows where c1Pos and c2Pos have consecutive values (e.g., 4 and 5, 3 and 2, etc.).
I have tried the following, but it doesn't work:
combined_df_locat_short_cues_consec<-subset(combined_df_color_locat_cues, subset=!(c2Pos==c1Pos+1|c1Pos==c1Pos-1))
Any help would be very much appreciated.
Thanks in advance,
Mikel

Please replace
subset=! with subset=
c1Pos==c1Pos-1 with c2Pos==c1Pos-1
combined_df_locat_short_cues_consec<-subset(combined_df_color_locat_cues, subset=(c2Pos==c1Pos+1|c2Pos==c1Pos-1))

Sum rows after a specific column for uncertain number of columns

Let's say i want to sum values of rows in several data frames. I want to start with column 2 and sum every value, that comes after that column. The different data frames may have different numbers of columns though. I guess it can work with
rowSums(df[2:X]).
I just dont know, what to replace the X with. Or is there a totally different way of doing it?
Regards

In case you only want to exclude the first column you can write:
rowSums(df[-1])
or
rowSums(df[,-1])

Use ncol to get column number :
rowSums(df[2:ncol(df)])
You can also use length.
rowSums(df[2:length(df)])

Aggregate rows across some columns using ID and keep others unchanged in a large R dataframe

I have a large dataframe (6000rx42c) where I have an almost unique ID. There are some duplicates where one ID has multiple rows, which vary only by 2 numerical columns which I am happy to add up into 1 row.
I've spent ages looking and aggregate seems to work,however I need to list all columns I am keeping which is a pain. Can someone suggest a better solution? I am not wedded to aggregate.
NewDF <-aggregate(cbind(AddColl1,AddCol2)~ID+OtherCol1+OtherCol2+OtherCol3...OtherCol39 , DF , sum)

r-How to pick up columns with same number in each rows in a matrix in R

Sorry for asking this naive question, but it's really hard for me.
I have a matrix that has about 5000 columns and 80 rows, I was wondering how can I pick up the columns that have the same number in each row (cell) across 80 rows?
Any help or suggestions will be appreciated!
Thanks,
Jing

If your dataframe or matrix is called df, try:
which(apply(df, 2, function(x) length(unique(x))) == 1)
This works because applying length(unique(x)) to a column x will tell you exactly how many distinct items there are in the column. If there's only one, then it's one of the columns with repeating rows which you're looking for.
The apply part runs this logic over every column in your matrix/dataframe.
The which(... == 1) part gives you the index of the columns which are just repeating.

Using data frame values to select columns of a different data frame

I'm relatively new in R so excuse me if I'm not even posting this question the right way.
I have a matrix generated from combination function.
double_expression_combinations <- combn(marker_column_vector,2)
This matrix has x columns and 2 rows. Each column has 2 rows with numbers that will be used to represent column numbers in my main data frame named initial. These columns numbers are combinations of columns to be tested. The initial data frame is 27 columns (thousands of rows) with values of 1 and 0. The test consists in using the 2 numbers given by double_expression_combinations as column numbers to use from initial. The test consists in adding each row of those 2 columns and counting how many times the sum is equal to 2.
I believe I'm able to come up with the counting part, I just don't know how to use the data from the double_expression_combinations data frame to select columns to test from the "initial" data frame.
Edited to fix corrections made by commenters

Using R it's important to keep your terminology precise. double_expression_combinations is not a dataframe but rather a matrix. It's easy to loop over columns in a matrix with apply. I'm a bit unclear about the exact test, but this might succeed:
apply( double_expression_combinations, 2, # the 2 selects each column in turn
function(cols){ sum( initial[ , cols[1] ] + initial[ , cols[2] ] == 2) } )
Both the '+' and '==' operators are vectorised so no additional loop is needed inside the call to sum.