This question already has answers here:
Sum rows in data.frame or matrix
(7 answers)
Closed 7 years ago.
I need to sum columns of a table that have a names starting with a particular string.
An example table might be:
tbl<-data.frame(num1=c(3,2,9), num2=c(3,2,9),n3=c(3,2,9),char1=c('a', 'b', 'c'))
I get the list of columns (in this example I wrote only 2, but the real case has more tan 20).
a<-colnames(tbl)[grep('num', colnames(tbl))]
I tried with
sum(tbl[,a])
But I get only one number with the total sum of the elements in both vectors.
What I need is the result of:
tbl$num1+ tbl$num2
We can either use Reduce
Reduce(`+`, tbl[a])
Or rowSums. The rowSums also has the option of removing the NA elements with na.rm=TRUE.
rowSums(tbl[a])
Related
This question already has answers here:
Select rows from a data frame based on values in a vector
(3 answers)
Closed 2 years ago.
I'm trying to find a way to subset the first 30 groups in my data frame (171 in total, of unequal length).
Here's a smaller dummy data frame I've been practicing with (in this case I only try to subsample the first 3 groups):
groups=c(rep("A",times=5),rep("B",times=2), rep("C",times=3),rep("D",times=2), rep("E",times=8)) value=c(1,2,4,3,5,7,6,8,7,5,2,3,5,7,1,1,2,3,5,4) dummy<-data.frame(groups,value)
So far, I've tried variations of:
subset<-c("A","B","C") dummy2<-dummy[dummy$groups==subset,]
but I get the following warning: longer object length is not a multiple of shorter object length
Would anyone know how to fix this or have other options?
We can use filter from dplyr. Get the first 'n' unique elements of 'groups' with head, use %in% to return a logical vector in filter to subset the rows
library(dplyr)
n <- 4
dummy %>%
filter(groups %in% head(unique(groups), n))
or subset in base R
subset(dummy, groups %in% head(unique(groups), n))
== can be used either with equal length vectors (for elementwise comparison) or if length of the second vector is 1. For multiple elements, use %in%
This question already has answers here:
R: Count number of objects in list [closed]
(5 answers)
Closed 2 years ago.
I have a dataframe in R, and I am trying to set all cells in the form of a vector, either c(1,2,3) or 1:2 to NA. Is there any easy way to do this?
You can use lengths to count number of elements in each value of column. Set them to NA where the length is greater than 1. Here I am considering dataframe name as df and column name as col_name. Change them according to your data.
df$col_name[lengths(df$col_name) > 1] <- NA
This question already has answers here:
Normalizing selection of dataframe columns with dplyr
(2 answers)
Closed 3 years ago.
In need to normalise my data by dividing each value by the mean of the entire column, preferably using dplyr.
assume
inputs <- c(3,5,3,9,12)
mydata = data.frame(inputs)
I would like all the values replaced by themselves divided by the mean, which is 6.4.
Any straightforward suggestion?
We can use sapply in base R for generalized approach
sapply(mydata, function(x) x/mean(x))
Or with colMeans if more than one column
mydata/colMeans(mydata)[col(mydata)]
This question already has answers here:
How to index a vector sequence within a vector sequence
(5 answers)
Closed 5 years ago.
I have got a dataframe and I need to find row numbers where the values of the entries in one column match a certain pattern.
Let the col1 col1 = matrix(c(1,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0,0,0,1), nrow = 21, ncol = 1) be an example of by column and vector r r = c(2, 0 ,2) be a vector I need to match it with.
I need R to return an index number of rows where the pattern in r matches the values in col1 (in this case row 11, 12, 13).
I thought I could achieve this with row.match, but that is not the case. I have tried different combinations of match function, but it doesn't yield any results either.
Maybe the way I am approaching this problem is wrong from the beginning, but I have trouble believing that there isn't any function, that would provide me with the expected result given some adjustment.
Thanks.
You could do this using rollapply from zoo. Basically, this runs identical on a rolling basis with a window of length(r). This tells you that the sequence is present starting at positon 11 of the col1 vector..
library(zoo)
which(rollapply(col1,length(r),identical,r))
[1] 11
To get a vector of positions, you could do:
which(rollapply(col1,length(r),identical,r))+0:(length(r)-1)
[1] 11 12 13
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 4 years ago.
In R i have a matrix that has several categorical values to it. Indexed size 2sqm, 4 sqm, 6sqm, number of units from 1-3, number of persons from 1-4 and then a column that has a summarized count from all the occurrences.
ex:
Size;Units;Pers;Count
4;3;4;3 # three time this row
2;1;1;2 # two times this row
6;2;2;1 # one times this row
How can i make the last column/vector multyply the rows so that is prints out:
Size;Units;Pers;Count
4;3;4;1
4;3;4;1
4;3;4;1
2;1;1;1
2;1;1;1
6;2;2;1
Either in spreadsheet or in R.
This is a assignment for school and i just cannot find the way to make the last vector (which i use as a constant to multiply the first 3 columns and yet still keep one in the last column entry.
We can replicate the sequence of rows by the 'Count' column and transform to create the 'Count' column of 1.
transform(df1[rep(1:nrow(df1), df1$Count),-4], Count=1)
This can be also done with wrapper function expandRows from library(splitstackshape)
library(splitstackshape)
transform(expandRows(df1, 'Count'), Count=1)