How can I summarise across multiple row names in R? [duplicate] - r

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Calculate the mean by group
(9 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 1 year ago.
Here is what my dataframe looks like:
> head(full_malaria_data)
X index lonx laty District population malaria_cases river_length distance_to_river distance_to_coast
1 0 1 6.470243 0.2406650 Lemba 10 0 3098.054 136.9634 210.53670
2 1 2 6.474831 0.2397604 Lemba 395 23 15498.375 240.7952 214.72492
3 2 3 6.460882 0.2677222 Lemba 1862 8 13198.230 583.5622 65.33937
4 3 4 6.471704 0.2500610 Lemba 302 0 13198.230 231.2028 523.73073
I am trying to find the total number of malaria cases per district (for Lemba, Canta Calo, Principe, Agua Grande).
So far my code does one district at a time:
full_malaria_data %>%
summarise(out = sum(malaria_cases[District == "Lobata"])) %>%
pull(out)
How can I augment my code so that it finds the sum of malaria cases for all districts at once?

Related

group_by() with two counts, one per variable [duplicate]

This question already has answers here:
Frequency count of two column in R
(8 answers)
count number of rows in a data frame in R based on group [duplicate]
(8 answers)
Closed 3 years ago.
My data looks like this
PDB.ID Chain.ID Space.Group Uniprot.Recommended.Name
101M A P 6 Myoglobin
102L A P 32 2 1 Endolysin
102M A P 6 Myoglobin
103L A P 32 2 1 Endolysin
103M A P 6 Myoglobin
104L A H 3 2 Endolysin
After reading the data and loading the required package
df <- read.delim("~/Downloads/dummy2.tsv")
library(dplyr)
I can count the number of entries for a specific variable with code like this
df %>% count(Uniprot.Recommended.Name)
Or alternatively
df %>% +
group_by(Uniprot.Recommended.Name) %>% +
summarise( +
count = n() +
)
I get two columns, a count for every case of Uniprot.Recommended.Name
My question: Is it possible to get a table with two counts. Counting the number of entries for every Space.Group per Uniprot.Recommended.Name.
Expected table should be like something like this
Myoglobin P 6 123
Myoglobin P 32 2 1 124
Endolysin P 32 2 1 125
Endolysin H 3 2 126
Thanks

Cumulative values for columns based on previous row [duplicate]

This question already has an answer here:
Sum of previous rows in a column R
(1 answer)
Closed 3 years ago.
Assume I need calculate the cumulative value based on other column in the same row and also the value from same column but previous row. Example: to obtain cumulative time based on time intervals.
> data <- data.frame(interval=runif(10),time=0)
> data
interval time
1 0.95197753 0
2 0.73623490 0
3 0.63938696 0
4 0.32085833 0
5 0.92621764 0
6 0.02801951 0
7 0.09071334 0
8 0.60624511 0
9 0.35364178 0
10 0.79759991 0
I can generate the cumulative value of time using the (ugly) code below:
for( i in 1:nrow(data)){
data[i,"time"] <- data[i,"interval"] + ifelse(i==1,0,data[i-1,"time"])
}
> data
interval time
1 0.95197753 0.9519775
2 0.73623490 1.6882124
3 0.63938696 2.3275994
4 0.32085833 2.6484577
5 0.92621764 3.5746754
6 0.02801951 3.6026949
7 0.09071334 3.6934082
8 0.60624511 4.2996533
9 0.35364178 4.6532951
10 0.79759991 5.4508950
Is it possible to do this without the for iteration, using a single command?
Maybe what you are looking for is cumsum():
library(tidyverse)
data <- data %>%
mutate(time = cumsum(interval))
As Ronak says and you do this as well using dplyr and the pipe:
library(dplyr)
data <- data %>%
mutate(time = cumsum(interval))

creating multi rows depend on special conditions [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
I have data.frame as follows :
duration classlabel
100 W
120 1
390 2
30 3
30 2
150 3
30 4
60 3
60 4
30 3
120 4
30 3
120 4
I have to make a number of lines according to duration with the class label in R. as an example, I have to make 100 rows with the class label 'W', and then 120 rows with the class label '2', etc.
anyone, can help me to solve this problem?
An option would be uncount
library(tidyr)
uncount(df1, duration, .remove = FALSE)
Or with rep from base R to replicate the sequence of rows by 'duration' column and expand the rows based on the numeric index
df1[rep(seq_len(nrow(df1)), df1$duration),]

aggregate over multiple columns [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
Hey I have some data looks like this:
ExpNum Compound Peak Tau SS
1 a 100 30 50
2 a 145 23 45
3 b 78 45 56
4 b 45 43 23
5 c 344 23 56
Id like to fund the mean based on Compound name
What I have
Norm_Table$Norm_Peak = (aggregate(data[[3]],by=list(Compound),FUN=normalization))
This is fine and I have this coding repeating 3 times just changing the data[[x]] number. Would lapply work here? or a for loop?
A dplyr solution:
library(dplyr)
data %>%
group_by(Compound) %>%
summarize_each(funs(mean), -ExpNum)

sum columns of a data frame depending on the category the observations belong to [duplicate]

This question already has answers here:
Sum of rows based on column value
(4 answers)
Closed 8 years ago.
I have a dataframe as such:
Response Spent Saved
1 Yes 100 25
2 Yes 200 50
3 No 20 2
4 No 13 3
I would like to sum up the amounts Spent and Saved, depending on the Response, ie:
Response Spent Saved
1 Yes 300 75
2 No 33 5
Right now, I am using a hackneyed approach, where I subset the dataframe into 2 new dataframes, convert the 2nd and 3rd columns into numeric data, do a colSums on each column individually, then save the outputs into a vector, then create a new dataframe....suffice to say it is a terrible approach.
How could I do this is a more effective manner?
Thanks for reading
Check ?aggregate
If your data.frame is DF, following should do what you want.
aggregate(. ~ Response, data = DF, FUN = sum)
## Response Spent Saved
## 1 No 33 5
## 2 Yes 300 75

Resources