This question already has answers here:
How to sum a variable by group
(18 answers)
How to group by two columns in R
(4 answers)
Closed 3 years ago.
I have a data frame which contains a lot of replicates rows. I would like to sum up the last column of replicates rows and remove the replications at the same time. Could anyone tell me how to do that?
The example is here:
name <- c("a","b","c","a","c")
position <- c(192,7,6,192,99)
score <- c(1,2,3,2,5)
df <- data.frame(name,position,score)
> df
name position score
1 a 192 1
2 b 7 2
3 c 6 3
4 a 192 2
5 c 99 5
#I would like to sum the score together if the first two columns are the
#same. The ideal result is like this way
name position score
1 a 192 3
2 b 7 2
3 c 6 3
4 c 99 5
Sincerely thank you for the help.
try this :
library(dplyr)
df %>%
group_by(name, position) %>%
summarise(score = sum(score, na.rm = T))
Related
This question already has answers here:
Conditional replacement of values in a data.frame
(5 answers)
Closed last year.
Having this dataframe:
dat=data.frame(a=c("ll","pp","ml","ml","v"),value=c(1,2,12,1,2))
I want to multiply by 10 only values correspond to a=ml
In base R:
dat=data.frame(a=c("ll","pp","ml","ml","v"),value=c(1,2,12,1,2))
dat$value[dat$a=="ml"] = dat$value[dat$a=="ml"] * 10
dat
Output:
a value
1 ll 1
2 pp 2
3 ml 120
4 ml 10
5 v 2
Another solution is to use a ifelse statement
dat %>%
mutate(value = ifelse(a == "ml", value*10, value))
a value
1 ll 1
2 pp 2
3 ml 120
4 ml 10
5 v 2
This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 2 years ago.
I have a dataframe that looks like the following:
unit_id outcome
1 3
1 5
1 4
2 1
2 2
2 3
I know how to calculate the mean for each unit_id.
df <- df %>%
group_by(unit_id) %>%
summarise(mean = mean(outcome))
This yields:
unit_id mean
1 4
2 2
I am trying to figure out a way to get the mean for each unit_id and include that in the original dataframe. I would like the output to look like the following.
unit_id outcome mean
1 3 4
1 5 4
1 4 4
2 1 2
2 2 2
2 3 2
We can use mutate instead of summarise
library(dplyr)
df <- df %>%
group_by(unit_id) %>%
mutate(mean = mean(outcome))
This question already has answers here:
Collapse text by group in data frame [duplicate]
(2 answers)
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 3 years ago.
I want to condense information in a dataframe to reduce the number of rows.
Consider the dataframe:
df <- data.frame(id=c("A","A","A","B","B","C","C","C"),b=c(4,5,6,1,2,7,8,9))
df
id b
1 A 4
2 A 5
3 A 6
4 B 1
5 B 2
6 C 7
7 C 8
8 C 9
I want to collapse the dataframe to all unique values of "id" and list the values in variable b. The result should look like
df.results <- data.frame(id=c("A","B","C"),b=c("4,5,6","1,2","7,8,9"))
df.results
id b
1 A 4,5,6
2 B 1,2
3 C 7,8,9
A solution for the first step is:
library(dplyr)
df.results <- df %>%
group_by(id) %>%
summarise(b = toString(b)) %>%
ungroup()
How would you turn df.results back into df?
This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Aggregating rows for multiple columns in R [duplicate]
(3 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 4 years ago.
I have a large data frame where I have one column (Phylum) that has repeated names and 253 other columns (each with a unique name) that have counts of the Phylum column. I would like to sum the counts within each column that correspond to each Phylum.
This is a simplified version of what my data look like:
Phylum sample1 sample2 sample3 ... sample253
1 P1 2 3 5 5
2 P1 2 2 10 2
3 P2 1 0 0 1
4 P3 10 12 3 1
5 P3 5 7 14 15
I have seen similar questions, but they are for fewer columns, where you can just list the names of the columns you want summed. I don't want to enter 253 unique column names.
I would like my results to look like this
Phylum sample1 sample2 sample3 ... sample253
1 P1 4 5 15 7
2 P2 1 0 0 1
3 P3 15 19 17 16
I would appreciate any help. Sorry for the format of the question, this is my first time asking for help on stackoverflow (rather than sleuthing).
If your starting file looks like this (test.csv):
Phylum,sample1,sample2,sample3,sample253
P1,2,3,5,5
P1,2,2,10,2
P2,1,0,0,1
P3,10,12,3,1
P3,5,7,14,15
Then you can use group_by and summarise_each from dplyr:
read_csv('test.csv') %>%
group_by(Phylum) %>%
summarise_each(funs(sum))
(I first loaded tidyverse with library(tidyverse).)
Note that, if you were trying to do this for one column you can simply use summarise:
read_csv('test.csv') %>%
group_by(Phylum) %>%
summarise(sum(sample1))
summarise_each is required to run that function (in the above, funs(sum)) on each column.
This question already has answers here:
Count number of rows per group and add result to original data frame
(11 answers)
Closed 5 years ago.
How to do countif in R
IN EXCEL we can write the formula as
"COUNTIF($K$2:$K$205,K2),COUNTIF($K$2:$K$205,K3),.... "
How to do in R
value col Countif
1 A 3
1 A 3
1 A 3
4 A 2
4 A 2
3 A 1
99 B 2
99 B 2
1000 B 4
1000 B 4
1000 B 4
1000 B 4
We can use the convenient function from dplyr i.e. add_count
library(dplyr)
df1 %>%
add_count(value, col)
which is similar to
df1 %>%
group_by(value, col) %>%
mutate(count = n())