This question already has answers here:
Calculate the mean by group
(9 answers)
Closed 3 years ago.
I have data that includes a treatment group, which is indicated by a 1, and a control group, which is indicated by a 0. This is all contained under the variable treat_invite. How can I separate these and take the mean of pct_missing for the 1's and 0's? I've attached an image for clarification.
enter image description here
assuming your data frame is called df:
df <- df %>% group_by(treat_invite) %>% mutate(MeanPCTMissing = mean(PCT_missing))
Or, if you want to just have the summary table (rather than the original table with an additional column):
df <- df %>% group_by(treat_invite) %>% summarise(MeanPCTMissing =
mean(PCT_missing))
Related
This question already has answers here:
filtering within the summarise function of dplyr
(3 answers)
Opposite of %in%: exclude rows with values specified in a vector
(13 answers)
Closed 3 months ago.
This post was edited and submitted for review 3 months ago and failed to reopen the post:
Original close reason(s) were not resolved
EDIT: I want to specify which values NOT to include in my calculation by providing a list of values for records to skip. I do NOT want to provide a list of values to include in my calculation because my dataset is too large.
I want to group records based on a certain value, and then I want to do some other calculations for certain variables; however, I want to exclude certain values from one of those calculations. Here is an example of what the data transformation would look like without any exclusions:
library(dplyr)
grouped <- starwars %>%
group_by(species) %>% #group my data by a particular value
summarise(Total_Mass = sum(mass), #make a calculation
Average_Height = mean(height)) # make another calculation
and here's what I am attempting to do:
exclude <- c("R2-D2","Luke","Darth") #make a list of the names of records I would like to exclude
grouped2 <- starwars %>%
group_by(species) %>%
summarise(Total_Mass = sum(mass) where name !%in% exclude, #sum mass for all records except those where name is in the exclude list
Average_Height = mean(height)) # make another calculation without any exclusions
This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 2 years ago.
I have a table with 2 columns.
Type: 1 or 2 or 3 or 4
Data: corresponding data (there are multiple data for each type)
Now I want to create a third column that contains means of data each type i.e., all the rows with type 1 have the same mean value. I think I should do it with mutate function but not sure how to proceed.
data %>% mutate(meanData = ifelse(...))
Can somebody help?
Thank you in advance.
We can do a group by operation
library(dplyr)
data <- data %>%
group_by(Type) %>%
mutate(meanData = mean(Data))
This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Select the row with the maximum value in each group
(19 answers)
Closed 3 years ago.
I have data like this:
ID SHape Length
180139746001000 2
180139746001000 1
I want to delete the duplicate rows whichever has the less shape length.
Can anyone help me with this?
with
df <- data.table(matrix(c(102:106,106:104,1:3,1:3,5:6),nrow = 8))
colnames(df) <- c("ID","Shape Length")
just use duplicated after sorting
setkey(df,"V2")
df[!duplicated(V1, fromLast = TRUE)]
You can select the highest shape length for each ID by performing
df %>%
group_by(ID) %>%
arrange(SHape.Length) %>%
slice(1) %>%
ungroup()
This question already has answers here:
Filling missing dates by group
(3 answers)
Fastest way to add rows for missing time steps?
(4 answers)
Closed 5 years ago.
I have a data frame like the following and would like to pad the dates.
Notice that four days are missing for id 3.
df = data.frame(
id = rep(1,1,1,2,2,3,3,3),
date = lubridate::ymd("2017-01-01","2017-01-02","2017-01-03",
"2017-05-10","2017-05-11","2017-01-03",
"2017-01-08","2017-01-09"),
type = c("A","A","A","B","B","C","C","C"),
val1 = rnorm(8),
val2 = rnorm(8))
df
I tried the padr package as I wanted a quick solution, but this doesn't seem to work.
?pad
padr::pad(df)
library(dplyr)
df %>% padr::pad(group = c('id'))
df %>% padr::pad(group = c('id','date'))
Any ideas on tools or other packages to pad a dataset across multiple columns and based on groupings
EDIT:
So there are three missing dates in my df.
"2017-01-03","2017-01-08","2017-01-09"
Thus, I want the final dates to include three extra rows that contain
"2017-01-04","2017-01-05","2017-01-06","2017-01-07"
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 5 years ago.
My DF has two columns One is state names and another one is No of event happened.
This Data had more columns and now I want to sum up all the no of events happened by state. how can I do that?
How about using dplyr,
library(dplyr)
DF %>%
group_by(state) %>%
summarize(total_event = sum(event))
Here I have supposed your data is in DF data frame and it has columns names state and event.