I'm trying to calculate a percentage, grouped by a discrete element, in this case win_percentage. The column I'm calculating off of is favorite_win, which has a 0 or 1 value. I'm able to calculate the numerator easily, since I just want to sum all the 1's in the column.
wins_by_spread <- df %>% group_by(spread_favorite) %>% summarise(wins = sum(favorite_win))
To get the denominator, I figured getting a row count would be easiest. However, when I try the below code, no values come back.
games_by_spread <- df %>% group_by(spread_favorite) %>% summarise(games = nrow(favorite_win))
Related
I have a line of code that calculates the maximum value for a number of products
data2019 %>%
group_by(PRODUCT) %>%
summarise(max_amt = max(AMOUNT))
I want to then count the number of rows where AMOUNT == max_amt for that particular product, but if I try to wrap it in a count or sum function it gives me the max value for the whole set, and the total number of rows for each product, which isn't very helpful, especially as the values vary considerably. How can I get it to produce the answer for each specific product?
You can do a count on condition by writing your summarize like sum(CONDITION). Like so:
data2019 %>%
group_by(PRODUCT) %>%
summarize(max_count = sum(AMOUNT == max(AMOUNT)))
I kindly request for help in grouping by ID, counting non-zeros and presenting the results as the percentage of the total in that particular ID
My data
library(dplyr)
id<-c(1,1,1,1,1,2,2,2,2)
x<-c(0,1,0,1,0,0,0,1,0)
df1<-data.frame(id,x)
head(df1)
in my results, after grouping by id=1, then i need column with total for ones as 2 and another column with the precentage (2/5)= 40. For group id=2 then i need the total for the column as 1 and percentage as (1/4)=25
df1 %>%
group_by(id) %>%
summarise(sum_of_1 = sum(x!=0),
pct= round((sum_of_1/n())*100))
This one works. Thanks for the help Hulk
Try this-
df1 %>%
group_by(id) %>%
summarise(sum_of_1 = sum(x, na.rm = TRUE),
pct = round((sum_of_1/n())*100))
I am new to R and this is my first post on SO - so please bear with me.
I am trying to identify outliers in my dataset. I have two data.frames:
(1 - original data set, 192 rows): observations and their value (AvgConc)
(2 - created with dplyr, 24 rows): Group averages from the original data set, along with quantiles, minimum, and maximum values
I want to create a new column within the original data set that gives TRUE/FALSE based on whether (AvgConc) is greater than the maximum or less than the minimum I have calculated in the second data.frame. How do I go about doing this?
Failed attempt:
Outliers <- Original.Data %>%
group_by(Status, Stim, Treatment) %>%
mutate(Outlier = Original.Data$AvgConc > Quantiles.Data$Maximum | Original.Data$AvgConc < Quantiles.Data$Minimum) %>%
as.data.frame()
Error: Column Outlier must be length 8 (the group size) or one, not 192
Here, we need to remove the Quantiles.Data$ by doing a join with 'Original.Data' by the 'Status', 'Stim', 'Treatment'
library(dplyr)
Original.Data %>%
inner_join(Quantiles.Data %>%
select(Status, Stim, Treatment, Maximum, Minimum)) %>%
group_by(Status, Stim, Treatment) %>%
mutate(Outlier = (AvgConc > Maximum) |(AvgConc < Minimum)) %>%
as.data.frame()
I am trying to generate a new column with values derived from the original chart. I would like to calculate the group average of same hotel and same date first, then use this group averages to divide the original sales.
Here is my code: I tried to calculate the group average by using group_by and summarise embedding in dplyr package, however, it did not generate my expected results.
hotel = c(rep("Hilton",3), rep("Caesar",3))
date1 = c(rep('2018-01-01',2), '2018-01-02', rep('2018-01-01',3))
dba = c(2,0,1,3,2,1)
sales = c(3,5,7,5,2,3)
df = data.frame(cbind(hotel, date1, dba, sales))
df1 = df %>%
group_by(date1, hotel) %>%
dplyr::summarise(avg = mean(sales)) %>%
acast(., date1~hotel)
Any suggestion would be highly appreciated!
Instead of summarise, we can use mutate. After grouping by 'date1', 'hotel', divide the 'sales' by the mean of 'sales' to create a new column
library(tidyverse)
df %>%
group_by(date1, hotel) %>%
mutate(SalesDividedByMean = sales/mean(sales))
NOTE: When there are columns having different types, cbinding results in a matrix and matrix can have only a single type. So, a character class vector can change the whole data into character. Wrapping with data.frame, propagate that change into either factor (by default stringsAsFactors = TRUE or `character)
data
df <- data.frame(hotel, date1, dba, sales)
I am editing a dataframe using dplyr where I have information on multiple reaction times(rt) for different individuals(id). I now want to make a new column, where I divide each specific reaction time by the individual's maximum reaction time.
Currently, I have only managed to divide each specific reaction time by the maximum reaction time of the group, using the following code:
df <- mutate(df, spcRT=rt)
df <- group_by(df, id, rt) %>% summarise(
spcRT = max(df$rt, na.rm=TRUE) ) %>% as.data.frame()
which(is.na(df))
df <- mutate(df,IDspcRT = rt/spcRT)
If we need to create a column ('spcRT') by dividing the reaction time ('rt') with the maximum reaction time (max(rt, na.rm=TRUE)) for each 'id', then we need to group by 'id' and do the division.
df %>%
group_by(id) %>%
mutate(spcRT = rt/max(rt, na.rm=TRUE))
It is not clear why the OP used 'rt' along with 'id' as grouping variable in the post. It would give only a single unique 'rt' value and there is no need for any max.