Row count in R with Group By

Row count in R with Group By - r

I'm trying to calculate a percentage, grouped by a discrete element, in this case win_percentage. The column I'm calculating off of is favorite_win, which has a 0 or 1 value. I'm able to calculate the numerator easily, since I just want to sum all the 1's in the column.
wins_by_spread <- df %>% group_by(spread_favorite) %>% summarise(wins = sum(favorite_win))
To get the denominator, I figured getting a row count would be easiest. However, when I try the below code, no values come back.
games_by_spread <- df %>% group_by(spread_favorite) %>% summarise(games = nrow(favorite_win))

Related

Counting rows that match result of calculation in R

I have a line of code that calculates the maximum value for a number of products
data2019 %>%
group_by(PRODUCT) %>%
summarise(max_amt = max(AMOUNT))
I want to then count the number of rows where AMOUNT == max_amt for that particular product, but if I try to wrap it in a count or sum function it gives me the max value for the whole set, and the total number of rows for each product, which isn't very helpful, especially as the values vary considerably. How can I get it to produce the answer for each specific product?

You can do a count on condition by writing your summarize like sum(CONDITION). Like so:
data2019 %>%
group_by(PRODUCT) %>%
summarize(max_count = sum(AMOUNT == max(AMOUNT)))

dplyr count, sum and calculate the percentage and round to whole number using R

I kindly request for help in grouping by ID, counting non-zeros and presenting the results as the percentage of the total in that particular ID
My data
library(dplyr)
id<-c(1,1,1,1,1,2,2,2,2)
x<-c(0,1,0,1,0,0,0,1,0)
df1<-data.frame(id,x)
head(df1)
in my results, after grouping by id=1, then i need column with total for ones as 2 and another column with the precentage (2/5)= 40. For group id=2 then i need the total for the column as 1 and percentage as (1/4)=25

df1 %>%
group_by(id) %>%
summarise(sum_of_1 = sum(x!=0),
pct= round((sum_of_1/n())*100))
This one works. Thanks for the help Hulk

Try this-
df1 %>%
group_by(id) %>%
summarise(sum_of_1 = sum(x, na.rm = TRUE),
pct = round((sum_of_1/n())*100))

How do I compare group means to individual observations and make a new TRUE/FALSE column?

I am new to R and this is my first post on SO - so please bear with me.
I am trying to identify outliers in my dataset. I have two data.frames:
(1 - original data set, 192 rows): observations and their value (AvgConc)
(2 - created with dplyr, 24 rows): Group averages from the original data set, along with quantiles, minimum, and maximum values
I want to create a new column within the original data set that gives TRUE/FALSE based on whether (AvgConc) is greater than the maximum or less than the minimum I have calculated in the second data.frame. How do I go about doing this?
Failed attempt:
Outliers <- Original.Data %>%
group_by(Status, Stim, Treatment) %>%
mutate(Outlier = Original.Data$AvgConc > Quantiles.Data$Maximum | Original.Data$AvgConc < Quantiles.Data$Minimum) %>%
as.data.frame()
Error: Column Outlier must be length 8 (the group size) or one, not 192

Here, we need to remove the Quantiles.Data$ by doing a join with 'Original.Data' by the 'Status', 'Stim', 'Treatment'
library(dplyr)
Original.Data %>%
inner_join(Quantiles.Data %>%
select(Status, Stim, Treatment, Maximum, Minimum)) %>%
group_by(Status, Stim, Treatment) %>%
mutate(Outlier = (AvgConc > Maximum) |(AvgConc < Minimum)) %>%
as.data.frame()

Dividing values in each cell by the group average in R

I am trying to generate a new column with values derived from the original chart. I would like to calculate the group average of same hotel and same date first, then use this group averages to divide the original sales.
Here is my code: I tried to calculate the group average by using group_by and summarise embedding in dplyr package, however, it did not generate my expected results.
hotel = c(rep("Hilton",3), rep("Caesar",3))
date1 = c(rep('2018-01-01',2), '2018-01-02', rep('2018-01-01',3))
dba = c(2,0,1,3,2,1)
sales = c(3,5,7,5,2,3)
df = data.frame(cbind(hotel, date1, dba, sales))
df1 = df %>%
group_by(date1, hotel) %>%
dplyr::summarise(avg = mean(sales)) %>%
acast(., date1~hotel)
Any suggestion would be highly appreciated!

Instead of summarise, we can use mutate. After grouping by 'date1', 'hotel', divide the 'sales' by the mean of 'sales' to create a new column
library(tidyverse)
df %>%
group_by(date1, hotel) %>%
mutate(SalesDividedByMean = sales/mean(sales))
NOTE: When there are columns having different types, cbinding results in a matrix and matrix can have only a single type. So, a character class vector can change the whole data into character. Wrapping with data.frame, propagate that change into either factor (by default stringsAsFactors = TRUE or `character)
data
df <- data.frame(hotel, date1, dba, sales)

R and dplyr: Creating a new column that divides values by multiple maximal values of another column

I am editing a dataframe using dplyr where I have information on multiple reaction times(rt) for different individuals(id). I now want to make a new column, where I divide each specific reaction time by the individual's maximum reaction time.
Currently, I have only managed to divide each specific reaction time by the maximum reaction time of the group, using the following code:
df <- mutate(df, spcRT=rt)
df <- group_by(df, id, rt) %>% summarise(
spcRT = max(df$rt, na.rm=TRUE) ) %>% as.data.frame()
which(is.na(df))
df <- mutate(df,IDspcRT = rt/spcRT)

If we need to create a column ('spcRT') by dividing the reaction time ('rt') with the maximum reaction time (max(rt, na.rm=TRUE)) for each 'id', then we need to group by 'id' and do the division.
df %>%
group_by(id) %>%
mutate(spcRT = rt/max(rt, na.rm=TRUE))
It is not clear why the OP used 'rt' along with 'id' as grouping variable in the post. It would give only a single unique 'rt' value and there is no need for any max.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Row count in R with Group By - r

Related

Counting rows that match result of calculation in R

dplyr count, sum and calculate the percentage and round to whole number using R

How do I compare group means to individual observations and make a new TRUE/FALSE column?

Dividing values in each cell by the group average in R

R and dplyr: Creating a new column that divides values by multiple maximal values of another column

Categories

Resources