Counting rows that match result of calculation in R - r

I have a line of code that calculates the maximum value for a number of products
data2019 %>%
group_by(PRODUCT) %>%
summarise(max_amt = max(AMOUNT))
I want to then count the number of rows where AMOUNT == max_amt for that particular product, but if I try to wrap it in a count or sum function it gives me the max value for the whole set, and the total number of rows for each product, which isn't very helpful, especially as the values vary considerably. How can I get it to produce the answer for each specific product?

You can do a count on condition by writing your summarize like sum(CONDITION). Like so:
data2019 %>%
group_by(PRODUCT) %>%
summarize(max_count = sum(AMOUNT == max(AMOUNT)))

Related

Row count in R with Group By

I'm trying to calculate a percentage, grouped by a discrete element, in this case win_percentage. The column I'm calculating off of is favorite_win, which has a 0 or 1 value. I'm able to calculate the numerator easily, since I just want to sum all the 1's in the column.
wins_by_spread <- df %>% group_by(spread_favorite) %>% summarise(wins = sum(favorite_win))
To get the denominator, I figured getting a row count would be easiest. However, when I try the below code, no values come back.
games_by_spread <- df %>% group_by(spread_favorite) %>% summarise(games = nrow(favorite_win))

Count occurences of elements and sum up its value in R

I have a data frame with 2 columns (Company and Amount). There are multiple companies (string) and a value in another column as numeric.
I want to count all the companies sum the amount for each company.
So in the end I want to have a new data frame with 3 columns (Company, Occurrences, Total Amount)
I could count all the occurrences and saved it in a new data frame but have no idea how to get the amount.
library(dplyr)
df_companies %>% count(df_companies$Geldgeber) %>% top_n(3, n) %>% arrange(desc(n))
You can use n() to get the count
library(dplyr)
df_companies %>%
group_by(Geldgeber) %>%
summarize(ct = n(), total = sum(Betrag))
You will want to consider how to account for missing values in Betrag, if any. For example, you could initially filter(!is.na(Betrag)), or you can use na.rm=T in the call to sum()

grouping_by_condition_and_mutate in R

I need your help:
The general idea is to create 1 new column,
Group the data by a column (DEP) then count the number of Total lines per group using the column (id).
Then filter the data with another column (dely): (only dely>=60) and count the id
Then calculate the share using the number of rows of
(the filtered columns)/ (total number calculated at the beginning).
total= count(id by group)
share = (dely>=60)/total
I was able to do it in 3 steps but I wanted to know if possible to do it in a faster way?
#group the data by ( DEP)
Total_group<-df %>%
group_by(DEP) %>%
summarise(n = n())
filter the data T_depart>60
Filter_60<- df %>% filter(df$T_depart>=60)
#then gorup the filtred data by DEP as I did for the Total
Filter_60_group<-Filter_60 %>%
group_by(DEP) %>%
summarise(n = n())
then calculte the share( share_dep)
share_data<-left_join (Total_group, Filter_60_group, by="DEP") %>% mutate(share_dep=n.x/n.y)
Any idea how to put all this steps in one or 2 step?

How do you count the number of observations in multiple columns and use mutate to make the counts as new columns in R?

I have a dataset that has multiple lines of survey responses from different years and from different organizations. There are 100 questions in the survey and people can skip them. I am trying to get the average for each question by year by organization (so grouped by organization and year). I also want to get the count of the number of people in those averages since people can skip them. I want these two data points as new columns as well, so it will add 200 columns total. I figured out how to the average. See code below. I can't seem to use the same function to get the count of observation.
This is how I successfully got the average.
df<- df%>%
group_by(Organization, Year) %>%
mutate(across(contains('Question'), mean, na.rm = TRUE, .names = "{.col}_average")) %>%
ungroup()
I am now trying to use a similar set up to get the count of observations. I duplicated the columns with the raw data and added Count in the title so that the new average columns are not counted as columns that R needs to find the ncount for
df<- df%>%
group_by(Organization, Year) %>%
mutate(across(contains('Count'), function(x){sum(!is.na(.))}, .names = "{.col}_ncount")) %>%
ungroup()
The code above does get me the new columns but the n count is the same of all columns and all rows? Any thoughts?
The issue is in the lambda function i.e. function(x) and then the sum is on the . instead of x. . by itself can be evaluated as the whole data
library(dplyr)
df%>%
group_by(Organization, Year) %>%
mutate(across(contains('Count'),
function(x){sum(!is.na(x))},
.names = "{.col}_ncount")) %>%
ungroup()
If we want to use the . or .x, specify the lambda function as ~
df%>%
group_by(Organization, Year) %>%
mutate(across(contains('Count'),
~ sum(!is.na(.)),
.names = "{.col}_ncount")) %>%
ungroup()

Find the maximum value for each group in dataframe and retrieve the entire row

I've the following dataframe (With many records) and would like to retrieve the entire row for each day with maximum differences.
require(dplyr)
# This gets the maximum value for each date
maxInfo = results %>% group_by(t) %>% summarise(Value = max(differences))
I'm able to get the max value for each day but how to get the entire row?
A possible solution, using slice_max:
library(dplyr)
results %>% group_by(t) %>% slice_max(differences)
data.table method if speed is important
results[ results[, .I[ differences == max(differences )], .(t)]$V1 ]

Resources