I need to find the mean of a variable and the number of times a particular combination occurs for that mean value in r.
In the example I have grouped by variables cli, cus and ron and need to summarize to find the mean of age and frequency of cash for this combination:
df%>% group_by(.dots=c("cli","cus","ron")) %>% summarise_all(mean(age),length(cash))
This doesn't work; is there another way out?
may be it is just me as I seemed to have just over complicated this one, just summarise gets me what I needed
df%>% group_by(.dots=c("cli","cus","ron")) %>% summarise(mean(age),length(cash))
Related
I have a dataset of three columns and roughly 300000 rows:
#Person ID# ##Likelihood of Risk## ###Year the survey was taken###
Each Person has taken part multiple times and I only want the most recent likelihood of Risk.
I wanted to figure that out by grouping the Person ID and then finding the max year.
That did not work out but I rather ended up having still multiple identical person ID's.
To continue working I need one specific value of Likelihood of Risk for each ID.
Riskytest <- Risk_Adult %>% group_by(pid,A_risk) %>% summarize(max=max(syear))
Riskytest <- Risk_Adult %>%
group_by(pid) %>%
slice_max(syear) %>%
ungroup()
i.e getting the count of each variable in each column with out having to do multiple group bus
This post:
group_by(across(all_of(vars, YEARS))) - grouping by variables with a fixed YEAR variable
seems to answer my question using this method purrr::map(vars, ~df %>% count(YEAR, .data[[.x]]))Is there a way to get the percentages at the same time as the counts?
I seem to get an error when I try to just add percent as the next step
I'm just 2 days into R so I hope I can give enough Info on my problem.
I have an Excel Table on Endothelial Cell Angiogenesis with Technical Repeats on 4 different dates. (But those Dates are not in order and in different weeks)
My Data looks like this (of course its not only the 2nd of March):
I want to average the data on those 4 different days, so I can compare i.e the "Nb Nodes" from day 1 to day 4.
So to finally have a jitterplot containing the group, the investigated Data Point and the date.
I'm a medical student so I dont really have yet any knowledge about this kind of stuff but Im trying to learn it. Hopefully I provided enough Info!
Found the solution:
#Group by
library(dplyr)
DateGroup <- group_by(Exclude0, Exp.Date, Group)
#Summarizing the mean in every Group and Date
summarise(DateGroup, mymean = mean(Date$`Nb meshes`))
I think the below code will work.
group_by the dimension you want to summarize by
2a. across() is helper verb so that you don't need to manually type each column specifically, it allows us to use tidy select language so that we can quickly reference columns that contains "Nb" (a pattern that I noticed from your screenshot)
2b. With across(), second argument, you then use formula that you want to apply to each column from the first argument of across()
2c. Optional argument in across so that the new columns names have a name convention)
Good luck on your R learning! It's a really great language and you made the right choice.
#df is your data frame
df %>% group_by(Exp.Date) %>%
summarize(across(contains("Nb"),mean,.names = {.fn}_{.col}))
#if you just want a single column then do this
df %>% group_by(Exp.Date) %>%
summarize(mean_nb_nodes=mean(`Nb nodes`))
I try to group by year and then calculate the average of means, but I don't know the fastest way to do it and the way I do it gives me an error.
First I calculate how many rows per year the table has:
avg_awarded_moves_year <- imdb_globes %>% group_by(year_film) %>%
tally()
And then again use transmute function to add the table the average per year.
avg_awarded_moves_year <- imdb_globes %>% group_by(year_film) %>%
transmute(average_per_year =
sum(averageRating)/avg_awarded_moves_year$n)
The error I encounter: Error: Column "average_per_year" must be length 12 (the group size) or one, not 76
I can bet that there is a faster and more eloquent way to do it. I tried to divide the sum by "n()" , but it didn't work as well. I don't want to use mean function because the sample consists o means already.
I have data for customer purchases across different products , I calculated the amount_spent by multiplying Item Numbers by the respective Price
I used cut function to segregate people into different age bins, Now how can I find the aggregate amount spent by different age groups i.e the contribution of each age group in terms of dollars spent
Please let me know if you need anymore info
I am really sorry that I can't paste the data here due to remote desktop constraints . I am actually concerned with the result I got after summarize function
library(dplyr)
customer_transaction %>% group_by(age_gr) %>% select(amount_spent) %>% summarise_each(funs(sum))
Though I am not sure if you want the contribution to the whole pie or just the sum in each age group.
If your data is of class data.table you could go with
customer_transaction[,sum(amount_spent),by=age_gr]