This question has come up before and there are some solutions but none that I could find for this specific case. e.g.
my_diamonds <- diamonds %>%
mutate(blah_var1 = rnorm(n()),
blah_var2 = rnorm(n()),
blah_var3 = rnorm(n()),
blah_var4 = rnorm(n()),
blah_var5 = rnorm(n()))
my_diamonds %>%
group_by(cut) %>%
summarise(MaxClarity = max(clarity),
MinTable = min(table), .groups = 'drop') %>%
summarise_at(vars(contains('blah')), mean)
Want a new df showing the max clarity, min table and mean of each of the blah variables. The above returned an empty tibble. Based on some other SO posts I tried using mutate and then summarise at:
my_diamonds %>%
group_by(cut) %>%
mutate(MaxClarity = max(clarity),
MinTable = min(table)) %>%
summarise_at(vars(contains('blah')), mean)
This returns a tibble but only for the blah variables, MaxClarity and MinTable are missing.
Is there a way to combine summarise and summarise_at in the same dplyr chain?
One issue with the summarise is that after the first call of summarise, we get only the columns in the grouping i.e. the 'cut' along with and the summarised columns i.e. 'MaxClarity' and 'MinTable'. In addition, after the first summarise step, the grouping is removed with groups = 'drop'
library(dplyr) # version >= 1.0
my_diamonds %>%
group_by(cut) %>%
summarise(MaxClarity = max(clarity),
MinTable = min(table),
across(contains('blah'), mean, na.rm = TRUE), .groups = 'drop')
I have a data-frame likeso:
Time <- seq.POSIXt(as.POSIXct("2017-11-14 00:01:00 CET"), as.POSIXct("2017-11-14 00:15:00 CET"), units = "minute", by=60)
A <- c(2,3,5,2,5,8,17,3,5,8,17,3,5,1,5)
B <- c(1,1,2,1,2,1,2,2,2,4,6,7,8,8,9)
DF <- data.frame(Time=Time, A=A, B=B)
and i want a "newDF" where I aggregate data by 5min, excluding however, for each column, the max/min value before the aggregation.
Using dplyr i get to something like this:
DF$TimeStamp_round<-floor_date(DF$Time,unit="5 minutes")
DF<-DF %>%
group_by(TimeStamp_round) %>%
mutate(TimeStamp_count = cur_group_id())
newDF<-DF %>%
group_by(TimeStamp_count) %>%
summarise(across(where(is.numeric), mean))
but i still donĀ“t manage to exclude the max/min value before the summarise() function in newDF
note: I do not want to do it manually for each column, because in the real DF the columns are 350
We can remove the range of values before taking the mean after grouping by 'TimeStamp_round'
library(dplyr)
DF %>%
group_by(TimeStamp_round) %>%
summarise(across(A:B, ~ mean(.[!. %in% range(.)])), .groups = 'drop')
Or if there are more columns and want to get the mean only for numeric
DF %>%
select(-Time) %>%
group_by(TimeStamp_round) %>%
summarise(across(where(is.numeric),
~ mean(.[!. %in% range(.)])), .groups = 'drop')
I have a dataframe like the following:
observations<- data.frame(X=c("00KS089001","00KS089001","00KS089002","00KS089002","00KS089003","00KS089003","00KS105001","00KS105001", "00KS177011","00KS177011","00P0006","00P006","00P006","00P006"), hzdept = c(0,20,0,15,0,13,0,20,0,16,0,6,13,29), hzdepb = c(20,30,15,30,13,30,20,30,16,30,6,13,29,30),Y=c("Red","White","Red","White","Green","Red","Red","Blue", "Black","Black","Red","White","White","White"), Z = c(0.67,0.33,0.5,0.5,0.43,0.57,0.67,0.33,0.53,0.47,0.2,0.23,0.53,0.04))
I want to be able to reduce this so that anytime X and Y are the same for two rows, the observations are combined i.e.
data.frame(X=c("00KS089001","00KS089001","00KS089002","00KS089002","00KS089003","00KS089003","00KS105001","00KS105001", "00KS177011","00P0006","00P006"), hzdept = c(0,20,0,15,0,13,0,20,0,0,6), hzdepb = c(20,30,15,30,13,30,20,30,30,6,30),Y=c("Red","White","Red","White","Green","Red","Red","Blue", "Black","Red","White"), Z = c(0.67,0.33,0.5,0.5,0.43,0.57,0.67,0.33,1.00,0.20,0.80))
Any suggestions on how to best go about this?
Edit: ok, now that I see how hzdept and hzdepb are supposed to be combined from your commment above:
library(tidyverse)
df <- observations %>% count(X,Y,wt = Z,name = "Z")
df_hzdept <- observations %>%
arrange(hzdept) %>%
distinct(X,Y,.keep_all = T) %>%
select(X,Y,hzdept)
df_hzdepb <- observations %>%
arrange(desc(hzdepb)) %>%
distinct(X,Y,.keep_all = T) %>%
select(X,Y,hzdepb)
df <- df %>% left_join(df_hzdept) %>% left_join(df_hzdepb)
Using dplyr
Here is how you would group by two columns and summarize using the minimum, max, and sum other columns in a dataframe:
library(magrittr) # For the pipe: %>%
observations %>%
dplyr::group_by(X, Y) %>%
dplyr::summarise(hzdept = min(hzdept),
hzdepb = max(hzdepb),
Z = sum(Z), .groups = 'drop')
I can't found any help lf internet.
I have 3 cols in .sav file loaded to R studio.
Is M with values 1,2,3,4,5,6,7 and label: weight, and N with values 1,2,3 and label diet.
I want group by it by these columns, but for N col I want only pick those where value is 1. Also I have last column with age data A.
I wrote this:
library(dplyr)
df%>%
group_by(M, N) %>%
summarize(values = mean(A, na.rm = TRUE))
And I got group by but for all N.
I tried something like this:
library(dplyr)
df%>%
group_by(M, N == 1) %>%
summarize(values = mean(A, na.rm = TRUE))
but I got again group for all categories from N with NA etc.
Expcted: I want only group_by by M - all values, and N where value =1.
How should that group by looks?
We can do a group by 'M' and summarise the filtered 'A'
library(dplyr)
df %>%
group_by(M) %>%
summarise(values = mean(A[N == 1], na.rm = TRUE))
Or another option is to have a filter in between, but this would also remove the groups where there are no 'N' as 1
df %>%
filter(N == 1) %>%
group_by(M) %>%
summarise(values = mean(A, na.rm = TRUE))
So I have a dplyr table movie_info_comb from which I am calculating various statistics on one column metascore. Here is the code:
summarise_each_(movie_info_comb, funs(min,max,mean,sum,sd,median,IQR),"metascore")
How do incorporate na.rm=TRUE? I've only seen examples for which one statistic is being calculated and I'd hate to have to repeat this 5 times (one for each function.
Thanks in advance.
You can do this with lazy evaluation
library(lazyeval)
na.rm = function(FUN_string)
lazy(FUN(., na.rm = TRUE)) %>%
interp(FUN = FUN_string %>% as.name)
na.rm.apply = function(FUN_strings)
FUN_strings %>%
lapply(na.rm) %>%
setNames(FUN_strings)
mtcars %>%
select(mpg) %>%
summarize_each(
c("min","max","mean","sum","sd","median","IQR") %>%
na.rm.apply)