This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 3 years ago.
I want to add an extra column in my already existing dataframe with location, coral type, percent bleached and year as column. I want average of bleach percent of each type of coral on every site over the years. For example, soft corals on site 01 has bleach percent on 20 in 2010 and 10 in 2011 so the average column value will contain 15.
already exiting df
type location year value
soft site01 2010 20
soft site01 2011 10
hard site01 2010 10
hard site01 2011 30
after adding column
type location year value avg
soft site01 2010 20 15
soft site01 2011 10 15
hard site01 2010 10 20
hard site01 2011 30 20
You can use ave:
transform(dat, avg = ave(value, type, location))
The result:
type location year value avg
1 soft site01 2010 20 15
2 soft site01 2011 10 15
3 hard site01 2010 10 20
4 hard site01 2011 30 20
Related
I have three columns in excel year, month value.
I want to average value considering month and year. In R language this function is done by group_by(). In excel how could this be done?
year month value
2019 1 12
2019 1 34
2019 2 56
2019 2 15
2020 1 16
2020 3 67
2020 4 89
2018 6 123
2018 6 45
2018 7 98
2019 3 53
2019 1 23
2020 1 12
2020 3 1
If one has Office 365 we can use:
=LET(
y,A2:A15,
m,B2:B15,
v,C2:C15,
u,SORT(UNIQUE(CHOOSE({1,2},y,m)),{1,2}),
CHOOSE({1,1,2},u,AVERAGEIFS(v,y,INDEX(u,0,1),m,INDEX(u,0,2))))
Put this in the first cell and it will spill the results.
Once the HSTACK is release we can replace the CHOOSE with it:
=LET(
y,A2:A15,
m,B2:B15,
v,C2:C15,
u,SORT(UNIQUE(HSTACK(y,m)),{1,2}),
HSTACK(u,AVERAGEIFS(v,y,INDEX(u,0,1),m,INDEX(u,0,2))))
Averageifs would do what you want, but you might want to review using the Filter function to duplicate the Group_By() method for other similar procedures. Once grouped, you can sum/average/sort, etc.
Averageifs:
=AVERAGEIFS(C:C,A:A,2018,B:B,6)
Filter:
=filter(C:C,(A:A=2018)*(B:B=6))
=Average(filter(C:C,(A:A=2018)*(B:B=6)))
See this spreadsheet for examples of both. I realize you're using Excel, but these formulas should work on both (though they are not the same)
I am trying to extract the team with the maximum number of wins each year in women's college basketball, and I am currently stuck with having the number of wins for each year for each team, and I want only the team with the maximum number of wins in each year.
winsbyyear <- WomenCBnewdf %>%
group_by(Year,Team)%>%
summarise(totalwinsyr = sum(Outcome))
Output currently looks like this, but I am expecting to see each year only once with the team with the maximum number of wins in the subsequent columns
Year Team totalwinsyr
<fct> <chr> <dbl>
1 2014 AbileneChristian 10
2 2014 AirForce 0
3 2014 Akron 18
4 2014 Alabama 10
5 2014 AlabamaAM 3
6 2014 AlabamaHuntsville 0
7 2014 AlabamaMobile 0
8 2014 AlabamaSt 15
9 2014 AlaskaAnchorage 1
10 2014 AlbanyNY 16
How to select the rows with maximum values in each group with dplyr?
I have already looked here but I could not find any resources to help with a group_by() with multiple values
Create a new column with the number of wins and then filter:
winsbyyear <- WomenCBnewdf %>%
group_by(Year,Team)%>%
mutate(totalwinsyr = sum(Outcome)) %>%
filter(totalwinsyr == max(totalwinsyr))
I have these data sets
month Year Rain
10 2010 376.8
11 2010 282.78
12 2010 324.58
1 2011 73.51
2 2011 225.89
3 2011 22.96
I used
df2prnext<-
aggregate(Rain~Year, data = subdataprnext, mean)
but I need the mean value of 217.53.
I am not getting the expected result. Thank you for your help.
This question already has answers here:
Apply several summary functions (sum, mean, etc.) on several variables by group in one call
(7 answers)
Closed 6 years ago.
I have this data frame:
YEAR NATION VOTE
2015 NOR 1
2015 USA 0
2015 CAN 1
2015 RUS 1
2014 USA 1
2014 USA 1
2014 USA 0
2014 NOR 1
2014 NOR 0
2014 CAN 1
...and it goes on and on with more years, nations and votes. VOTE is binary, yes(1) or no(0). I am trying to code an output table that aggregates on year and nation, but that also that brings the total number of votes for each nation (the sum of 0's and 1's) together with the total number of 1's, in an output table like the one sketched below (sumVOTES being the total number of votes for that nation that year, i.e. sum of all 1s and 0s):
YEAR NATION VOTE-1 sumVOTES %-1s
2015 USA 8 17 47.1
2015 NOR 7 13 53.8
2015 CAN 3 11 27.2
2014 etc.
etc.
You are not providing your data.frame in a reproducible manner.
But this should work...
library(data.table)
# assuming 'df' is your data.frame
setDT(df)[, .('VOTE-1' = sum(VOTE==1),
'sumVOTES' = .N,
'%-1s' = 1e2*sum(VOTE==1)/.N),
by = .(YEAR, NATION)]
setDT converts data.frame to data.table by reference.
I have a data set with multiple sites that were each sampled over multiple years. As part of this I have climate data that were sampled throughout each year as well as calculated means for several variables (mean annual temp, mean annual precipitation, mean annual snow depth, etc). Here is what the data frame actually looks like:
site date year temp precip mean.ann.temp mn.ann.precip
a 5/1/10 2010 15 0 6 .03
a 6/2/10 2010 18 1 6 .03
a 7/3/10 2010 22 0 6 .03
b 5/2/10 2010 16 2 7 .04
b 6/3/10 2010 17 3 7 .04
b 7/4/10 2010 20 0 7 .04
c 5/3/10 2010 14 0 5 .06
c 6/4/10 2010 13 0 5 .06
c 7/8/10 2010 25 0 5 .06
d 5/5/10 2010 16 15 10 .2
d 6/6/10 2010 22 0 10 .2
d 7/7/10 2010 24 0 10 .2
...
It then goes on the same way for multiple years.
How can I extract the mean.ann.temp and mn.ann.precip for each site and year? I've tried doubling up tapply() with no success and using double for loops, but I can't seem to figure it out. Can someone help me? Or do I have to do it the long and tedious way of just subsetting everything out?
Thanks,
Paul
Subset the columns and wrap it in a unique.
unique(d[,c("site","year","mean.ann.temp","mn.ann.precip")])
A similar way if the last two columns are different, and you want the first row:
d[!duplicated(d[,c("site","year")]),]
To compute summaries using plyr
require(plyr)
ddply(yourDF, .(site,year), summarize,
meanTemp=mean(mean.ann.temp),
meanPrec=mean(mn.ann.precip)
)