Aggregates by group and including counts across rows [duplicate]

Aggregates by group and including counts across rows [duplicate] - r

This question already has answers here:
Apply several summary functions (sum, mean, etc.) on several variables by group in one call
(7 answers)
Closed 6 years ago.
I have this data frame:
YEAR NATION VOTE
2015 NOR 1
2015 USA 0
2015 CAN 1
2015 RUS 1
2014 USA 1
2014 USA 1
2014 USA 0
2014 NOR 1
2014 NOR 0
2014 CAN 1
...and it goes on and on with more years, nations and votes. VOTE is binary, yes(1) or no(0). I am trying to code an output table that aggregates on year and nation, but that also that brings the total number of votes for each nation (the sum of 0's and 1's) together with the total number of 1's, in an output table like the one sketched below (sumVOTES being the total number of votes for that nation that year, i.e. sum of all 1s and 0s):
YEAR NATION VOTE-1 sumVOTES %-1s
2015 USA 8 17 47.1
2015 NOR 7 13 53.8
2015 CAN 3 11 27.2
2014 etc.
etc.

You are not providing your data.frame in a reproducible manner.
But this should work...
library(data.table)
# assuming 'df' is your data.frame
setDT(df)[, .('VOTE-1' = sum(VOTE==1),
'sumVOTES' = .N,
'%-1s' = 1e2*sum(VOTE==1)/.N),
by = .(YEAR, NATION)]
setDT converts data.frame to data.table by reference.

Related

Is there a way I can get the maximum value for each group after a double group_by in R?

I am trying to extract the team with the maximum number of wins each year in women's college basketball, and I am currently stuck with having the number of wins for each year for each team, and I want only the team with the maximum number of wins in each year.
winsbyyear <- WomenCBnewdf %>%
group_by(Year,Team)%>%
summarise(totalwinsyr = sum(Outcome))
Output currently looks like this, but I am expecting to see each year only once with the team with the maximum number of wins in the subsequent columns
Year Team totalwinsyr
<fct> <chr> <dbl>
1 2014 AbileneChristian 10
2 2014 AirForce 0
3 2014 Akron 18
4 2014 Alabama 10
5 2014 AlabamaAM 3
6 2014 AlabamaHuntsville 0
7 2014 AlabamaMobile 0
8 2014 AlabamaSt 15
9 2014 AlaskaAnchorage 1
10 2014 AlbanyNY 16
How to select the rows with maximum values in each group with dplyr?
I have already looked here but I could not find any resources to help with a group_by() with multiple values

Create a new column with the number of wins and then filter:
winsbyyear <- WomenCBnewdf %>%
group_by(Year,Team)%>%
mutate(totalwinsyr = sum(Outcome)) %>%
filter(totalwinsyr == max(totalwinsyr))

Ntile and decile function depended on two columns in R [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 4 years ago.
I would like to have a new column with Ntile but it should depend on column 1 - "year" and show the ntile number for column 2 - "mileage".
year mileage
<dbl> <dbl>
1 2011 7413
2 2011 10926
3 2011 7351
4 2011 11613
5 2012 8367
6 2010 25125
mydata$Ntile <- ntile(mydata$mileage, 10)
I know the easy to use function ntile, but I do not know how to make it depend on 2 columns. I would like to have ntiles for mileage but for each year, 2010, 2011 and 2012 to be calculated in new column "Ntile".
PS: I know there is not enough data to calculate Ntiles for 2011 and 2012, it is just an example.

I like the data.table approach:
library(data.table)
mydata <- as.data.table(mydata)
mydata[, Ntile:=ntile(mileage,10), by=year]
Best!

Using R, how to create a new data set with the Median of a column in my existing dataframe?

I am new to R and I want a new data set from my dataframe that will include a new column which represents the median of the values in an existing column (called Total Extras) of the dataframe. The latter consists of around 5,000 individual observations.
I am a bit confused on how to proceed with this task as the Median need to be calculated based on the following criteria: Property, Month, Year and Market
Currently, my dataframe (let's call it mydata1) stands as follows (first 5 rows shown):
Property Date Month Year Market TotalExtras
ZIL 1-Jan-15 1 2015 UK 450.00
ZIL 1-Jan-15 1 2015 UK 125.00
ZIL 1-Feb-15 2 2015 UK 300.00
ZIL 1-Feb-16 2 2016 FR 225.00
EBA 1-Feb-15 2 2015 UK 150.00
...
I need my R codes to create a new dataframe (let's call it mydata2) to appear like below:
Property Date Month Year Market MedianTotalExtras
ZIL 1-Jan-15 1 2015 UK 175.00
ZIL 1-Feb-15 2 2015 UK 250.00
ZIL 1-Feb-16 2 2016 FR 400.00
EBA 1-Feb-15 2 2015 UK 328.00
...
The figures above are for illustration purposes only. Basically, mydata2 is re-grouping the data based on Property, Date and Market with the column 'Median Total Extras' replacing the 'TotalExtras' column of mydata1.
Can this be done with R?

In dplyr the general gist will be something like:
mydata1 %>%
group_by(Property, Date, Market) %>%
summarise(MedianTotalExtras = median(TotalExtras))
where group_by arranges the cutting up of the dataset into pieces with unique Property, Date, Market combos, and the summarise + median calculates the median.

Summing part of data frame over identical factor-levels to get rid of abundance in identical levels in data frame [duplicate]

This question already has answers here:
Aggregate by specific year in R
(2 answers)
Closed 5 years ago.
i have this as part of dataset of about 6000 rows:
ÅR LM RE AGE PA REC
1 2012 PKORT Stockholm <19 17973 35508
2 2012 PKORT Stockholm 20-24 31042 63229
3 2012 PKORT Stockholm 25-29 27305 64558
4 2012 PKORT Stockholm 30-34 18256 42726
5 2012 PKORT Stockholm 35-39 13200 32145
6 2012 PKORT Stockholm 40< 9458 24422
7 2012 PKORT Stockholm 40< 6123 16152
and i want to sum all the rows for PA and REC where AGE is "40<" to reduce the data frame from an abundance of identical factor levels.
I have tried aggregate, tapply and also assumed that R understands that both "40<" should be summed when lm-functions are applied.
This seems like a really easy operation, any help is appreciated.

We can do this with dplyr
library(dplyr)
df1 %>%
filter(AGE == "40<") %>%
group_by_(.dots = names(df1)[1:3]) %>%
summarise_at(vars(PA, REC) , sum)

R counter, counting frequency in a table [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Add column with order counts
(2 answers)
Closed 6 years ago.
I have following data set
id year
2 20332 2005
3 6383 2005
14 20332 2006
15 6806 2006
16 23100 2006
I would like to have an additional column, which counts the number of years the id variable is already available:
id year Counter
2 20332 2005 1
3 6383 2005 1
14 20332 2006 2
15 6806 2006 1
16 23100 2006 1
The dataset is currently not sorted according to the year. I thought about mutate rather than a function.
Any ideas? Thanks!

We can use ave from base R
df1$Counter <- with(df1, ave(id, id, FUN = seq_along))

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Aggregates by group and including counts across rows [duplicate] - r

Related

Is there a way I can get the maximum value for each group after a double group_by in R?

Ntile and decile function depended on two columns in R [duplicate]

Using R, how to create a new data set with the Median of a column in my existing dataframe?

Summing part of data frame over identical factor-levels to get rid of abundance in identical levels in data frame [duplicate]

R counter, counting frequency in a table [duplicate]

Categories

Resources