Sub-setting with dplyr [duplicate] - r

This question already has answers here:
Select the row with the maximum value in each group
(19 answers)
Closed 2 years ago.
Dataframe has the columns:
State Sex Year Name Number Percent
I need to filter for each year, one male and one female with highest percentage, in every state.
Example:
Washington M 2011 John 34 0.46
Washington F 2011 Mary 42 0.67
Washington M 2012 John 46 0.46
Washington F 2012 Mary 64 0.67
and so on for every State and year.

You can try
df %>%
group_by(State, Year, Sex) %>%
slice(which.max(Percent))

Related

Trying to find a specific element based on a condition [duplicate]

This question already has answers here:
Find value corresponding to maximum in other column [duplicate]
(2 answers)
Closed 2 years ago.
This is my dataframe in r studio. I'm trying to find code what will produce the name of the student with he highest age.
students.df #Name of dataframe
name DAD BDA gender nationality age
1 Amy 80 70 F IRL 20
2 Bill 65 50 M UK 21
3 Carl 50 80 M IRL 22
as.character(subset(students.df,students.df$age==max(students.df$age))$name)
library(dplyr)
students.df %>% filter(age==max(age)) %>% select(name)
you can try this
students.df[which.max(student.df$age),]

Calculate annual average of quarterly data in R [duplicate]

This question already has answers here:
Summarising by a group variable in r
(2 answers)
Closed 2 years ago.
I have a dataframe with some TS data reported quarterly, as follows
quarter region value
2018T4 A 4
2018T3 A 2
2018T2 A 3
2018T1 A 9
2018T4 B 6
2018T3 B 2
2018T2 B 5
2018T1 B 8
2017T4 A 2
...
I want to aggregate the quarterly observations and average them to obtain an annual mean value for each year and region, as such
quarter region value
2018 A 4.5
2018 B 5.25
2017 A 2
...
What would be an appropriate approach to this?
We can remove the quarter information from year and take mean by year and region.
aggregate(value~year+region, transform(df, year = sub('T.*', '', quarter)), mean)
# year region value
#1 2017 A 2.00
#2 2018 A 4.50
#3 2018 B 5.25
Same using dplyr :
library(dplyr)
df %>%
group_by(year = sub('T.*', '', quarter), region) %>%
summarise(value = mean(value))

Create multiple columns of values that are in second column and fill new data frame with number of occurences according to first column [duplicate]

This question already has answers here:
Frequency counts in R [duplicate]
(2 answers)
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 4 years ago.
I am new to stack overflow and sorry if I am not asking question properly.
I have two columns country and Year.
INDIA 1970
USA 1970
USA 1971
INDIA 1970
.
.
UK 1972
I want new data frame like this and I need to fill it with occurrences.
1970 1971 1972....
INDIA 2
USA 1 1
UK 1
An option could be to use reshape2::dcast with fun.aggregate argument set as length:
library(reshape2)
dcast(df, Country~Year, length)
# Country 1970 1971 1972
# 1 INDIA 2 0 0
# 2 UK 0 0 1
# 3 USA 1 1 0
Data:
df <- read.table(text =
"Country Year
INDIA 1970
USA 1970
USA 1971
INDIA 1970
UK 1972",
header = TRUE, stringsAsFactors = FALSE)

Summing part of data frame over identical factor-levels to get rid of abundance in identical levels in data frame [duplicate]

This question already has answers here:
Aggregate by specific year in R
(2 answers)
Closed 5 years ago.
i have this as part of dataset of about 6000 rows:
ÅR LM RE AGE PA REC
1 2012 PKORT Stockholm <19 17973 35508
2 2012 PKORT Stockholm 20-24 31042 63229
3 2012 PKORT Stockholm 25-29 27305 64558
4 2012 PKORT Stockholm 30-34 18256 42726
5 2012 PKORT Stockholm 35-39 13200 32145
6 2012 PKORT Stockholm 40< 9458 24422
7 2012 PKORT Stockholm 40< 6123 16152
and i want to sum all the rows for PA and REC where AGE is "40<" to reduce the data frame from an abundance of identical factor levels.
I have tried aggregate, tapply and also assumed that R understands that both "40<" should be summed when lm-functions are applied.
This seems like a really easy operation, any help is appreciated.
We can do this with dplyr
library(dplyr)
df1 %>%
filter(AGE == "40<") %>%
group_by_(.dots = names(df1)[1:3]) %>%
summarise_at(vars(PA, REC) , sum)

Aggregates by group and including counts across rows [duplicate]

This question already has answers here:
Apply several summary functions (sum, mean, etc.) on several variables by group in one call
(7 answers)
Closed 6 years ago.
I have this data frame:
YEAR NATION VOTE
2015 NOR 1
2015 USA 0
2015 CAN 1
2015 RUS 1
2014 USA 1
2014 USA 1
2014 USA 0
2014 NOR 1
2014 NOR 0
2014 CAN 1
...and it goes on and on with more years, nations and votes. VOTE is binary, yes(1) or no(0). I am trying to code an output table that aggregates on year and nation, but that also that brings the total number of votes for each nation (the sum of 0's and 1's) together with the total number of 1's, in an output table like the one sketched below (sumVOTES being the total number of votes for that nation that year, i.e. sum of all 1s and 0s):
YEAR NATION VOTE-1 sumVOTES %-1s
2015 USA 8 17 47.1
2015 NOR 7 13 53.8
2015 CAN 3 11 27.2
2014 etc.
etc.
You are not providing your data.frame in a reproducible manner.
But this should work...
library(data.table)
# assuming 'df' is your data.frame
setDT(df)[, .('VOTE-1' = sum(VOTE==1),
'sumVOTES' = .N,
'%-1s' = 1e2*sum(VOTE==1)/.N),
by = .(YEAR, NATION)]
setDT converts data.frame to data.table by reference.

Resources