This question already has answers here:
Select the first row by group
(8 answers)
Collapsing data frame by selecting one row per group
(4 answers)
Select the row with the maximum value in each group
(19 answers)
Closed 2 months ago.
keep the unique value for each person.
I have DF
name
size
john
16
khaled
15
john
15
Alex
16
john
16
I need in the output to remove the duplicated value in color for each name.
name
size
john
16
khaled
15
john
15
Alex
16
What is the best function or library to do that?
Related
This question already has answers here:
Find value corresponding to maximum in other column [duplicate]
(2 answers)
Closed 2 years ago.
This is my dataframe in r studio. I'm trying to find code what will produce the name of the student with he highest age.
students.df #Name of dataframe
name DAD BDA gender nationality age
1 Amy 80 70 F IRL 20
2 Bill 65 50 M UK 21
3 Carl 50 80 M IRL 22
as.character(subset(students.df,students.df$age==max(students.df$age))$name)
library(dplyr)
students.df %>% filter(age==max(age)) %>% select(name)
you can try this
students.df[which.max(student.df$age),]
This question already has answers here:
How to count the number of unique values by group? [duplicate]
(1 answer)
count number of rows in a data frame in R based on group [duplicate]
(8 answers)
Closed 2 years ago.
I have a dataframe that contains a column representing the 'Year' and another column that represents 'Type':
a Year Creams
1 2004 11
2 2004 12
3 2001 13
4 2004 14
5 2002 15
. .... ..
How do I count every year in column 'Year' so it appears as:
a Year TypeCount
1 2004 3
2 2002 1
3 2001 1
It can be output into another dataframe, I don't mind. I just need it to be suitable to make a graph out of it at the end.
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 2 years ago.
I have a data frame that has multiple entries on the same day with a TSS score.
athlete workoutday tss
1 Athlete_1 2020-03-20 30
2 Athlete_1 2020-03-20 21
3 Athlete_1 2020-03-20 64
I would like some help in knowing how to combine the tss scores into into a new column and be put into a new data frame so that there is only 1 entry for each athlete.
for example
athlete workoutday tss
1 Athlete_1 2020-03-20 115
2
3
Cheers
SELECT Athlete_1,workoutday, (select SUM(tss) from your_table where athlete='Athlete_1')
as tss
FROM your_table
GROUP BY Athlete_1;
This question already has answers here:
Count number of rows per group and add result to original data frame
(11 answers)
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 4 years ago.
although I have found a lot of ways to calculate the sum of a variable by group, all the approaches end up creating a new data set which aggregates the double cases.
To be more precise, if I have a data frame:
id year
1 2010
1 2015
1 2017
2 2011
2 2017
3 2015
and I want to count the number of times I have the same ID by the different years, there are a lot of ways (using aggregate, tapply, dplyr, sqldf etc) which use a "group by" kind of functionality that in the end will give something like:
id count
1 3
2 2
3 1
I haven't managed to find a way to calculate the same thing but keep my original data frame, in order to obtain:
id year count
1 2010 3
1 2015 3
1 2017 3
2 2011 2
2 2017 2
3 2015 1
and therefore do not aggregate my double cases.
Has somebody already figured out?
Thank you in advance
This question already has answers here:
calculate average over multiple data frames
(5 answers)
Closed 6 years ago.
I am new to R and I need help in this. I have 3 data sets from 3 different years. they have the same columns with different values for each year. I want to find the average for the column values across the three years based on the name field. To be specific:
assume : first data set
Name Age Height Weight
A 4 20 20
B 5 22 22
C 8 25 21
D 10 25 23
second data set
Name Age Height Weight
A 5 22 25
B 6 23 26
Third data set
Name Age Height Weight
A 6 24 24
B 7 24 27
C 10 27 28
I want to find the average height for "A" across the three data sets
We can place them in a list and rbind them, group by 'Name' and get the mean of each column
library(data.table)
rbindlist(list(df1, df2, df3))[, lapply(.SD, mean), by = Name]
Or with dplyr
bind_rows(df1, df2, df3) %>%
group_by(Name) %>%
summarise_each(funs(mean))