How to find maximum value from dataframe with specific condition? [duplicate] - r

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 2 years ago.
I have a dataframe named employee with 100 rows like this :
Date Name ride food income bonus sallary
1 01 Jan 2020 Ludociel 10 6 330000 0 330000
2 01 Jan 2020 Estarossa 15 8 465000 100000 565000
3 01 Jan 2020 Tarmiel 8 10 420000 100000 520000
4 01 Jan 2020 Sariel 5 8 315000 0 315000
5 01 Jan 2020 Escanor 15 7 435000 100000 535000
6 01 Jan 2020 Ban 13 9 465000 100000 565000
7 01 Jan 2020 Meliodas 6 15 540000 100000 640000
8 01 Jan 2020 King 15 12 585000 100000 685000
9 01 Jan 2020 Zeldris 15 11 555000 100000 655000
10 01 Jan 2020 Rugal 15 6 405000 100000 505000
11 02 Jan 2020 Ludociel 14 6 390000 100000 490000
12 02 Jan 2020 Estarossa 12 14 600000 100000 700000
...
100 10 Jan 2020 Rugal 13 10 495000 100000 595000
The problem is I want to find which employee that has the highest total sallary from 1 Jan to 10 Jan. My expected output is just a vector like this :
[1] "varName" is the highest with total sallary "varTotal_sallary"
I have tried using for loop + if clause and it only return total of 1 name only, and every name will have the function.
function_ludociel<-function(name, date, sallary){
total=integer()
for(i in 100){
if(date[i]=="01 Jan 2020" & name[i]=="Ludociel"){
total=sum(sallary)
}
}
return(total)
}
ludociel=function_ludociel(employee$name,employee$date,employee$sallary)
After that I planned to combine them in 1 variable and use max(), but i know it is silly to code.
Anyone have solution for this? Thankyou very much...

Convert date to actual date class
Use aggregate to calculate total salary from 1st Jan to 10th Jan
Select row with maximum salary
Print the result.
employee$Date <- as.Date(employee$Date, '%d %b %Y')
sub_data <- aggregate(sallary~Name, employee,
subset = Date >= as.Date('2020-01-01') &
Date <= as.Date('2020-01-10'), sum)
max_data <- sub_data[which.max(sub_data$sallary), ]
sprintf('%s has the highest salary %d', max_data$Name, max_data$sallary)

Related

How to find sum of other categories in the data in R?

I have a panel data, sampled monthly for various product types.
type year month cost
A 2020 01 3
A 2020 02 6
B 2020 01 7
B 2020 02 9
C 2020 01 10
C 2020 02 15
I need to calculate an additional variable stating: "For a given period, what is the sum of costs from other product types?"
For example,
For 2020-01, Total Cost = 3 + 7 + 10 = 20. Costs of non-A types are 20 - 3 = 17.
For 2020-02, Total Cost = 6 + 9 + 15 = 30. Costs of non-B types are 30 - 9 = 21.
So the final data frame should look like this:
type year month cost other_costs
A 2020 01 3 17
A 2020 02 6 24
B 2020 01 7 13
B 2020 02 9 21
C 2020 01 10 10
C 2020 02 15 15
I have tried group_by() + mutate() functions from dplyr, but I get NA with the following code:
df %>%
group_by(year, month) %>%
mutate(other_costs = sum(cost) - cost)
What are your suggestions?

calculate mean for subgroups [duplicate]

This question already has answers here:
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 3 years ago.
I want to calculate the month value by calculating the mean of the weeksums per month.
e.g. for June (06) and distance 10 I have the weeksums 1(2017_28), 6(2017_29) and 1 (2017_31), I want to summarise these weeks to get the total monthsum 8 and the mean value 2.6667 (8:3).
I got the monthsum but I don't know how to calculate the mean
df %>%
group_by(year_month, distance) %>%
mutate(monthsum = sum(weeksum))
year year_month month year_week distance weeksum
1 2017 2017_05 05 2017_21 15 4
2 2017 2017_05 05 2017_21 10 1
3 2017 2017_05 05 2017_22 5 5
4 2017 2017_05 05 2017_22 0 1
5 2017 2017_06 06 2017_22 0 11
6 2017 2017_06 06 2017_23 20 7
7 2017 2017_06 06 2017_23 0 6
8 2017 2017_07 07 2017_28 10 1
9 2017 2017_07 07 2017_28 0 1
10 2017 2017_07 07 2017_29 10 6
11 2017 2017_07 07 2017_29 5 3
12 2017 2017_07 07 2017_30 0 12
13 2017 2017_07 07 2017_31 10 1
14 2017 2017_07 07 2017_31 0 7
This is what I want:
year year_month month year_week distance monthsum mean
1 2017 2017_05 05 2017_21 15 4 4
2 2017 2017_05 05 2017_21 10 1 1
3 2017 2017_05 05 2017_22 5 5 5
4 2017 2017_05 05 2017_22 0 1 1
5 2017 2017_06 06 2017_22 0 17 8.5
6 2017 2017_06 06 2017_23 20 7 7
7 2017 2017_07 07 2017_28 10 8 2.6667
8 2017 2017_07 07 2017_28 0 20 6.6667
9 2017 2017_07 07 2017_29 5 3 3
First of, I hope you use dplyrand not plyr to be up to date.
Also simply extend your statement with a mean() function like this:
df %>%
group_by(year_month, distance) %>%
mutate(monthsum = sum(weeksum), monthmean = mean(weeksum))
Also in your case use summarizeinstead of mutate to get a better view:
df %>%
group_by(year_month, distance) %>%
summarize(monthsum = sum(weeksum), monthmean = mean(weeksum))

Apply automation in R to change rows of numbers into date

I have created a simple data.frame of 1 column:
x<-as.data.frame(replicate(1, sample(1:27, 1250, rep=TRUE)))
So x will be a column with repeated values from 1 to 27.
I wish to change these values into dates, eg.
x[x==1]<-"31 June 2018"
x[x==2]<-"1 July 2018"
x[x==3]<-"2 July 2018"
Is there a faster way to do this?
I believe I can do this using apply... but I have not much experience using apply..
Thank you for your suggestions.
Here's one way with as.Date() -
x$date <- as.Date(x$V1, origin = "2018-06-30")
head(x)
V1 date
1 5 2018-07-05
2 19 2018-07-19
3 13 2018-07-13
4 9 2018-07-09
5 10 2018-07-10
6 21 2018-07-21
If you want the format to be as per your post -
x$date <- as.Date(x$V1, origin = "2018-06-30") %>% format("%d %B %Y")
head(x)
V1 date
1 5 05 July 2018
2 19 19 July 2018
3 13 13 July 2018
4 9 09 July 2018
5 10 10 July 2018
6 21 21 July 2018

How to lump sum the number of days of a data of several year?

I have data similar to this. I would like to lump sum the day (I'm not sure the word "lump sum" is correct or not) and create a new column "date" so that new column lump sum the number of 3 years data in ascending order.
year month day
2011 1 5
2011 2 14
2011 8 21
2012 2 24
2012 3 3
2012 4 4
2012 5 6
2013 2 14
2013 5 17
2013 6 24
I did this code but result was wrong and it's too long also. It doesn't count the February correctly since February has only 28 days. are there any shorter ways?
cday <- function(data,syear=2011,smonth=1,sday=1){
year <- data[1]
month <- data[2]
day <- data[3]
cmonth <- c(0,31,28,31,30,31,30,31,31,30,31,30,31)
date <- (year-syear)*365+sum(cmonth[1:month])+day
for(yr in c(syear:year)){
if(yr==year){
if(yr%%4==0&&month>2){date<-date+1}
}else{
if(yr%%4==0){date<-date+1}
}
}
return(date)
}
op10$day.no <- apply(op10[,c("year","month","day")],1,cday)
I expect the result like this:
year month day date
2011 1 5 5
2011 1 14 14
2011 1 21 21
2011 1 24 24
2011 2 3 31
2011 2 4 32
2011 2 6 34
2011 2 14 42
2011 2 17 45
2011 2 24 52
Thank you for helping!!
Use Date classes. Dates and times are complicated, look for tools to do this for you rather than writing your own. Pick whichever of these you want:
df$date = with(df, as.Date(paste(year, month, day, sep = "-")))
df$julian_day = as.integer(format(df$date, "%j"))
df$days_since_2010 = as.integer(df$date - as.Date("2010-12-31"))
df
# year month day date julian_day days_since_2010
# 1 2011 1 5 2011-01-05 5 5
# 2 2011 2 14 2011-02-14 45 45
# 3 2011 8 21 2011-08-21 233 233
# 4 2012 2 24 2012-02-24 55 420
# 5 2012 3 3 2012-03-03 63 428
# 6 2012 4 4 2012-04-04 95 460
# 7 2012 5 6 2012-05-06 127 492
# 8 2013 2 14 2013-02-14 45 776
# 9 2013 5 17 2013-05-17 137 868
# 10 2013 6 24 2013-06-24 175 906
# using this data
df = read.table(text = "year month day
2011 1 5
2011 2 14
2011 8 21
2012 2 24
2012 3 3
2012 4 4
2012 5 6
2013 2 14
2013 5 17
2013 6 24", header = TRUE)
This is all using base R. If you handle dates and times frequently, you may also want to look a the lubridate package.

R, subset two data frames with different rows

df1 <-
Year Month
2011 08
2011 08
2011 09
2011 10
2012 11
2012 11
df2 <-
Year Month
2001 02
2011 08
2011 10
2013 01
2012 11
My goal is to make data matrix with (Month, Year) that are common to both data sets.
goal <-
Year Month
2011 10
2011 08
2012 11
Can anyone please help me???
You can merge() the two then find the unique rows.
unique(merge(df1, df2))
# Year Month
# 1 2011 10
# 2 2011 8
# 4 2012 11
If you load dplyr, you can take the intersection
library(dplyr)
intersect(df1,df2)
# Year Month
# 1 2011 8
# 2 2011 10
# 3 2012 11
which I find intuitive.

Resources