Apply automation in R to change rows of numbers into date - r

I have created a simple data.frame of 1 column:
x<-as.data.frame(replicate(1, sample(1:27, 1250, rep=TRUE)))
So x will be a column with repeated values from 1 to 27.
I wish to change these values into dates, eg.
x[x==1]<-"31 June 2018"
x[x==2]<-"1 July 2018"
x[x==3]<-"2 July 2018"
Is there a faster way to do this?
I believe I can do this using apply... but I have not much experience using apply..
Thank you for your suggestions.

Here's one way with as.Date() -
x$date <- as.Date(x$V1, origin = "2018-06-30")
head(x)
V1 date
1 5 2018-07-05
2 19 2018-07-19
3 13 2018-07-13
4 9 2018-07-09
5 10 2018-07-10
6 21 2018-07-21
If you want the format to be as per your post -
x$date <- as.Date(x$V1, origin = "2018-06-30") %>% format("%d %B %Y")
head(x)
V1 date
1 5 05 July 2018
2 19 19 July 2018
3 13 13 July 2018
4 9 09 July 2018
5 10 10 July 2018
6 21 21 July 2018

Related

Creating Calendar df in R

I am currently creating a Calendar df to join to my other dfs and originally code it in the following way:
Date <- seq(as.Date("2020-01-01"), as.Date("2021-12-31"), by="days")
Calendar <- data.frame(Date)
Calendar$DateNo <- format(Calendar$Date, format = "%d")
Calendar$NameDay <- format(Calendar$Date, format = "%A")
Calendar$MonthNo <- format(Calendar$Date, format = "%m")
Calendar$NameMonth <- format(Calendar$Date, format = "%B")
Calendar$NameMonthShort <- format(Calendar$Date, format = "%b")
Calendar$Week <- format(Calendar$Date, format = "%V")
Calendar$Year <- format(Calendar$Date, format = "%Y")
Calendar$Quarter <- quarter(Calendar$Date, with_year = F, fiscal_start = 7)
Calendar$Month_Year <-paste(Calendar$NameMonthShort,Calendar$Year,sep="-")
Calendar$Quarter_Year <-paste(Calendar$Quarter,Calendar$Year,sep="-")
After some issues with plotting my data into ggplot I came across an alternate way of creating it using lubridate package with mutate. My new code is as follows:
Date <- seq(as.Date("2020-01-01"), as.Date("2021-12-31"), by="days")
Calendar <- data.frame(Date)
Calendar <- Calendar %>%
mutate(
DateNo = day(Date),
NameDay = wday(Date,label = TRUE),
MonthNo = month(Date),
NameMonth = month(Date, label = TRUE),
NameMonthShort = month(Date, label = TRUE),
Week = week(Date),
Year = year(Date),
Quarter = quarter(Date, with_year = F, fiscal_start = 7))
The issues I am encountering are that I can't add the unabbreviated date/month and not sure if I can add Month_Year/Quarter_Year inside the mutate so that the values are factored in. Is it possible to add those values in or do I have to add them how I did previously? Thanks!
You might find it easier if you use built-in as.POSIXlt, no lubridate needed. Just apply it on your sequence and you'd get a list-type format,
Date <- as.POSIXlt(seq(as.Date("2020-01-01"), as.Date("2020-06-30"), by="7 days"))
## Note: shortened for sake of brevity
that has the desired information already stored in objects that can be accessed by $.
attr(Date, "names")
# [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst"
There are some minor conversions needed due to the storage format, and some helper functions like weekdays, quarters, and strftime. In addition we may use the built-in constants month.name and month.abb.
Calendar <- data.frame(Date,
DateNo=Date$mday,
NameDay=weekdays(Date),
MonthNo=Date$mon + 1,
NameMonth=month.name[Date$mon + 1],
NameMonthShort=month.abb[Date$mon + 1],
Week=strftime(Date, "%V"),
Year=1900 + Date$year,
Quarter=quarters(Date)
)
Result
Calendar
# Date DateNo NameDay MonthNo NameMonth NameMonthShort Week Year Quarter
# 1 2020-01-01 1 Wednesday 1 January Jan 01 2020 Q1
# 2 2020-01-08 8 Wednesday 1 January Jan 02 2020 Q1
# 3 2020-01-15 15 Wednesday 1 January Jan 03 2020 Q1
# 4 2020-01-22 22 Wednesday 1 January Jan 04 2020 Q1
# 5 2020-01-29 29 Wednesday 1 January Jan 05 2020 Q1
# 6 2020-02-05 5 Wednesday 2 February Feb 06 2020 Q1
# 7 2020-02-12 12 Wednesday 2 February Feb 07 2020 Q1
# 8 2020-02-19 19 Wednesday 2 February Feb 08 2020 Q1
# 9 2020-02-26 26 Wednesday 2 February Feb 09 2020 Q1
# 10 2020-03-04 4 Wednesday 3 March Mar 10 2020 Q1
# 11 2020-03-11 11 Wednesday 3 March Mar 11 2020 Q1
# 12 2020-03-18 18 Wednesday 3 March Mar 12 2020 Q1
# 13 2020-03-25 25 Wednesday 3 March Mar 13 2020 Q1
# 14 2020-04-01 1 Wednesday 4 April Apr 14 2020 Q2
# 15 2020-04-08 8 Wednesday 4 April Apr 15 2020 Q2
# 16 2020-04-15 15 Wednesday 4 April Apr 16 2020 Q2
# 17 2020-04-22 22 Wednesday 4 April Apr 17 2020 Q2
# 18 2020-04-29 29 Wednesday 4 April Apr 18 2020 Q2
# 19 2020-05-06 6 Wednesday 5 May May 19 2020 Q2
# 20 2020-05-13 13 Wednesday 5 May May 20 2020 Q2
# 21 2020-05-20 20 Wednesday 5 May May 21 2020 Q2
# 22 2020-05-27 27 Wednesday 5 May May 22 2020 Q2
# 23 2020-06-03 3 Wednesday 6 June Jun 23 2020 Q2
# 24 2020-06-10 10 Wednesday 6 June Jun 24 2020 Q2
# 25 2020-06-17 17 Wednesday 6 June Jun 25 2020 Q2
# 26 2020-06-24 24 Wednesday 6 June Jun 26 2020 Q2
Long month names are easy to add by including abbr=FALSE switch to month().
Pasting quarters or months to years needs a second mutate as below.
Edit Since paste creates character vectors and not factors, you will need to specify factor levels manually:
monthlevels = c(
'Jan-2020','Feb-2020','Mar-2020','Apr-2020','May-2020','Jun-2020',
'Jul-2020','Aug-2020','Sep-2020','Oct-2020','Nov-2020','Dec-2020',
'Jan-2021','Feb-2021','Mar-2021','Apr-2021','May-2021','Jun-2021',
'Jul-2021','Aug-2021','Sep-2021','Oct-2021','Nov-2021','Dec-2021')
quarterlevels = c('1-2020','2-2020','3-2020','4-2020','1-2021','2-2021','3-2021','4-2021')
Calendar %>%
mutate(
DateNo = day(Date),
NameDay = wday(Date,label = TRUE),
MonthNo = month(Date),
NameMonth = month(Date, label = TRUE, abbr=FALSE), ## added abbr=FALSE
NameMonthShort = month(Date, label = TRUE),
Week = week(Date),
Year = year(Date),
Quarter = quarter(Date, with_year = F, fiscal_start = 7)) %>%
## added second mutate() to paste fields created by the first mutate
mutate(
QuarterYear = factor(paste(Quarter, Year, sep='-'), levels=quarterlevels),
MonthYear = factor(paste(NameMonthShort,Year,sep="-"),levels=monthlevels
) %>% head()
Returns:
Date DateNo NameDay MonthNo NameMonth NameMonthShort Week Year Quarter
1 2020-01-01 1 Wed 1 January Jan 1 2020 3
2 2020-01-02 2 Thu 1 January Jan 1 2020 3
3 2020-01-03 3 Fri 1 January Jan 1 2020 3
4 2020-01-04 4 Sat 1 January Jan 1 2020 3
5 2020-01-05 5 Sun 1 January Jan 1 2020 3
6 2020-01-06 6 Mon 1 January Jan 1 2020 3
QuarterYear MonthYear
1 3-2020 Jan-2020
2 3-2020 Jan-2020
3 3-2020 Jan-2020
4 3-2020 Jan-2020
5 3-2020 Jan-2020
6 3-2020 Jan-2020

How to find maximum value from dataframe with specific condition? [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 2 years ago.
I have a dataframe named employee with 100 rows like this :
Date Name ride food income bonus sallary
1 01 Jan 2020 Ludociel 10 6 330000 0 330000
2 01 Jan 2020 Estarossa 15 8 465000 100000 565000
3 01 Jan 2020 Tarmiel 8 10 420000 100000 520000
4 01 Jan 2020 Sariel 5 8 315000 0 315000
5 01 Jan 2020 Escanor 15 7 435000 100000 535000
6 01 Jan 2020 Ban 13 9 465000 100000 565000
7 01 Jan 2020 Meliodas 6 15 540000 100000 640000
8 01 Jan 2020 King 15 12 585000 100000 685000
9 01 Jan 2020 Zeldris 15 11 555000 100000 655000
10 01 Jan 2020 Rugal 15 6 405000 100000 505000
11 02 Jan 2020 Ludociel 14 6 390000 100000 490000
12 02 Jan 2020 Estarossa 12 14 600000 100000 700000
...
100 10 Jan 2020 Rugal 13 10 495000 100000 595000
The problem is I want to find which employee that has the highest total sallary from 1 Jan to 10 Jan. My expected output is just a vector like this :
[1] "varName" is the highest with total sallary "varTotal_sallary"
I have tried using for loop + if clause and it only return total of 1 name only, and every name will have the function.
function_ludociel<-function(name, date, sallary){
total=integer()
for(i in 100){
if(date[i]=="01 Jan 2020" & name[i]=="Ludociel"){
total=sum(sallary)
}
}
return(total)
}
ludociel=function_ludociel(employee$name,employee$date,employee$sallary)
After that I planned to combine them in 1 variable and use max(), but i know it is silly to code.
Anyone have solution for this? Thankyou very much...
Convert date to actual date class
Use aggregate to calculate total salary from 1st Jan to 10th Jan
Select row with maximum salary
Print the result.
employee$Date <- as.Date(employee$Date, '%d %b %Y')
sub_data <- aggregate(sallary~Name, employee,
subset = Date >= as.Date('2020-01-01') &
Date <= as.Date('2020-01-10'), sum)
max_data <- sub_data[which.max(sub_data$sallary), ]
sprintf('%s has the highest salary %d', max_data$Name, max_data$sallary)

Use of EXCEL OFFSET IN R for a range of values and multiple times

This is file I want to append my data in
Collection A
Jan
Feb
March
April
Collection B
Jan
Feb
March
April
Revenue A
Jan
Feb
March
April
Revenue B
Jan
Feb
March
April
The file I want to pull my data from looks like this:
Collection Month Collection A Collection B Revenue Month Revenue A Revenue B
Collection January 1 5 Revenue January 4 8
Collection February 2 6 Revenue February 3 7
Collection March 3 7 Revenue March 2 6
Collection April 4 8 Revenue April 1 5
I want the final output to look like this:
Collection A
Jan 1
Feb 2
March 3
April 4
Collection B
Jan 5
Feb 6
March 7
April 8
Revenue A
Jan 4
Feb 3
March 2
April 1
Revenue B
Jan 8
Feb 7
March 6
April 5
I am able to run this on excel using OFFSET and INDIRECT function. But I want to automate it better for future purposes so I am trying it on R.
I am really stuck on how to combine the two datasets to find the desired output. It seems like an impossible task for me. I have played around with several functions like select, subset and arrange by none of them have helped me progress.
I will be glad if someone can help me out with this.
Here's a way to achieve that output. Note that I removed spaces from the column names in the sample data in order to make it easier to read into R. You didn't specify what you wanted the column names of the output dataframe to be so as given they make little sense.
library(tidyverse)
tbl <- read_table2(
"Collection Month CollectionA CollectionB Revenue Month RevenueA RevenueB
Collection January 1 5 Revenue January 4 8
Collection February 2 6 Revenue February 3 7
Collection March 3 7 Revenue March 2 6
Collection April 4 8 Revenue April 1 5"
)
#> Warning: Duplicated column names deduplicated: 'Month' => 'Month_1' [6]
tbl %>%
select(-Collection, -Revenue, -Month_1) %>%
gather(variable, value, -Month) %>%
group_by(variable) %>%
group_modify(~ add_row(.x, Month = .y$variable, value = NA, .before = 1)) %>%
ungroup() %>%
select(-variable)
#> # A tibble: 20 x 2
#> Month value
#> <chr> <dbl>
#> 1 CollectionA NA
#> 2 January 1
#> 3 February 2
#> 4 March 3
#> 5 April 4
#> 6 CollectionB NA
#> 7 January 5
#> 8 February 6
#> 9 March 7
#> 10 April 8
#> 11 RevenueA NA
#> 12 January 4
#> 13 February 3
#> 14 March 2
#> 15 April 1
#> 16 RevenueB NA
#> 17 January 8
#> 18 February 7
#> 19 March 6
#> 20 April 5
Created on 2019-06-18 by the reprex package (v0.3.0)

How to lump sum the number of days of a data of several year?

I have data similar to this. I would like to lump sum the day (I'm not sure the word "lump sum" is correct or not) and create a new column "date" so that new column lump sum the number of 3 years data in ascending order.
year month day
2011 1 5
2011 2 14
2011 8 21
2012 2 24
2012 3 3
2012 4 4
2012 5 6
2013 2 14
2013 5 17
2013 6 24
I did this code but result was wrong and it's too long also. It doesn't count the February correctly since February has only 28 days. are there any shorter ways?
cday <- function(data,syear=2011,smonth=1,sday=1){
year <- data[1]
month <- data[2]
day <- data[3]
cmonth <- c(0,31,28,31,30,31,30,31,31,30,31,30,31)
date <- (year-syear)*365+sum(cmonth[1:month])+day
for(yr in c(syear:year)){
if(yr==year){
if(yr%%4==0&&month>2){date<-date+1}
}else{
if(yr%%4==0){date<-date+1}
}
}
return(date)
}
op10$day.no <- apply(op10[,c("year","month","day")],1,cday)
I expect the result like this:
year month day date
2011 1 5 5
2011 1 14 14
2011 1 21 21
2011 1 24 24
2011 2 3 31
2011 2 4 32
2011 2 6 34
2011 2 14 42
2011 2 17 45
2011 2 24 52
Thank you for helping!!
Use Date classes. Dates and times are complicated, look for tools to do this for you rather than writing your own. Pick whichever of these you want:
df$date = with(df, as.Date(paste(year, month, day, sep = "-")))
df$julian_day = as.integer(format(df$date, "%j"))
df$days_since_2010 = as.integer(df$date - as.Date("2010-12-31"))
df
# year month day date julian_day days_since_2010
# 1 2011 1 5 2011-01-05 5 5
# 2 2011 2 14 2011-02-14 45 45
# 3 2011 8 21 2011-08-21 233 233
# 4 2012 2 24 2012-02-24 55 420
# 5 2012 3 3 2012-03-03 63 428
# 6 2012 4 4 2012-04-04 95 460
# 7 2012 5 6 2012-05-06 127 492
# 8 2013 2 14 2013-02-14 45 776
# 9 2013 5 17 2013-05-17 137 868
# 10 2013 6 24 2013-06-24 175 906
# using this data
df = read.table(text = "year month day
2011 1 5
2011 2 14
2011 8 21
2012 2 24
2012 3 3
2012 4 4
2012 5 6
2013 2 14
2013 5 17
2013 6 24", header = TRUE)
This is all using base R. If you handle dates and times frequently, you may also want to look a the lubridate package.

duplicating/replicating only specific rows in a data frame

I have data acording to uniue id and sorted on date of visit. Some people have multiple visits. Data is in the long format sorted by visit. I only want to replicate a row of the last visit of each person. How does one replicate only specific rows in a data frame?
id visit glucose
1 12 Jan 2015 12
1 3 Feb 2015 8
2 1 Feb 2015 13
3 12 Jan 2015 7
3 4 Feb 2015 13
3 1 March 2015 8
If we need to duplicate the last row based on the 'visit' for each 'id', we can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), order by 'id', and 'visit', grouped by 'id', we replicate the last row (.N)
library(data.table)
setDT(df1)[order(id, as.Date(visit, "%d %b %Y")), .SD[c(seq_len(.N), .N)], by = id]
# id visit glucose
#1: 1 12 Jan 2015 12
#2: 1 3 Feb 2015 8
#3: 1 3 Feb 2015 8
#4: 2 1 Feb 2015 13
#5: 2 1 Feb 2015 13
#6: 3 12 Jan 2015 7
#7: 3 4 Feb 2015 13
#8: 3 1 March 2015 8
#9: 3 1 March 2015 8
If we want only the last row for each 'id'
setDT(df1)[order(id, as.Date(visit, "%d %b %Y")), .SD[.N], id]

Resources