I have date time data in 4 columns of my dataframe. I am wanting to create a single column with the date time.
A small example (apologies for poor formatting I'm working off a phone)
2014 3 1 23 2.1
2014 3 2 0 4.7
2014 3 2 1 2.4
So the above gives the data (final column) for three time steps an hour apart, 2300 (11pm) on 1st March 2014, midnight that night and 0100 (1am) on the 2nd March 2014.
I want to create an extra column that would have "2009-03-01 23:00:00 GMT".
I tried using
Mytimes <-with(my data,ISOdatetime(year,month,day,time,0,0)
Where my columns are year, month, day, time, datavalue, but get as output "2009-03-01 EST" with no hour data. I can then add this column to my dataframe
I'm not particularly worried about the time zone, that's just what the examples looked like.
See comments from MrFlick and Joshua, midnight time is not shown, but is there.
Is this what you want:
> require(lubridate)
> input <- read.table(text = "2014 3 1 23 2.1
+ 2014 3 2 0 4.7
+ 2014 3 2 1 2.4")
> # create date column
> input$date <- ymd_h(paste(input$V1, input$V2, input$V3, input$V4))
>
> input
V1 V2 V3 V4 V5 date
1 2014 3 1 23 2.1 2014-03-01 23:00:00
2 2014 3 2 0 4.7 2014-03-02 00:00:00
3 2014 3 2 1 2.4 2014-03-02 01:00:00
Related
I have a dataset with a column date like this :
id
date
1
2018-02-13
2
2019-02-28
3
2014-01-23
4
2021-10-28
5
2022-01-23
6
2019-09-28
I want to select data that have a date lesser than 15th February
I tried an ifelse statement to define 15th february for each year in my database but i want to know if there is an easier way to do so !
Here's the result i'm expecting
id
date
1
2018-02-13
3
2014-01-23
5
2022-01-23
Thanks in advance
You could use lubridate
library(lubridate)
library(dplyr)
d %>% filter(month(date)<2 | (month(date)==2 & day(date)<=15))
Output:
id date
1 1 2018-02-13
2 3 2014-01-23
3 5 2022-01-23
This is a base R option. Using format check if the month/year is between January 1st and February 15th.
df[between(format(df$date, "%m%d"), "0101", "0215"),]
Output
id date
1 1 2018-02-13
3 3 2014-01-23
5 5 2022-01-23
This probably seems straightforward, but I am pretty stumped.
I have a set of dates ~ August 1 of each year and need to sum sales by week number. The earliest date is 2008-12-08 (YYYY-MM-DD). I need to create a "week_id" field where week #1 begins on 2008-12-08. And the date 2011-09-03 is week 142. Note that this is different since the calculation of week number does not reset every year.
I am putting up a small example dataset here:
data <- data.frame(
dates = c("2008-12-08", "2009-08-10", "2010-03-31", "2011-10-16", "2008-06-03", "2009-11-14" , "2010-05-05", "2011-09-03"))
data$date = as.Date(data$date)
Any help is appreciated
data$week_id = as.numeric(data$date - as.Date("2008-12-08")) %/% 7 + 1
This would take the day difference between the two dates and find the integer number of 7 days elapsed. I add one since we want the dates where zero weeks have elapsed since the start to be week 1 instead of week 0.
dates date week_id
1 2008-12-07 2008-12-07 0 # added for testing
2 2008-12-08 2008-12-08 1
3 2008-12-09 2008-12-09 1 # added for testing
4 2008-12-14 2008-12-14 1 # added for testing
5 2008-12-15 2008-12-15 2 # added for testing
6 2009-08-10 2009-08-10 36
7 2010-03-31 2010-03-31 69
8 2011-10-16 2011-10-16 149
9 2008-06-03 2008-06-03 -26
10 2009-11-14 2009-11-14 49
11 2010-05-05 2010-05-05 74
12 2011-09-03 2011-09-03 143
Given a date and the day of the week it is, I want to know if there is a code that tells me which of those days of the month it is. For example in the picture below, given 2/12/2020 and "Wednesday" I want to be given the output "2" for it being the second Wednesday of the month.
You can do that in base R in essentially one operation. You also do not need the second input column.
Here is slower walkthrough:
Code
dates <- c("2/12/2020","2/11/2020","2/10/2020","2/7/2020","2/6/2020", "2/5/2020")
Dates <- anytime::anydate(dates) ## one of several parsers
dow <- weekdays(Dates) ## for illustration, base R function
cnt <- (as.integer(format(Dates, "%d")) - 1) %/% 7 + 1
res <- data.frame(dt=Dates, dow=dow, cnt=cnt)
res
(Final) Output
R> res
dt dow cnt
1 2020-02-12 Wednesday 2
2 2020-02-11 Tuesday 2
3 2020-02-10 Monday 2
4 2020-02-07 Friday 1
5 2020-02-06 Thursday 1
6 2020-02-05 Wednesday 1
R>
Functionality like this is often in dedicated date/time libraries. I wrapped some code from the (C++) Boost date_time library in package RcppBDH -- that allowed to easily find 'the third Wednesday in the last month each quarter' and alike.
(lubridate::day(your_date) - 1) %/% 7 + 1
The idea here is that the first 7 days of the month are all the first for their weekday. Next 7 are 2nd, etc.
> (1:30 - 1) %/% 7 + 1
# [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5
Just to offer an alternative calculation for the nth-weekday of the month, you can just divide the day by 7 and always round up:
date <- lubridate::mdy("02/12/2020")
ceiling(day(date)/7)
I am trying de-seasonalize my data by dividing my monthly totals by the average seasonality ratio per that month. I have two data frames. avgseasonality that has 12 rows of the average seasonality ratio per month. The problem is since the seasonality ratio is the ratio of each month averaged only has 12 rows and the ordertotal data frame has 147 rows.
deseasonlize <- transform(avgseasonalityratio, deseasonlizedtotal =
df1$OrderTotal / avgseasonality$seasonalityratio)
This runs but it does not pair the months appropriately. It uses the first ratio of april and runs it on the first ordertotal of december.
> avgseasonality
Month seasonalityratio
1 April 1.0132557
2 August 1.0054602
3 December 0.8316988
4 February 0.9813396
5 January 0.8357475
6 July 1.1181648
7 June 1.0439899
8 March 1.1772450
9 May 1.0430667
10 November 0.9841149
11 October 0.9595041
12 September 0.8312318
> df1
# A tibble: 157 x 3
DateEntLabel OrderTotal `d$Month`
<dttm> <dbl> <chr>
1 2005-12-01 00:00:00 512758. December
2 2006-01-01 00:00:00 227449. January
3 2006-02-01 00:00:00 155652. February
4 2006-03-01 00:00:00 172923. March
5 2006-04-01 00:00:00 183854. April
6 2006-05-01 00:00:00 239689. May
7 2006-06-01 00:00:00 237638. June
8 2006-07-01 00:00:00 538688. July
9 2006-08-01 00:00:00 197673. August
10 2006-09-01 00:00:00 144534. September
# ... with 147 more rows
I need the ordertotal and ratio of each month respectively. The calculations would for each month respectively be such as (december) 512758/0.8316988 = 616518.864762 The output for the calculations would be in their new column that corresponds with the month and ordertotal. Please any help is greatly appreciated!
Easiest way would be to merge() your data first, then do the operation. You can use R base merge() function, though I will show here using the tidyverse left_join() function. I see that one of your columns has a strange name d$Month, renameing this to Month will simplify the merge!
Reproducible example:
library(tidyverse)
df_1 <- data.frame(Month = c("Jan", "Feb"), seasonalityratio = c(1,2))
df_2 <- data.frame(Month = rep(c("Jan", "Feb"),each=2), OrderTotal = 1:4)
df_1 %>%
left_join(df_2, by = "Month") %>%
mutate(eseasonlizedtotal = OrderTotal / seasonalityratio)
#> Month seasonalityratio OrderTotal eseasonlizedtotal
#> 1 Jan 1 1 1.0
#> 2 Jan 1 2 2.0
#> 3 Feb 2 3 1.5
#> 4 Feb 2 4 2.0
Created on 2019-01-30 by the reprex package (v0.2.1)
I have a column of dates in the format:
16Jun10
and I would like to extract the Julian day.
I have various years.
I have tried the functions julian and mdy.date and it doesn't seem to work.
Try the following to convert from class character(i.e. text) to class POSIXlt, and then extract Julian day (yday):
tmp <- as.POSIXlt("16Jun10", format = "%d%b%y")
tmp$yday
# [1] 166
For more details on function settings:
?POSIXlt
?DateTimeClasses
Another option is to use a Date class, and then use format to extract a julian day (notice that this class define julian days between 1:366, while POSIXlt is 0:365):
tmp <- as.Date("16Jun10", format = "%d%b%y")
format(tmp, "%j")
# [1] "167"
Similarly:
require(lubridate)
x = as.Date('2010-06-10')
yday(x)
[1] 161
Also note, using lubridate:
> dmy('16Jun10')
[1] "2010-06-16 UTC"
You can use R's insol package which has a JD(x, inverse=FALSE) function which converts POSIXct to Julian Day Number (JDN).
insol package also has JDymd(year,month,day,hour=12,minute=0,sec=0) for custom dates.
To display the whole Julian Date (JD) you possibly have to set options(digits=16).
my.data = read.table(text = "
OBS MONTH1 DAY1 YEAR1
1 3 1 2012
2 3 31 2012
3 4 1 2012
4 4 30 2012
5 5 1 2012
6 5 31 2012
7 6 1 2012
8 6 30 2012
9 7 1 2012
10 7 31 2012
", header = TRUE, stringsAsFactors = FALSE)
my.data$MY.DATE1 <- do.call(paste, list(my.data$MONTH1, my.data$DAY1, my.data$YEAR1))
my.data$MY.DATE1 <- as.Date(my.data$MY.DATE1, format=c("%m %d %Y"))
my.data$my.julian.date <- as.numeric(format(my.data$MY.DATE1, "%j"))
my.data
Returns, which technically is incorrect since Julian dates do not return to 1 on the first day of each January:
http://en.wikipedia.org/wiki/Julian_day
The dates below are Ordinal dates:
OBS MONTH1 DAY1 YEAR1 MY.DATE1 my.julian.date
1 1 3 1 2012 2012-03-01 61
2 2 3 31 2012 2012-03-31 91
3 3 4 1 2012 2012-04-01 92
4 4 4 30 2012 2012-04-30 121
5 5 5 1 2012 2012-05-01 122
6 6 5 31 2012 2012-05-31 152
7 7 6 1 2012 2012-06-01 153
8 8 6 30 2012 2012-06-30 182
9 9 7 1 2012 2012-07-01 183
10 10 7 31 2012 2012-07-31 213
Here are my R versions of code originally written in APL and converted to J. We call this pseudo-Julian because it is only intended for dates after October 15, 1582 which is when calendar reform, in some parts of the Western world, arbitrarily changed the date.
#* toJulian: convert 3-element c(Y,M,D) timestamp into pseudo-Julian day number.
toJulian<- function(TS3)
{ mm<- TS3[2]
xx<- 0
if( mm<=2) {xx<- 1}
mm<- (12*xx)+mm
yy<- TS3[1]-xx
nc<- floor(0.01*yy)
jd<- floor(365.25*yy)+floor(30.6001*(1+mm))+TS3[3]+1720995+(2-(nc-floor(0.25*nc)))
return(jd)
#EG toJulian c(1959,5,24) -> 2436713
#EG toJulian c(1992,12,16) -> 2448973
}
Here's the inverse function:
#* toGregorian: convert pseudo-Julian day number to timestamp in form c(Y,M,D)
# (>15 Oct 1582). Adapted from "Numerical Recipes in C" by Press,
# Teukolsky, et al.
toGregorian<- function(jdn)
{ igreg<- 2299161 # Gregorian calendar conversion day c(1582,10,15).
ja<- floor(jdn)
xx<- 0
if(igreg<=ja){xx<- 1}
jalpha<- floor((floor((xx*ja)-1867216)-0.25)/36524.25)
ja<- ((1-xx)*ja) + ((xx*ja)+1+jalpha-floor(0.25*jalpha))
jb<- ja+1524
jc<- floor(6680+((jb-2439870)-122.1)/365.25)
jd<- floor(365.25*jc)
je<- floor((jb-jd)/30.6001)
id<- floor((jb-jd)-floor(30.6001*je))
mm<- floor(je-1)
if(12<mm){mm<- mm-12}
iyyy<- floor(jc-4715)
if(mm>2){iyyy<- iyyy-1}
if(0>iyyy){iyyy<- iyyy-1}
gd<- c(iyyy, mm, id)
return(gd)
#EG toGregorian 2436713 -> c(1959,5,24)
#EG toGregorian 2448973 -> c(1992,12,16)
}