I have a column of dates in the format:
16Jun10
and I would like to extract the Julian day.
I have various years.
I have tried the functions julian and mdy.date and it doesn't seem to work.
Try the following to convert from class character(i.e. text) to class POSIXlt, and then extract Julian day (yday):
tmp <- as.POSIXlt("16Jun10", format = "%d%b%y")
tmp$yday
# [1] 166
For more details on function settings:
?POSIXlt
?DateTimeClasses
Another option is to use a Date class, and then use format to extract a julian day (notice that this class define julian days between 1:366, while POSIXlt is 0:365):
tmp <- as.Date("16Jun10", format = "%d%b%y")
format(tmp, "%j")
# [1] "167"
Similarly:
require(lubridate)
x = as.Date('2010-06-10')
yday(x)
[1] 161
Also note, using lubridate:
> dmy('16Jun10')
[1] "2010-06-16 UTC"
You can use R's insol package which has a JD(x, inverse=FALSE) function which converts POSIXct to Julian Day Number (JDN).
insol package also has JDymd(year,month,day,hour=12,minute=0,sec=0) for custom dates.
To display the whole Julian Date (JD) you possibly have to set options(digits=16).
my.data = read.table(text = "
OBS MONTH1 DAY1 YEAR1
1 3 1 2012
2 3 31 2012
3 4 1 2012
4 4 30 2012
5 5 1 2012
6 5 31 2012
7 6 1 2012
8 6 30 2012
9 7 1 2012
10 7 31 2012
", header = TRUE, stringsAsFactors = FALSE)
my.data$MY.DATE1 <- do.call(paste, list(my.data$MONTH1, my.data$DAY1, my.data$YEAR1))
my.data$MY.DATE1 <- as.Date(my.data$MY.DATE1, format=c("%m %d %Y"))
my.data$my.julian.date <- as.numeric(format(my.data$MY.DATE1, "%j"))
my.data
Returns, which technically is incorrect since Julian dates do not return to 1 on the first day of each January:
http://en.wikipedia.org/wiki/Julian_day
The dates below are Ordinal dates:
OBS MONTH1 DAY1 YEAR1 MY.DATE1 my.julian.date
1 1 3 1 2012 2012-03-01 61
2 2 3 31 2012 2012-03-31 91
3 3 4 1 2012 2012-04-01 92
4 4 4 30 2012 2012-04-30 121
5 5 5 1 2012 2012-05-01 122
6 6 5 31 2012 2012-05-31 152
7 7 6 1 2012 2012-06-01 153
8 8 6 30 2012 2012-06-30 182
9 9 7 1 2012 2012-07-01 183
10 10 7 31 2012 2012-07-31 213
Here are my R versions of code originally written in APL and converted to J. We call this pseudo-Julian because it is only intended for dates after October 15, 1582 which is when calendar reform, in some parts of the Western world, arbitrarily changed the date.
#* toJulian: convert 3-element c(Y,M,D) timestamp into pseudo-Julian day number.
toJulian<- function(TS3)
{ mm<- TS3[2]
xx<- 0
if( mm<=2) {xx<- 1}
mm<- (12*xx)+mm
yy<- TS3[1]-xx
nc<- floor(0.01*yy)
jd<- floor(365.25*yy)+floor(30.6001*(1+mm))+TS3[3]+1720995+(2-(nc-floor(0.25*nc)))
return(jd)
#EG toJulian c(1959,5,24) -> 2436713
#EG toJulian c(1992,12,16) -> 2448973
}
Here's the inverse function:
#* toGregorian: convert pseudo-Julian day number to timestamp in form c(Y,M,D)
# (>15 Oct 1582). Adapted from "Numerical Recipes in C" by Press,
# Teukolsky, et al.
toGregorian<- function(jdn)
{ igreg<- 2299161 # Gregorian calendar conversion day c(1582,10,15).
ja<- floor(jdn)
xx<- 0
if(igreg<=ja){xx<- 1}
jalpha<- floor((floor((xx*ja)-1867216)-0.25)/36524.25)
ja<- ((1-xx)*ja) + ((xx*ja)+1+jalpha-floor(0.25*jalpha))
jb<- ja+1524
jc<- floor(6680+((jb-2439870)-122.1)/365.25)
jd<- floor(365.25*jc)
je<- floor((jb-jd)/30.6001)
id<- floor((jb-jd)-floor(30.6001*je))
mm<- floor(je-1)
if(12<mm){mm<- mm-12}
iyyy<- floor(jc-4715)
if(mm>2){iyyy<- iyyy-1}
if(0>iyyy){iyyy<- iyyy-1}
gd<- c(iyyy, mm, id)
return(gd)
#EG toGregorian 2436713 -> c(1959,5,24)
#EG toGregorian 2448973 -> c(1992,12,16)
}
Related
I have a daily revenue time series df from 01-01-2014 to 15-06-2017 and I want to aggregate the daily revenue data to weekly revenue data and do the weekly predictions. Before I aggregate the revenue, I need to create a continuously week variable, which will NOT start from week 1 again when a new year starts. Since 01-01-2014 was not Monday, so I decided to start my first week from 06-01-2014.
My df now looks like this
date year month total
7 2014-01-06 2014 1 1857679.4
8 2014-01-07 2014 1 1735488.0
9 2014-01-08 2014 1 1477269.9
10 2014-01-09 2014 1 1329882.9
11 2014-01-10 2014 1 1195215.7
...
709 2017-06-14 2017 6 1677476.9
710 2017-06-15 2017 6 1533083.4
I want to create a unique week variable starting from 2014-01-06 until the last row of my dataset (1257 rows in total), which is 2017-06-15.
I wrote a loop:
week = c()
for (i in 1:179) {
week = rep(i,7)
print(week)
}
However, the result of this loop is not saved for each iteration. When I type week, it just shows 179,179,179,179,179,179,179
Where is the problem and how can I add 180, 180, 180, 180 after the repeat loop?
And if I will add more new data after 2017-06-15, how can I create the weekly variable automatically depending on my end of row (date)? (In other words, by doing that, I don't need to calculate how many daily observations I have and divide it by 7 and plus the rest of the dates to become the week index)
Thank you!
Does this work
library(lubridate)
#DATA
x = data.frame(date = seq.Date(from = ymd("2014-01-06"),
to = ymd("2017-06-15"), length.out = 15))
#Add year and week for each date
x$week = year(x$date) + week(x$date)/100
#Convert the addition of year and week to factor and then to numeric
x$week_variable = as.numeric(as.factor(x$week))
#Another alternative
x$week_variable2 = floor(as.numeric(x$date - min(x$date))/7) + 1
x
# date week week_variable week_variable2
#1 2014-01-06 2014.01 1 1
#2 2014-04-05 2014.14 2 13
#3 2014-07-04 2014.27 3 26
#4 2014-10-02 2014.40 4 39
#5 2014-12-30 2014.52 5 52
#6 2015-03-30 2015.13 6 65
#7 2015-06-28 2015.26 7 77
#8 2015-09-26 2015.39 8 90
#9 2015-12-24 2015.52 9 103
#10 2016-03-23 2016.12 10 116
#11 2016-06-21 2016.25 11 129
#12 2016-09-18 2016.38 12 141
#13 2016-12-17 2016.51 13 154
#14 2017-03-17 2017.11 14 167
#15 2017-06-15 2017.24 15 180
Here is the answer:
week = c()
for (i in 1:184) {
for (j in 1:7) {
week[j+(i-1)*7] = i
}
}
week = as.data.frame(week)
I created a week variable, and from week 1 to the week 184 (end of my dataset). For each week number, I repeat 7 times because there are 7 days in a week. Later I assigned the week variable to my data frame.
I am a beginner in R and I am trying to convert sets of calendar dates to sets of Julian dates in a data frame using R. I know there are a similar questions answered but I am not being able to get I want.
df <- data.frame(Date = c('2010-06-20','2005-10-19','2000-05-01','2003-04-04','2010-11-20','2009-09-14'), No = c(1, 4, 6, 11, 7, 9))
df$ jDate <- as.POSIXct(as.numeric(df$Date), origin = '1970-01-01')
gives me
df
Date No cDate
1 2010-06-20 1 1969-12-31 19:00:05
2 2005-10-19 4 1969-12-31 19:00:03
3 2000-05-01 6 1969-12-31 19:00:01
4 2003-04-04 11 1969-12-31 19:00:02
5 2010-11-20 7 1969-12-31 19:00:06
6 2009-09-14 9 1969-12-31 19:00:04
How could I get a column with Julian days in the column 'jDate'?
Thank you for your help.
You can do
df$Date <- as.Date(df$Date)
to get the date, and then
df$jDate <- format(df$Date, "%j")
to get the julian days or
df$jDateYr <- format(df$Date, "%Y-%j")
to prepend the year (if you want). This returns
df
Date No jDate jDateYr
1 2010-06-20 1 171 2010-171
2 2005-10-19 4 292 2005-292
3 2000-05-01 6 122 2000-122
4 2003-04-04 11 094 2003-094
5 2010-11-20 7 324 2010-324
6 2009-09-14 9 257 2009-257
To read more about the possible date-time formats, see ?strptime.
Based on aosmith's comments, I did this and got what I wanted.
> df$jDate <- julian(as.Date(df$Date), origin = as.Date('1970-01-01'))
df
Date No jDate
1 2010-06-20 1 14780
2 2005-10-19 4 13075
3 2000-05-01 6 11078
4 2003-04-04 11 12146
5 2010-11-20 7 14933
6 2009-09-14 9 14501
I have a vector of dates like this:
1 2014-03-10 22:54:24
2 2014-03-10 22:53:16
3 2014-03-10 22:53:01
4 2014-03-10 22:52:38
5 2014-03-10 22:52:00
6 2014-03-01 01:13:08
7 2014-03-01 01:11:30
8 2014-03-01 01:07:41
9 2014-03-01 01:05:28
10 2014-03-01 00:58:40
11 2014-03-27 18:11:57
How can I group by month, day, morning, afternoon or week? For instance:
month sum
2014-3 11
==============
week sum
2014-3-1 5
2014-3-9 5
==============
2014-3-1
morning sum
2014-3-1 5
Use the package data.table and get known of the class POSIXlt.
#x is assumed to be you're vector of time objects (POSIXct POSIXlt).
# The following lines are just for getting known to POSIXlt. You do not need to run these.
Secs <- as.POSIXlt(x)[[1]]
Mins <- as.POSIXlt(x)[[2]]
# ...
Month <- as.POSIXlt(x)[[5]] + 1 # months do start with 0 instead of 1
Year <- as.POSIXlt(x)[[6]] - 100 #for 2016 the result would be 116 ...
DayOfYear <- as.POSIXlt(x)[[9]] + 1 #starts with 0
You can calculate more complicated values similarly. Use data.table now.
require(data.table)
X <- as.data.table(x) # creates a data.table object
setnames(X, "Time") # names the 1 column 'Time'
X[ , month := as.POSIXlt(Time)[[5]] + 1] #adds a column month
X[ , doy:= as.POSIXlt(Time)[[8]] + 1] #adds a column day of year
#....
Now you can group your data.table with:
X[ , .N, by = doy]
X[ , .N, by = month]
# ...
.N returns the number of items in each group. You could also combine the grouping:
X[ , .N, by = list(doy, month)]
There are many nice tutorials using data.tables and the grouping and evaluation is similar to sql syntax (which can also be found in tutorials).
A good link to start is the FAQ of the developer:
http://datatable.r-forge.r-project.org/datatable-faq.pdf
EDIT:
Of course you could also make more complicated columns for afternoon and morning like this:
X[ , afternoon:= ifelse(as.POSIXlt(x)[[3]] > 12, TRUE, FALSE)]
Assuming you have a data frame like this where time is in POSIXct format:
df
time
1 2014-03-10 22:54:24
2 2014-03-10 22:53:16
3 2014-03-10 22:53:01
4 2014-03-10 22:52:38
5 2014-03-10 22:52:00
6 2014-03-01 01:13:08
7 2014-03-01 01:11:30
8 2014-03-01 01:07:41
9 2014-03-01 01:05:28
10 2014-03-01 00:58:40
11 2014-03-27 18:11:57
You can get month, week and am/pm as follows:
df$month <- format(df$time, '%Y-%m')
df$week <- format(df$time, '%Y-%U')
df$ampm <- ifelse(as.numeric(format(df$time, '%H')) > 12, 'pm', 'am')
df
time month week ampm
1 2014-03-10 22:54:24 2014-03 2014-10 pm
2 2014-03-10 22:53:16 2014-03 2014-10 pm
3 2014-03-10 22:53:01 2014-03 2014-10 pm
4 2014-03-10 22:52:38 2014-03 2014-10 pm
5 2014-03-10 22:52:00 2014-03 2014-10 pm
6 2014-03-01 01:13:08 2014-03 2014-08 am
7 2014-03-01 01:11:30 2014-03 2014-08 am
8 2014-03-01 01:07:41 2014-03 2014-08 am
9 2014-03-01 01:05:28 2014-03 2014-08 am
10 2014-03-01 00:58:40 2014-03 2014-08 am
11 2014-03-27 18:11:57 2014-03 2014-12 pm
Then, you can get your summaries using library dplyr like this:
library(dplyr)
count(df, month)
Source: local data frame [1 x 2]
month n
(chr) (int)
1 2014-03 11
count(df, week)
Source: local data frame [3 x 2]
week n
(chr) (int)
1 2014-08 5
2 2014-10 5
3 2014-12 1
count(df, ampm)
Source: local data frame [2 x 2]
ampm n
(chr) (int)
1 am 5
2 pm 6
I've got some data that looks about like so:
demo <- read.table(text = "
date num
'12/31/2010' 35
'04/01/2013' 34
'06/02/2015' 34
'06/15/2015' 34
'01/30/2015' 33
'04/15/2014' 33
'05/28/2014' 33
'06/02/2014' 33
'06/17/2015' 33
'06/25/2015' 33
'06/24/2015' 32
'07/31/2013' 32
'08/31/2013' 32
'04/27/2015' 31
'05/07/2015' 31
'12/30/2013' 31
'11/21/2014' 30
'12/20/2013' 30
",header = TRUE, sep = "")
How do I group and count these by year?
2010 1
2013 5
etc.
I can use plyr to count each date: count(demo, vars = 'date'), but not group them.
I'd convert the dates to a date format first, rather than treating them as strings.
library(lubridate)
# Convert string to date format
demo$date <- as.Date(demo$date, "%m/%d/%Y")
# Table of counts by year
table(year(demo$date))
# 2010 2013 2014 2015
# 1 5 4 8
I like data.table for this. First we need to convert to "Date" class in the date column, then find the number of observations by year.
library(data.table)
demo$date <- as.Date(demo$date, "%m/%d/%Y")
as.data.table(demo)[, .N, keyby = year(date)]
# year N
# 1: 2010 1
# 2: 2013 5
# 3: 2014 4
# 4: 2015 8
We use keyby here so we get a nice ordered result. Alternatively, and to change your entire table to a data.table, you can use setDT() instead of as.data.table(). This is the preferred method.
setDT(demo)[, .N, keyby = year(date)]
table(substr(demo$date, 7,10))
2010 2013 2014 2015
1 5 4 8
substr allows you isolate the year, and table tallies the amounts.
demo$date <- as.Date(demo$date, format = "%m/%d/%Y")
demo$year <- format(demo$date, format = "%Y")
aggregate(num ~ year, demo, FUN = length)
## year num
## 1 2010 1
## 2 2013 5
## 3 2014 4
## 4 2015 8
Date formats can be modified using Date and POSIXct classes. This allows you to handle dates that looks like '1/1/2010'.
dates <- as.Date(demo$date, format = "%m/%d/%Y")
head(dates)
# [1] "2010-12-31" "2013-04-01" "2015-06-02" "2015-06-15" "2015-01-30"
# [6] "2014-04-15"
table(format(dates, format = "%Y"))
#
# 2010 2013 2014 2015
# 1 5 4 8
I cleared one hurdle, with some help from SO and thought the next hurdle would be easier. What I really have is start and end dates in a data frame:
require(lubridate)
demo <- read.table(text = "
start end num
2010-12-31 <NA> 35
2013-04-01 <NA> 34
2015-06-02 <NA> 34
2015-06-15 2012-12-31 34
2015-01-30 2011-12-31 33
2014-04-15 2013-12-31 33
2014-05-28 2013-12-31 33
2014-06-02 <NA> 33
2015-06-17 <NA> 33
2015-06-25 <NA> 33
2015-06-24 <NA> 32
2013-07-31 <NA> 32
2013-08-31 <NA> 32
2015-04-27 <NA> 31
2015-05-07 <NA> 31
2013-12-30 <NA> 31
2014-11-21 <NA> 30
2013-12-20 2013-06-30 30
",header = TRUE, sep = "")
demo$start <- as.Date(demo$start, '%Y-%m-%d')
demo$end <- as.Date(demo$end, '%Y-%m-%d')
I can get a table of start years, or a table of end years, with table(year(demo$end)) or table(year(demo$start)) which is a lovely start. But what I really want to know is something more like: for each year, how many entries that started have not yet ended? So count is.na() for each start year.
I thought I could use aggregate() for that, but this:
aggregate(is.na(end) ~ year(start), demo, FUN = length)
But that seems to be counting every observation, not just the observations for which the end date is.na()
You can use table with multiple arguments to give you 2-way or multi-way tables:
> with(demo, table( year=format(demo$start, "%Y"), Not.missing = !is.na(end) ) )
Not.missing
year FALSE TRUE
2010 1 0
2013 4 1
2014 2 2
2015 6 2
You could also use lubridate::year instead of hte format call.
If you need to find the number of NA values for each 'year', we can use sum as the is.na(end) is a logical vector. The length gives the total length of the vector per year instead of the length of the TRUE values
aggregate(cbind(end=is.na(end)) ~ cbind(year=year(start)), demo, FUN = sum)
# year end
#1 2010 1
#2 2013 4
#3 2014 2
#4 2015 6
Or we can use data.table. We convert the 'data.frame' to 'data.table' (setDT(demo)), grouped by the year of the 'start' column and using i as is.na(end) as row index, we get the .N or the number of elements for each group.
library(data.table)
setDT(demo)[is.na(end), list(end = .N) , list(year=year(start))]
# year end
#1: 2010 1
#2: 2013 4
#3: 2015 6
#4: 2014 2
Here is another option:
library(dplyr)
library(lubridate)
demo %>% subset(is.na(end)) %>% group_by(year(start)) %>% summarise(n=length(end))
#Source: local data frame [4 x 2]
#
# year(start) n
#1 2010 1
#2 2013 4
#3 2014 2
#4 2015 6
This is pretty straightforward. With your original data (demo), subset to only get the NA in your end column. Afterwards (and using year() from the lubridate package), group by each year, and get the summary of the number of NAs present in the end column. This will return a data.frame object.