Convert Julian date to calendar dates within a data frame - r

I have a data frame
> df
Age year sex
12 80210 F
13 9123 M
I want to convert the year 80210 as 26june1982. How can I do this that the new data frame contains year in day month year formate from Julian days.

You can convert Julian dates to dates using as.Date and specifying the appropriate origin:
as.Date(8210, origin=as.Date("1960-01-01"))
#[1] "1982-06-24"
However, 80210 needs an origin pretty long ago.

You should substract the origin from the year column.
as.Date(c(80210,9123)-80210,origin='1982-06-26')
[1] "1982-06-26" "1787-11-08"

There are some options for doing this job in the R package date.
See for example on page 4, the function date.mmddyy, which says:
Given a vector of Julian dates, this returns them in the form “10/11/89”, “28/7/54”, etc.
Try this code:
age = c(12,13)
year = c(8210,9123)
sex = c("F","M")
df = data.frame(cbind(age,year,sex))
library(date)
date = date.mmddyy(year, sep = "/")
df2 = transform(df,year=date) #hint provided by jilber
df2
age year sex
1 12 6/24/82 F
2 13 12/23/84 M

Related

Detect UK holidays in date data

I am working in R, I have 20 years data and I would to check if every giving date is a UK holiday creating a categorical variable (TRUE/FALSE).
I used this code:
library(timeDate)
c <- timeDate(data$Date)
b <- isHoliday(c, holidays = GBBankHoliday(), wday = 1:6)
or
b <- isHoliday(c, holidays = HolidayLONDON(), wday = 1:6)
but it detecs only Sundays (not Christmass or other Holidays).
Does anyone have an idea what to do?
You can try creating wrapper functions for various holidays in the package, and extracting the dates for the holidays, and cross-referencing those dates for your analysis:
library(timeDate)
c <- timeDate(data$Date)
b <- isHoliday(c, holidays = GBBankHoliday(), wday = 1:6)
years <- list(Year = c(2019,2018,2017,2016))
year_fun <- function(year){timeDate::.easter(year)}
purrr::map(years, year_fun)
$Year
GMT
[1] [2019-04-21] [2018-04-01] [2017-04-16] [2016-03-27]
I created a new binary variable in my data which is called "holiday". If the date of my data is a UK holiday the value is 1 (TRUE) if the date is not a holiday the value is 2 (FALSE). The code is very simple:
library(timeDate)
data$holidays<-as.factor(data$Date %in% (as.Date(holidayLONDON(1990:2010))))

convert irregular 6hourly data to daily accumulated using R

I have the following data:
Date,Rain
1979_8_9_0,8.775
1979_8_9_6,8.775
1979_8_9_12,8.775
1979_8_9_18,8.775
1979_8_10_0,0
1979_8_10_6,0
1979_8_10_12,0
1979_8_10_18,0
1979_8_11_0,8.025
1979_8_12_12,0
1979_8_12_18,0
1979_8_13_0,8.025
[1] The data is six hourly but some dates have incomplete 6 hourly data. For example, August 11 1979 has only one value at 00H. I would like to get the daily accumulated from this kind of data using R. Any suggestion on how to do this easily in R?
I'll appreciate any help.
You can transform your data to dates very easily with:
dat$Date <- as.Date(strptime(dat$Date, '%Y_%m_%d_%H'))
After that you should aggregate with:
aggregate(Rain ~ Date, dat, sum)
The result:
Date Rain
1 1979-08-09 35.100
2 1979-08-10 0.000
3 1979-08-11 8.025
4 1979-08-12 0.000
5 1979-08-13 8.025
Based on the comment of Henrik, you can also transform to dates with:
dat$Date <- as.Date(dat$Date, '%Y_%m_%d')
# split the "date" variable into new, separate variable
splitDate <- stringr::str_split_fixed(string = df$Date, pattern = "_", n = 4)
df$Day <- splitDate[,3]
# split data by Day, loop over each split and add rain variable
unlist(lapply(split(df$Rain, df$Day), sum))

Subset csv data based on Pentad dates using R

I would like to subset the following csv file based on Pentad dates (non overlapping average of dates). For example:
1.January 1 to January 5
2.January 6 to January 10
...
73.December 27 to December 31.
Here's the complete list of pentad dates:
List of Pentad dates
The Complete Data
Sample Data
SN,CY,Y,M,D,H,lat,lon,cat
198305,5,1983,8,5,0,9.1,140.7,"TD"
198305,5,1983,8,5,6,9.3,140.5,"TD"
198305,5,1983,8,5,12,9.6,139.9,"TD"
198305,5,1983,8,5,18,9.9,139.4,"TS"
198305,5,1983,8,6,0,10.2,138.8,"TS"
198305,5,1983,8,6,6,11,138.1,"TS"
198305,5,1983,8,6,12,11.8,137.3,"TS"
198305,5,1983,8,6,18,12.4,136.4,"Cat1"
198305,5,1983,8,7,0,12.8,135.8,"Cat1"
198305,5,1983,8,7,6,13.6,134.7,"Cat1"
198305,5,1983,8,7,12,14.4,133.9,"Cat2"
198305,5,1983,8,7,18,15,133.5,"Cat4"
198305,5,1983,8,8,0,15.8,132.8,"Cat4"
198305,5,1983,8,8,6,16.3,132.4,"Cat4"
198305,5,1983,8,8,12,17.1,132,"Cat5"
198305,5,1983,8,8,18,17.4,131.4,"Cat5"
198305,5,1983,8,9,0,17.8,130.8,"Cat5"
198305,5,1983,8,9,6,18.1,130.7,"Cat4"
198305,5,1983,8,9,12,18.7,130.3,"Cat4"
198305,5,1983,8,9,18,18.9,130.4,"Cat4
SN is a unique identifier, Y is years, M is months, D is days,H is hours. If the unique number falls in one pentad, it should not be included in the next subset anymore.
I have tried this for August (based from previous post):
P1 <- c(1,6,11,16,21,26)
P6 <- c(5,10,15,20,25,30)
res <- Map(function(x,y) subset(df1, M==8 & D >=x & D <= y), d1, d2)
But I'm having a problem with mapping with the starting pentads (P7) because it includes January 31 to February 4.
Can anyone suggest any methods to do this in R? Ill appreciate any help.
library(stringr)
df$Date = paste(df$Y, str_pad(df$M,2,'left','0'), str_pad(df$D,2,'left','0'), sep='-')
# Extract day of year (int 0 to 365) from POSIXlt date
df$yday = as.POSIXlt(df$Date)$yday + 1
Now it's trivial:
df$pentad = ceiling(df$yday/5)

How to split columns in R

I have around 10k records. There is a variable named date_time with data in the following format 2013-01-07 10:17:08.
I need to split the column and arrive at a derived variable to identify year and month separately.
We can use lubridate package to convert the 'date_time' column to POSIXct class and extract the year and month from the output
library(lubridate)
v1 <- ymd_hms(df1$date_time)
transform(df1, Year= year(v1), Month = month(v1))
# date_time Year Month
#1 2013-01-07 10:17:08 2013 1
#2 2013-01-08 10:18:08 2013 1
data
df1 <- data.frame(date_time=c('2013-01-07 10:17:08',
'2013-01-08 10:18:08'), stringsAsFactors=FALSE)

Split date data (m/d/y) into 3 separate columns

I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.

Resources