I have a netCDF file and I am attempting to identify the first date in the file and the 'base date'. The file contains monthly data. My notes, which are fairly old, indicate the first date is January 1, 1948.
The following R code gives the first date as 17067072:
library(ncdf)
library(chron)
my.data <- open.ncdf('my.netCDF.nc')
x = get.var.ncdf(my.data, "lon" )
y = get.var.ncdf(my.data, "lat" )
z = get.var.ncdf(my.data, "time")
z[1:5]
# [1] 17067072 17067816 17068512 17069256 17069976
I downloaded an application called ncdump.exe and after typing the following line in the Windows command window:
C:\Users\Mark W Miller\ncdump>ncdump -h my.netCDF.nc
I learned that the base date is:
time:units = "hours since 1-1-1 00:00:0.0" ;
This same base data is obtained in R using:
att.get.ncdf(my.data,"time","units")$value
[1] "hours since 1-1-1 00:00:0.0"
I tried to verify that with the following R code:
date1 <- as.Date("01/01/0001", "%m/%d/%Y")
date1
# [1] "0001-01-01"
date2 <- as.Date("01/01/1948", "%m/%d/%Y")
date2
# [1] "1948-01-01"
period <- as.Date(date1:date2, origin = "00-01-01")
hours <- 24 * (length(period)-1)
hours
# [1] 17067024
There is a difference of 48 hours between the number in z[1] and the number returned by the R code immediately above:
17067072 - 17067024
[1] 48
Where is my error? Since the netCDF file contains monthly data I doubt the first date is January 3, 1948. The website from which I downloaded the data does not offer the option of selecting the day within month.
The application ncdump.exe can be downloaded from here:
http://www.narccap.ucar.edu/data/ascii-howto.html
If I can figure out how to subset the netCDF file I might upload the smaller file somewhere.
Thank you for any advice.
Have you looked at your period vector? when I looked at the first few and last few values the year comes out as something that does not make sense. Possibly something is messed up in one of the conversions.
Also note that same computer programs treat 1900 as a leap year even though it was not, a difference between 2 programs on that factor could account for 24 of the hours in your difference.
Related
x1 <- read_excel("path",sheet = 1,skip=1,col_names =TRUE, col_types = c("date","date","date","date","date","date","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess"))
View(x1)
I was trying to load an excel sheet with multiple columns in R and for some reason, the entire dates throughout the dataset turn out to be 1899-12-31 and don't proceed. The first four columns are supposed to be in "date" format. It should be 2018-01-01, 2018-01-02 and so on. How do I fix this?
for this issue with r and excel, you can use the following (answer will vary depending on whether you are using windows or mac):
On Windows, for dates (post-1901):
as.Date(43099, origin = "1900-01-01") # 2018-01-01
43099
On Mac, for dates (post-1904):
as.Date(41639, origin = "1904-01-01") # 2018-01-01
a bit of pertinent info taken from https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/as.Date
as.Date(32768, origin = "1900-01-01")
## Excel is said to use 1900-01-01 as day 1 (Windows default) or
## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel
## incorrectly treating 1900 as a leap year.
## (these values come from http://support.microsoft.com/kb/214330)
I have an Excel-file with a lot of dates in the format "ww-yyyy". When it is loaded into R it is seen as "42370" (for instance).
I need to extract the week and year for each data point x.
I have tried as.POSIXct,as.POSIXlt, as.Date(x, format="%w-%y,origin = 1899-12-30).
as.Date(42370,format =%w%y) gives the output "2023-11-14" which should've been week 53 in 2015
I have some timedelta strings which were exported from Python. I'm trying to import them for use in R, but I'm getting some weird results.
When the timedeltas are small, I get results that are off by 2 days, e.g.:
> as.difftime('26 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of 24.20389 days
When they are larger, it doesn't work at all:
> as.difftime('36 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of NA secs
I also read into 'R' some time delta objects I had processed with 'Python' and had a similar issue with the 26 days 04:53:36.000000000 format. As Gregor said, %d in strptime is the day of the month as a zero padded decimal number so won't work with numbers >31 and there doesn't seem to be an option for cumulative days (probably because strptime is for date time objects and not time delta objects).
My solution was to convert the objects to strings and extract the numerical data as Gregor suggested and I did this using the gsub function.
# convert to strings
data$tdelta <- as.character(data$tdelta)
# extract numerical data
days <- as.numeric(gsub('^.*([0-9]+) days.*$','\\1',data$tdelta))
hours <- as.numeric(gsub('^.*ys ([0-9]+):.*$','\\1',data$tdelta))
minutes <- as.numeric(gsub('^.*:([0-9]+):.*$','\\1',data$tdelta))
seconds <- as.numeric(gsub('^.*:([0-9]+)..*$','\\1',data$tdelta))
# add up numerical components to whatever units you want
time_diff_seconds <- seconds + minutes*60 + hours*60*60 + days*24*60*60
# add column to data frame
data$tdelta <- time_diff_seconds
That should allow you to do computations with the time differences. Hope that helps.
I have 2 columns with ~ 2000 rows of dates in them. One is a variable with a visit date (df$visitdate), and the other is a birth date of the individual (df$birthday).
Wondering if there is any simple way to subtract the visit date - birth date to create the variable "age at the time of the visit", accounting for leap years, etc.
I tried to use the following code (from an answer in a similar question) but it didn't work in my case.
find number of seconds in one year:
seconds_in_a_year <- as.integer((seconds(ymd("2010-01-01")) - seconds(ymd("2009-01-01"))))
now obtain number of seconds between the 2 dates you desire
seconds_between_dates <- as.integer(seconds(date1) - seconds(date2))
your final answer for number of years in floating points will be
years_between_dates <- seconds_between_dates / seconds_in_a_year
When I tried to apply this to my data frame (note: using variables rather than specific dates, so this may be the cause) I got the following:
seconds_in_a_year <- as.integer((seconds(ymd(df$visitdate)) - seconds(ymd(df$birthday))))
Warning message:
NAs introduced by coercion
Following the code along I got a final output of:
years_between_dates
[1] 1.157407e-05 [2] 1.157407e-05
Any help is greatly appreciated!
Subtracting from a Date object another Date object gives you the time difference in days, e.g.
> dates = as.Date(c("2007-03-01", "2004-05-23"))
>
> dates[1] - dates[2]
Time difference of 1012 days
So, assuming 365 days in a year
> age_time_visit = as.numeric(dates[1] - dates[2]) / 365
> age_time_visit
[1] 2.772603
There are various answers for this scattered around the internet.
I think the one I've typically used was inspired by Professor Ripley:
http://r.789695.n4.nabble.com/Calculate-difference-between-dates-in-years-td835196.html
age_years <- function(first, second)
{
lt <- data.frame(first, second)
age <- as.numeric(format(lt[,2],format="%Y")) - as.numeric(format(lt[,1],format="%Y"))
first <- as.Date(paste(format(lt[,2],format="%Y"),"-",format(lt[,1],format="%m-%d"),sep=""))
age[which(first > lt[,2])] <- age[which(first > lt[,2])] - 1
age
}
There's another approach at https://gist.github.com/mmparker/7254445
Or you you just want to raw, decimal value of years, you can get the number of days and divide by 365.2425
Here is an approach that accounts for leap years (don't know if this has been done before, but suspect it has...).
get.age <- function(from, to) {
require(lubridate) # for leap_year(...)
n <- as.integer(to-from)
n.l <- sum(leap_year(seq(from,to,by=1)))
n.l/366 + (n+1-n.l)/365
}
get.age(as.Date("2009-01-01"),as.Date("2012-12-31"))
# [1] 4
get.age(as.Date("2012-01-01"),as.Date("2012-01-31")) # 2012 was a leap year
# [1] 0.08469945
get.age(as.Date("2011-01-01"),as.Date("2011-01-31")) # 2011 was not
# [1] 0.08493151
So the basic idea is to create a vector with one element for every day between from and to (inclusive), then for each day account for whether that day is part of a leap year or not. The we add up the leap year days and the non-leap year days separately and calculate the number of years as:
leap-year-days/366 + non-leap-year-days/365
This works for single dates (vectors of length 1). To enable this for columns of dates, as you asked, we use Vectorize(...).
vget.age <- Vectorize(get.age) # vectorized version
And then a demo:
# example data set
set.seed(1) # for reproducible example
today <- as.Date("2015-09-09")
df <- data.frame(birth.date=today-sample(1000:10000,2000)) # 2000 birthdays
result <- vget.age(df$birth.date,today) # how old are they?
head(result)
# [1] 9.282192 11.909589 16.854795 25.115068 7.706849 24.865753
How to convert between year,month,day and dates in R?
I know one can do this via strings, but I would prefer to avoid converting to strings, partly because maybe there is a performance hit?, and partly because I worry about regionalization issues, where some of the world uses "year-month-day" and some uses "year-day-month".
It looks like ISODate provides the direction year,month,day -> DateTime , although it does first converts the number to a string, so if there is a way that doesn't go via a string then I prefer.
I couldn't find anything that goes the other way, from datetimes to numerical values? I would prefer not needing to use strsplit or things like that.
Edit: just to be clear, what I have is, a data frame which looks like:
year month day hour somevalue
2004 1 1 1 1515353
2004 1 1 2 3513535
....
I want to be able to freely convert to this format:
time(hour units) somevalue
1 1515353
2 3513535
....
... and also be able to go back again.
Edit: to clear up some confusion on what 'time' (hour units) means, ultimately what I did was, and using information from How to find the difference between two dates in hours in R?:
forwards direction:
lh$time <- as.numeric( difftime(ISOdate(lh$year,lh$month,lh$day,lh$hour), ISOdate(2004,1,1,0), units="hours"))
lh$year <- NULL; lh$month <- NULL; lh$day <- NULL; lh$hour <- NULL
backwards direction:
... well, I didnt do backwards yet, but I imagine something like:
create difftime object out of lh$time (somehow...)
add ISOdate(2004,1,1,0) to difftime object
use one of the solution below to get the year,month,day, hour back
I suppose in the future, I could ask the exact problem I'm trying to solve, but I was trying to factorize my specific problem into generic reusable questions, but maybe that was a mistake?
Because there are so many ways in which a date can be passed in from files, databases etc and for the reason you mention of just being written in different orders or with different separators, representing the inputted date as a character string is a convenient and useful solution. R doesn't hold the actual dates as strings and you don't need to process them as strings to work with them.
Internally R is using the operating system to do these things in a standard way. You don't need to manipulate strings at all - just perhaps convert some things from character to their numerical equivalent. For example, it is quite easy to wrap up both operations (forwards and backwards) in simple functions you can deploy.
toDate <- function(year, month, day) {
ISOdate(year, month, day)
}
toNumerics <- function(Date) {
stopifnot(inherits(Date, c("Date", "POSIXt")))
day <- as.numeric(strftime(Date, format = "%d"))
month <- as.numeric(strftime(Date, format = "%m"))
year <- as.numeric(strftime(Date, format = "%Y"))
list(year = year, month = month, day = day)
}
I forego the a single call to strptime() and subsequent splitting on a separation character because you don't like that kind of manipulation.
> toDate(2004, 12, 21)
[1] "2004-12-21 12:00:00 GMT"
> toNumerics(toDate(2004, 12, 21))
$year
[1] 2004
$month
[1] 12
$day
[1] 21
Internally R's datetime code works well and is well tested and robust if a bit complex in places because of timezone issues etc. I find the idiom used in toNumerics() more intuitive than having a date time as a list and remembering which elements are 0-based. Building on the functionality provided would seem easier than trying to avoid string conversions etc.
I'm a bit late to the party, but one other way to convert from integers to date is the lubridate::make_date function. See the example below from R for Data Science:
library(lubridate)
library(nycflights13)
library(tidyverse)
a <- flights %>%
mutate(date = make_date(year, month, day))
Found one solution for going from date to year,month,day.
Let's say we have a date object, that we'll create here using ISOdate:
somedate <- ISOdate(2004,12,21)
Then, we can get the numerical components of this as follows:
unclass(as.POSIXlt(somedate))
Gives:
$sec
[1] 0
$min
[1] 0
$hour
[1] 12
$mday
[1] 21
$mon
[1] 11
$year
[1] 104
Then one can get what one wants for example:
unclass(as.POSIXlt(somedate))$mon
Note that $year is [actual year] - 1900, month is 0-based, mday is 1-based (as per the POSIX standard)