I am importing some data from NOAA and it has dates and times in separate columns. I searched for an elegant way to append a single datetime column in my R dataframe but was unable. I found a stack exchange question about the inverse but not this one. Is there a simple as.Date command I could run? I am simply using read.table for a downloaded text file and it imports just find.
Buoy data is here:
http://www.ndbc.noaa.gov/data/realtime2/51202.txt
>yr mo dy hr mn degT m.s m.s.1 m sec sec.1 degT.1 hPa degC degC.1 degC.2 nmi hPa.1 ft
>2012 1 16 3 55 MM MM MM 1.4 10 7.2 339 MM MM 23.9 MM MM MM MM
You can use ISOdatetime, which is just a simple wrapper to as.POSIXct. Make sure to specify the sec argument as zero.
Data$timestamp <- with(Data, ISOdatetime(YY,MM,DD,hh,mm,0))
Yep, you want to paste the date time columns together and then coerce that full string to a date time object.
dat <- within(dat, datetime <- as.POSIXlt(paste(yr, mo, dy, hr, mn),
format = "%Y %m %d %H %M"))
assuming dat is the object containing the buoy data. This adds a new columns that is a "POSIXlt" class object or you could use as.POSIXct() if you prefer the other format.
Or, having looked at the file so can use their column names:
dat <- within(dat, datetime <- as.POSIXlt(paste(YY, MM, DD, hh, mm),
format = "%Y %m %d %H %M"))
Related
I have a 24 hour data starting from 7:30 today (for example), until 7:30 the next day, because I didn't link the date to the line plot, R sorts the hour starting from 00:00 despite the data starting at 7:30, I am a beginner in R, and I don't know where to begin to even solve this problem, should I try linking the date also to the X axis, or is there a better solution?
My time function somehow didn't work either, it used to work when I was plotting data for 15 minute increments.
library(chron)
d <- read.csv(file="data.csv", header = T)
t <- times(d$Time)
plot(t,d$MCO2, type="l")
Graph created from the 24 hour data I have :
Graph created from a 15 minute data using the same code :
I wanted the outcome to be from 7:30 to 7:30 the next day, but it showed now a decimal number from 0.0 to 1
Here is the link to the data, just in case:
https://www.dropbox.com/s/wsg437gu00e5t08/Data%20210519.csv?dl=0
The question is actually about combining a date column and a time column to create a timestamp containing date AND time. Note that I suggest to process everything as if we are in GMT timezone. You can pick whatever timezone you want, then stick to it.
# use ggplot
library(ggplot2)
# assume everything happens in GMT timezone
Sys.setenv( TZ = "GMT" )
# replicating the data: a measurement result sampled at 1 sec interval
t <- seq(start, end, by = "1 sec")
Time24 <- trimws(strftime(t, format = "%k:%M:%OS", tz="GMT"))
Date <- strftime(t, format = "%d/%m/%Y", tz="GMT")
head(Time24)
head(Date)
d <- data.frame(Date, Time24)
# this is just a random data of temperature
d$temp <- rnorm(length(d$Date),mean=25,sd=5)
head(d)
# the resulting data is as follows
# Date Time24 temp
#1 22/05/2019 0:00:00 22.67185
#2 22/05/2019 0:00:01 19.91123
#3 22/05/2019 0:00:02 19.57393
#4 22/05/2019 0:00:03 15.37280
#5 22/05/2019 0:00:04 31.76683
#6 22/05/2019 0:00:05 26.75153
# this is the answer to the question
# which is combining the the date and the time column of the data
# note we still assume that this happens in GMT
t <- as.POSIXct(paste(d$Date,d$Time24,sep=" "), format = "%d/%m/%Y %H:%M:%OS", tz="GMT")
# print the data into a plot
png(filename = "test.png", width = 800, height = 600, units = "px", pointsize = 22 )
ggplot(d,aes(x=t,y=temp)) + geom_line() +
scale_x_datetime(date_breaks = "3 hour",
date_labels = "%H:%M\n%d-%b")
The problem is that the function times does not include information about the day. This is a problem since your data spans two days.
The data type you use should be able to include information about the day. Posix is this data type. Also, since Posix is the go-to date-time object in R it is much easier to plot.
Before plotting the data, the time column should have the correct difference in days. When just transforming the column with as.POSIXct, the times of day 2 are read as if it is from day 1. This is why we have to add 24 hours to the correct entries.
After that, it is just a matter of plotting. I added an example of the package of ggplot2 since I prefer these plots.
You might notice that using as.POSIXct will add an incorrect date to your time information. Don't bother about this, you use this date just as a dummy date. You don't use this date itself, you just use it to be able to work with the difference in days.
library(ggplot2)
# Read in your data set
d <- read.csv(file="Data 210519.csv", header = T)
# Read column into R date-time object
t <- as.POSIXct(d$Time24, format = "%H:%M:%OS")
# Add 24 hours to time the time on day 2.
startOfDayTwo <- as.POSIXct("00:00:00", format = "%H:%M:%OS")
endOfDayTwo <- as.POSIXct("07:35:00", format = "%H:%M:%OS")
t[t >= startOfDayTwo & t <= endOfDayTwo] <- t[t >= startOfDayTwo & t <= endOfDayTwo] + 24*60*60
plot(t,d$MCO2, type="l")
# arguably a nicer plot
ggplot(d,aes(x=t,y=MCO2)) + geom_line() +
scale_x_datetime(date_breaks = "2 hour",
date_labels = "%I:%M %p")
I want to calculate the number of months between two dates but before that I have a problem when loading the data in r. In csv sheet the format is mm/dd/yyyy but in R the variable is classified as character.
I tried
data$issue_d <- format(as.Date(data$issue_d), "%m/%d/%Y")
and to convert as date first but it gives the following error
character string is not in a standard unambiguous format
Any suggestion for this?
Example input:
issue_d <- c("Dec,2011","Nov,2014","Apr,2015")
Try below:
# example data
df1 <- data.frame(
issue_d1 = c("Dec,2011","Nov,2014","Apr,2015"),
issue_d2 = c("Nov,2015","Sep,2017","Apr,2018"))
library(zoo)
df1$Months <-
(as.yearmon(df1$issue_d2, "%b,%Y") -
as.yearmon(df1$issue_d1, "%b,%Y")) * 12
df1
# issue_d1 issue_d2 Months
# 1 Dec,2011 Nov,2015 47
# 2 Nov,2014 Sep,2017 34
# 3 Apr,2015 Apr,2018 36
I'm trying to set up a new variable that incorporates the difference (in number of days) between a known date and the end of a given year. Dummy data below:
> Date.event <- as.POSIXct(c("12/2/2000","8/2/2001"), format = "%d/%m/%Y", tz = "Europe/London")
> Year = c(2000,2001)
> Dates.test <- data.frame(Date.event,Year)
> Dates.test
Date.event Year
1 2000-02-12 2000
2 2001-02-08 2001
I've tried applying a function to achieve this, but it returns an error
> Time.dif.fun <- function(x) {
+ as.numeric(as.POSIXct(sprintf('31/12/%s', s= x['Year']),format = "%d/%m/%Y", tz = "Europe/London") - x['Date.event'])
+ }
> Dates.test$Time.dif <- apply(
+ Dates.test, 1, Time.dif.fun
+ )
Error in unclass(e1) - e2 : non-numeric argument to binary operator
It seems that apply() does not like as.POSIXct(), as testing a version of the function that only derives the end of year date, it is returned as a numeric in the form '978220800' (e.g. for end of year 2000). Is there any way around this? For the real data the function is a bit more complex, including conditional instances using different variables and sometimes referring to previous rows, which would be very hard to do without apply.
Here are some alternatives:
1) Your code works with these changes. We factored out s, not because it is necessary, but only because the following line gets very hard to read without that due to its length. Note that if x is a data frame then so is x["Year"] but x[["Year"]] is a vector as is x$Year. Since the operations are all vectorized we do not need apply.
Although we have not made this change, it would be a bit easier to define s as s <- paste0(x$Year, "-12-31") in which case we could omit the format argument in the following line owing to the use of the default format.
Time.dif.fun <- function(x) {
s <- sprintf('31/12/%s', x[['Year']])
as.numeric(as.POSIXct(s, format = "%d/%m/%Y", tz = "Europe/London") -x[['Date.event']])
}
Time.dif.fun(Dates.test)
## [1] 323 326
2) Convert to POSIXlt, set the year, month and day to the end of the year and subtract. Note that the year component uses years since 1900 and the mon component uses Jan = 0, Feb = 1, ..., Dec = 11. See ?as.POSIXlt for details on these and other components:
lt <- as.POSIXlt(Dates.test$Date.event)
lt$year <- Dates.test$Year - 1900
lt$mon <- 11
lt$mday <- 31
as.numeric(lt - Dates.test$Date.event)
## [1] 323 326
3) Another possibility is:
with(Dates.test, as.numeric(as.Date(paste0(Year, "-12-31")) - as.Date(Date.event)))
## [1] 323 326
You could use the difftime function:
Dates.test$diff_days <- difftime(as.POSIXct(paste0(Dates.test[,2],"-12-31"),format = "%Y-%m-%d", tz = "Europe/London"),Dates.test[,1],unit="days")
You can use ISOdate to build the end of year date, and the difftime(... units='days') to get the days til end of year.
From ?difftime:
Limited arithmetic is available on "difftime" objects: they can be
added or subtracted, and multiplied or divided by a numeric vector.
If you want to do more than the limited arithmetic, just coerce with as.numeric(), but you will have to stick with whatever units you specified.
By convention, you may wish to use the beginning of the next year (midnight on new year's eve) as your endpoint for that year. For example:
Dates.test <- data.frame(
Date.event = as.POSIXct(c("12/2/2000","8/2/2001"),
format = "%d/%m/%Y", tz = "Europe/London")
)
# use data.table::year() to get the year of a date
year <- function(x) as.POSIXlt(x)$year + 1900L
Dates.test$Date.end <- ISOdate(year(Dates.test$Date.event)+1,1,1)
# if you don't want class 'difftime', wrap it in as.numeric(), as in:
Dates.test$Date.diff <- as.numeric(
difftime(Dates.test$Date.end,
Dates.test$Date.event,
units='days')
)
Dates.test
# Date.event Date.end Date.diff
# 1 2000-02-12 2001-01-01 12:00:00 324.5
# 2 2001-02-08 2002-01-01 12:00:00 327.5
The apply() family are basically a clean way of doing for loops, and you should strive for more efficient, vectorized solutions.
If I have a vector of dates and hours such as...
c("2016-03-15 13","2016-03-16 23","2016-03-17 06","2016-03-18 15","2016-03-19 08","2016-03-20 21")
Can I find the number of hours that pass between each timestamp? I looked into difftime but it requires 2 vectors.
We can do this after converting to 'DateTime' class using lubridate, then get the difference in 'hour' between adjacent elements using difftime by passing two vectors after removing the last and first observation in the vector
library(lubridate)
v2 <- ymd_h(v1)
Or a base R option is as.POSIXct
v2 <- as.POSIXct(v1, format = "%Y-%m-%d %H")
and then do the difftime
difftime(v2[-length(v2)], v2[-1], unit = "hour")
data
v1 <- c("2016-03-15 13","2016-03-16 23","2016-03-17 06",
"2016-03-18 15","2016-03-19 08","2016-03-20 21")
You can do this by using strptime() function.
Try something like this.
data <- c("2016-03-15 13","2016-03-16 23","2016-03-17 06","2016-03-18 15","2016-03-19 08","2016-03-20 21")
datevec <- strptime(data,"%Y-%m-%d %H")
difftime(datevec[-length(datevec)],datevec[-1],units="hours")
Here is the output.
> difftime(datevec[-length(datevec)],datevec[-1],units="hours")
Time differences in hours
[1] -34 -7 -33 -17 -37
I have my data in following format.
x <- c("2012-03-01T00:05:55+00:00", "2012-03-01T00:06:23+00:00",
"2012-03-01T00:06:52+00:00")
Actual data is very long.
My objective is
convert them to hourly time-series in R
to aggregate my data to hourly data
First convert your dates into a date-time class using asPOSIXct
df = data.frame(x = c("2012-03-01T00:05:55+00:00", "2012-03-01T00:06:23+00:00",
"2012-03-01T00:06:52+00:00"))
df$times = as.POSIXct(df$x, format = "%Y-%m-%dT00:%H:%M+%S")
Then extract just the hour part using format
df$hour = format(df$times, '%H')
This give you :
x times hour
1 2012-03-01T00:05:55+00:00 2012-03-01 05:55:00 05
2 2012-03-01T00:06:23+00:00 2012-03-01 06:23:00 06
3 2012-03-01T00:06:52+00:00 2012-03-01 06:52:00 06
Or you can extract the date and the hour using:
df$date_hour = format(df$times, '%Y-%m-%d:%H')
for more infor see ?strftime it says "A conversion specification is introduced by %, usually followed by a single letter or O or E and then a single letter. Any character in the format string not part of a conversion specification is interpreted literally (and %% gives %). Widely implemented conversion specifications include:... %H
Hours as decimal number (00–23). As a special exception strings such as 24:00:00 are accepted for input, since ISO 8601 allows these."
Now you can do any aggregartion you want using something like plyr::ddply
library(plyr)
ddply(df, .(hour), nrow)
hour V1
1 05 1
2 06 2
or
ddply(df, .(date_hour), nrow)
date_hour V1
1 2012-03-01:05 1
2 2012-03-01:06 2