I have some data, for example:
Date CAC Index
2014-10-10 4073,71
2014-10-17 4033,18
2014-10-24 4128,9
But when I put it into R with XLConnect library I get the following:
wb<-loadWorkbook(file.choose())
lp<-getSheets(wb)
data=lapply(seq_along(lp),function(i) readWorksheet(wb,sheet=lp[i],startRow=1))[[1]]
data[,1]=as.character(data[,1])
tail(data,3)[,c(1,4)]
Date CAC.Index
719 2014-10-09 22:00:00 4073.71
720 2014-10-16 22:00:00 4033.18
721 2014-10-23 22:00:00 4128.90
Why don't I get the same dates?
In example:
I dont get 2014-10-24, instead I get 2014-10-23 22:00:00
Might it be an issue with
ttz<-Sys.getenv('TZ')
Sys.setenv(TZ='GMT')
?
Best regards
I think it comes from importing data as GMT and converting it to your local timezone, which seems to GMT-2, therefore 2014/10/10 00:00 is set to 2014/10/09 22:00.
Maybe you could solve this by specifying your tz according to OlsonNames() list or specify your date column is Date instead of POSIXct.
It looks like it has converted the date string in the Excel to a Date object in R. Try a str(data) to see what the types are in your data.frame (good habit to get into)
If it is a Date object, then you can use format to put it in the way you would like to read it. Something like:
##assuming data$Date is a Date class object
data$DateFormatted <- format(data$Date, format="%Y-%m-%d")
See ?format for other examples.
Related
Hope you can help i have massive CSV file i need to import in to R manipulate and export to excel, all other data is importing and manipulating fine apart from the Date format, the CSV is supplied (and cant be changed) with all dates with dd/mm/yyyy hh:mm, i need a way to strip it down to dd/mm/yyyy,(dd/mm/yy) all methods i have tried so far have altered the date to mm/dd/yyyy of give me multiple errors.
The only work around i have found is to convert the data in the CSV in to General format before importing it however the "live" CSV are to big to open and convert.
Any help would be great
One potential solution could be to use lubridate to parse the text strings of the date/time columns after import. From this you can extract the date and time (using date() and hms::as_hms()):
library(readr)
library(dplyr)
library(lubridate)
read_csv("Date_time\n
01/09/2021 19:30\n
19/12/2020 12:45\n
16/03/2019 00:15") %>%
mutate(Date_time = dmy_hm(Date_time),
Date = date(Date_time),
Time = hms::as_hms(Date_time))
#> # A tibble: 3 x 3
#> Date_time Date Time
#> <dttm> <date> <time>
#> 1 2021-09-01 19:30:00 2021-09-01 19:30
#> 2 2020-12-19 12:45:00 2020-12-19 12:45
#> 3 2019-03-16 00:15:00 2019-03-16 00:15
This at least gives you tidy and workable data imported into R, able to be formatted for printing. Does this reach your solution? If it's not working on your data then perhaps post a small sample (or representative sample) as an example to try and get working.
Created on 2021-12-07 by the reprex package (v2.0.1)
I'm struggling with converting character class dates of many different format types (e.g., yyyy/mm/dd; mm/dd/yyyy; yyyy-mm-dd; mm-dd-yyyy; yy-mm-dd; mm-dd-yy; etc.) to POSIXlt class. Ideally, I would like to convert all birth_dates to POSIXlt class with yyyy/mm/dd format (see sample data below). Is there any simple way to do this in R?:
id birth_date start_date age
102 08/09/1993 2013/09/01 20
103 1995-02-21 2013/09/01 18
104 01-15-94 2013/09/01 19
105 88-12-30 2013/09/01 24
Here is what I have been doing thus far. Unfortunately, this doesn't seem to work (I wind up with more NAs than there should be) given all of the different ways in which the original date is formatted:
library(lubridate)
data$birth_date1<-as.Date(data$birth_date,format="%Y-%m-%d") #Convert character class to date class
data$birth_date2<-ymd(swc3$birth_date1) #Convert date class to POSIXlt class using lubridate pkg
That's horrible. Could be worse though. At least there are delimiters in there, like "-" and "/".
Short Answer
Yes, there's an easy way to parse that in R. Apply parse_date_time() separately to each birth date, giving it a decent orders list to chose from, and carefully set the order of the guesses. You'll need to convert the "integer-time" to a useful time when you're done.
See the Long Answer for details.
Long Answer
This is why the lubridate package has parse_date_time(). But there are problems. Let's see:
require(lubridate)
# WRONG! doesn't work as intended.
as.Date(
parse_date_time(data$birth_date,
orders=c("ymd", "mdy", "mdY", "Ymd")
)
)
[1] "1993-08-09" "1995-02-21" "1994-01-15" "0088-12-30"
That looks great, except for the last one. What's going on?
parse_date_time() is selecting a "best fit" set of orders and formats to use when parsing the dates, and the last element is the odd one out.
To make this work as intended, you'll need to apply parse_date_time() one-by-one to each date, because each date format was apparently selected more-or-less at random. This will be slower, but it will give more useful answers.
# RIGHT. Some conversion of results required.
parsed <- sapply(data[,"birth_date"],
parse_date_time,
orders=c("ymd", "mdy", "mdY", "Ymd") )
parsed
08/09/1993 1995-02-21 01-15-94 88-12-30
744854400 793324800 758592000 599443200
Ok, those look like Unix-time integers, which are the unclass()'d version of what parse_date_time() produces. And none are negative, so they must all have happened after 1970. This is encouraging. Convert:
# Conversion of results
parsed <- as.POSIXct(parsed, origin="1970-01-01", tz = "GMT")
as.Date(parsed)
08/09/1993 1995-02-21 01-15-94 88-12-30
"1993-08-09" "1995-02-21" "1994-01-15" "1988-12-30"
lubridate and parse_date_time() are very good at what they do.
Since you asked for POSIXlt, not Date types:
as.POSIXlt(parsed)
08/09/1993 1995-02-21
"1993-08-09 10:00:00 AEST" "1995-02-21 11:00:00 AEDT"
01-15-94 88-12-30
"1994-01-15 11:00:00 AEDT" "1988-12-30 11:00:00 AEDT"
Though I personally prefer only having dates when the actual time isn't important; these are assumed to be all happening at midnight UTC, and are converted to my time zone (Eastern Australia).
I'm trying to combine two vectors of dates into a single vector. I have been using dates with the lubridate package.
First I create two vectors of dates:
library(lubridate)
mydate <- mdy("04/01/2016")
mydate_range <- mydate + (1:12)*months(1)
anotherdate_range <- mdy("05/01/2017") + (1:12)*months(1)
Inspecting mydate_range and anotherdate_range these seem to have worked fine.
But then when I try to combine these into one vector things get weird.
combineddates <- c(mydate_range, anotherdate_range)
combineddates
[1] "2016-04-30 19:00:00 CDT" "2016-05-31 19:00:00 CDT" "2016-06-30 19:00:00 CDT"
The first date of combineddates is now "2016-04-30". Before I combined them using the c() function the first date of mydate_range was "2016-05-01".
Not sure why this changed. How should I join these date vectors?
The reason for the date change is the conversion due to time zone adjustments. 2016-04-30 19:00:00 CDT is the same as 2016-05-01 GMT. Most likely your initial sequence was in GMT and somewhere along the way it got converted to local time.
I find it best to define the time zone in your initial definition and it should stay consistent throughout.
I'm getting inconsistent results when trying to subset data based on a date being before or after some POSIXct date and time. When I make a string of dates like this:
myDates <- c(as.POSIXct("2014-12-27 08:10:00 UTC"),
as.POSIXct("2014-12-27 08:15:00 UTC"),
as.POSIXct("2014-12-27 09:30:00 UTC"))
and then try to subset to find all the entries in myDates that were before 8:15 a.m. on Dec. 27, 2014 like this:
myDates[myDates < as.POSIXct("2014-12-27 08:15:00")]
that works fine and I get
"2014-12-27 08:10:00 PST"
(although I don't understand why it says "PST" for the time zone; that's where I am, but I set it to UTC).
However, my original date and time data were in Excel, where they were in numeric format. I imported them as a data.frame called Samples and converted the date and time column into POSIXct format by doing:
as.POSIXct(Samples$DateTime, origin = "1970-01-01", tz = "UTC")
Now, I'm having hair-pulling, head-onto-desk-bashing frustrations with subsetting those dates. Take one date in particular, x <- Samples$DateTime[34], which, according to the output R gives me, is "2014-12-27 08:10:00 UTC". If I check whether x < 2014-12-27 08:15, that should be true, and here's what I see:
x < as.POSIXct("2014-12-27 08:15:00 UTC")
TRUE
But x should NOT be less 2014-12-27 8:09:00 UTC, right? This is what I see:
X < as.POSIXct("2014-12-27 08:09:00 UTC")
TRUE
Why, for the love of Pete, does R tell me that 8:10 is before 8:09?!? This doesn't seem to be a problem for data that I just type in like above, only for data I've imported from Excel.
You probably need to get everything in the same timezone first. Try
as.numeric(as.POSIXct("2014-12-27 08:10:00 UTC", tz="UTC"))
#[1] 1419667800
# equivalent to "2014-12-27 08:10:00 UTC"
vs.
as.numeric(as.POSIXct("2014-12-27 08:10:00 UTC"))
#[1] 1419631800
# equivalent to 8:10 in local timezone - in my case Aust. EST.
# "2014-12-27 08:10:00 AEST"
You can see that they are actually numerically different.
To fix this, specify the tz= explicitly when importing as the "UTC" in your text strings will not be detected on input.
Also, be really careful with variable names. Likely you just slipped in typing it in here, but in the description of the problem and the first logical comparison you used x and in the second one you used X.
R is case sensitive, so it would not compare your date to the one stored in x. If anything else was stored in memory with X it may actually be that you were given the right answer for the question you asked.
I use rmongodb to query a MongoDB. I connect to the DB which works nicely (require(rmongodb); mongo <- mongo.create("foo")) and I am generally able to get stuff out of the database. I just don't know what to do about the date formats..
TIME <- strptime("2013-11-11 15:00",format="%Y-%m-%d %H:%M",tz="CET")
query = mongo.bson.buffer.create()
mongo.bson.buffer.append(query, "timestamp", TIME)
query = mongo.bson.from.buffer(query)
when I look at this query it says:
timestamp : 9 1198930688
So mongo.bson.buffer.append has properly recognized that timestamp is a date class and does some conversion -- which I don't understand. This is not UNIX time and I would not really care if the values returned from the database weren't in this format as well. I'm particularly puzzled because quite some of these numeric date values are negative while all my dates are from 2013... Some more examples:
# 2013-10-10 12:15 --> -1579369312
# 2013-10-10 12:30 --> -1578469312
# 2013-11-10 12:30 --> 1103530688
So basically my question is: How can I convert this funny date format (1198930688) back to POSIXct?
Thanks a lot!
skr
Try
myTIME <- mongo.bson.value( query, "timestamp" )
myTIME
[1] "2013-11-11 15:00:00 CET"