I have this vector representing time recorded as hours (0 to 24) and minute (0 to 59). I would like to transform it into a %H:%M time format in R such that I can use function like difftime.
str(SF5$ES_TIME)
int [1:11452] 1940 600 5 1455 1443 2248 1115 900 200 420 ...
This is what I've tried, but in both cases, I got an error:
>SF5$time1<-as.POSIXct(SF5$ES_TIME, format = "%H:%M",tz="EST")
Error in as.POSIXct.numeric(SF5$ES_TIME, format = "%H:%M", tz = "EST") :
'origin' must be supplied
SF5$time1<-as.POSIXct(as.character(SF5$ES_TIME), format="%H:%M",tz="")
> str(SF5$time1)
POSIXct[1:11452], format: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
Any help or reading suggestions would be much appreciated!
Thank you,
Aurelie
Well, the error message tells you to provide origin and a minute is 60 seconds, so:
SF5 <- list(ES_TIME=as.integer(c(1940,600,5,1455,1443,2248,1115,900,200,420)))
x <- as.POSIXct(SF5$ES_TIME*60, origin="1970-01-01")
format(x, format="%H:%M")
#[1] "08:20" "10:00" "00:05" "00:15" "00:03" "13:28" "18:35" "15:00" "03:20" "07:00"
Note that the POSIXct date is just a number (with a class), so you need the format call to print it as you want - the default printing of x would print the full date info (year/month/day etc).
...any origin date would do since you don't care about it, but 1970-01-01 is the usual origin...
I was able to crack down the code! Thank you all for your tip!
#1) as suggested by Justin : put all numbers into four digits with zero padding
SF5$ES_TIME2<-sprintf("%04d",SF5$ES_TIME)
#2) Matched these %H%M with their corresponding date %y-%m-%d
SF5$ES.datetime <- paste(SF5$ES_TIME2,SF5$ES_DATE,sep=" ")
#3) Transform into Date-Time format
SF5$ES.datetime2 <- as.POSIXct(SF5$ES.datetime,format="%H%M %y-%m-%d", tz="")
# Did the same for my other time-date of interest
SF5$SH_TIME2<-sprintf("%04d",SF5$SH_TIME)
SF5$SH.datetime <- paste(SF5$SH_TIME2,SF5$SH_DATE,sep=" ")
SF5$SH.datetime2 <- as.POSIXct(SF5$SH.datetime,format="%H%M %y-%m-%d", tz="")
# Calculate the time difference between the 2 date-time in hours
SF5$duration<-difftime(SF5$SH.datetime2,SF5$ES.datetime2,units="hours",tz="")
Related
I have some very simple data in R that needs to have its date format changed:
date midpoint
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
4 31/05/2011 0.7970
5 30/04/2011 0.7877
6 31/03/2011 0.7411
7 28/02/2011 0.7624
8 31/01/2011 0.7665
9 31/12/2010 0.7500
10 30/11/2010 0.7734
11 31/10/2010 0.7511
12 30/09/2010 0.7263
13 31/08/2010 0.7158
14 31/07/2010 0.7110
15 30/06/2010 0.6921
16 31/05/2010 0.7005
17 30/04/2010 0.7113
18 31/03/2010 0.7027
19 28/02/2010 0.6973
20 31/01/2010 0.7260
21 31/12/2009 0.7154
22 30/11/2009 0.7287
23 31/10/2009 0.7375
Rather than %d/%m/%Y, I would like it in the standard R format of %Y-%m-%d
How can I make this change? I have tried:
nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")
But that just cut off the year and added zeros to the day:
[1] "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20"
[6] "0031/03/20" "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20"
[11] "0031/10/20" "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20"
[16] "0031/05/20" "0030/04/20" "0031/03/20" "0028/02/20" "0031/01/20"
[21] "0031/12/20" "0030/11/20" "0031/10/20" "0030/09/20" "0031/08/20"
[26] "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20" "0031/03/20"
[31] "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20" "0031/10/20"
[36] "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20"
Thanks!
There are two steps here:
Parse the data. Your example is not fully reproducible, is the data in a file, or the variable in a text or factor variable? Let us assume the latter, then if you data.frame is called X, you can do
X$newdate <- strptime(as.character(X$date), "%d/%m/%Y")
Now the newdate column should be of type Date.
Format the data. That is a matter of calling format() or strftime():
format(X$newdate, "%Y-%m-%d")
A more complete example:
R> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"),
+ mid=c(0.8378,0.8457,0.8147))
R> nzd
date mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
R> nzd$newdate <- strptime(as.character(nzd$date), "%d/%m/%Y")
R> nzd$txtdate <- format(nzd$newdate, "%Y-%m-%d")
R> nzd
date mid newdate txtdate
1 31/08/2011 0.8378 2011-08-31 2011-08-31
2 31/07/2011 0.8457 2011-07-31 2011-07-31
3 30/06/2011 0.8147 2011-06-30 2011-06-30
R>
The difference between columns three and four is the type: newdate is of class Date whereas txtdate is character.
nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")
In the above piece of code, there are two mistakes. First of all, when you are reading nzd$date inside as.Date you are not mentioning in what format you are feeding it the date. So, it tries it's default set format to read it. If you see the help doc, ?as.Date you will see
format
A character string. If not specified, it will try "%Y-%m-%d"
then "%Y/%m/%d" on the first non-NA element, and give an error
if neither works. Otherwise, the processing is via strptime
The second mistake is: even though you would like to read it in %Y-%m-%d format, inside format you wrote "%Y/%m/%d".
Now, the correct way of doing it is:
> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"),
+ mid=c(0.8378,0.8457,0.8147))
> nzd
date mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
> nzd$date <- format(as.Date(nzd$date, format = "%d/%m/%Y"), "%Y-%m-%d")
> head(nzd)
date mid
1 2011-08-31 0.8378
2 2011-07-31 0.8457
3 2011-06-30 0.8147
You could also use the parse_date_time function from the lubridate package:
library(lubridate)
day<-"31/08/2011"
as.Date(parse_date_time(day,"dmy"))
[1] "2011-08-31"
parse_date_time returns a POSIXct object, so we use as.Date to get a date object. The first argument of parse_date_time specifies a date vector, the second argument specifies the order in which your format occurs. The orders argument makes parse_date_time very flexible.
After reading your data in via a textConnection, the following seems to work:
dat <- read.table(textConnection(txt), header = TRUE)
dat$date <- strptime(dat$date, format= "%d/%m/%Y")
format(dat$date, format="%Y-%m-%d")
> format(dat$date, format="%Y-%m-%d")
[1] "2011-08-31" "2011-07-31" "2011-06-30" "2011-05-31" "2011-04-30" "2011-03-31"
[7] "2011-02-28" "2011-01-31" "2010-12-31" "2010-11-30" "2010-10-31" "2010-09-30"
[13] "2010-08-31" "2010-07-31" "2010-06-30" "2010-05-31" "2010-04-30" "2010-03-31"
[19] "2010-02-28" "2010-01-31" "2009-12-31" "2009-11-30" "2009-10-31"
> str(dat)
'data.frame': 23 obs. of 2 variables:
$ date : POSIXlt, format: "2011-08-31" "2011-07-31" "2011-06-30" ...
$ midpoint: num 0.838 0.846 0.815 0.797 0.788 ...
This is really easy using package lubridate. All you have to do is tell R what format your date is already in. It then converts it into the standard format
nzd$date <- dmy(nzd$date)
that's it.
Using one line to convert the dates to preferred format:
nzd$date <- format(as.Date(nzd$date, format="%d/%m/%Y"),"%Y/%m/%d")
I believe that
nzd$date <- as.Date(nzd$date, format = "%d/%m/%Y")
is sufficient.
I have an excel which includes dates. I'm importing this excel file into a 'data frame'.
After importing, I tried to convert one column into an date format, but it's displaying 'NA'
What I tried:
str(df$Date_of_visit) # prints type before conversion
df$Date_of_visit # values in the column
df$Date_of_visit <- as.Date(df$Date_of_visit, origin = "1899-12-30", format="%m%d%y") #converting to date
str(df$Date_of_visit) # prints type after conversion
print(df$Date_of_visit) # values in the column
Output I got :
chr [1:4] "43503" "43319" "43473" "43473"
Date[1:4], format: NA NA NA NA
[1] NA NA NA NA
Can someone help me out? What is the mistake I'm doing here?
Thanks in advance!
Regards
Mouni.
You need to not specify format= argument in your as.Date(), and convert the characters to numeric before using as.Date(). Example:
dte <- c("43503","43319","43473","43473")
dte <- as.Date(as.numeric(dte), origin = "1899-12-30")
dte
#[1] "2019-02-07" "2018-08-07" "2019-01-08" "2019-01-08"
format(dte, "%m%d%Y")
#[1] "02072019" "08072018" "01082019" "01082019"
You can use format() to convert the Date objects to character of your choice of format. Note that format() gives you character object, not Date anymore.
Good Afternoon! I have data which consist of date and time of share price. I need to join this data to the one column.
date time open high low close
1 1999.04.08 11:00 1.0803 1.0817 1.0797 1.0809
2 1999.04.08 12:00 1.0808 1.0821 1.0806 1.0807
3 1999.04.08 13:00 1.0809 1.0814 1.0801 1.0813
4 1999.04.08 14:00 1.0819 1.0845 1.0815 1.0844
5 1999.04.08 15:00 1.0839 1.0857 1.0832 1.0844
6 1999.04.08 16:00 1.0842 1.0852 1.0824 1.0834
I tried to do that using this function:
df1 <- within(data, { timestamp = strptime(paste(date, time), "%Y/%m/%d%H:%M:%S") })
but I got the column of NAs.
Also I tried to do that using:
data$date_time = mdy_hm(paste(data$date, data$time))
but I got again the error:
Warning message:
All formats failed to parse. No formats found.
Please, tell me what I do wrong.
In your particular example, let's break it down first to see why you are getting NA values, and then generate a solution that creates your desired results.
> date <- c("1999.04.08", "1999.04.08")
> time <- c("11:00", "12:00")
> df <- data.frame(date, time, stringsAsFactors = F)
> df
date time
1 1999.04.08 11:00
2 1999.04.08 12:00
> str(df)
'data.frame': 2 obs. of 2 variables:
$ date: chr "1999.04.08" "1999.04.08"
$ time: chr "11:00" "12:00"
Don't forget to use str to understand the data type(s) you are dealing with. That can and will greatly influence the answer to your question. Looking at the help description of function strptime, we see the following definition:
strptime converts character vectors to class "POSIXlt": its input x is first converted by as.character. Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.
So, let's break down your code:
df1 <- within(data,
{ timestamp = strptime(paste(date, time),
"%Y/%m/%d%H:%M:%S")
})
First, the paste function:
> paste(date[1], time[1])
[1] "1999.04.08 11:00"
This generates a character vector with the format above.
Next, the strptime command.
> strptime(paste(date[1], time[1]), "%Y/%m/%d%H:%M:%S")
[1] NA
Okay, we see an NA. First, be sure to explicitly write format =, if it reads as tedious, then you should not be having any problems writing flawless code that you will remember forever. Looking at the help code we see:
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- strptime(x, "%d%b%Y")
> z
[1] "1960-01-01 PST" "1960-01-02 PST" "1960-03-31 PST" "1960-07-30 PDT"
Notice the help section also defines upper/lower case Y, and the same with the month and date variables. In your case, you are trying to extract something of the following form: YYYY/mm/ddHH:MM:SS, such as 2017/20/1111:28:30. Do you see the issue now?
Using your string extraction attempt, we modify it slightly to get the format you are looking for:
> strptime(paste(date, time), format = "%Y.%m.%d %H:%M")
[1] "1999-04-08 11:00:00 PDT" "1999-04-08 12:00:00 PDT"
Putting it all together you get:
> df1 <- within(df, {timestamp = strptime(paste(date, time), format = "%Y.%m.%d %H:%M")})
> str(df1)
'data.frame': 2 obs. of 3 variables:
$ date : chr "1999.04.08" "1999.04.08"
$ time : chr "11:00" "12:00"
$ timestamp: POSIXlt, format: "1999-04-08 11:00:00" "1999-04-08 12:00:00"
> df1
date time timestamp
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
Oh yeah, and try out the dplyr package.
library(dplyr)
> df %>%
mutate(ts = as.POSIXct(paste(date,time),
format = "%Y.%m.%d %H:%M"))
date time ts
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
I have difficulty converting dates from excel (reading from csv) to R. Help is much appreciated.
Here is what I'm doing:
df$date = as.Date(df$excel.date, format = "%d/%m/%Y")
However, some dates get converted but some not. Here is the output of:
head(df$date)
[1] NA NA NA "0006-01-05" NA NA
the first 5 entries imported from csv file are as follows:
7/28/05
7/28/05
12/16/05
5/1/06
4/21/05
and here is the output of:
head(df$excel.date)
[1] 7/28/05 7/28/05 12/16/05 5/1/06 4/21/05 1/25/07
1079 Levels: 1/1/00 1/1/02 1/1/97 1/10/96 1/10/99 1/11/04 1/11/94 1/11/96 1/11/97 1/11/98 ... 9/9/99
str(df)
.
.
$ excel.date : Factor w/ 1079 levels "1/1/00","1/1/02",..: 869 869 288 618 561 48 710 1022 172 241 ...
First of all, make sure you have the dates in your file in an unambiguous format, using full years (not just 2 last numbers). %Y is for "year with century" (see ?strptime) but you don't seem to have century. So you can use %y (at your own risk, see ?strptime again) or reformat the dates in Excel.
It is also a good idea to use as.is=TRUE with read.csv when reading in these data -- otherwise character vectors are converted to factors which can lead to unexpected results.
And on Wndows it may be easier to use RODBC to read in dates directly from xls or xlsx file.
(edit)
The following may give a hint:
> as.Date("13/04/2014", format= "%d/%m/%Y")
[1] "2014-04-13"
> as.Date(factor("13/04/2014"), format= "%d/%m/%Y")
[1] "2014-04-13"
> as.Date(factor("13/04/14"), format= "%d/%m/%Y")
[1] "14-04-13"
> as.Date(factor("13/04/14"), format= "%d/%m/%y")
[1] "2014-04-13"
(So as.Date can actually take care of factors - the magick happens in as.Date.factor method defined as:
function (x, ...) as.Date(as.character(x), ...)
It is not a good idea to represent dates as factors but in this case it is not a problem either. I think the problem is excel which saves your years as 2-digit numbers in a CSV file, without asking you.)
-
The ?strptime help file says that using %y is platform specific - you can have different results on different machines. So if there's no way of going back to the source and save the csv in a better way you might use something like the following:
x <- c("7/28/05", "7/28/05", "12/16/05", "5/1/06", "4/21/05", "1/25/07")
repairExcelDates <- function(x, yearcol=3, fmt="%m/%d/%Y") {
x <- do.call(rbind, lapply(strsplit(x, "/"), as.numeric))
year <- x[,yearcol]
if(any(year>99)) stop("dont'know what to do")
x[,yearcol] <- ifelse(year <= as.numeric(format(Sys.Date(), "%Y")), year+2000, year + 1900)
# if year <= current year then add 2000, otherwise add 1900
x <- apply(x, 1, paste, collapse="/")
as.Date(x, format=fmt)
}
repairExcelDates(x)
# [1] "2005-07-28" "2005-07-28" "2005-12-16" "2006-05-01" "2005-04-21"
# [6] "2007-01-25"
Your data is formatted as Month/Day/Year so
df$date = as.Date(df$excel.date, format = "%d/%m/%Y")
should be
df$date = as.Date(df$excel.date, format = "%m/%d/%Y")
I am working in R and I need to change from a column in format
9/27/2011 3:33:00 PM
to a value format. In Excel I can use the function value() but I do not know how to do it in R.
My data looks like this:
9/27/2011 15:33 a 1 5 9
9/27/2011 15:33 v 2 6 2
9/27/2011 15:34 c 3 7 1
To convert a string into R date format, use as.POSIXct - then you can coerce it to a numeric value using as.numeric:
> x <- as.POSIXct("9/27/2011 3:33:00 PM", format="%m/%d/%Y %H:%M:%S %p")
> x
[1] "2011-09-27 03:33:00 BST"
> as.numeric(x)
[1] 1317090780
The value you get indicates the number of seconds since an arbitrary date, usually 1/1/1970. Note that this is different from Excel, where a date is stored as the number of days since an arbitrary date (1/1/1900 if my memory serves me well - I try not to use Excel any more.)
For more information, see ?DateTimeClasses
This was useful for me:
> test=as.POSIXlt("09/13/2006", format="%m/%d/%Y")
> test
[1] "2006-09-13"
> 1900+test$year
[1] 2006
> test$yday
[1] 255
> test$yday/365
[1] 0.6986301
> 1900+test$year+test$yday/366
[1] 2006.697
You can use similar approaches if you need day numbers like in Excel.