This question already has answers here:
Split date-time column into Date and time variables
(7 answers)
Closed 3 years ago.
I have a column list of dates in data frame with date format 201001011200 as %Y%m%d%H%M. I wanted to split them as %Y%m%d and %H%M as Date and Time.
I tried to as.Date(data$Date,origin = "1970-01-01") but I got an error message
Error in charToDate(x) : character string is not in a standard
unambiguous format
The class of the date is numeric. So I tried to convert it to characterand applied the above as.Date function but was not helpful.
Any idea? Thank you in advance.
EDIT
Here is a sample of my data:
Index Date rank amount
81211 201004090000 11 4.9
81212 201004090100 11 4.6
81213 201004090200 11 3.3
81214 201004090300 11 2.7
81215 201004090400 11 3.1
81216 201004090500 11 3.7
81217 201004090600 11 4.0
81218 201004090700 11 4.2
81219 201004090800 11 4.2
81220 201004090900 11 4.0
Updated Answer: Beginning with your example data, you can do
data$Date <- as.POSIXct(as.character(data$Date), format = "%Y%m%d%H%M")
to change the column to a POSIX datetime value. Then, to extract the date and time into two separate columns, you can do
data$date <- as.character(as.Date(data$Date))
data$time <- format(data$Date, "%T")
This gives the following updated data frame data
Index Date rank amount date time
1 81211 2010-04-09 00:00:00 11 4.9 2010-04-09 00:00:00
2 81212 2010-04-09 01:00:00 11 4.6 2010-04-09 01:00:00
3 81213 2010-04-09 02:00:00 11 3.3 2010-04-09 02:00:00
4 81214 2010-04-09 03:00:00 11 2.7 2010-04-09 03:00:00
5 81215 2010-04-09 04:00:00 11 3.1 2010-04-09 04:00:00
6 81216 2010-04-09 05:00:00 11 3.7 2010-04-09 05:00:00
7 81217 2010-04-09 06:00:00 11 4.0 2010-04-09 06:00:00
8 81218 2010-04-09 07:00:00 11 4.2 2010-04-09 07:00:00
9 81219 2010-04-09 08:00:00 11 4.2 2010-04-09 08:00:00
10 81220 2010-04-09 09:00:00 11 4.0 2010-04-09 09:00:00
Original Answer: If you are starting with a numeric value, wrap it in as.character() then run it through as.POSIXct() to get a POSIX date-time value.
data$Date <- as.POSIXct(as.character(data$Date), format = "%Y%m%d%H%M")
As an example I will use 201001011200 as you gave.
(x <- as.POSIXct(as.character(201001011200), format = "%Y%m%d%H%M"))
# [1] "2010-01-01 12:00:00 PST"
Then to separate out the date and time you can do the following.
list(as.Date(x), format(x, "%T"))
# [[1]]
# [1] "2010-01-01"
#
# [[2]]
# [1] "12:00:00"
That gives Date and character classed items, respectively. For a plain old character vector, just use format() twice.
c(format(x, "%m-%d-%Y"), format(x, "%T"))
# [1] "01-01-2010" "12:00:00"
or
c(as.character(as.Date(x)), format(x, "%T"))
# [1] "2010-01-01" "12:00:00"
Related
This question already has answers here:
How to change multiple Date formats in same column?
(3 answers)
Converting multiple date formats in a column to a single form
(2 answers)
Closed 1 year ago.
I have station wise Discharge data frame df. The dates (I imported it from an existing .csvfile) format are irregular. Below is an example data frame:
> df
Station Date Discharge
1 A 1981-01-01 0.1
2 A 1981-02-01 0.0
3 B 1981-03-01 0.0
4 B 1981-04-01 0.0
5 B 1/13/1981 0.4
6 C 1/14/1981 0.2
7 D 1/15/1981 0.6
8 D 1981-16-01 0.1
9 D 1981-17-01 0.5
Because of this further processing of this data is difficult. I tried the following:
> df$Date <- as.Date(df$Date, "%m/%d/%Y")
> df
Station Date Discharge
1 A 1981-01-01 0.1
2 A 1981-02-01 0.0
3 B 1981-03-01 0.0
4 B 1981-04-01 0.0
5 B NA 0.4
6 C NA 0.2
7 D NA 0.6
8 D 1981-16-01 0.1
9 D 1981-17-01 0.5
NA's are being introduced. How to make the format of all the dates same. It would be nice to have date as d-m-y format. Any guidance is appreciated. Thanks.
You can first use lubridate::parse_date_time to get data in standard format. Multiple formats can be passed in the function.
lubridate::parse_date_time(df$Date, c('Ydm', 'mdY'))
#[1] "1981-01-01 UTC" "1981-01-02 UTC" "1981-01-03 UTC" "1981-01-04 UTC" "1981-01-13 UTC"
#[6] "1981-01-14 UTC" "1981-01-15 UTC" "1981-01-16 UTC" "1981-01-17 UTC"
Then use format to get data in any format you wish.
format(lubridate::parse_date_time(df$Date, c('Ydm', 'mdY')), '%d-%m-%Y')
#[1] "01-01-1981" "02-01-1981" "03-01-1981" "04-01-1981" "13-01-1981" "14-01-1981"
#[7] "15-01-1981" "16-01-1981" "17-01-1981"
Note that the output from format is of class character and not date. Dates can have only one format in R which is Ymd.
as.Date(lubridate::parse_date_time(df$Date, c('Ydm', 'mdY')))
#[1] "1981-01-01" "1981-01-02" "1981-01-03" "1981-01-04" "1981-01-13" "1981-01-14"
#[7] "1981-01-15" "1981-01-16" "1981-01-17"
I'm having trouble using a function POSIXct.
When I apply the function in my dataset, the year appears with two zeros ahead.
like this:
datu1$timestamp <- as.POSIXct(datu1$date.sec, origin = "1970-01-01", tz="GMT")
datu1$timestamp <- as.POSIXct(datu1$timestamp,
format = "%Y-%m-%d %H:%M:%S", tz = 'GMT')
head(datu1)
ID date.sec lon lat lon.025 lat.025 lon.5 lat.5 lon.975 lat.975
1 102211.10 -61827840000 -38.6616 -13.59272 -40.5025 -15.25025 -38.7 -13.76 -36.9000 -10.88950
2 102211.10 -61827818400 -38.6647 -13.60312 -40.4000 -15.17025 -38.7 -13.77 -37.0975 -11.03975
3 102211.10 -61827796800 -38.6723 -13.64505 -40.3000 -15.10000 -38.7 -13.79 -37.0000 -11.29950
4 102211.10 -61827775200 -38.6837 -13.68972 -40.2000 -14.98025 -38.7 -13.83 -37.2000 -11.45975
5 102211.10 -61827753600 -38.7030 -13.73054 -40.2000 -14.98100 -38.7 -13.84 -37.3000 -11.62925
6 102211.10 -61827732000 -38.7221 -13.77846 -40.0000 -15.04050 -38.7 -13.88 -37.5000 -11.69950
bmode bmode.5 timestamp
1 1.556 2 0010-10-03 00:00:00
2 1.565 2 0010-10-03 06:00:00
3 1.571 2 0010-10-03 12:00:00
4 1.571 2 0010-10-03 18:00:00
5 1.589 2 0010-10-04 00:00:00
6 1.599 2 0010-10-04 06:00:00
How can I fix this to get the full year (like: 2010) instead of two zeros?
Perhaps your data was encoded with a weird origin (e.g. excel uses "1899-12-30"). Just adapt the origin= 'till the date matches what you require.
as.POSIXct(-61827840000, origin="1970-01-01", tz="GMT")
# [1] "0010-10-03 GMT"
as.POSIXct(-61827840000, origin="3970-01-01", tz="GMT")
# [1] "2010-10-03 GMT"
New R user here - I have many .csv files containing time stamps (date_time) in one column and temperature readings in the other. I am trying to write a function that detects the date_time format, and then changes it to a different format. Because of the way the data was collected, the date/time format is different in some of the .csv files. I want the function to change the date_time for all files to the same format.
Date_time format I want: %m/%d/%y %H:%M:%S
Date_time format I want changed to above: "%y-%m-%d %H:%M:%S"
> head(file1data)
x date_time temperature coupler_d coupler_a host_con stopped EoF
1 1 18-07-10 09:00:00 41.137 Logged
2 2 18-07-10 09:15:00 41.322
3 3 18-07-10 09:30:00 41.554
4 4 18-07-10 09:45:00 41.832
5 5 18-07-10 10:00:00 42.156
6 6 18-07-10 10:15:00 42.755
> head(file2data)
x date_time temperature coupler_d coupler_a host_con stopped EoF
1 1 07/10/18 01:00:00 PM 8.070 Logged
2 2 07/10/18 01:15:00 PM 8.095
3 3 07/10/18 01:30:00 PM 8.120
4 4 07/10/18 01:45:00 PM 8.120
5 5 07/10/18 02:00:00 PM 8.020
6 6 07/10/18 02:15:00 PM 7.795
file2data is in the correct format. file1data is incorrect.
I have tried using logicals to detect and replace the date format e.g.,
file1data %>%
if(str_match_all(date_time,"([0-9][0-9]{2})[-.])")){
format(as.POSIXct(date_time,format="%y-%m-%d %H:%M:%S"),"%m/%d/%y %H:%M:%S")
}else{format(date_time,"%m/%d/%y %H:%M:%S")}
but this has not worked, I get the following errors:
Error in if (.) str_match_all(date_time, "([0-9][0-9]{2})[-.])") else { :
argument is not interpretable as logical
In addition: Warning message:
In if (.) str_match_all(date_time, "([0-9][0-9]{2})[-.])") else { :
the condition has length > 1 and only the first element will be used
Any ideas?
I have a dataframe df with a certain number of columns. One of them, ts, is timestamps:
1462147403122 1462147412990 1462147388224 1462147415651 1462147397069 1462147392497
...
1463529545634 1463529558639 1463529556798 1463529558788 1463529564627 1463529557370.
I have also at my disposal the corresponding datetime in the datetime column:
"2016-05-02 02:03:23 CEST" "2016-05-02 02:03:32 CEST" "2016-05-02 02:03:08 CEST" "2016-05-02 02:03:35 CEST" "2016-05-02 02:03:17 CEST" "2016-05-02 02:03:12 CEST"
...
"2016-05-18 01:59:05 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:16 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:24 CEST" "2016-05-18 01:59:17 CEST"
As you can see my dataframe contains data accross several day. Let's say there are 3. I would like to add a column containing number 1, 2 or 3. 1 if the line belongs to the first day, 2 for the second day, etc...
Thank you very much in advance,
Clement
One way to do this is to keep track of total days elapsed each time the date changes, as demonstrated below.
# Fake data
dat = data.frame(datetime = c(seq(as.POSIXct("2016-05-02 01:03:11"),
as.POSIXct("2016-05-05 01:03:11"), length.out=6),
seq(as.POSIXct("2016-05-09 01:09:11"),
as.POSIXct("2016-05-16 02:03:11"), length.out=4)))
tz(dat$datetime) = "UTC"
Note, if your datetime column is not already in a datetime format, convert it to one using as.POSIXct.
Now, create a new column with the day number, counting the first day in the sequence as day 1.
dat$day = c(1, cumsum(as.numeric(diff(as.Date(dat$datetime, tz="UTC")))) + 1)
dat
datetime day
1 2016-05-02 01:03:11 1
2 2016-05-02 15:27:11 1
3 2016-05-03 05:51:11 2
4 2016-05-03 20:15:11 2
5 2016-05-04 10:39:11 3
6 2016-05-05 01:03:11 4
7 2016-05-09 01:09:11 8
8 2016-05-11 09:27:11 10
9 2016-05-13 17:45:11 12
10 2016-05-16 02:03:11 15
I specified the timezone in the code above to avoid getting tripped up by potential silent shifts between my local timezone and UTC. For example, note the silent shift from my default local time zone ("America/Los_Angeles") to UTC when converting a POSIXct datetime to a date:
# Fake data
datetime = seq(as.POSIXct("2016-05-02 01:03:11"), as.POSIXct("2016-05-05 01:03:11"), length.out=6)
tz(datetime)
[1] ""
date = as.Date(datetime)
tz(date)
[1] "UTC"
data.frame(datetime, date)
datetime date
1 2016-05-02 01:03:11 2016-05-02
2 2016-05-02 15:27:11 2016-05-02
3 2016-05-03 05:51:11 2016-05-03
4 2016-05-03 20:15:11 2016-05-04 # Note day is different due to timezone shift
5 2016-05-04 10:39:11 2016-05-04
6 2016-05-05 01:03:11 2016-05-05
I have data with dates in MM/DD/YY HH:MM format and others in plain old MM/DD/YY format. I want to parse all of them into the same format as "2010-12-01 12:12 EST." How should I go about doing that? I tried the following ifelse statement and it gave me a bunch of long integers and told me a large number of my data points failed to parse:
df_prime$date <- ifelse(!is.na(mdy_hm(df$date)), mdy_hm(df$date), mdy(df$date))
df_prime is a duplicate of the data frame df that I initially loaded in
IEN date admission_number KEY_PTF_45 admission_from discharge_to
1 12 3/3/07 18:05 1 252186 OTHER DIRECT
2 12 3/9/07 12:10 1 252186 RETURN TO COMMUNITY- INDEPENDENT
3 12 3/10/07 15:08 2 252382 OUTPATIENT TREATMENT
4 12 3/14/07 10:26 2 252382 RETURN TO COMMUNITY-INDEPENDENT
5 12 4/24/07 19:45 3 254343 OTHER DIRECT
6 12 4/28/07 11:45 3 254343 RETURN TO COMMUNITY-INDEPENDENT
...
1046334 23613488506 2/25/14 NA NA
1046335 23613488506 2/25/14 11:27 NA NA
1046336 23613488506 2/28/14 NA NA
1046337 23613488506 3/4/14 NA NA
1046338 23613488506 3/10/14 11:30 NA NA
1046339 23613488506 3/10/14 12:32 NA NA
Sorry if some of the formatting isn't right, but the date column is the most important one.
EDIT: Below is some code for a portion of my data frame via a dput command:
structure(list(IEN = c(23613488506, 23613488506, 23613488506, 23613488506, 23613488506, 23613488506), date = c("2/25/14", "2/25/14 11:27", "2/28/14", "3/4/14", "3/10/14 11:30", "3/10/14 12:32")), .Names = c("IEN", "date"), row.names = 1046334:1046339, class = "data.frame")
Have you tried the function guess_formats() in the lubridate package?
A reproducible example to build a dataframe like yours could be helpful!
The lubridate package's mdy_hm has a truncated parameter that lets you supply dates that might not have all the bits. For your example:
> mdy_hm(d$date,truncated=2)
[1] "2014-02-25 00:00:00 UTC" "2014-02-25 11:27:00 UTC"
[3] "2014-02-28 00:00:00 UTC" "2014-03-04 00:00:00 UTC"
[5] "2014-03-10 11:30:00 UTC" "2014-03-10 12:32:00 UTC"