I have a column of time values, except that they are in character format and do not have the colons to separate H, M, S. The column looks similar to the following:
Time
024201
054722
213024
205022
205024
125440
I want to convert all the values in the column to look like actual time values in the format H:M:S. The values are already in HMS format, so it is simply a matter of inserting colons, but that is proving more difficult than I thought. I found a package that adds commas every three digits from the right to make Strings look like currency values, but nothing for time (without also adding a date value, which I do not want to do). Any help would be appreciated.
Since the data is time related, you should consider storing it in a POSIX format:
> df <- data.frame(Time=c("024201", "054722", "213024", "205022", "205024", "125440")
> df$Time <- as.POSIXct(df$Time, format="%H%M%S")
> df
Time
1 2014-01-05 02:42:01
2 2014-01-05 05:47:22
3 2014-01-05 21:30:24
4 2014-01-05 20:50:22
5 2014-01-05 20:50:24
6 2014-01-05 12:54:40
To output just the times:
> format(df, "%H:%M:%S")
Time
1 02:42:01
2 05:47:22
3 21:30:24
4 20:50:22
5 20:50:24
6 12:54:40
A regular expression with lookaround works for this:
gsub('(..)(?=.)', '\\1:', x$Time, perl=TRUE)
The (?=.) means a character (matched by .) must follow, but is not considered part of the match (and is not captured).
Here is a regex solution:
x <- readLines(n=6)
024201
054722
213024
205022
205024
125440
gsub("(\\d\\d)(\\d\\d)(\\d\\d)", "\\1:\\2:\\3", x)
## [1] "02:42:01" "05:47:22" "21:30:24"
## [4] "20:50:22" "20:50:24" "12:54:40 "
Here the (\\d\\d) says we're looking for 2 digits. The parenthesis breaks the string into 3 parts. Then the \\1: says take chunk 1 and place a colon after it.
Or via date/times classes:
time <- c("024201", "054722", "213024", "205022", "205024", "125440")
time <- as.POSIXct(paste0("1970-01-01", time), format="%Y-%d-%m %H%M%S")
(time <- format(time, "%H:%M:%S"))
# [1] "02:42:01" "05:47:22" "21:30:24" "20:50:22" "20:50:24" "12:54:40"
This gives a chron "times" class vector:
> library(chron)
> times(gsub("(..)(..)(..)", "\\1:\\2:\\3", DF$Time))
[1] 02:42:01 05:47:22 21:30:24 20:50:22 20:50:24 12:54:40
The "times" class can display times without having to display the date and supports various methods on the times.
On the other hand, if only a character string is wanted then only the gsub part is needed.
Related
I would like to change all the mixed date format into one format for example d-m-y
here is the data frame
x <- data.frame("Name" = c("A","B","C","D","E"), "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
I hv tried using this code down here, but it gives NAs
newdateformat <- as.Date(x$Birthdate,
format = "%m%d%y", origin = "2020-6-25")
newdateformat
Then I tried using parse, but it also gives NAs which means it failed to parse
require(lubridate)
parse_date_time(my_data$Birthdate, orders = c("ymd", "mdy"))
[1] NA NA "2001-09-12 UTC" NA
[5] "2005-02-18 UTC"
and I also could find what is the format for the first date in the data frame which is "36085.0"
i did found this code but still couldn't understand what the number means and what is the "origin" means
dates <- c(30829, 38540)
betterDates <- as.Date(dates,
origin = "1899-12-30")
p/s : I'm quite new to R, so i appreciate if you can use an easier explanation thank youuuuu
You should parse each format separately. For each format, select the relevant rows with a regular expression and transform only those rows, then move on the the next format. I'll give the answer with data.table instead of data.frame because I've forgotten how to use data.frame.
library(lubridate)
library(data.table)
x = data.table("Name" = c("A","B","C","D","E"),
"Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
# or use setDT(x) to convert an existing data.frame to a data.table
# handle dates like "2001-sep-12" and "2020-6-25"
# this regex matches strings beginning with four numbers and then a dash
x[grepl('^[0-9]{4}-',Birthdate),Birthdate1:=ymd(Birthdate)]
# handle dates like "36085.0": days since 1904 (or 1900)
# see https://learn.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system
# this regex matches strings that only have numeric characters and .
x[grepl('^[0-9\\.]+$',Birthdate),Birthdate1:=as.Date(as.numeric(Birthdate),origin='1904-01-01')]
# assume the rest are like "Feb-18-2005" and "05/27/84" and handle those
x[is.na(Birthdate1),Birthdate1:=mdy(Birthdate)]
# result
> x
Name Birthdate Birthdate1
1: A 36085.0 2002-10-18
2: B 2001-sep-12 2001-09-12
3: C Feb-18-2005 2005-02-18
4: D 05/27/84 1984-05-27
5: E 2020-6-25 2020-06-25
How could I modify raw date values. For example.
> DF2
Date
1 11012018
2 7312014
3 6102015
4 10202017
Into modified date values the one with "/"
> DF2
Date
1 11/01/2018
2 7/31/2014
3 6/10/2015
4 10/20/2017
Use lubridate for all date and time related tasks
> lubridate::mdy(c("11012018", "7/31/2014"))
[1] "2018-11-01" "2014-07-31"
You can also format it if needed:
format(lubridate::mdy(c("11012018", "7/31/2014")), "%m/%d/%Y")
[1] "11/01/2018" "07/31/2014"
Assuming: your date is in month-date-year format. Else you can use other lubridate functions
We could also use(It is assumed that you just need to add a new separator. In any case, you could convert back to date-time type):
new<-gsub("([0-9]{,2})([0-9]{2})([0-9]{4})","\\1 \\2 \\3",df$Date)
gsub(" ","/",new)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
Edit:
More generally as suggested by #jay.sf ,
test4<-gsub("(^[0-1]?\\d)([0-3]?\\d)(\\d{4}$)","\\1 \\2 \\3",df$Date)
gsub(" ","/",test4)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
This is to account for such date formats as:
test3<-c("11012018", "1112015", "7312014", "7312014", "10202017", "772007", "772007",
"7072007")
One possible solution could be:
df <- transform(df, V1 = as.Date(as.character(V1), "%d%m%Y"))
And another which may convert in the required mm/dd/yyyy format is as below:
df <- data.frame(lapply(df, function(x) as.Date(as.character(x), "%m%d%Y")))
Both the solutions are through the base R package.
I have a matrix date that looks like this:
Date Time
1 2017-05-19 08:52:21
2
3 2017-05-20 22:29:29
4 2017-05-20 15:21:35
Both date$Date and date$Time are integers.
I would like to obtain a new column like this:
Date Time
1 20170519 085221
2 NA NA
3 20170520 222929
4 20170520 152135
I've tried with as.character, as.numeric, as.Date... But can't find the solution /=
Sorry if the question was already answer in another post, but I wasn't able to find it!
You need format...
format(as.POSIXct("2017-05-19"),"%Y%m%d")
[1] "20170519"
format(as.POSIXct("08:52:21",format="%H:%M:%S"),"%H%M%S")
[1] "085221"
See ?strptime for the formatting codes.
Since you apparently don't necessarily want date or time class objects (do you?), and since you don't further specify what exactly you need this for, there seems no need to work with date or time functions.
You could try this:
Step 1: First, if you want empty cells to contain NA, fill those in per column
df$Date[df$Date == ""] <- NA
df$Time[df$Time == ""] <- NA
Step 2: And then simply replace the "-" and ":" in the Date and Time values, respectively, to get the wanted strings
df$Date <- gsub(pattern = "-", x = df$Date, replacement = "")
df$Time <- gsub(pattern = ":", x = df$Time, replacement = "")
Date Time
1 20170519 85221
2 <NA> <NA>
3 20170520 222929
4 20170520 152135
The output might not yield integer classes (my starting df resembling your df did not contain integers, so can't double check; result here were character classes), so if you really want integer classes, simply apply as.integer().
As you see the output is the same as your expected output, except for the leading "0" of the row 1 Time value. If need be, there's a work around to get that in there, although I'm not sure what that would add. And after applying as.integer it would most likely disappear anyway.
I am experiment with R and came across an issue I don't fully understand.
dates = c("03-19-76", "04/19/76", as.character("04\19\76"), "05.19.76", "060766")
dates
[1] "03-19-76" "04/19/76" "04\0019>" "05.19.76" "060766"
Why should the third date be interpreted and what sort of interpretation is taking place. I also got this output when I left out the as.character function.
Thanks
Echoing the comments, make sure to escape backslashes in strings.
dates = c("03-19-76", "04/19/76", "04\\19\\76", "05.19.76", "060766")
> dates
[1] "03-19-76" "04/19/76" "04\\19\\76" "05.19.76" "060766"
Now that you've got the dates stored, there's actually a lot of built in functions you can use with dates. Dates even have their own object types! To do so use as.Date. Since you're using nonstandard date formats, you have to tell R how you've formatted them.
> as.Date(dates[1], "%m-%d-%y")
[1] "1976-03-19"
> as.Date(dates[2], "%m/%d/%y")
[1] "1976-04-19"
> as.Date("20\\10\\1999", "%d\\%m\\%Y")
[1] "1999-10-20"
a <- as.Date(dates[1], "%m-%d-%y")
b <- as.Date(dates[2], "%m/%d/%y")
> b - a
Time difference of 31 days
d <- as.numeric(b-a)
> d
[1] 31
> a + d^2
[1] "1978-11-05"
Note that since you're using 2-digit years, you use %y. If you used 4-digit years, you'd use %Y. If you forget, you'll get oddities like this:
> as.Date("03/14/2001", "%m/%d/%y")
[1] "2020-03-14"
> as.Date("03/14/10", "%m/%d/%Y")
[1] "0010-03-14"
I have a dataframe in R, and a column called created_at which holds a text which I want to parse into a datetime. Here is a snappy preview:
head(pushes)
created_at repo.url repository.url
1 2013-06-17T00:14:04Z https://github.com/Mindful/blog
2 2013-07-31T21:08:15Z https://github.com/leapmotion/js.leapmotion.com
3 2012-11-04T07:08:15Z https://github.com/jplusui/jplusui
4 2012-06-21T08:16:22Z https://github.com/LStuker/puppet-rbenv
5 2013-03-10T09:15:51Z https://github.com/Fchaubard/CS108FinalProject
6 2013-10-04T11:34:11Z https://github.com/cmmurray/soccer
actor.login payload.actor actor_attributes.login
1 Mindful
2 joshbuddy
3 xuld
4 LStuker
5 ststanko
6 cmmurray
I wrote an instructions which works ok with some test data:
xts::.parseISO8601("2012-06-17T00:14:04",tz="UTC")$first.time returns proper Posix date
But when I apply it to a column with this instruction:
pushes$created_at <- xts::.parseISO8601(substr(pushes$created_at,1,nchar(pushes$created_at)-1),tz="UTC")$first.time
every row in a dataframe gets a duplicated date 2012-06-17 00:14:04 UTC
Like the function runned only once for the first row and then result was duplicated in rest of the rows :( Can you please help me to apply it properly row per row in a created_at column ?
Thanks.
The first argument to .parseISO8601 is supposed to be a character string, not a vector. You need to use sapply (or equivalent) to loop over your vector.
created_at <-
c("2013-06-17T00:14:04Z", "2013-07-31T21:08:15Z", "2012-11-04T07:08:15Z",
"2012-06-21T08:16:22Z", "2013-03-10T09:15:51Z", "2013-10-04T11:34:11Z")
# Only parses first element
.parseISO8601(substr(created_at,1,nchar(created_at)-1),tz="UTC")$first.time
# [1] "2013-06-17 00:14:04 UTC"
firstParseISO8601 <- function(x) .parseISO8601(x,tz="UTC")$first.time
# parse all elements
datetimes <- sapply(sub("Z$","",created_at), firstParseISO8601, USE.NAMES=FALSE)
# note that "simplifying" the output strips the POSIXct class, so we re-add it
datetimes <- .POSIXct(datetimes, tz="UTC")