Extract time from factor column in R - r

I would like to extract the time from a table column sd_data$start in R with the following characteristics:
str(sd_data$start)
Factor w/ 122 levels "01/03/2017 08:00",..: 1 2 5 10 12 14 18 19 20 21 ...
I found similar questions on the forum but so far all the answers have only given me NAs or blank values (00:00:00) so I see no other option than raise the question again specifically for my dataset.
I have managed to extract the dates and move them to a new column in the table with little effort and I am very surprised how difficult it is (for me at least) to do the same for hours, minutes and seconds. I must be overlooking something.
sd_data$start_date <- as.Date(sd_data$start,format='%d/%m/%Y')
sd_data$start_time <-
Thanks in advance for helping me to find the right lines of code to complete this task.
Here an example of what I am trying to do and where I am failing to get the time out.
smpldata <- "01/03/2017 08:00"
smpltime <-as.Date(as.character(smpldata),format='%d/%m/%Y %M:%S')
smpltime
# [1] 08:00 = what I would like to see
# [1] "2017-03-01" = what I am seeing

Maybe using as.character() to convert to character before convert to date, because the factor type is not well transformed. And including the other string elements on the date format as suggested above by Sotos.
sd_data$start_date <-
as.Date(as.character(sd_data$start),
format='%d/%m/%Y %H:%M:%S')
Another tip is to take a look at lubridate package. It's very usefull for this kind of task.
library(lubridate)
smpldata <- as.factor("01/03/2017 08:00")
(smpltime <-dmy_hm(as.character(smpldata)))
[1] "2017-03-01 08:00:00 UTC"
Here you still see the date. You can handle just the time for plots and other needs using hour() and minute().
hour(smpltime)
[1] 8
minute(smpltime)
[1] 0
Or you can use the format() function to get exactly what you want.
format(smpltime, "%H:%M:%S")
[1] "08:00:00"
format(smpltime, "%H:%M")
[1] "08:00"

Related

Recode "date & time variable" into two separate variables

I'm a PhD student (not that experienced in R), and I'm trying to recode a string variable, called RecordedDate into two separate variables: a Date variable and a Time variable. I am using RStudio.
An example of values are:
8/6/2018 18:56
7/26/2018 10:43
7/28/2018 8:36
I would like to you the first part of the value (example: 08/6/2018) to reformat this into a date variable, and the second part of the value (example: 18:56) into a time variable.
I'm thinking the first step would be to create code that can break this up into two variables, based on some rule. I’m thinking maybe I can separate separate everything before the "space" into the Date variable, and after the "space" in the Time variable. I am not able to figure this out.
Then, I'm looking for code that would change the Date from a "string" variable to a "date" type variable. I’m not sure if this is correct, but I’m thinking something like:
better_date <- as.Date(Date, "%m/%d/%Y")
Finally, then I would like to change theTime variable to a "time" type format (if this exists). Not sure how to do this part either, but something that indicates hours and minutes. This part is less important than getting the date variable.
Two immediate ways:
strsplit() on the white space
The proper ways: parse, and then format back out.
Only 2. will guarantee you do not end up with hour 27 or minute 83 ...
Examples:
R> data <- c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 8:36")
R> strsplit(data, " ")
[[1]]
[1] "8/6/2018" "18:56"
[[2]]
[1] "7/26/2018" "10:43"
[[3]]
[1] "7/28/2018" "8:36"
R>
And:
R> data <- c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 8:36")
R> df <- data.frame(data)
R> df$pt <- anytime::anytime(df$data) ## anytime package used
R> df$time <- format(df$pt, "%H:%M")
R> df$day <- format(df$pt, "%Y-%m-%d")
R> df
data pt time day
1 8/6/2018 18:56 2018-08-06 18:56:00 18:56 2018-08-06
2 7/26/2018 10:43 2018-07-26 10:43:00 10:43 2018-07-26
3 7/28/2018 8:36 2018-07-28 00:00:00 00:00 2018-07-28
R>
I often collect data in a data.frame (or data.table) and then add column by column.

How to convert a date with only a year to a date with the format "Year-Month-Day" in R

Sorry for the question, I started using RStudio a month ago and I get confronted to things I've never learned. I checked all the websites, helps and forums possible the past two days and this is getting me crazy.
I got a variable called Release giving the date of the release of a song. Some dates are following the format %Y-%m-%d whereas some others only give me a Year.
I'd like them to be all the same but I'm struggling to only modify the observations with the year.
Brief summary in word:
11/11/2011
01/06/2011
1974
1970
16/09/2003
I've imported the data with :
music<-read.csv("music2.csv", header=TRUE, sep = ",", encoding = "UTF-8",stringsAsFactors = F)
And this how I have it in RStudio
"2011-11-11" "2011-06-01" "1974" "1970" "2003-09-16"
This is an example as I got 2200 obs.
The working code is
Modifdates<- ifelse(nchar(music$Release)==4,paste0("01-01-",music$Release),music$Release)
Modifdates
I obtain this :
"2011-11-11" "2011-06-01" "01-01-1974" "01-01-1970" "2003-09-16"
I just would like them to be all with the same format "%Y-%m-%d". How can I do that?
So I tried this
as.Date(music$Release,format="%Y-%m-%d")
But I got NA's where I modified my dates.
Could anyone help?
Update
Using sub find occurrences of date consisting from single year ("(^[0-9]{4}$)" part), using back-reference substitute it to add -01-01 at the end of the string ("\\1-01-01" part), and finally convert it to the date class, using as.Date() (as.Date() default is format = "%Y-%m-%d" so you don't need to specify it):
dat <- c("2011-11-11", "2011-06-01", "1974", "1970", "2003-09-16")
dat class is character:
as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))
# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
dat class is factor, but sub automatically coerce it to the character class for you:
# dat <- as.factor(dat); dat
# 2011-11-11 2011-06-01 1974 1970 2003-09-16
# Levels: 1970 1974 2003-09-16 2011-06-01 2011-11-11
as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))
# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
Welcome to SO, please try to provide a reproducible example next time so that we can best help you.
I think here you could use:
testdates <- c("1974", "12-12-2012")
betterdates <- ifelse(nchar(testdates)==4,paste0("01-01-",testdates),testdates)
> betterdates
[1] "01-01-1974" "12-12-2012"
EDIT: if your vector is factor you should use as.character.factor first. If you then want to convert back to factor you can use as.factor
EDIT2 : do not convert as.date before doing this. Only do it after this modification

Formatting Dates with non-standard format

I'm relatively new to this site so forgive me if my question is a bit vague for you guys. I also realize there are many threads on this topic, yet I feel they do not answer my question specifically since they are almost all about changing yy/mm/dd to dd/mm/yy or vice versa.
In short what do i want? I want my current format changed into only year.
I have a column full of dates of this format.
31OCT2016:23:52:00.000
I've seen in many topics you can use format commands but they go something like this;
dates <- c("05/27/84", "07/07/05")
I have over 100.000 observations so this can't be done manually.
So I tried;
mydata$dates <- format(as.Date(mydata$dates), "%Y")
But that didn't work. I saw on this website the proper values
http://www.statmethods.net/input/dates.html
But it did not say anything on how to get rid of hours minutes and seconds.
So what is the easiest way to strip it all down to year only?
Lubridate is your friend. To be precise, the function dmy_hms:
I'll generate some sample data which has the same format as your example so my code is reproducible. Don't worry about it too much. For your purposes, you can jump right to the conversion part.
#------------------------------------------------------------------------------------------
#This code block is entirely for generating reproducible sample data
d <- sample(1:27,10,T)
mon <- toupper(sample(month.abb,10,T))
y <- sample(2000:2017,10,T)
h <- sample(0:23,10,T)
min <- sample(0:59,10,T)
s <- sample(0:59,10,T)
#load package
library(lubridate)
dts <- sprintf('%02d%s%s:%s:%s:%s.000',d,mon,y,h,min,s)
> dts
[1] "01JAN2012:12:6:53.000" "01NOV2010:0:19:47.000" "03SEP2000:9:45:3.000" "25NOV2009:21:39:57.000" "08DEC2015:19:27:36.000"
[6] "23MAR2009:13:39:40.000" "03JUN2010:14:54:50.000" "03APR2002:6:34:45.000" "19NOV2012:5:17:29.000" "02FEB2003:0:3:59.000"
#------------------------------------------------------------------------------------------
So basically the variable dts is your column full of dates which you want to convert:
#conversion
> dmy_hms(dts)
[1] "2012-01-01 12:06:53 UTC" "2010-11-01 00:19:47 UTC" "2000-09-03 09:45:03 UTC" "2009-11-25 21:39:57 UTC"
[5] "2015-12-08 19:27:36 UTC" "2009-03-23 13:39:40 UTC" "2010-06-03 14:54:50 UTC" "2002-04-03 06:34:45 UTC"
[9] "2012-11-19 05:17:29 UTC" "2003-02-02 00:03:59 UTC"
And then to get just the years, you can use the year function:
> year(dmy_hms(dts))
[1] 2012 2010 2000 2009 2015 2009 2010 2002 2012 2003
So assuming you want to do everything inside the data.frame, your code could look like this:
# example dataframe
dframe <- data.frame(variable=c('A','B','C'),dates=sample(dts,3))
This is a data frame with some variable and the column with the dates.
> dframe
variable dates
1 A 15JAN2000:0:37:6.000
2 B 13DEC2016:8:34:28.000
3 C 18AUG2005:2:27:16.000
So to convert the dates, we can simply do dframe$dates <- year(dmy_hms(dframe$dates))
If we look at dframe again, we can see that the conversion was successful:
> dframe
variable dates
1 A 2000
2 B 2016
3 C 2005

How to convert times over 24:00:00 in R

In R I have this data.frame
24:43:30 23:16:02 14:05:44 11:44:30 ...
Note that some of the times are over 24:00:00 ! In fact all my times are within 02:00:00 to 25:59:59.
I want to subtract all entries in my dataset data with 2 hours. This way I get a regular data-set. How can I do this?
I tried this
strptime(data, format="%H:%M:%S") - 2*60*60
and this work for all entries below 23:59:59. For all entries above I simply get NA since the strptime command produce NA to all entries above 23:59:59.
Using lubridate package can make the job easier!
> library(lubridate)
> t <- '24:43:30'
> hms(t) - hms('2:0:0')
[1] "22H 43M 30S"
Update:
Converting the date back to text!
> substr(strptime(hms(t) - hms('2:0:0'),format='%HH %MM %SS'),12,20)
[1] "22:43:30"
Adding #RHertel's update:
format(strptime(hms(t) - hms('2:0:0'),format='%HH %MM %SS'),format='%H:%M:%S')
Better way of formating the lubridate object:
s <- hms('02:23:58) - hms('2:0:0')
paste(hour(s),minute(s),second(s),sep=":")
"0:23:58"
Although the answer by #amrrs solves the main problem, the formatting could remain an issue because hms() does not provide a uniform output. This is best shown with an example:
library(lubridate)
hms("01:23:45")
#[1] "1H 23M 45S"
hms("00:23:45")
#[1] "23M 45S"
hms("00:00:45")
#[1] "45S"
Depending on the time passed to hms() the output may or may not contain an entry for the hours and for the minutes. Moreover leading zeros are omitted in single-digit values of hours, minutes and seconds. This can result pretty much in a formatting nightmare if one tries to put that data into a common form.
To resolve this difficulty one could first convert the time into a duration with lubridate's as.duration() function. Then, the duration in seconds can be transformed into a POSIXct object from which the hours, minutes, and seconds can be extracted easily with format():
times <- c("24:43:30", "23:16:02", "14:05:44", "11:44:30", "02:00:12")
shifted_times <- hms(times) - hms("02:00:00")
format(.POSIXct(as.duration(shifted_times),tz="GMT"), "%H:%M:%S")
#[1] "22:43:30" "21:16:02" "12:05:44" "09:44:30" "00:00:12"
The last entry "02:00:12" would have caused difficulties if shifted_times had been passed to strptime().

difftime for multiple dates in r

I have chemistry water data taken from a river. Normally, the sample dates were on a Wednesday every two weeks. The data record starts in 1987 and ends in 2013.
Now, I want to re-check if there are any inconsistencies within the data, that is if the samples are really taken every 14 days. For that task I want to use the r function difftime. But I have no idea on how to do that for multiple dates.
Here is some data:
Date Value
1987-04-16 12:00:00 1,5
1987-04-30 12:00:00 1,2
1987-06-25 12:00:00 1,7
1987-07-14 12:00:00 1,3
Can you tell me on how to use the function difftime properly in that case or any other function that does the job. The result should be the number of days between the samplings and/or a true and false for the 14 days.
Thanks to you guys in advance. Any google-fu was to no avail!
Assuming your data.frame is named dd, you'll want to verify that the Date column is being treated as a date. Most times R will read them as a character which gets converted to a factor in a data.frame. If class(df$Date) is "character" or "factor", run
dd$Date<-as.POSIXct(as.character(dd$Date), format="%Y-%m-%d %H:%M:%S")
Then you can so a simple diff() to get the time difference in days
diff(dd$Date)
# Time differences in days
# [1] 14 56 19
# attr(,"tzone")
# [1] ""
so you can check which ones are over 14 days.

Resources