How to convert character into time duration? - r

I want to calculate time spent on different types of activities collected by Excel spreadsheet.
After reading the file all values of time come as character type and I'm unable to transform into HH:MM:SS.
Dataframe example:
df <- data.frame(id=c(1,2,3,4,5,6),
name=c('Sean','Bob','Dylan',"Barbara","Louis","Marine"),
Swimming=c("00:00:00","00:30:22","00:42:22",
"00:50:53","00:20:11","00:30:12"),
Skating=c("00:10:23","00:10:22","00:02:22",
"00:20:53","00:30:11","00:10:12"))
I need to transform this CHR values of Swimming and Skating column into a time duration to manipulate them. I want to know for example, how many hours all of them spend doing swimming activities.
I tried:
Lubridate package (parse_date_time) function:
parse_date_time(df[3:4],"HMS")
Gives me this warning:
Warning message:
All formats failed to parse. No formats found.
How can I transform this data in a way I can manipulate?

I've just successfully tested #thelatemail suggestion. It worked perfectly. Then I just converted to hours.
Just will duplicate your #thelatemail response here for those who feel lost and neglect comments:
as.duration(hms(df$Swimming)) I think is preferable. sum(hms(df$Swimming)) gives a really odd result while sum(as.duration(hms(df$Swimming))) gives a more expected result.

Related

Time/Date conversion from numeric

I have recently come across a time series dataset (in R) that had a numeric time index in the following format:
1.586183e+12 1.586184e+12 1.586185e+12 1.586186e+12 1.586187e+12 1.586188e+12
The data should be in 15 minute intervals. I have tried some of a usual conversions, such as as.POSIXct(), but that doesn't seem to work. I was hoping that someone could point me to the right format conversion.
Many thanks

R problem Date column stored as Factor R can't convert it

I have downloaded the SP500 data from Yahoo Finance ticker GSPC and am trying to filter it by year, however the Date column is stored as Factor so R can't filter it. Can anyone help me convert it? I tried multiple solutions, but nothing worked.
So far I've used the loaded the lubridate package and used the following code, but all the values just got replaced with NA's.
as.Date(SP500$Date, format = "%m-%d-%Y")
Then I used the: SP500$Date <- ymd(SP500$Date, format = "%Y-%m-%d") code and again nothing happened. (SP500 is the name of the data frame that I stored the data in)
Also, tried using just SP500$Date <- as.Date(SP500$Date) but R says do not know how to convert it to Date.
Any help would be much appreciated! Thank you!
Classes only exist in the environment of a programming language. What likely happened was that your data (perhaps a .csv file?) got interpreted as factor by R during reading.
Everything you're trying to do here can be accomplished using the base library in R (meaning you don't need to import anything).
If you're dealing with dates:
df$date <- as.Date(df$date, format = "%Y-%m-%d")
If you're dealing with datetimes:
df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S")
(obviously the specific format may vary; see list)
Occasionally, coercion in R may act finicky. The format parameter is somewhat unforgiving of errors. I personally frequently mistake - for /, or conflate "%Y-%m-%d" with "%d-%m-%Y" causing the operation to throw an error. Obviously, if the format isn't consistent in your data, instances that can't be described by the specific format you supplied will result in NAs.
Sometimes your dates are actually integers (e.g. 20181111); in this case, you may need to supply '1970-01-01' to the origin parameter of as.Date(). For example, if you are iterating through a vector of Dates using a for loop, R won't honour the class of passed Dates and will convert them to integers.
It may sound like a bandaid solution, but class coercions from common types like character are usually written well; I often pre-emptively coerce the object to character when I'm clueless about why my attempt to coerce a class failed.

SPSS date format when imported into R

I have not worked with SPSS (.sav) files before and am trying to work with some data files provided to me by importing them into R. I did not receive any explanation of the files, and because communication is difficult I am trying to figure out as much as I can on my own.
Here's my first question. This is what the Date field looks like in an R data frame after import:
> dataset2$Date[1:4]
[1] 13608172800 13608259200 13608345600 13608345600
I don't know what dates the data is supposed to be for, but I found that if I divide the above numbers by 10, that seems to give a reasonable date (in February 2013). Can anyone confirm this is indeed what the above represents?
My second question is regarding another column called Begin_time. Here's what that looks like:
> dataset2$Begin_time[1:4]
[1] 29520 61800 21480 55080
Any idea what this is representing? I want to believe this is some representation of time of day because the records are for wildlife observations, but I haven't got more info than that to try to guess. I noticed that if I take the difference between End_Time and Begin_time I get numbers like 120 and 180, which seems like minutes to me (3 hours seems reasonable to observe a wild animal), but the absolute numbers are far greater than the number of minutes in a day (1440), so that leaves me puzzled. Is this some time keeping format from SPSS? If so, what's the logic?
Unfortunately, I don't have access to SPSS, so any help would be much appreciated.
I had the same problem and this function is a good solution:
pss2date <- function(x) as.Date(x/86400, origin = "1582-10-14")
This is where I found the answer:
http://scs.math.yorku.ca/index.php/R:_Importing_dates_from_SPSS
Dates in SPSS Statistics are represented as floating point doubles holding the number of seconds since Oct 1, 1582. If you use the SPSS R plugin apis, they can be automatically converted to R dates, but any proper converter should be able to do this for you.

Importing custom formatted excel data into R

I'm trying to analyse some race data in R. The data is mainly finishing times, currently in the custom format hh:mm:ss, however when imported to R I cannot do any analysis as I'll always receive the following error message:
Warning message:
In mean.default(Swim) : argument is not numeric or logical: returning NA
Can anyone advice how best to get around this, albeit probably simple, stumbling block for me? Thanks for your help.
When you are importing a column with the format hh:mm:ss from excel to r it will be imported as factor (or character, depending on the import function/settings). That is the reason for your error message (character/factor is neither numerical or logical).
To be able to do any analysis on the data, you need to do some conversion. You have your data as character you can do
as.integer(as.POSIXct(Swim, format="%H:%M:%S")) %% 86400
to get the hh:mm:ss as number of seconds. If Swim is a factor you can do:
Swim <- as.character(Swim)
to get it as a character.

How to convert date and time into a numeric value

As a new and self taught R user I am struggling with converting date and time values characters into numbers to enable me to group unique combinations of data. I'm hoping someone has come across this before and knows how I might go about it.
I'd like to convert a field of DateTime data (30/11/2012 14:35) to a numeric version of the date and time (seconds from 1970 maybe??) so that I can back reference the date and time if needed.
I have search the R help and online help and only seem to be able to find POSIXct, strptime which seem to convert the other way in the examples I've seen.
I will need to apply the conversion to a large dataset so I need to set the formatting for a field not an individual value.
I have tried to modify some python code but to no avail...
Any help with this, including pointers to tools I should read about would be much appreciated.
You can do this with base R just fine, but there are some shortcuts for common date formats in the lubridate package:
library(lubridate)
d <- ymd_hms("30/11/2012 14:35")
> as.numeric(d)
[1] 1921407275
From ?POSIXct:
Class "POSIXct" represents the (signed) number of seconds since the
beginning of 1970 (in the UTC timezone) as a numeric vector.

Resources