I have timestamp data that is of the "factor" class. It looks as follows:
"193:00:11" ; where it is hours:minutes:seconds ...
I am trying to convert this to the right timestamp class so I can perform calculations on it (like determine the mean, max, minimum etc.,). I have tried using lubridate, and doing:
hhmmss(df1$time) ; but this does not work and just gives me the seconds back.
Thank you for the help.
If the strings/factors are always in this format, this will give the number of seconds elapsed. The data must be in a character vector.
#example data
tm <- c("193:01:11", "96:22:47", "1:01:01", "2:02:02")
tmm <- matrix(as.numeric(unlist(strsplit(tm,":"))),ncol=3, byrow=T)
tmm %*% c(3600, 60, 1)
Related
I have a matrix of time variables in the following format.
time <- matrix(c('01:11', '01:20', '00:51', '01:09',
'01:11', '01:00', '01:19', '00:14',
'00:57', '01:12', '01:14', '00:43',
'01:10', '01:19', '01:03', '00:27',
'00:59', '01:04', '00:46', '00:52',
'01:05', '01:13', '01:01', '00:48'), ncol=3)
Where the values before ':' are minutes and after that are seconds.
I want to convert all the values into seconds. But I am not sure how I should transform the data so that those values with minutes can be converted into seconds, and those already in seconds can then be used as a numeric value.
I tried with the chron package my dataset seems to be in the wrong format.
Use strsplit with apply. If the values are not character, you may want to convert it to character.
apply(time, 2, function(x) sapply(strsplit(x,":"), function(y)
as.numeric(y[1])*60 + as.numeric(y[2])))
I am new to R and need to use the function getnfac from the PANICr package. And it seems that the function only takes an xts object as its first argument. However, after I went through some reading I still don't understand what an xts object is. Could anyone please tell me how I can convert a matrix into an xts object?
Below I use return matrix as the first argument. Therefore I just need to convert return to an xts object.
getnfac(return,143,"BIC3")
Error in getnfac(return, 143, "BIC3") :
x must be an xts object so lags and differences are taken properly
xts is an extensible time series object, essentially a regular ts object (or more correctly a zoo object) with some bits added.
The 'extensible' part of the name refers to how you can add attributes of your own choice.
While a matrix can be converted into a multivariate time series quite easily
m <- matrix(1:16, 4)
m.ts <- ts(m)
index(m.ts)
An xts requires its index (a vector describing at what time each sample was taken) to be in a date or time format
library(xts)
m <- matrix(1:16, 4)
d <- as.Date(1:nrow(m))
m.xts <- xts(m, order.by=d)
index(m.xts)
If your data is sampled at evenly spaced intervals a dummy index like the one above is probably ok. If not, you'll need to supply a vector corresponding to the sampling times.
In my opinion, the first argument to the getnfac() function should be matrix containing the data.
In addition to the above answers,
You can convert matrix format using coredata() about xts object.
I wrote the following function to convert a vector of Strings to a vector of Dates (the code inside the for loop was inspired by this post: R help converting factor to date). When I pass in a vector of size 1000, this takes about 30 seconds. Not terribly slow, but I ultimately need to pass in about 100,000 so this could be a problem. Any ideas why this is slow and/or how to speed it up?
toDate <- function (dates)
{
theDates <- vector()
for(i in 1:length(dates))
{
temp <- factor(dates[i])
temp <- as.Date(temp, format = "%m/%d/%Y")
theDates[i] <- temp
}
class(theDates) <- "Date"
return(theDates)
}
Just do:
as.Date(dates, format = "%m/%d/%Y")
You don't need to loop over the dates vector as as.Date() can handle a vector of characters just fine in a single shot. Your function is incurring length(dates) calls to as.Date() plus some assignments to other functions, which all have overhead that is totally unnecessary.
You don't want to convert each individual date to a factor. You don't want to convert them at all (as.Date() will just convert them back to characters). If you did want to convert them, factor() is also vectorised, so you could (but you don't need this at all, anywhere in your function) remove the factor() line and insert dates <- as.factor(dates) outside the for() loop. But again, you don't need to do this at all!
This is what I want to do:
First, I randomly generate a sequence of dates.
Then, I assign the earliest date to the variable.
site_start<-list()
for(i in 1:l0){
for(j in 1:10){
date<-seq.Date(from="1900-01-01",to="2000-01-01",by=week)
site_start[[i]][j]<-sample(date,1)
}
}
Now, let us assume the date variable is correctly generated. The reason I say this is because in my real case, I acquired the date variable from dozens of other steps that is irrelevant here.
My question is, why the site_start[[i]][j] I generated, kept on coming out as POSIXct, and R requires me to provide 'origin'? I format it with origin of 1970-01-01, it is still a numeric date, such as 15600. I simply don't know how to format this number anymore.
Any help is appreciated!!
W
Why don't you use this vectorized approach:
date.pool <- seq(from=as.Date("1900-01-01"), to=as.Date("2000-01-01"), by="1 week")
site_start <- replicate(10, sample(date.pool, 10, rep=T), simplify=F)
This produces a list with 10 items, each of which is a 10 length vector with random dates pulled from date.pool. Here are the first two items (site_start[1:2]):
[[1]]
[1] "1969-09-15" "1955-10-10" "1959-04-13" "1992-02-10" "1905-07-31" "1901-09-23"
[7] "1926-10-18" "1959-06-01" "1924-06-02" "1906-05-14"
[[2]]
[1] "1979-01-01" "1998-02-23" "1929-09-02" "1968-07-01" "1924-03-17" "1914-11-02"
[7] "1928-02-13" "1937-10-25" "1915-02-08" "1974-05-06"
In the past, when I have had to grab the oldest or most-recent entry I will use arrange. E.g.,
# read dataset
enforce <- read.csv(paste(input.dir, "provider_enforcement.csv", sep="/"))
# use lubridate package to parse date format
enforce$SNAPSHOT_DATE <- mdy_hm(enforce$SNAPSHOT_DATE)
# this function sorts a data.frame and returns a data.frame with one row containing the most recent SNAPSHOT
MostRecent <- function(data) {
return(arrange(data, SNAPSHOT_DATE, decreasing=TRUE)[1, ])
}
# use plyr to apply MostRecent to my dataset for each provider
enforce <- ddply(enforce, .(PROVIDER_IDNO), MostRecent)
My knowledge and experience of R is limited, so please bear with me.
I have a measurements of duration in the following form:
d+h:m:s.s
e.g. 3+23:12:11.931139, where d=days, h=hours, m=minutes, and s.s=decimal seconds. I would like to create a histogram of these values.
Is there a simple way to convert such string input into a numerical form, such as seconds? All the information I have found seems to be geared towards date-time objects.
Ideally I would like to be able to pipe a list of data to R on the command line and so create the histogram on the fly.
Cheers
Loris
Another solution based on SO:
op <- options(digits.secs=10)
z <- strptime("3+23:12:11.931139", "%d+%H:%M:%OS")
vec_z <- z + rnorm(100000)
hist(vec_z, breaks=20)
Short explanation: First, I set the option in such a way that the milliseconds are shown. Now, if you type z into the console you get "2012-05-03 23:12:11.93113". Then, I parse your string into a date-object. Then I create some more dates and plot a histogramm. I think the important step for you is the parsing and strptime should help you with that
I would do it like this:
str = "3+23:12:11.931139"
result = sum(as.numeric(unlist(strsplit(str, "[:\\+]", perl = TRUE))) * c(24*60*60, 60*60, 60, 1))
> result
[1] 342731.9
Then, you can wrap it into a function and apply over the list or vector.