I am having an issue with adding time in ms to a specific time stamp. My data contain two variables, Start Time (Time, in HH:MM:SS.ss format) and time for a specific event given in ms that have passed since the Start Time (STime, numeric variables, i.e. 640654 ms, which equals 640.654 s or 10.678.. min). For example:
Time <- c("16:44:38.00", "16:44:38.00", "16:44:38.00")
STime <- c(640657, 940640, 955301)
Here on the forum I found the code (thank you #tstev) that should allow calculating FTime - Start Time with STime added, and shown in HH:MM:SS.ss format:
FTime <- strftime(strptime(Time,format="%H:%M:%OS")+(STime %% 1)+0.005,format="%H:%M:%OS3")
However, in my case it does not add STime properly - I've tried to present STime in seconds instead of ms (640.657, 940.640, 955.301) and then add - it still does not provide accurate results, here's what I've got:
"16:44:38.662", "16:44:38.645", "16:44:39.306"
Does anyone know how to fix this?
Cheers
You can use the library hms. I divided the microseconds by 1000 to render them as seconds, then converted the values.
library(hms)
library(tidyverse)
Time <- c("16:44:38.00", "16:44:38.00", "16:44:38.00") %>% as_hms()
STime <- (c(640657, 940640, 955301) / 1000) %>% hms(seconds = .)
(Time + STime) %>% as_hms()
# 16:55:18.657
# 17:00:18.640
# 17:00:33.301
Let me know if you have any questions.
Related
I use this:
want=as.Date(date, '%d-%b-%y')
to convert dates like this: 1-JAN-52
Instead of returning '1952-01-01' I am getting '2052-01-01'. Any advice?
Welcome to modern computers, all shaped after the early Unix systems of the 1970s. The start of time, so to speak, is the epoch, aka 1 Jan 1970.
Your problem here, in a nutshell, is the inferior input data. You only supply two years and by a widely followed convention, values less than 70 are taken for the next century. It's all about the epoch.
So you have two choices. You could preprend '19' to the year part and parse via %Y, or you could just take the year value out of the date and reduce it by 100 if need be.
Some example code for the second (and IMHO better) option, makeing 1970 the cutoff date:
> datestr <- "1-Jan-52"
> d <- as.Date(datestr, '%d-%b-%y')
>
> d
[1] "2052-01-01"
>
> if (as.integer(strftime(d, "%Y")) >= 1970) {
+ dp <- as.POSIXlt(d)
+ dp$year <- dp$year - 100
+ d <- as.Date(dp)
+}
> d
[1] "1952-01-01"
>
You need to go via POSIXlt to get the components easily.
I am timing events that are usually of multiple minutes duration but occasionally an event will be less than 60 seconds and lubridate::duration() returns an erroneous value.
I can fix this problem with an IF and 2 duration functions by checking for time differences greater than or less than 60 seconds but it seems such an inelegant method.
library(lubridate)
for (x in c(59, 60)) {
startTime <- Sys.time()
Sys.sleep(x)
print(duration(as.numeric(Sys.time() - startTime), "minutes" ))
}
[1] "3543.86645793915s (~59.06 minutes)"
[1] "60.0540399551392s (~1 minutes)"
I just want time less than 60 seconds in decimal minutes or just seconds.
Thank you.
I am supposing your interval of time are objects of class difftime. If it is like that, see the next script:
#Create object difftime
a <- Sys.time()
b <- Sys.time()
DELTA_TIME <- b - a
#print the duration
print(duration(as.numeric(DELTA_TIME,units='mins'),units='minutes'))
See that this way of printing the duration will accept any object of class difftime. The important thing is the argument unit='mins' in the function as.numeric
I'm working on implementing a finance model in R. I'm using quantmod::getSymbols(), which is returning a xts object. I'm using both stock data from google (or yahoo) and economic/yield data from FRED. Right now I'm receiving errors for non-conformable arrays when attempting to do a comparison.
require(quantmod)
fiveYearsAgo = Sys.Date() - (365 * 5)
bondIndex <- getSymbols("LQD",src="google",from = fiveYearsAgo, auto.assign = FALSE)[,c(0,4)]
bondIndex$score <- 0
bondIndex$low <- runMin(bondIndex,365)
bondIndex$high <- runMax(bondIndex,365)
bondIndex$score <- ifelse(bondIndex > (bondIndex$low * 1.006), bondIndex$score + 1, bondIndex$score)
# Error in `>.default`(bondIndex, (bondIndex$low * 1.006)) :
# non-conformable arrays
bondIndex$score <- ifelse(bondIndex < (bondIndex$high * .994), bondIndex$score - 1, bondIndex$score)
# Error in `<.default`(bondIndex, (bondIndex$high * 0.994)) :
# non-conformable arrays
print (bondIndex$score)
I added the following before the offending line:
print (length(bondIndex))
print (length(bondIndex$low))
print (length(bondIndex$high))
My results were 5024, 1256, and 1256. I want them to be same length where every day has the close, 52 week high, and 52 week low. I additionally want to add more data so the days also have a 50 day moving average. Further still, what really put an ax in my progress was implementing yield data from FRED. My theory is that stock and bond markets have different holidays, resulting in slightly different days with day. In this case, I'd like to na.spline() the missing data.
I know I'm going about this wrong way, what's the best way to do what I'm attempting? I want to have each row be a day, then have columns for close price, high, low, moving average, a few different yields for that day and finally a "score" that has a daily value based on the other data for that day.
Thanks for the help and let me know if you want or need more information.
You need to tell your statement what variable you want. right now you are asking if bondIndex is greater or less than low or high. This doesn't make sense. Presumably you want bondIndex[,1] aka bondIndex$LQD.Close:
bondIndex$score <- ifelse(bondIndex[,1] > (bondIndex$low * 1.006), bondIndex$score + 1, bondIndex$score)
bondIndex$score <- ifelse(bondIndex[,1] < (bondIndex$high * .994), bondIndex$score - 1, bondIndex$score)
As a side note, Sys.Date() - (365 * 5) is not five years ago (hint, leap years). This will be a bug that might bite you down the line.
I'm trying to take data from a CSV file that looks like this:
datetime,bytes
2014-10-24T10:38:49.453565,52594
2014-10-24T10:38:49.554342,86594
2014-10-24T10:38:49.655055,196754
2014-10-24T10:38:49.755772,272914
2014-10-24T10:38:49.856477,373554
2014-10-24T10:38:49.957182,544914
2014-10-24T10:38:50.057873,952914
2014-10-24T10:38:50.158559,1245314
2014-10-24T10:38:50.259264,1743074
and compute rates of change of the bytes value (which represents the number of bytes downloaded so far into a file), in a way that accurately reflects my detailed time data for when I took the sample (which should approximately be every 1/10 of a second, though for various reasons, I expect that to be imperfect).
For example, in the above sampling, the second row got (86594-52594=)34000 additional bytes over the first, in (.554342-.453565=).100777 seconds, thus yielding (34000/0.100777=)337,378 bytes/second.
A second example is that the last row compared to its predecessor got (1743074-1245314=)497760 bytes in (.259264-.158559=).100705 seconds, thus yielding (497760/.100705=)4,942,753 bytes/sec.
I'd like to get a graph of these rates over time, and I'm fairly new to R, and not quite figuring out how to get what I want.
I found some related questions that seem like they might get me close:
How to parse milliseconds in R?
Need to calculate Rate of Change of two data sets over time individually and Net rate of Change
Apply a function to a specified range; Rate of Change
How do I calculate a monthly rate of change from a daily time series in R?
But none of them seem to quite get me there... When I try using strptime, I seem to lose the precision (even using %OS); plus, I'm just not sure how to plot this as a series of deltas with timestamps associated with them... And the stuff in that one answer (second link, the answer with the AAPL stock delta graph) about diff(...) and -nrow(...) makes sense to me at a conceptual level, but not deeply enough that I understand how to apply it in this case.
I think I may have gotten close, but would love to see what others come up with. What options do I have for this? Anything that could show a rolling average (over, say, a second or 5 seconds), and/or using nice SI units (KB/s, MB/s, etc.)?
Edit:
I think I may be pretty close (or even getting the basic question answered) with:
my_data <- read.csv("my_data.csv")
my_deltas <- diff(my_data$bytes)
my_times <- strptime(my_data$datetime, "%Y-%m-%dT%H:%M:%S.%OS")
my_times <- my_times[2:nrow(my_data)]
df <- data.frame(my_times,my_deltas)
plot(df, type='l', xlab="When", ylab="bytes/s")
It's not terribly pretty (especially the y axis labels, and the fact that, with a longer data file, it's all pretty crammed with spikes), though, and it's not getting the sub-second precision, which might actually be OK for the larger problem (in the bigger graph, you can't tell, whereas with the sample data above, you really can), but still is not quite what I was hoping for... so, input still welcomed.
A possible solution:
# reading the data
df <- read.table(text="datetime,bytes
2014-10-24T10:38:49.453565,52594
2014-10-24T10:38:49.554342,86594
2014-10-24T10:38:49.655055,196754
2014-10-24T10:38:49.755772,272914
2014-10-24T10:38:49.856477,373554
2014-10-24T10:38:49.957182,544914
2014-10-24T10:38:50.057873,952914
2014-10-24T10:38:50.158559,1245314
2014-10-24T10:38:50.259264,1743074", header=TRUE, sep=",")
# formatting & preparing the data
df$bytes <- as.numeric(df$bytes)
df$datetime <- gsub("T"," ",df$datetime)
df$datetime <- strptime(df$datetime, "%Y-%m-%d %H:%M:%OS")
df$sec <- as.numeric(format(df$datetime, "%OS6"))
# calculating the change in bytes per second
df$difftime <- c(NA,diff(df$sec))
df$diffbytes <- c(NA,diff(df$bytes))
df$bytespersec <- df$diffbytes / df$difftime
# creating the plot
library(ggplot2)
ggplot(df, aes(x=sec,y=bytespersec/1000000)) +
geom_line() +
geom_point() +
labs(title="Change in bytes\n", x="\nWhen", y="MB/s\n") +
theme_bw()
which gives:
This is not homework. I am interested in setting up a simulation of a coin toss in R. I would like to run the simulation for a week. Is there a function in R that will allow me to start and stop the simulation over a time period such as a week? If all goes well, I may want to increase the length of the simulation period.
For example:
x <- rbinom(10, 1, 1/2)
So to clarify, instead of 10 in the code above, how do I keep the simulation going for a week (number of trials in a week versus set number of trials)? Thanks.
Here is code that will continue to run for three seconds, then stop and print the totals.
x <- Sys.time()
duration <- 3 # number of seconds
heads <- 0
tails <- 0
while(Sys.time() <= x + duration){
s <- sample(0:1, 1)
if(s == 1) heads <- heads+1 else tails <- tails+1
cat(sample(0:1, 1))
}
cat("heads: ", heads)
cat("tails: ", tails)
The results:
001100111000011010000010110111111001011110100110001101101010 ...
heads: 12713
tails: 12836
Note of warning:
At the speed of my machine, I bet that you get a floating point error long before the end of the week. In other words, you may get to the maximum value your machine allows you to store as an integer, double, float or whatever you are using, and then your code will crash.
So you may have to build in some error checking or rollover mechanism to protect you from this.
For an accelerated illustration of what will happen, try the following:
x <- 1e300
while(is.finite(x)){
x <- x+x
cat(x, "\n")
}
R deals with the floating point overload gracefully, and returns Inf.
So, whatever data you had in the simulation is now lost. It's not possible to analyse infinity to any sensible degree.
Keep this in mind when you design your simulation.
While now is smaller than a week later time stamp append to x rbinmo(1,1,1/2)
R> week_later <- strptime("2012-06-22 16:45:00", "%Y-%m-%d %H:%M:%S")
R> x <- rbinom(1, 1, 1/2) // init x
R> while(as.numeric(Sys.time()) < as.numeric(week_later)){
R> x <- append(x, rbinom(1, 1, 1/2))
R> }
You may be interested in the fairly new package harvestr by Andrew Redd. It splits a task into pieces (the idea being that pieces could be run in parallel). The part of the package that applies to your question is that it caches results of the pieces that have already been processed, so that if the task is interupted and restarted then those pieces that have finished will not be rerun, but it will pick up on those that did not complete (pieces that were interupted part way through will start from the beginning of that piece).
This may let you start and stop the simulation as you request.