R adding to a difftime vector forgets about the units - r

When I extend a vector of difftimes by another difftime object, then it seems that the unit of the added item is ignored and overridden without conversion:
> t = Sys.time()
> d = difftime(c((t+1), (t+61)), t)
> d
Time differences in secs
[1] 1 61
> difftime(t+61, t)
Time difference of 1.016667 mins
> d[3] = difftime(t+61, t)
> d
Time differences in secs
[1] 1.000000 61.000000 1.016667
> as.numeric(d)
[1] 1.000000 61.000000 1.016667
This is in R 3.1.0. Is there a reasonable explanation for this behavior? I just wanted to store some time differences in this way for later use and didn't expect this at all. I didn't find this documented anywhere..
Okay, for now I'm just helping myself with always specifying the unit:
> d[3] = difftime(t+61, t, unit="secs")
> d
Time differences in secs
[1] 1 61 61

From help("difftime")
If units = "auto", a suitable set of units is chosen, the largest possible (excluding "weeks") in which all the absolute differences are greater than one.
units = "auto" is the default. So for a difference of 1 and 61 seconds, if you were to choose minutes,
difftime(c((t+1), (t+61)), t, units = "min")
# Time differences in mins
# [1] 0.01666667 1.01666667
One of those is less than one, so by default since you did not specify the units R chose them for you according to the guidelines above. Additionally, the units are saved with the object
d <- difftime(c((t+1), (t+61)), t)
units(d)
# [1] "secs"
But you can change the units with units<-
d[3] <- difftime(t+61, t)
d
# Time differences in mins
# [1] 0.01666667 1.01666667 1.01666667
units(d) <- "secs"
d
# Time differences in secs
# [1] 1 61 61

Related

Converting time in integer to MIN:SEC

I am using this code to find a difference between two times:
station_data.avg$duration[i] = if_else(station_data.avg$swath[i] != 0, round(
difftime(station_data.avg$end[i], station_data.avg$start[i], units = "mins"),
3
), 0)
But the output is 3.116667 and I want the output to be in the format Min:sec so 3:18
I tried
station_data.avg$duration[i]= as.character(times(station_data.avg$duration[i] / (24 * 60 )))
and was hoping that would work but it did not
You can use the chron package to convert fraction of the minute (ie, x.25 indicating 25% of a minute) into x.15 indicating out of 60 seconds (15/60 = 25). An example is below, but if you edit your question to make it reproducible, I can provide more specific help.
Data
a <- Sys.time()
b <- Sys.time() + 60 * 3 + 15 # add 3 min 15 seconds
Code
difftime(b, a, units = "min")
# Time difference of 3.250006 mins
chron::times(as.numeric(difftime(b, a, units = "days")))
# [1] 00:03:15
Note the change to units = "days" in this context.
You could further parse this out by wrapping this in lubridate::hms:
lubridate::hms(
chron::times(as.numeric(difftime(b, a, units = "days")))
)
# [1] "3M 15S"

function to get the power of a number

looking for a way to get from an floating point number the power of 10 to which it is noted
6.45e-8 - would be 8
3.21e-4 would be 4
0.013 would be 2
or minus in all
is ther e a function which would do the following
instead of multiplying with 6.45e_8 it would be at first dividing by 1e-8 and then multiply with (6.45e-8/1e8=...).
How about
floor(log10(x))
? log10 computes the log base 10, floor finds the next smaller integer.
tenexp <- function(x){c <- trunc(log10(abs(x))); return(abs(c-1*(c<0)))}
Here's the (desired?) result:
> tenexp(0.0134)
[1] 2
> tenexp(6.45e-8)
[1] 8
> tenexp(6.45e+3)
[1] 3
> tenexp(-1.28e+4)
[1] 4

Binning data in R

I have a vector with around 4000 values. I would just need to bin it into 60 equal intervals for which I would then have to calculate the median (for each of the bins).
v<-c(1:4000)
V is really just a vector. I read about cut but that needs me to specify the breakpoints. I just want 60 equal intervals
Use cut and tapply:
> tapply(v, cut(v, 60), median)
(-3,67.7] (67.7,134] (134,201] (201,268]
34.0 101.0 167.5 234.0
(268,334] (334,401] (401,468] (468,534]
301.0 367.5 434.0 501.0
(534,601] (601,668] (668,734] (734,801]
567.5 634.0 701.0 767.5
(801,867] (867,934] (934,1e+03] (1e+03,1.07e+03]
834.0 901.0 967.5 1034.0
(1.07e+03,1.13e+03] (1.13e+03,1.2e+03] (1.2e+03,1.27e+03] (1.27e+03,1.33e+03]
1101.0 1167.5 1234.0 1301.0
(1.33e+03,1.4e+03] (1.4e+03,1.47e+03] (1.47e+03,1.53e+03] (1.53e+03,1.6e+03]
1367.5 1434.0 1500.5 1567.0
(1.6e+03,1.67e+03] (1.67e+03,1.73e+03] (1.73e+03,1.8e+03] (1.8e+03,1.87e+03]
1634.0 1700.5 1767.0 1834.0
(1.87e+03,1.93e+03] (1.93e+03,2e+03] (2e+03,2.07e+03] (2.07e+03,2.13e+03]
1900.5 1967.0 2034.0 2100.5
(2.13e+03,2.2e+03] (2.2e+03,2.27e+03] (2.27e+03,2.33e+03] (2.33e+03,2.4e+03]
2167.0 2234.0 2300.5 2367.0
(2.4e+03,2.47e+03] (2.47e+03,2.53e+03] (2.53e+03,2.6e+03] (2.6e+03,2.67e+03]
2434.0 2500.5 2567.0 2634.0
(2.67e+03,2.73e+03] (2.73e+03,2.8e+03] (2.8e+03,2.87e+03] (2.87e+03,2.93e+03]
2700.5 2767.0 2833.5 2900.0
(2.93e+03,3e+03] (3e+03,3.07e+03] (3.07e+03,3.13e+03] (3.13e+03,3.2e+03]
2967.0 3033.5 3100.0 3167.0
(3.2e+03,3.27e+03] (3.27e+03,3.33e+03] (3.33e+03,3.4e+03] (3.4e+03,3.47e+03]
3233.5 3300.0 3367.0 3433.5
(3.47e+03,3.53e+03] (3.53e+03,3.6e+03] (3.6e+03,3.67e+03] (3.67e+03,3.73e+03]
3500.0 3567.0 3633.5 3700.0
(3.73e+03,3.8e+03] (3.8e+03,3.87e+03] (3.87e+03,3.93e+03] (3.93e+03,4e+03]
3767.0 3833.5 3900.0 3967.0
In the past, i've used this function
evenbins <- function(x, bin.count=10, order=T) {
bin.size <- rep(length(x) %/% bin.count, bin.count)
bin.size <- bin.size + ifelse(1:bin.count <= length(x) %% bin.count, 1, 0)
bin <- rep(1:bin.count, bin.size)
if(order) {
bin <- bin[rank(x,ties.method="random")]
}
return(factor(bin, levels=1:bin.count, ordered=order))
}
and then i can run it with
v.bin <- evenbins(v, 60)
and check the sizes with
table(v.bin)
and see they all contain 66 or 67 elements. By default this will order the values just like cut will so each of the factor levels will have increasing values. If you want to bin them based on their original order,
v.bin <- evenbins(v, 60, order=F)
instead. This just split the data up in the order it appears
This result shows the 59 median values of the break-points. The 60 bin values are probably as close to equal as possible (but probably not exactly equal).
> sq <- seq(1, 4000, length = 60)
> sapply(2:length(sq), function(i) median(c(sq[i-1], sq[i])))
# [1] 34.88983 102.66949 170.44915 238.22881 306.00847 373.78814
# [7] 441.56780 509.34746 577.12712 644.90678 712.68644 780.46610
# ......
Actually, after checking, the bins are pretty darn close to being equal.
> unique(diff(sq))
# [1] 67.77966 67.77966 67.77966

Moving Maximum in last 5 minutes in R

I was wondering how to implement a moving maximum and minimum for a price in the last 5 minutes in O(n) time in R. My data consists of two columns: one with the time of day in seconds and the other with price. Right now, I take the current time, subtract 5 minutes, subset for the last 5 minutes, and then search for min and max at each index, so the operation is O(n^2). Is there any way to do this in O(n)?
Sample data:
time
[34200.19, 34200.23, 34201.45, ..., 35800, 35800.2, 35800.5]
price
[100, 103, 102, ..., 95, 97, 99]
The following compares a direct approach with a slightly more efficient varient, but it looks to scale as about n^1.6 on the values I've tried it with (10,000 - 100,000) - partly depends if incresing n is assumed to be more points in the same time period, or extending over a longer period.
#Create some data
n <- 10000
d <- data.frame(t=as.POSIXct(24*3600*runif(n), origin = "2014-01-01"),x=runif(n))
d <- d[order(d$t),]
d$inmax2 <-d$inmax <- rep(FALSE,n)
d$inmax2[1] <-d$inmax[1] <- TRUE
if (max(diff(d$t)) > 300) warning("There are gaps of more than 300 secs")
#Method 1, assume that you've done something like this
t1 <- system.time({
for (i in 2:n) d$inmax[i] <- !any((difftime(d$t[i], d$t[1:(i-1)] ,units="secs") < 300) & (d$x[i] < d$x[1:(i-1)] ))
})
#Method 2
t2 <- system.time({
cand <- 1
next_cand <- 2
while (next_cand <= n)
{
cand <- cand[difftime(d$t[next_cand],d$t[cand],units="secs")<300]
cand <- c(cand[d$x[cand] > d$x[next_cand]],next_cand)
if(length(cand)==1) d$inmax2[cand] <- TRUE
next_cand <- next_cand + 1
}
})
rbind(method1=t1,method2=t2)
# user.self sys.self elapsed user.child sys.child
# method1 14.98 0.03 15.04 NA NA
# method2 2.59 0.05 2.63 NA NA
all(d[[3]]==d[[4]])
# TRUE
The approach is to run through keeping all possible candidates in the past 5 minutes that are not less than the present one. If there are no such candidates the current must be the maximum. I assume that you can generalise to minimum.
Possibly doesn't work if you want to know maximum in last 5 minutes between datapoints rather than at datapoints though - not sure if you require that
Sort the dataframe by time first. Then maintain a max heap of the price, removing the lost price entries after every shift. Since rebalancing a heap is O(log n), this will be O(n log n). For implementing a max heap, consult any algorithms textbook (although I may edit this post later with one).

how to do statistic with time date

I got a serial of times, as following,
2013-12-27 00:31:15
2013-12-29 17:01:17
2013-12-31 01:52:41
....
my target is to know what time in a day is more important, like most times are in the period of 17:00 ~ 19:00.
In order to do that, I think I should draw every single time as a point in x-axes, and the unit of x-axes is minute.
I don't know how to do it exactly with R and ggplot2.
Am I on the right way? I mean, is there a better way to get my target?
library(chron)
# create some test data - hrs
set.seed(123)
Lines <- "2013-12-27 00:31:15
2013-12-29 17:01:17
2013-12-31 01:52:41
"
tt0 <- times(read.table(text = Lines)[[2]]) %% 1
rng <- range(tt0)
hrs <- 24 * as.vector(sort(diff(rng) * runif(100)^2 + rng[1]))
# create density, find maximum of it and plot
d <- density(hrs)
max.hrs <- d$x[which.max(d$y)]
ggplot(data.frame(hrs)) +
geom_density(aes(hrs)) +
geom_vline(xintercept = max.hrs)
giving:
> max.hrs # in hours - nearly 2 am
[1] 1.989523
> times(max.hrs / 24) # convert to times
[1] 01:59:22

Resources