how to do statistic with time date - r

I got a serial of times, as following,
2013-12-27 00:31:15
2013-12-29 17:01:17
2013-12-31 01:52:41
....
my target is to know what time in a day is more important, like most times are in the period of 17:00 ~ 19:00.
In order to do that, I think I should draw every single time as a point in x-axes, and the unit of x-axes is minute.
I don't know how to do it exactly with R and ggplot2.
Am I on the right way? I mean, is there a better way to get my target?

library(chron)
# create some test data - hrs
set.seed(123)
Lines <- "2013-12-27 00:31:15
2013-12-29 17:01:17
2013-12-31 01:52:41
"
tt0 <- times(read.table(text = Lines)[[2]]) %% 1
rng <- range(tt0)
hrs <- 24 * as.vector(sort(diff(rng) * runif(100)^2 + rng[1]))
# create density, find maximum of it and plot
d <- density(hrs)
max.hrs <- d$x[which.max(d$y)]
ggplot(data.frame(hrs)) +
geom_density(aes(hrs)) +
geom_vline(xintercept = max.hrs)
giving:
> max.hrs # in hours - nearly 2 am
[1] 1.989523
> times(max.hrs / 24) # convert to times
[1] 01:59:22

Related

Converting time in integer to MIN:SEC

I am using this code to find a difference between two times:
station_data.avg$duration[i] = if_else(station_data.avg$swath[i] != 0, round(
difftime(station_data.avg$end[i], station_data.avg$start[i], units = "mins"),
3
), 0)
But the output is 3.116667 and I want the output to be in the format Min:sec so 3:18
I tried
station_data.avg$duration[i]= as.character(times(station_data.avg$duration[i] / (24 * 60 )))
and was hoping that would work but it did not
You can use the chron package to convert fraction of the minute (ie, x.25 indicating 25% of a minute) into x.15 indicating out of 60 seconds (15/60 = 25). An example is below, but if you edit your question to make it reproducible, I can provide more specific help.
Data
a <- Sys.time()
b <- Sys.time() + 60 * 3 + 15 # add 3 min 15 seconds
Code
difftime(b, a, units = "min")
# Time difference of 3.250006 mins
chron::times(as.numeric(difftime(b, a, units = "days")))
# [1] 00:03:15
Note the change to units = "days" in this context.
You could further parse this out by wrapping this in lubridate::hms:
lubridate::hms(
chron::times(as.numeric(difftime(b, a, units = "days")))
)
# [1] "3M 15S"

Can I write my equation more efficiently in R?

I'm quite new to coding, so I don't know what the limits are for what I can do in R, and I haven't been able to find an answer for this particular kind of problem yet, although it probably has quite a simple solution.
For equation 2, A.1 is the starting value, but in each subsequent equation I need to use the previous answer (i.e. for A.3 I need A.2, for A.4 I need A.3, etc.).
A.1 <- start.x*(1-rate[1])+start.x*rate[1]
A.[2:n] <- A.[n-1]*(1-rate[2:n])+x*rate[2:n]
How do I set A.1 as the initial value, and is there a better way of writing equation 2 than to copy and paste the equation 58 times?
I've included the variables I have below:
A.1<- -13.2 # which is the same as start.x
x<- -10.18947 # x[2:n]
n<- 58
Age<-c(23:80)
rate <- function(Age){
Turnover<-(1/(1.0355*Age-3.9585))
return(Turnover)
}
I need to find the age at which A can be rounded to -11.3. I expect to see it from ages 56 to 60.
Using the new information, try this:
x<- -10.18947
n<- 58
Age <- 23:80
rate <- (1 / (1.0355 * Age - 3.9585))
A <- vector("numeric", 58)
A[1] <- -13.2
for (i in 2:n) {
A[i] <- A[i-1] * (1 - rate[i]) + x * rate[i]
}
Age[which.min(abs(A + 11.3))]
# [1] 58
plot(Age, A, type="l")
abline(h=-11.3, v=58, lty=3)
So the closest age to -11.3 is 58 years.

R - interpolate time series to same interval but new time points

Apologies if this is an obvious question, but I am new to R, having spent many years with SAS
I have two data files of measurements taken on 10-second intervals, but not at the exact same time points. I would like to convert one of the time series to match the times of the other, adjusting its values with linear interpolation. In SAS I could do this pretty quickly with proc expand, but I can't find anything similar package in R (I've looked at zoo and xts).
To show what I mean, here are snippets of my two files. In this case one time series is on the 3's and the other is on the 2's. In this particular case that 1-second difference is probably trivial, but this is a problem I run into a lot in my work and so I'd like to know if there's any easy way to recalculate, via linear interpolation, the values in the second data set to also be on the 3's
Date,Time,Value
3-Nov-16,13:15:53,264.651
3-Nov-16,13:16:03,264.58
3-Nov-16,13:16:13,264.368
3-Nov-16,13:16:23,264.273
3-Nov-16,13:16:33,264.391
11/3/16,1:15:52 PM,10.1
11/3/16,1:16:02 PM,10.1
11/3/16,1:16:12 PM,10.1
11/3/16,1:16:22 PM,10.1
11/3/16,1:16:32 PM,10.1
You can use the 'approx' function. Here is an example with your data:
> input <- read.table(text = "11/3/16,1:15:52 PM,10.1
+
+ 11/3/16,1:16:02 PM,10.1
+
+ 11/3/16,1:16:12 PM,10.1
+
+ 11/3/16,1:16:22 PM,10.1
+
+ 11/3/16,1:16:32 PM,10.1", as.is = TRUE, sep = ',')
>
> # convert the date
> input$time <- as.POSIXct(input$V2, format = "%I:%M:%S %p")
> library(lubridate)
>
> input$newtime <- input$time
>
> first <- read.table(text = "3-Nov-16,13:15:53,264.651
+
+ 3-Nov-16,13:16:03,264.58
+
+ 3-Nov-16,13:16:13,264.368
+
+ 3-Nov-16,13:16:23,264.273
+
+ 3-Nov-16,13:16:33,264.391", as.is = TRUE, sep = ',')
> first$time <- as.POSIXct(first$V2, format = "%H:%M:%S")
>
> # use "approx" to interprete values
> # find values for times in "input" since "first" has different values
> input$result <- approx(first$time,
+ first$V3,
+ xout = input$time,
+ rule = 2
+ )$y
>
>
> input
V1 V2 V3 time newtime result
1 11/3/16 1:15:52 PM 10.1 2017-01-11 13:15:52 2017-01-11 13:15:52 264.6510
2 11/3/16 1:16:02 PM 10.1 2017-01-11 13:16:02 2017-01-11 13:16:02 264.5871
3 11/3/16 1:16:12 PM 10.1 2017-01-11 13:16:12 2017-01-11 13:16:12 264.3892
4 11/3/16 1:16:22 PM 10.1 2017-01-11 13:16:22 2017-01-11 13:16:22 264.2825
5 11/3/16 1:16:32 PM 10.1 2017-01-11 13:16:32 2017-01-11 13:16:32 264.3792
> first
V1 V2 V3 time
1 3-Nov-16 13:15:53 264.651 2017-01-11 13:15:53
2 3-Nov-16 13:16:03 264.580 2017-01-11 13:16:03
3 3-Nov-16 13:16:13 264.368 2017-01-11 13:16:13
4 3-Nov-16 13:16:23 264.273 2017-01-11 13:16:23
5 3-Nov-16 13:16:33 264.391 2017-01-11 13:16:33
>
I apologize that I am bit lazy trying evaluating input as you asked. I am still learning R. I wonder if this piece of code solves your fundamental issue.
The algorithm is simple
I change all time to Unix time which is the number of seconds since epoch.
I use unixtime as x and the second number as y
I create new data points based on the second set of datetime as unixtime.
toUnixTime <- function( dateobj ){
return (as.numeric(as.POSIXct(dateobj, origin="1970-01-01")))
}
toDateTime <- function( unixtime){
return (as.POSIXct(unixtime, origin="1970-01-01"))
}
toUnix <- function( datetime ){
return ( as.numeric(strptime( datetime, "%d-%b-%y,%H:%M:%S") ) )
}
toUnix2 <- function( datetime ){
return ( as.numeric(strptime( datetime, "%m/%d/%y,%I:%M:%S %p") ) )
}
main <- function(){
x <- c(toUnix("3-Nov-16,13:15:53" ),toUnix("3-Nov-16,13:16:03:53"))
y <- c(264.651,264.58)
f <- approxfun(x,y)
f(toUnix2("11/3/16,1:16:02 PM"))
}
main()
This outputs
264.5871 for 11/3/16,1:16:02 PM

Print dates without scientific notation in rpart classification tree

When I create an rpart tree that uses a date cutoff at a node, the print methods I use - both rpart.plot and fancyRpartPlot - print the dates in scientific notation, which makes it hard to interpret the result. Here's the fancyRpartPlot:
Is there a way to print this tree with more interpretable date values? This tree plot is meaningless as all those dates look the same.
Here's my code for creating the tree and plotting two ways:
library(rpart) ; library(rpart.plot) ; library(rattle)
my_tree <- rpart(a ~ ., data = dat)
rpart.plot(my_tree)
fancyRpartPlot(my_tree)
Using this data:
# define a random date/time selection function
generate_days <- function(N, st="2012/01/01", et="2012/12/31") {
st = as.POSIXct(as.Date(st))
et = as.POSIXct(as.Date(et))
dt = as.numeric(difftime(et,st,unit="sec"))
ev = runif(N, 0, dt)
rt = st + ev
rt
}
set.seed(1)
dat <- data.frame(
a = runif(1:100),
b = rpois(100, 5),
c = sample(c("hi","med","lo"), 100, TRUE),
d = generate_days(100)
)
From a practical standpoint, perhaps you'd like to just use days from the start of the data:
dat$d <- dat$d-as.POSIXct(as.Date("2012/01/01"))
my_tree <- rpart(a ~ ., data = dat)
rpart.plot(my_tree,branch=1,extra=101,type=1,nn=TRUE)
This reduces the number to something manageable and meaningful (though not as meaningful as a specific date, perhaps). You may even want to round it to the nearest day or week. (I can't install GTK+ on my computer so I can't us fancyRpartPlot.)
One possible way might be to use the digits options in print to examine the tree and as.POSIXlt to convert to date:
> print(my_tree,digits=100)
n= 100
node), split, n, deviance, yval
* denotes terminal node
1) root 100 7.0885590 0.5178471
2) d>=1346478795.049611568450927734375 33 1.7406368 0.4136051
4) b>=4.5 23 1.0294497 0.3654257 *
5) b< 4.5 10 0.5350040 0.5244177 *
3) d< 1346478795.049611568450927734375 67 4.8127122 0.5691901
6) d< 1340921905.3460228443145751953125 55 4.1140164 0.5368048
12) c=hi 28 1.8580913 0.4779574
24) d< 1335890083.3241622447967529296875 18 0.7796261 0.3806526 *
25) d>=1335890083.3241622447967529296875 10 0.6012662 0.6531062 *
13) c=lo,med 27 2.0584052 0.5978317
26) d>=1337494347.697483539581298828125 8 0.4785274 0.3843749 *
27) d< 1337494347.697483539581298828125 19 1.0618892 0.6877082 *
7) d>=1340921905.3460228443145751953125 12 0.3766236 0.7176229 *
## Get date on first node
> as.POSIXlt(1346478795.049611568450927734375,origin="1970-01-01")
[1] "2012-08-31 22:53:15 PDT"
I also check the digits option in available in rpart.plot and fancyRpartPlot:
rpart.plot(my_tree,digits=10)
fancyRpartPlot(my_tree, digits=10)
I don't know how important the specific chronological date is in your classification but an alternative method would be to breakdown your dates by the characteristics. In other words, create bins based on the "year" (2012,2013,2014...) as [1,0]. "Day of the Week" (Mon, Tues, Wed, Thurs, Fri...) as [1,0]. Maybe even as "Day of Month" (1,2,3,4,5...31) as [1,0]. This adds a lot more categories to be classifying by but it eliminates the issue with working with a fully formatted date.

How can I estimate density of target wind speed from daily weibull distributions for 13 years of large dataset

Good day,
I want to create weibull distribution based on each day's wind speeds measured half hourly (48 wind speeds each day, sometimes few hours are missing).
And then based on the weibull distribution, I want to calculate the density of a certain target wind speed (in this dataset, 29km/hr) based on each daily weibull distribution.
To do this, I need to arrange 13 years of dataset each day to calculate two parameters for weibull distribution (scale = a and shape = b) to estimate density of the target point each day. As this is a large dataset, I need to use a certain function for automatically process it and put the everyday results in a different table (a, b, density of 29 km/hr) (Possibly 'return' function??)
My data looks like this:
Time windspeed direction Date day_index
1 24/07/2000 13:00 31 310 2000-07-24 13:00:00 2000_206
2 24/07/2000 13:30 41 320 2000-07-24 13:30:00 2000_206
3 24/07/2000 14:30 37 290 2000-07-24 14:30:00 2000_206
4 24/07/2000 15:00 30 300 2000-07-24 15:00:00 2000_206
5 24/07/2000 15:30 24 320 2000-07-24 15:30:00 2000_206
6 24/07/2000 16:00 22 330 2000-07-24 16:00:00 2000_206
7 24/07/2000 16:30 37 270 2000-07-24 16:30:00 2000_206
This is the further question by this link webpage (How can I apply "sapply" in R with multiple codes in one function?)
Previous comments indicate that I might need to used 'aggregate' or 'ddply' functions. How can I put multiple arguments in the functions for my intention of analysis?
My function for multiple arguments is:
library(bReeze)
library(xts)
time_ballarat <- strptime(ballarat_alldata[,1], "%d/%m/%Y %H:%M")
multiple.function <- set1 <- createSet(height=10, v.avg=ballarat_alldata[,2], dir.avg=ballarat_alldata[,3]) + ballarat <- createMast(time.stamp=time_ballarat, set1) + ballarat <- clean(mast=ballarat) + ballarat.wb <- weibull(mast=ballarat, v.set=1, print=FALSE) + my.x <- density(ballarat_alldata$windspeed, from = 0, to = max(ballarat_alldata$windspeed))$x + my.y <- density(ballarat_alldata$windspeed, from = 0, to = max(ballarat_alldata$windspeed))$y + df <- data.frame(x = my.x, y = my.y) + my.nls <- nls(y ~ (a/b) * (x/b)^(a-1) * exp(- (x/b)^a), + data = df[df$x > 0, ], + start = c(a = ballarat.wb[13,2], b = ballarat.wb[13,1])) + xValues <- seq(from = 0, to = 40, length.out = 100) + my.predicted <- predict(my.nls, data.frame(x = xValues)) + my.coef <- coef(my.nls) + my.weibull.predict <- function(x, a, b) { + y <- (a/b) * (x/b)^(a-1) * exp(- (x/b)^a) + return(y)} + return(c(ballarat.parameter = my.coef[1], ballarat.scale= my.coef[2], + my29 = my.weibull.predict(29, my.coef[1], my.coef[2])))}
I am not sure this can calculate density of the target speed in each day.. Could you please check whether it fits with my intention? The primary concern of "multiple.function" is that weibull distribution codes should be created differently based on each day's different data. This is missing in the codes.
For example, in "createSet(height=10, v.avg=ballarat_alldata[,2], dir.avg=ballarat_alldata[,3])", v.avg and dir.avg should not contain all datasets for the calculation but I do not know how.
I am still a beginner in R so I apologise about my questions in advance. The questions might be too specific.. Please help me finding the way to solve my problems!
Regards,
Kangmin.

Resources