Lubridate fix the time units - r

When we take the difference of two time, there is something going on automatically there in units.
> ymd_hms("2016-05-09 15:17:03") - ymd_hms("2016-05-09 15:17:04")
Time difference of -1 secs
> ymd_hms("2016-05-09 16:17:03") - ymd_hms("2016-05-09 15:17:04")
Time difference of 59.98333 mins
> ymd_hms("2016-05-10 16:17:03") - ymd_hms("2016-05-09 15:17:04")
Time difference of 1.041655 days
How can I fix the units without using difftime function.
So I can do the following:
VECTOR = c(ymd_hms("2016-05-10 16:17:03"),
ymd_hms("2016-05-10 17:19:33"),
ymd_hms("2016-05-10 19:55:03")
)
diffs = diff(VECTOR)
IntervalsInHours = toHours(diffs)
Additionally, is there any way to know the units being used in a lubridate time object. For example,
> ymd_hms("2016-05-09 15:17:03") - ymd_hms("2016-05-09 15:17:04")
Time difference of -1 secs
The units used here are seconds.

"you want to use diff function to take the time differences between a VECTOR elements, only in the units specified"
pls try below code : (by int_diff function)
> VECTOR = c(ymd_hms("2016-05-10 16:17:03"),
+ ymd_hms("2016-05-10 16:17:04"),
+ ymd_hms("2016-05-10 17:19:33"),
+ ymd_hms("2016-05-10 19:55:03")
+ )
> as.numeric(int_diff(VECTOR))
[1] 1 3749 9330
> round(as.numeric(int_diff(VECTOR))/3600,2)
[1] 0.00 1.04 2.59
see, whatever the time interval min unit is seconds or not, it is always scaled by seconds as below.
> VECTOR = c(ymd_hms("2016-05-10 16:17:03"),
+ #ymd_hms("2016-05-10 16:17:04"),
+ ymd_hms("2016-05-10 17:19:33"),
+ ymd_hms("2016-05-10 19:55:03")
+ )
> as.numeric(int_diff(VECTOR))
[1] 3750 9330
> round(as.numeric(int_diff(VECTOR))/3600,2)
[1] 1.04 2.59

please try below to transform the time difference into hours.
library(lubridate)
x=ymd_hms("2016-05-09 16:17:03")
y=ymd_hms("2016-05-19 15:17:04")
diffs=as.duration(x-y)
IntervalsInHours=as.numeric(abs(diffs))/3600;IntervalsInHours

or you can use this way:
library(lubridate)
x=ymd_hms("2016-05-09 16:17:03")
y=ymd_hms("2016-05-19 16:17:04")
diffs=as.duration(x-y);
IntervalsInHours=abs(diffs)/dhours(1);IntervalsInHours

I wrote two functions in case anyone might find useful.
timeDiffUnitConvert = function(Diffs, to="day", roundingN = 1){
if(to == "day"){
R = round(as.numeric(as.duration(Diffs))/3600/24,roundingN)
} else if (to == "hour") {
R = round(as.numeric(as.duration(Diffs))/3600, roundingN)
} else if (to == "min") {
R = round(as.numeric(as.duration(Diffs))/60, roundingN)
} else if (to == "sec"){
R = round(as.numeric(as.duration(Diffs)), roundingN)
} else {
stop("to which unit? it must be `day`, `hour`, `min` or `sec`.")
}
R
}
timeDiffVector = function(TimeVector, to="day", roundingN = 1, attachNaMode = "none"){
R = timeDiffUnitConvert(diff(TimeVector), to = to, roundingN = roundingN)
if(attachNaMode == "leading"){
R = c(NA,R)
} else if(attachNaMode == "trailing"){
R = c(R,NA)
} else{
stop("check your attachNaMode: shall be either `leading` or `trailing`")
}
R
}

Related

Time formatting: how to write a While loop that operates for a whole minute?

I have written the following function:
iterations_per_minute = function() {
Sys.setenv(TZ='GMT+5') ## This line is optional; it just sets my timezone
final_instant = as.numeric(format(Sys.time(), "%H.%M")) + 0.01
counter = 0
while(as.numeric(format(Sys.time(), "%H.%M")) < final_instant) {
counter = counter + 1
}
return(counter)
}
You can infer from the code what the function does, but allow me to explain in lay words anyway: what number can you reach by counting as fast as possible during one minute starting from the number 1? Think of the computer doing exactly that same thing. The output of this function is the number that the computer reaches after counting for a whole minute.
Or at least that is how I would like this function to behave. It does work the way I have described if we pair exactly the call to the function with the beginning of a minute in the clock. However, it will count for less than a minute if we execute this function when the second hand of the clock is pointing at any other number besides twelve. How do I fix this?
Also, I figure I probably won't get the desired output if I execute the function between 23:59 and 0:00. How can this other issue be fixed?
Seems to me like you're trying to introduce more moving parts than you need.
Consider the following:
a <- Sys.time()
a
# [1] "2020-07-25 16:21:40 CDT"
a + 60
# [1] "2020-07-25 16:22:40 CDT"
So, we can just add 60 to Sys.time() without worrying about conversions or whatever else:
iterations_per_minute = function() {
counter = 0
end <- Sys.time() + 60
while( Sys.time() < end ) {
counter = counter + 1
}
return(counter)
}
Using this function, apparently my machine can count to 1474572 in one minute.

Why "c" is equal to 1000 here?

This loop is going over all the values of i in range(92:1000) and whichever value of i is holding the condition true it is breaking the loop by setting that value of i in c and when i am trying to run this code block in R language it is giving me c=1000.
> c=0
> for (i in range(92:1000)){
+ if(dpois(i,94.32)<=dpois(5,94.32))
+ {c=i;
+ break;
+ }
+ }
> c
[1] 1000
But what i expected it should give value of c=235 as at i=235 as:--
> dpois(235,94.32)
[1] 2.201473e-34
> dpois(5,94.32)
[1] 6.779258e-34
> dpois(235,94.32)<=dpois(5,94.32)
[1] TRUE
And it should break whenever the condition is true for the first time.
Where am i going wrong ?
In R, range computes the range of the given data, i.e. the minimum and maximum
> range(92:1000)
[1] 92 1000
Also, using c as a variable name is very bad practice in R. Since c is an intrinsic function used to define vectors.
The following gives the expected answer
> c0=0
> for (i in 92:1000){
+ if(dpois(i,94.32)<=dpois(5,94.32))
+ {
+
+ c0=i
+ break
+
+ }
+ }
> c0
[1] 234

Calculate RSI indicator according to tradingview?

I would like to calculate RSI 14 in line with the tradingview chart.
According to there wiki this should be the solution:
https://www.tradingview.com/wiki/Talk:Relative_Strength_Index_(RSI)
I implemented this is in a object called RSI:
Calling within object RSI:
self.df['rsi1'] = self.calculate_RSI_method_1(self.df, period=self.period)
Implementation of the code the calculation:
def calculate_RSI_method_1(self, ohlc: pd.DataFrame, period: int = 14) -> pd.Series:
delta = ohlc["close"].diff()
ohlc['up'] = delta.copy()
ohlc['down'] = delta.copy()
ohlc['up'] = pd.to_numeric(ohlc['up'])
ohlc['down'] = pd.to_numeric(ohlc['down'])
ohlc['up'][ohlc['up'] < 0] = 0
ohlc['down'][ohlc['down'] > 0] = 0
# This one below is not correct, but why?
ohlc['_gain'] = ohlc['up'].ewm(com=(period - 1), min_periods=period).mean()
ohlc['_loss'] = ohlc['down'].abs().ewm(com=(period - 1), min_periods=period).mean()
ohlc['RS`'] = ohlc['_gain']/ohlc['_loss']
ohlc['rsi'] = pd.Series(100 - (100 / (1 + ohlc['RS`'])))
self.currentvalue = round(self.df['rsi'].iloc[-1], 8)
print (self.currentvalue)
self.exportspreadsheetfordebugging(ohlc, 'calculate_RSI_method_1', self.symbol)
I tested several other solution like e.g but non return a good value:
https://github.com/peerchemist/finta
https://gist.github.com/jmoz/1f93b264650376131ed65875782df386
Therefore I created a unittest based on :
https://school.stockcharts.com/doku.php?id=technical_indicators:relative_strength_index_rsi
I created an input file: (See excel image below)
and a output file: (See excel image below)
Running the unittest (unittest code not included here) should result in but is only checking the last value.
if result == 37.77295211:
log.info("Unit test 001 - PASSED")
return True
else:
log.error("Unit test 001 - NOT PASSED")
return False
But again I cannot pass the test.
I checked all values by help with excel.
So now i'm a little bit lost.
If I'm following this question:
Calculate RSI indicator from pandas DataFrame?
But this will not give any value in the gain.
a) How should the calculation be in order to align the unittest?
b) How should the calculation be in order to align with tradingview?
Here is a Python implementation of the current RSI indicator version in TradingView:
https://github.com/lukaszbinden/rsi_tradingview/blob/main/rsi.py
I had same issue in calculating RSI and the result was different from TradingView,
I have found RSI Step 2 formula described in InvestoPedia and I changed the code as below:
N = 14
close_price0 = float(klines[0][4])
gain_avg0 = loss_avg0 = close_price0
for kline in klines[1:]:
close_price = float(kline[4])
if close_price > close_price0:
gain = close_price - close_price0
loss = 0
else:
gain = 0
loss = close_price0 - close_price
close_price0 = close_price
gain_avg = (gain_avg0 * (N - 1) + gain) / N
loss_avg = (loss_avg0 * (N - 1) + loss) / N
rsi = 100 - 100 / (1 + gain_avg / loss_avg)
gain_avg0 = gain_avg
loss_avg0 = loss_avg
N is the number of period for calculating RSI (by default = 14)
the code is put in a loop to calculate all RSI values for a series.
For those who are experience the same.
My raw data contained ticks where the volume is zero. Filtering this OLHCV rows will directly give the good results.

R - vectorised conditional replace

Hi I'm trying manipulate a list of numbers and I would like to do so without a for loop, using fast native operation in R. The pseudocode for the manipulation is :
By default the starting total is 100 (for every block within zeros)
From the first zero to next zero, the moment the cumulative total falls by more than 2% replace all subsequent numbers with zero.
Do this far all blocks of numbers within zeros
The cumulative sums resets to 100 every time
For example if following were my data :
d <- c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3,3,5,0,0,0,-1,-1,-1,-1);
Results would be :
0 0 0 1 3 4 5 -1 2 3 -5 0 0 0 -2 -3 0 0 0 0 0 -1 -1 -1 0
Currently I have an implementation with a for loop, but since my vector is really long, the performance is terrible.
Thanks in advance.
Here is a running sample code :
d <- c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3,3,5,0,0,0,-1,-1,-1,-1);
ans <- d;
running_total <- 100;
count <- 1;
max <- 100;
toggle <- FALSE;
processing <- FALSE;
for(i in d){
if( i != 0 ){
processing <- TRUE;
if(toggle == TRUE){
ans[count] = 0;
}
else{
running_total = running_total + i;
if( running_total > max ){ max = running_total;}
else if ( 0.98*max > running_total){
toggle <- TRUE;
}
}
}
if( i == 0 && processing == TRUE )
{
running_total = 100;
max = 100;
toggle <- FALSE;
}
count <- count + 1;
}
cat(ans)
I am not sure how to translate your loop into vectorized operations. However, there are two fairly easy options for large performance improvements. The first is to simply put your loop into an R function, and use the compiler package to precompile it. The second slightly more complicated option is to translate your R loop into a c++ loop and use the Rcpp package to link it to an R function. Then you call an R function that passes it to c++ code which is fast. I show both these options and timings. I do want to gratefully acknowledge the help of Alexandre Bujard from the Rcpp listserv, who helped me with a pointer issue I did not understand.
First, here is your R loop as a function, foo.r.
## Your R loop as a function
foo.r <- function(d) {
ans <- d
running_total <- 100
count <- 1
max <- 100
toggle <- FALSE
processing <- FALSE
for(i in d){
if(i != 0 ){
processing <- TRUE
if(toggle == TRUE){
ans[count] <- 0
} else {
running_total = running_total + i;
if (running_total > max) {
max <- running_total
} else if (0.98*max > running_total) {
toggle <- TRUE
}
}
}
if(i == 0 && processing == TRUE) {
running_total <- 100
max <- 100
toggle <- FALSE
}
count <- count + 1
}
return(ans)
}
Now we can load the compiler package and compile the function and call it foo.rcomp.
## load compiler package and compile your R loop
require(compiler)
foo.rcomp <- cmpfun(foo.r)
That is all it takes for the compilation route. It is all R and obviously very easy. Now for the c++ approach, we use the Rcpp package as well as the inline package which allows us to "inline" the c++ code. That is, we do not have to make a source file and compile it, we just include it in the R code and the compilation is handled for us.
## load Rcpp package and inline for ease of linking
require(Rcpp)
require(inline)
## Rcpp version
src <- '
const NumericVector xx(x);
int n = xx.size();
NumericVector res = clone(xx);
int toggle = 0;
int processing = 0;
int tot = 100;
int max = 100;
typedef NumericVector::iterator vec_iterator;
vec_iterator ixx = xx.begin();
vec_iterator ires = res.begin();
for (int i = 0; i < n; i++) {
if (ixx[i] != 0) {
processing = 1;
if (toggle == 1) {
ires[i] = 0;
} else {
tot += ixx[i];
if (tot > max) {
max = tot;
} else if (.98 * max > tot) {
toggle = 1;
}
}
}
if (ixx[i] == 0 && processing == 1) {
tot = 100;
max = 100;
toggle = 0;
}
}
return res;
'
foo.rcpp <- cxxfunction(signature(x = "numeric"), src, plugin = "Rcpp")
Now we can test that we get the expected results:
## demonstrate equivalence
d <- c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3,3,5,0,0,0,-1,-1,-1,-1)
all.equal(foo.r(d), foo.rcpp(d))
Finally, create a much larger version of d by repeating it 10e4 times. Then we can run the three different functions, pure R code, compiled R code, and R function linked to c++ code.
## make larger vector to test performance
dbig <- rep(d, 10^5)
system.time(res.r <- foo.r(dbig))
system.time(res.rcomp <- foo.rcomp(dbig))
system.time(res.rcpp <- foo.rcpp(dbig))
Which on my system, gives:
> system.time(res.r <- foo.r(dbig))
user system elapsed
12.55 0.02 12.61
> system.time(res.rcomp <- foo.rcomp(dbig))
user system elapsed
2.17 0.01 2.19
> system.time(res.rcpp <- foo.rcpp(dbig))
user system elapsed
0.01 0.00 0.02
The compiled R code takes about 1/6 the time the uncompiled R code taking only 2 seconds to operate on the vector of 2.5 million. The c++ code is orders of magnitude faster even then the compiled R code requiring just .02 seconds to complete. Aside from the initial setup, the syntax for the basic loop is nearly identical in R and c++ so you do not even lose clarity. I suspect that even if parts or all of your loop could be vectorized in R, you would be sore pressed to beat the performance of the R function linked to c++. Lastly, just for proof:
> all.equal(res.r, res.rcomp)
[1] TRUE
> all.equal(res.r, res.rcpp)
[1] TRUE
The different functions return the same results.

script to get average based on timestamps

I have two fields in my text file which are
timestamp number
The format of timestamp is hh:mm:ss.mmm
some sample records are
18:31:48.345 0.00345
18:31:49.153 0.00123
18.32:23.399 0.33456
I want to print out averages of records which are no more than 30 second apart. what is a good and fast way of doing it
Here is a starting point in awk. I know you can optimize code better.
count == 0 { startTime = timeToSeconds($1) }
{ currentTime = timeToSeconds($1)
elapsedTime = currentTime - startTime
if (elapsedTime > 30.0) {
calculateAverage()
startTime = timeToSeconds($1)
}
print
sum += $2
count++
}
END { calculateAverage() }
function timeToSeconds(timeString) {
# Convert a time string to number of seconds
split(timeString, tokens, ":")
seconds = tokens[1]*3600.0 + tokens[2]*60.0 + tokens[3]
return seconds
}
function calculateAverage() {
# Use & modify global vars: count, sum
average = sum / count
printf "Average: %.4g\n\n", average
sum = 0.0; count = 0
}
I would start by using some scripting language that has built-in date/time 'operations'. For instance, in Ruby you could easily do:
require 'time'
t,n = gets.chomp.split(/\s+/)
ts1 = Time.parse(t)
# ...
t,n = gets.chomp.split(/\s+/)
ts2 = Time.parse(t)
Which now allows you to do things like:
diff = ts2 - ts1
if diff > 30
# difference is greater than 30 seconds
end
Ruby Time objects can be used in context (float, int, String, etc) so it is trivial to start doing calculations as if the parsed dates are actually numeric values.

Resources