Initially I was using poissrnd command to generate Poisson distributed numbers but I had no info on how to make them 'arrive' in my code. So I decided to generate the inter-arrival times. I do that as below.
t=exprnd(1/0.1);
for i=1:5
t=t+exprnd(1/0.1);
end
%t is like 31.3654 47.1014 72.0024 77.5162 102.3227 104.5794
%Even if this way of producing arrival times is wrong, still my question remains the same
All is ok to see but how can I actually use these times in my code to say that yes, arrival number 1 occuring at 31.3654 time, then arrival number 2 at 47.1014 time,etc. As ultimately I have to take an arrival, do some action, then receive another call. I cannot use a loop to increment through such varying numbers (even if i use ceil()).
So, what do people mean when they say they generated arrivals using Poisson distbn. How do they make use of the Posson distbn.? Because, the code won't know if a number is from Poisson or Rand distbn? I am tired of thinking an answer to this question. Please suggest something.
Related
I'm trying to compute ( MACD - signal ) / signal of prices of Russel 1000 (which is an index of the 1000 US large cap stocks). I keep getting this error message and simply couldn't figure out why :
Error in EMA(c(49.85, 48.98, 48.6, 49.15, 48.85, 50.1, 50.85, 51.63, 53.5, :n = 360 is outside valid range: [1, 198]
I'm still relatively new in R although I'm proficient in Python. I suppose I could've used "try" to just work around this error, but I do want to understand at least what the cause of it is.
Without further ado, this is the code :
N<-1000
DF_t<- data.frame(ticker=rep("", N), macd=rep(NA,N),stringsAsFactors=FALSE)
stock<-test[['Ticker']]
i<-0
for (val in stock){dfpx=bdh(c(val), c("px_last"),start.date=as.Date("2018-1-
01"),end.date=as.Date("2019-12-30"))
macd<- MACD( dfpx[,"px_last"], 60, 360, 45, maType="EMA")
num<-dim(macd)[1]
ma<-(macd[num,][1]-macd[num,][2])/macd[num,][2]
i=i+1
DF_t[i,]<-list(val,ma)
}
For your information,bdh() is a Bloomberg command to fetch historic data.dfpx[] is a dataframe.MACD() is a function that takes a time series of prices and outputs a matrix,where the first column are the MACD values and the second column are the signal values.
Thank you very much! Any advice would be really appreciated. Btw, the code works with a small sample of a few stocks but it will cause the error message when I try to apply it to the universe of one thousand stocks. In addition, the number of data points is about 500, which should be large enough for my setup of the parameters to compute MACD.
Q : "...and error handling"
If I may add a grain of salt onto this, the error-prevention is way better than any ex-post error-handling.
For this, there is a cheap, constant O(1) in both [TIME]- and [SPACE]-Domains step, that principally prevents any such error-related crashes :
Just prepend to the instantiated the vector of TimeSERIES data with that many constant and process-invariant value cells, that make it to the maximum depth of any vector-processing, and any such error or exception is principally avoided :
processing-invariant value, in most cases, is the first know value, to be repeated that many times, as needed back, in the direction of time towards older ( not present ) bars ( yes, not relying on NaN-s and how NaN-s might get us in troubles in methods, that are sensitive to missing data, which was described above. Q.E.D. )
For those who are interested, I have found the cause of the error: some stocks have missing prices. It's simple like that. For instance, Dow US Equity has only about 180 daily prices (for whatever reason) over the past one and a half years, which definitely can't be used to compute a moving average of 360 days.
I basically ran small samples till I eventually pinpointed what caused the error message. Generally speaking, unless you are trying to extract data of above 6,000 stocks or so and you are querying say 50 fields, you are Okay. A rule of thumb for daily usage limit of Bloomberg is said to be around 500,000 for a school console. A PhD colleague working in a trading firm also told me professional consoles of Bloomberg are more forgiving.
I am totally new to R. Hopefully you can help. I am trying to simulate from a Hawkes process using R. The main idea is that-first of all I simulated some events from a homogeneous Poisson process. Then each of these events will create their own children using a non homogeneous Poisson process. The code is like as below:
SimulateHawkesprocess<-function(n,tmax,lambda,lambda2){
times<-Simulatehomogeneousprocess(n,lambda)
count<-1
while(count<n){
newevent<-times[count] + Simulateinhomogeneousprocess(lambda2,tmax,lambdamax=NA)
times<-c(times,newevent)
count<-count+1
n<-length(times)
}
return(times)
}
But the r code is producing this infinite loop(probably because of the last line: (n<-length(times))). How can I overcome this problem? How can I put a stopping condition?
This is not a R specific problem. You need to get your algorithm working correctly first. Compare the code you have written against what you want to do. If you need help with the algorithm then tag the question as such. Moreover the function call to Simulateinhomogeneousprocess is very inconsistent. Some insight into that function would help. What is that function returning, a number or a vector?
Within the loop you are increasing the value of n by at least 1 each time so you never reach the end.
newevent<-times[count] + Simulateinhomogeneousprocess(lambda2,tmax,lambdamax=NA)
This creates a non empty variable
times<-c(times,newevent)
Increases the "times" vector by at least 1 (since newevent is non-empty)
count<-count+1
n<-length(times)
You increase the count by 1 but also increase the n value by atleast 1 thus creating a never ending loop. One of these things has to change for the loop to stop.
I spent more than two months with RRDTOOL to find out how to store and visualize data on graph. I'm very close now to my goal, but for some reason I don't understand why it is happening that some data are considered to be NaN in my case.
I counting lines in gigabytes sized of log files and have feeding the result to an rrd database to visualize events occurrence. The stepping of the database is 60 seconds, the data is inserted in seconds base whenever it is available, so no guarantee the the next timestamp will be withing the heartbeat or within the stepping. Sometimes no data for minutes.
If have such big distance mostly my data is considered to be NaN.
b1_5D.rrd
1420068436:1
1420069461:1
1420073558:1
1420074583:1
1420076632:1
1420077656:1
1420079707:1
1420080732:1
1420082782:1
1420083807:1
1420086881:1
1420087907:1
1420089959:1
1420090983:1
1420094055:1
1420095080:1
1420097132:1
1420098158:1
1420103284:1
1420104308:1
1420107380:1
1420108403:1
1420117622:1
1420118646:1
1420121717:1
1420122743:1
1420124792:1
1420125815:1
1420131960:1
1420134007:1
1420147326:1
1420148352:1
rrdtool create b1_4A.rrd --start 1420066799 --step 60 DS:Value:GAUGE:120:0:U RRA:AVERAGE:0.5:1:1440 RRA:AVERAGE:0.5:10:1008 RRA:AVERAGE:0.5:30:1440 RRA:AVERAGE:0.5:360:1460
The above gives me an empty graph for the input above.
If I extend the heart beat, than it will fill the time gaps with the same data. I've tried to insert zero values, but that will average out the counts and bring results in mils.
Maybe I taking something wrong regarding RRDTool.
It would be great if someone could explain what I doing wrong.
Thank you.
It sounds as if your data - which is event-based at irregular timings - is not suitable for an RRD structure. RRD prefers to have its data at constant, regular intervals, and will coerce the incoming data to match its requirements.
Your RRD is defined to have a 60s step, and a 120s heartbeat. This means that it expects one sample every 60s, and no further apart than 120s.
Your DS is a gauge, and so the values you enter (all of them '1' in your example) will be the values stored, after any time normalisation.
If you increase the heartbeat, then a value received within this time will be used to make a linear approximation to fill in all samples since the last one. This is why doing so fills the gaps with the same data.
Since your step is 60s, the smallest sample time sidth will be 1 minute.
Since you are always storing '1's, your graph will therefore either show '1' (when the sample was received in the heartbeart window) or Unknown (when the heartbeat expired).
In other words, your graph is showing exactly what you gave it. You data are being coerced into a regular set of numerical values at a 1-minute step, each being 1 or Unknown.
I have a "succeeded" metric that is just the timestamp. I want to see the time between successive successes (this is how long the data is stale for). I have
derivative(Success)
but I also want to know how long between the last success time and the current time. since derivative transforms xs[n] to xs[n+1] - xs[n], the "last" delta doesn't exist. How can I do this? Something like:
derivative(append(Success, now()))
I don't see any graphite functions for appending series, and I don't see any user-defined graphite functions.
The general problem is to be alerted when the data is stale, via graphite monitoring. There may be a better solution than the one I'm thinking about.
identity is a function whose value at any given time is the timestamp of that time.
keepLastValue is a function that takes a series and replicates data points forward over gaps in the data.
So then diffSeries(identity("now"), keepLastValue(Success)) will be a "sawtooth" series that climbs steadily while Success isn't updated, and jumps down to zero (or close to it — there might be some time skew) every time Success has a data point. If you use graphite monitoring to get the current value of that expression and compare it to some threshold, it will probably do what you want.
I am currently attempting to implement a trading idea that I have been playing around with. It consists of 50+ securities and has a strategy very similar to this one. (Current package I am using is quantmod).
http://www.r-bloggers.com/backtesting-a-simple-stock-trading-strategy/
For those who aren't interested in clicking, it is a strategy that will look at the pass X days( in his case 200 ) and enter a position depending on the peak reached in the stock. I understand how to do this strategy for my idea, but I cannot grasp how to aggregate my data into one summary.
Is there a way I can consolidate the summary for all the positions I have entered into one larger portfolio summary and chart that against the S&P 500?
Any advice on where I can find resources or being lead to the information. I have looked at portfolio analysis package for R and I do not believe that will be much help to me.
Thank you in advance.
Edit: In the link, at the bottom, there are 3 indexes that are FTSE, N225, DJIA. Could i combine those 3 summaries to show the same output as below, BUT combined
FTSE:
Me Index
Cumulative Return 3.56248582 3.8404476
Annual Return 0.05667121 0.0589431
Annualized Sharpe Ratio 0.45907768 0.3298633
Win % 0.53216374 0.5239884
Annualized Volatility 0.12344579 0.1786895
Maximum Drawdown -0.39653398 -0.5256991
Max Length Drawdown 1633.00000 2960.0000
Could I get that same output but for the 3 securities data combined? Is there a effective way of doing that. Thank you so much. Happy holidays
It's a little unclear to me what you mean by "combine" in this case. If you want a single column representing the combined returns from all three exchanges as if they were a single unified market, that's really tricky, because the exchanges trade in different currencies (British pounds; U.S. dollars, Japanese Yen, etc.). The underlying analysis would have to be modified substantially to take into account fluctuating daily foreign exchange rates.
I suspect that this is NOT want you want. Rather, you are simply asking how to take three sequential two-column outputs and turn them into a single parallel six-column output.
If that is indeed what you want, then you need to rewrite the testStrategy() function shown near the bottom of the link. As it's currently written, that function takes three inputs: an index name myStock (with allowed values of FTSE, DJIA, or N225), and two integer values, nHold and nHigh. You would need to change it so that it instead accepts five inputs; e.g., myStockA, myStockB and myStockC, plus the two integer values already mentioned. Then each of the lines currently referring to myStock would have to be replicated three times. Finally, the two cbind() lines that you see at the bottom would have to be modified so that instead of merging the data together into only two columns, you include all six.
For a good intro tutorial on how to write and modify your own R functions, please see this. To understand how to use the cbind() function, which you will have to call with six rather than two inputs, please see this.