I am trying to store the results of reqHistoricalData into a dictionary of dataframes, where each key is the ticker name and the value is the corresponding time series data frame.
def historical_data_handler(msg):
global hist, df
if "finished" in msg.date:
df = pd.DataFrame(index=np.arange(0, len(hist)), columns=('date', 'open', 'high', 'low', 'close', 'volume'))
for index, msg in enumerate(hist):
df.loc[index,'date':'volume'] = datetime.strptime(msg.date, '%Y%m%d %H:%M:%S'), msg.open, msg.high, msg.low, msg.close, msg.volume
else:
hist.append(msg)
def error_handler(msg):
print(msg)
con = ibConnection(port=7496,clientId=75)
con.register(historical_data_handler, message.historicalData)
con.register(error_handler, message.Error)
con.connect()
print("Connection to IB is ", con.isConnected(), " and starting data pull")
contracts = ["SPY", "AAPL"]
result = {}
for ct in contracts:
hist = []
spec = Contract()
spec.m_symbol = ct
spec.m_secType = 'STK'
spec.m_exchange = 'SMART'
spec.m_currency = 'USD'
spec.m_expiry = '' # For futures
con.reqHistoricalData(0, spec, '', '2 D', '5 mins', 'TRADES', 0, 1)
print(spec.m_symbol, hist, df)
result[ct] = df
print("Connection is terminated ", con.disconnect(), " after finishing pulling data")
The behavior of the code is not what I would expect. When I looking at my "result" dictionary. The values are the same across "SPY" and "APPL". I think there is something wrong with how I am defining the global variables as they dont seem to be updating properly in the for loop.
Any help would be greatly appreciated, thanks!
When I look at the head of the two dataframes stored in "result" (they are the same):
[326 rows x 6 columns]
SPY date open high low close volume
0 2018-02-09 04:00:00 261.08 261.16 260.92 260.99 68
1 2018-02-09 04:05:00 260.99 261 260.86 260.99 59
[326 rows x 6 columns]
AAPL date open high low close volume
0 2018-02-09 04:00:00 261.08 261.16 260.92 260.99 68
1 2018-02-09 04:05:00 260.99 261 260.86 260.99 59
Just looking at the data, it seems you have about twice as many rows as you should for 2 days. The hist = [] is run twice in succession before the replies from IB are processed. So you must be getting all the data for SPY and AAPL in one list.
Try clearing hist in the finished branch of the data handler. I think that's where you should ask for the next contract as well. You can then put a sleep(10) to avoid pacing violations if you ever ask for more data.
Related
Probably due to my limited knowledge of communicating with APIs, (Which I am trying to remedy :) ) I seem to be unable to execute a put request for more than 1 row of a dataframe at a time. for example, if df_final consists of 1 row, the following code works. If there are multiple rows, it fails and I get a 400 status.
reqBody <- list(provName = df_final$Provider,site = df_final$Site,
monthJuly = df_final$July, monthAugust = df_final$August,
monthSeptember = df_final$September, monthOctober =df_final$October,
monthNovember = df_final$November ,
monthDecember = df_final$December, monthJanuary = df_final$January, monthFebruary = df_final$February,
monthMarch = df_final$March, monthApril = df_final$April, monthMay = df_final$May,
monthJune = df_final$June,
assumptions = paste("Monthly Volume:", input$Average, "; Baseline Seasonality:", input$Year, "; Trend:", input$Year_slopes),
rationale = as.character(input$Comments), fiscalYear = FY_SET, updateDtm = Sys.time())
r <- PUT(fullURL, body = reqBody, encode = "json", content_type_json())
Using with_verbose() I am able to see that the json being sent is formatted differently for the 2 cases. I haven't found anything in the documentation ( https://cran.r-project.org/web/packages/httr/httr.pdf) that has been particularly helpful in overcoming this.
The format it appears to be sending out in the first instance (1 row in the data frame) Looks like this:
{"provName":"Name","site":"site","monthJuly":56,"monthAugust":71,"monthSeptember":65,"monthOctober":78,"monthNovember":75,"monthDecember":98,"monthJanuary":23,"monthFebruary":39,"monthMarch":38,"monthApril":42,"monthMay":57,"monthJune":54,"assumptions":"Monthly Volume: Last 3 Months of 2019 ; Baseline Seasonality: 2017 ; Trend: 2017","rationale":"","fiscalYear":2022,"updateDtm":"2023-02-03 15:19:40"}
and again, it works sans issues.
With 2 rows I get the following format:
{"provName":["Name","Name"],"site":["site","site"],"monthJuly":[56,56],"monthAugust": [71,71],"monthSeptember":[65,65],"monthOctober":[78,78],"monthNovember":[75,75],"monthDecember": [98,98],"monthJanuary":[23,23],"monthFebruary":[39,39],"monthMarch":[38,38],"monthApril": [42,42],"monthMay":[57,57],"monthJune":[54,54],"assumptions":["Monthly Volume: Last 3 Months of 2019 ; Baseline Seasonality: 2017 ; Trend: 2017","Monthly Volume: Last 3 Months of 2019 ; Baseline Seasonality: 2017 ; Trend: 2017"],"rationale":["",""],"17":2,"18":2}
And it fails with status 400.
I suppose I could use lapply and PUT for each row, however with thousands of rows in a dataframe, I think that would be less than ideal.
Anyone have any light to share on this?
Any help would be greatly appreciated!
PS: this didn't really answer my question
R httr put requets
and as I mentioned, Doing something like this is not ideal:
Convert each data frame row to httr body parameter list without enumeration
Looks like you are using a list as the request body. Use a data frame instead.
Lists and data frames get serialized to JSON differently:
jsonlite::toJSON(list(x = 1:2, y = 3:4))
#> {"x":[1,2],"y":[3,4]}
jsonlite::toJSON(data.frame(x = 1:2, y = 3:4))
#> [{"x":1,"y":3},{"x":2,"y":4}]
I am trying to create an indicator that has a dynamic n that changes each day. Basically I am making a strategy that enters a trade when a stocks price reaches its all time highest price.
The best way I can think to do this is by using a Donchian Chanel and entering when the closing price is equal to or greater than all previous DC highs. To do this I need:
n = (Current date of algo - start date).
This way the indicator will start working from day 1 and it won't "forget" about previous highs as the strategy runs through years of data. The problem I am having is that I don't know how to write a code/function that will express the current date of strategy in a way that I can turn it into a simple calculation. The best code I can come up with is:
##Problem in line below##
dcn <- difftime(initdate, as.Date(datePos), units = c("days"))
### This part will work fine once dcn is working
BuySig<-function(price,DC...)
{ifelse(price=>DC,1,0)}
add.indicator(strategy=strategyname,name="DonchianChannel",
arguments=list(HL=quote(mktdata$Close),n=dcn),label="DC")
dcn of course is going to be my Donichan Channel n. The problem I am having is that no matter what I try to use in place of as.Date(datePos) it keeps telling me "object 'datePos' not found". I have tried using other things that I specify earlier in my code such as: Dates, timestamp.
Any advice would be really helpful.
You can't use DonchianChannel with an n that varies. n must be a fixed integer for that function. You need to create your own function that trades 'highest highs' since the start of your data set.
This achieves what you want; just make a function out of it and supply it as a function for add.indicator
library(quantmod)
getSymbols("SPY")
SPY_max <- runMax(Cl(SPY), n = 1, cumulative = TRUE)
SPY$all_time_high <- Cl(SPY) >= SPY_max
chart_Series(SPY["2018/", 1:4])
tail(SPY[SPY$all_time_high == 1,], 10)
# SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted all_time_high
# 2018-01-19 279.80 280.41 279.14 280.41 140920100 273.9762 1
# 2018-01-22 280.17 282.69 280.11 282.69 91322400 276.2038 1
# 2018-01-23 282.74 283.62 282.37 283.29 97084700 276.7901 1
# 2018-01-25 284.16 284.27 282.40 283.30 84587300 276.7998 1
# 2018-01-26 284.25 286.63 283.96 286.58 107743100 280.0046 1
# 2018-08-24 286.44 287.67 286.38 287.51 57487400 283.3048 1
# 2018-08-27 288.86 289.90 288.68 289.78 57072400 285.5416 1
# 2018-08-28 290.30 290.42 289.40 289.92 46943500 285.6796 1
# 2018-08-29 290.16 291.74 289.89 291.48 61485500 287.2167 1
# 2018-09-20 292.64 293.94 291.24 293.58 100360600 289.2860
When the column all_time_high returns 1, you're at an all time high for the time series in question.
I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).
I have data.table named data like this:
> head(data)
start end unit
1: 2008-11-17 2007-01-23 ADM 2-05
2: 2008-12-29 2007-01-06 BOB 4-07
3: 2008-12-31 2007-01-01 DAT15-02
4: 2008-12-31 2010-01-01 DAT15-06
5: 2008-12-31 2010-01-02 TUW 4-09
6: 2008-12-31 2010-01-02 BEG 5-01
With data types as follows:
sapply(dane, class)
start end unit
"Date" "Date" "character"
I'm trying to debug this line:
data[,
list(date = format(seq(from = start, to = end, by = "1 day"), "%Y-%m-%d")),
by = list(start, end, unit)
]
Then I get error message:
Error in del/by : non-numeric argument to binary operator
I figured out, that the error is caused by conversion to numeric which takes place when I pass something as argument to the list in 'by'.
So this modified code works:
dane[,
list(date = format(seq(
from = as.Date(start, origin = "1970-01-01"),
to = as.Date(end, origin = "1970-01-01"), by = "1 day"),
"%Y-%m-%d")),
by = list(start, end, unit)
]
This looks like a bug in data.table package. I wonder if anybody knows about this one.
Thanks in advance.
This is now fixed in commit #1256 of v1.9.3. From NEWS:
o Using by columns with attributes (ex: factor, Date) in j did not retain the attributes, also in case of :=. This was partially a regression from an earlier fix (bug #2531) due to recent changes for R3.1.0. Now fixed and clearer tests added. Thanks to Christophe Dervieux for reporting and to AdamB for reporting here on SO. Closes #5437.
Thanks again for reporting.
I just spent a day debugging some R code only to find that the problem I was having was caused by a missing date in the data returned by Yahoo using getSymbol. At the time I write this Yahoo is returning this:
QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
2014-01-03 87.27 87.35 86.62 86.64 35723700 86.64
2014-01-06 86.66 86.76 86.00 86.32 32073100 86.32
2014-01-07 86.72 87.25 86.56 87.12 25860600 87.12
2014-01-08 87.14 87.55 86.95 87.31 27197400 87.31
2014-01-09 87.63 87.64 86.72 87.02 23674700 87.02
2014-01-13 87.18 87.48 85.68 86.01 48842300 86.01
2014-01-14 86.30 87.72 86.30 87.65 37178900 87.65
2014-01-15 88.03 88.54 87.94 88.37 39835600 88.37
2014-01-16 88.30 88.51 88.16 88.38 31630100 88.38
2014-01-17 88.11 88.37 87.67 87.88 36895800 87.88
which is missing 2014-01-10. That date is returned for other ETFs. I expect that Yahoo will fix the data one of these days (the data is on Google) but for now it is wrong which caused my code some fits.
To address this issue I want to check my data to ensure that there is data for all dates the markets were open. If there's a canned way to do this in some package I'd appreciate info on that but to that end I started writing some code using the timeDate package. However I have ended up with xts index questions I don't understand. The code follows:
library(timeDate)
library(quantmod)
MyZone = "UTC"
Sys.setenv(TZ = MyZone)
YearStart = "1990"
YearEnd = "2014"
currentYear = getRmetricsOptions("currentYear")
dateStart = paste0(YearStart, "-01-01")
dateEnd = paste0(YearEnd, "-12-31")
DayCal = timeSequence(from = dateStart, to = dateEnd, by="day", zone = MyZone)
TradingCal = DayCal[isBizday(DayCal, holidayNYSE())]
testSym = "QQQ"
getSymbols(testSym, src="yahoo", from = dateStart, to = dateEnd)
testData = get(testSym)
head(testData)
tail(testData, n=10)
#Save date range of data being checked
firstIndex = index(testData)[1]
lastIndex = index(testData)[nrow(testData)]
#Create an xts series covering all dates
AllDates = xts(x=rep(1, length.out=length(TradingCal)),
order.by=TradingCal, tzone = MyZone)
head(AllDates)
tail(AllDates)
index(AllDates)[1:20]
index(testData)[1:20]
tzone(AllDates)
tzone(testData)
#Create an xts object that has all dates covered
#by testSym but using calendar I created
CheckData = subset(AllDates, ((index(AllDates)>=firstIndex) &&
(index(AllDates)<=lastIndex))
)
class(index(AllDates))
class(index(testData))
The goal here was to create a 'known good calendar' which I could use to create a simple xts object. With that object I would then check whether every index in that object had a corresponding index in the data being tested. However I'm not getting that far as it appears my indexes are not compatible. When I run the code I get this at the end:
> CheckData = subset(AllDates, ((index(AllDates)>=firstIndex) && (index(AllDates)<=lastIndex))
+ )
Error in `>=.default`(index(AllDates), firstIndex) :
comparison (5) is possible only for atomic and list types
> class(index(AllDates))
[1] "timeDate"
attr(,"package")
[1] "timeDate"
> class(index(testData))
[1] "Date"
>
Can someone show me the errors of my ways here so that I can move forward? Thanks!
You need to convert TradingCal to Date:
TradingDates <- as.Date(TradingCal)
And here's another way to find index values in TradingDates that aren't in your testData index.
AllDates <- xts(,TradingDates)
testSubset <- paste(start(testData), end(testData), sep="/")
CheckData <- merge(AllDates, testData)[testSubset]
BadDates <- CheckData[is.na(rowSums(CheckData))]