I would like to import from Bloomberg into R for a specified day the entire option chain for a particular stock, i.e. all expiries and strikes for the exchange traded options. I am able to import the option chain for a non-specified day (today):
bbgData <- bds(connection,sec,"OPT_CHAIN")
Where connection is a valid Bloomberg connection and sec is a Bloomberg security ticker such as "TLS AU Equity"
However, if I add extra fields it doesn't work, i.e.
bbgData <- bds(connection, sec,"OPT_CHAIN", testDate, "OPT_STRIKE_PX", "MATURITY", "PX_BID", "PX_ASK")
bbgData <- bds(connection, sec,"OPT_CHAIN", "OPT_STRIKE_PX", "MATURITY", "PX_BID", "PX_ASK")
Similarly, if I switch to using the historical data function it doesn't work
bbgData <- dateDataHist <- bdh(connection,sec,"OPT_CHAIN","20160201")
I just need the data for one day, but for a specified day, and including the additional fields
Hint: I think the issue is that every field following "OPT_CHAIN" is dependent on the result of "OPT_CHAIN", so for example it is the strike price given the code in "OPT_CHAIN", but I am unsure how to introduce this conditionality into the R Bloomberg query.
It's better to use the field CHAIN_TICKERS and related overrides when retrieving option data for a given underlying from Bloomberg. You can, for example, request points for a given moneyness by getting CHAIN_TICKERS with an override of CHAIN_STRIKE_PX_OVRD equal to 90%-110%.
In either case you need to use the tickers that are the result of your first request in a second request if you want to retrieve additional data. So:
option_tickers <- bds("TLS AU Equity","CHAIN_TICKERS",
overrides=c(CHAIN_STRIKE_PX_OVRD="90%-110%"))
option_prices <- bdp(sapply(option_tickers, paste, "equity"), c("PX_BID","PX_ASK"))
Related
I have a large (~18 million records) database of insurance policies, and I need to determine if each policy has been renewed or not. Imagine that a few records look like this: (today is October 5, 2022):
policy_number
prior_policy_number
zip_code
expiration_date
123456
90210
2023-10-01
123456
987654
90210
2022-10-01
987654
90210
2021-10-01
456654
10234
2019-05-01
The first line is a current policy, because 2023-10-01 is in the future.
The second line was renewed (by the first line).
The third line was renewed by the second line--we can tell because the second line's prior policy number matches the third line's policy number.
The fourth line was not renewed.
So a policy is renewed if either:
a) there is another policy with the same policy number and zip code but a later expiration date
b) there is another policy whose prior policy number matches this policy number, they have the same zip code, and the other policy has a later expiration date.
(Zip code is necessary because some insurers use policy numbers like "00000002" and this disambiguates duplicates.)
I wrote the following code, which works but takes forever to execute. Basically, I sort the data frame by descending expiration date, and then for each observation I create a miniature data frame that consists of just policies that have the same policy number or previous policy number and zip code, and then check the expiration data of the first (and therefore latest) one to see if it's later than the policy in question. I realize this is probably a horrible way to do this.
Does anyone have suggestions for how to make it more efficient?
non_renewals <- valid_zip_policies %>% arrange(desc(expiration_date))
check_renewed <- function (policy,zip,exp) {
#We create a subset of the main data frame containing only that policy number, (or policies with this policy as the prior policy number) and filter it for the matching zip code
cat(policy,zip,exp)
test_renewed <- valid_zip_policies %>% select(c("policy_number","prior_policy_number","zip_code","expiration_date")) %>% filter(policy_number == policy | prior_policy_number == policy) %>% filter(zip_code == zip)
#These are all the policies for the given policy number, sorted from latest to earliest expiration date. Is the expiration date of the most recent one later than the expiration date of this one? If so, it was renewed
if (test_renewed$expiration_date[1] > exp) { return (TRUE)} else {return (FALSE)}
}
for (i in 1:nrow(non_renewals)) {
non_renewals$renewed [i] <- check_renewed(non_renewals$policy_number[i],non_renewals$zip_code[i],non_renewals$expiration_date[i])
}
So I was able to answer my own question! The following code is literally about 100 times faster! Two things helped:
by far the greatest speed boost was from using data tables from package data.table rather than data frames. That package also has the fifelse command you see below.
using package parallel and its mclapply command gave an additional speed boost on my system.
It may also have helped that instead of passing three items from the original table to the function, I just pass the number and let the function retrieve the items as necessary.
non_renewals <- setDT(non_renewals)
check_renewed <- function (obs) {
#If expiration date of latest example is later, then it was renewed
if (non_renewals[policy_number==policy_number[obs] & zip_code==zip_code[obs],expiration_date][1] > non_renewals$expiration_date[obs]) {return("RENEWED")}
#If not, check the prior policies
final <- fifelse(non_renewals[prior_policy_number==policy_number[obs] & zip_code==zip_code[obs],expiration_date][1] > non_renewals$expiration_date[obs],"RENEWED","NONRENEWED",na="NONRENEWED")
return(final)
}
renewed <- character(10000)
system.time(renewed <- mclapply(1:10000,function (i) {check_renewed(i)}))
I have a database called "db" with a table called "company" which has a column named "name".
I am trying to look up a company name in db using the following query:
dbGetQuery(db, 'SELECT name,registered_address FROM company WHERE LOWER(name) LIKE LOWER("%APPLE%")')
This give me the following correct result:
name
1 Apple
My problem is that I have a bunch of companies to look up and their names are in the following data frame
df <- as.data.frame(c("apple", "microsoft","facebook"))
I have tried the following method to get the company name from my df and insert it into the query:
sqlcomp <- paste0("'SELECT name, ","registered_address FROM company WHERE LOWER(name) LIKE LOWER(",'"', df[1,1],'"', ")'")
dbGetQuery(db,sqlcomp)
However this gives me the following error:
tinyformat: Too many conversion specifiers in format string
I've tried several other methods but I cannot get it to work.
Any help would be appreciated.
this code should work
df <- as.data.frame(c("apple", "microsoft","facebook"))
comparer <- paste(paste0(" LOWER(name) LIKE LOWER('%",df[,1],"%')"),collapse=" OR ")
sqlcomp <- sprintf("SELECT name, registered_address FROM company WHERE %s",comparer)
dbGetQuery(db,sqlcomp)
Hope this helps you move on.
Please vote my solution if it is helpful.
Using paste to paste in data into a query is generally a bad idea, due to SQL injection (whether truly injection or just accidental spoiling of the query). It's also better to keep the query free of "raw data" because DBMSes tend to optimize a query once and reuse that optimized query every time it sees the same query; if you encode data in it, it's a new query each time, so the optimization is defeated.
It's generally better to use parameterized queries; see https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries.
For you, I suggest the following:
df <- data.frame(names = c("apple", "microsoft","facebook"))
qmarks <- paste(rep("?", nrow(df)), collapse = ",")
qmarks
# [1] "?,?,?"
dbGetQuery(con, sprintf("select name, registered_address from company where lower(name) in (%s)", qmarks),
params = tolower(df$names))
This takes advantage of three things:
the SQL IN operator, which takes a list (vector in R) of values and conditions on "set membership";
optimized queries; if you subsequently run this query again (with three arguments), then it will reuse the query. (Granted, if you run with other than three companies, then it will have to reoptimize, so this is limited gain);
no need to deal with quoting/escaping your data values; for instance, if it is feasible that your company names might include single or double quotes (perhaps typos on user-entry), then adding the value to the query itself is either going to cause the query to fail, or you will have to jump through some hoops to ensure that all quotes are escaped properly for the DBMS to see it as the correct strings.
I use Telegraf to collect data and then write it to InfluxDb. And I would like to aggregate the data into one row. As I know there are a lot of plugins for Telegraf including plugins to aggregate data. I use BasicStats Aggregator Plugin for my purpose. But it has unexpected behavior (unexpected for me). It aggregates rows by their tags but I need aggregation only by timestamp. How can I make this plugin to aggregate rows only by their timestamps? For example I have the following rows:
timestamp=1 tag1=foo field1=3
timestamp=1 tag1=bar field1=1
timestamp=1 tag1=baz field1=4
and for them I would like to get the following aggregated row:
timestamp=1 sum=8
Thank you in advance
I believe you can't aggregate by timestamp with Telegraf's conventional processors or aggregators( like BasicStats ) because of InfluxDB's nature. It is a time-series database and it is indexed by time. But you can aggregate the data with an InfluxDB Query:
SELECT mean("field1") FROM "statsdemo"."telegraf"."my_metric" WHERE time > now()-5m AND time < now() GROUP BY time(2000ms) FILL(null)
Another approach could be using execd aggregator. Just write a script with bash or your favorite programming language that reads from STDIN, aggregate the data by timestamp, and print the result to STDOUT following the influx line protocol. I have never used this aggregator before, but I think it may work.
After over a year struggling to no avail, I'm turning the SO community for help. I've used various RegEx creator sites, standalone RegEx creator software as well as manual editing all in a futile attempt to create a pattern to parse and extract dynamic data from the below e-mail samples (sanitized to protect the innocent):
Action to Take: Buy shares of Facebook (Nasdaq: FB) at market. Use a 20% trailing stop to protect yourself. ...
Action to Take: Buy Google (Nasdaq: GOOG) at $42.34 or lower. If the stock is above $42.34, don't chase it. Wait for it to come down. Place a stop at $35.75. ...
***Action to Take***
Buy International Business Machines (NYSE: IBM) at market. And use a protective stop at $51. ...
What needs to be parsed is both forms of "Action to Take" sections and the resulting extracted data must include the direction (i.e. buy or sell, but just concerned about buys here), the ticker, the limit price (if applicable) and the stop value as either a percentage or number (if applicable). Sometimes there's also multiple "Action to Take"'s in a single e-mail as well.
Here's examples of what the pattern should not match (or ideally be flexible enough to deal with):
Action to Take: Sell half of your Apple (NYSE: AAPL) April $46 calls for $15.25 or higher. If the spread between the bid and the ask is $0.20 or more, place your order between the bid and the ask - even if the bid is higher than $15.25.
Action to Take: Raise your stop on Apple (NYSE: AAPL) to $75.15.
Action to Take: Sell one-quarter of your Facebook (Nasdaq: FB) position at market. ...
Here's my R code with the latest Perl pattern (to be able to use lookaround in R) that I came up with that sort of works, but not consistently or over multiple saved e-mails:
library(httr)
library("stringr")
filenames <- list.files("R:/TBIRD", pattern="*.eml", full.names=TRUE)
parse <- function(input)
{
text <- readLines(input, warn = FALSE)
text <- paste(text, collapse = "")
trim <- regmatches(text, regexpr("Content-Type: text/plain.*Content-Type: text/html", text, perl=TRUE))
pattern <- "(?is-)(?<=Action to Take).*(?i-s)(Buy|Sell).*(?:\\((?:NYSE|Nasdaq)\\:\\s(\\w+)\\)).*(?:for|at)\\s(\\$\\d*\\.\\d* or|market)\\s"
df <- str_match(text,pattern)
return(df)
}
list <- lapply(filenames, function(x){ parse(x) })
table <- do.call(rbind,list)
table <- data.frame(table)
table <- table[rowSums(is.na(table)) < 1, ]
table <- subset(table, select=c("X2","X3","X4"))
The parsing has to operate on the text copy because the HTML appears way too complicated to do so due to lack of standardization from e-mail to e-mail. Unfortunately, the text copy also commonly tends to have wrong line endings than regexp expects which greatly aggravates things.
I have the data as below manner.
<Status>Active Leave Terminated</Status>
<date>05/06/2014 09/10/2014 01/10/2015</date>
I want to get the data as in the below manner.
<status>Active</Status>
<date>05/06/2014</date>
<status>Leave</Status>
<date>09/10/2014</date>
<status>Terminated</Status>
<date>01/10/2015</date>
please help me on the query, to retrieve the data as specified above.
Well, you have a string and want to split it at the whitestapces. That's what tokenize() is for and \s is a whitespace. To get the corresponding date you can get the current position in the for loop using at. Together it looks something like this (note that I assume that the input data is the current context item):
let $dates := tokenize(date, "\s+")
for $status at $pos in tokenize(Status, "\s+")
return (
<status>{$status}</status>,
<date>{$dates[$pos]}</date>
)
You did not indicate whether your data is on the file system or already loaded into MarkLogic. It's also not clear if this is something you need to do once on a small set of data or on an on-going basis with a lot of data.
If it's on the file system, you can transform it as it is being loaded. For instance, MarkLogic Content Pump can apply a transformation during load.
If you have already loaded the content and you want to transform it in place, you can use Corb2.
If you have a small amount of data, then you can just loop across it using Query Console.
Regardless of how you apply the transformation code, dirkk's answer shows how you need to change it. If you are updating content already in your database, you'll xdmp:node-delete() the original Status and date elements and xdmp:node-insert-child() the new ones.