Using MySQL as cache for quantmod getSymbols - r

I've been using quantmods getSymbols function a lot and would like to reduce the load on the external data providers and reduce the time it takes to execute longer code loops because of network latency.
Ideal would be a function that takes a list of symbols (like getSymbols), downloads them from the provider configured in 'setSymbolLookup' and saves them in a MySQL database for easy retrieval later using getSymbols.MySQL.
A major bonus would be if another function (or the same one) allowed only downloading the difference since the last update.
Alternatively a type of proxy where the symbol is downloaded if it doesn't already exist in a local MySQL database/cache would also be good.
Has anyone developed something like this, or come across any documentation on how to do it? I've searched around but the closest I can get are some questions about how to use MySQL as an input source.
Thanks in advance!

Related

Reading Locked SQLite DB In to Memory

I'm working on a project that has some pretty specific requirements, and am running in to a problem with one of them. We have a locked SQLite database. We can't unlock this database, but need to read it (but not write to it), and we cannot create any new files on the filesystem that contain the data from this database. What was suggested is to read the file in to RAM, and then access it from there. I've been trying to find out a way to do this, but this project is on Windows, so it's not going as smoothly as it might otherwise.
What I've been trying to do is read the file in to a bash variable, and then pass that variable to sqlite as the database. This hasn't been working particularly well.
I installed win-bash, but when I do "sqlite3.exe <(cat <<<"$database")" I get a 'syntax error near unexpected token `<('" I checked, and win-bash looks like it's based on an older version of bash. I tried zsh, but it's saying "doesn't look like your system supports FIFOs.". I installed cygwin, which wouldn't really be a good solution anyway (once I figure out how to do this, I need to pass it off to our Qt developers so that they can roll it in to a Qt application) but I was just trying to do a 'proof of concept' - that didn't work either. Sqlite opened just fine, but when i ran ".tables", it said "Error: unable to open database "/dev/fd/63": unable to open database file" So, it looks like I'm barking up the wrong tree, and need to think of some other way to do this.
I guess my questions are, first, is it possible to read a sqlite database in a variable as I was attempting, or am I going down an entirely incorrect path there? Second - if it can't be done that way, is there some way I'm overlooking that might make this possible?
Thanks!

Running a R script to Update a Database

I have written a script in R that calculates a specific value for each of the S&P500 stocks. I would like to run this script every five minutes during trading hours and have the script upload the values to an online database.
I don't know anything about IT. I was thinking to run the script on AWS and have the script upload a SQL database or an AWS version of a SQL server every five minutes.
Do you guys have any ideas about how I should approach this problem? or any other methodologies I could use.
Thank you.
If you want to go the AWS route with a database then there are any number of ways to achieve this, but here is an outline of a fairly straightforward approach.
Launch a database. See e.g. https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcome.html.
Launch an EC2 instance. See e.g. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html.
Set up a cron job to launch your R script every 5 minutes during business hours from the EC2 instance. See e.g. https://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/.
Use an R package such as dplyr/dbplyr/DBI/odbc as suggested by #rpolicastro to connect to and write data to the database.
I'm glossing over a lot of complication in setting up the system in AWS, but hopefully this can get you started. Also, if you really care about ensuring that you never miss any data timepoints then you may either need to set up some kind of redundancy systems, or code in the ability to look backwards in time and fill in the missing timepoints.

Is it possible to manage R sessions?

Is it possible to manage R sessions, as in:
Connect your R console to an existing R session process?
Can two R sessions transfer data to one another?
One might desire this in the following likely scenario:
You're happily working on your R project and have generated data that took 3 hours to compute.
You decide to save your workspace in the case of a technical issue.
Upon saving your Rstudio decides to hang for eternity, however, leaving the R session unaffected.
In this scenario, you would want to
Connect to the R session with a terminal to retrieve your data anyway.
Setup another new R session that continuously synchronizes with the existing R session as a backup session.
Is it possible?
Connect your R console to an existing R session process?
Not possible.
Can two R sessions transfer data to one another?
Yes, there are multiple ways to do this. The general keyword for this is “inter-process communication”. You can use files, named pipes or sockets, for example. To serialise the data you can use either builtin functions (saveRDS, readRDS) or packages (e.g. feather).
But for your given use-case, there’s a much simpler solution:
Never rely on RStudio to save your R session. Instead, do so explicitly by calling saveRDS (or, to save the whole workspace, which I don’t generally recommend, save.image). In fact, the general recommendation is to disable the RStudio options for saving and restoring the session!
Make sure that your preferences look like this:

How can I share (GitHub) my code (R) with sensitive information (passwords)?

Imagine you are using a package that uses an access token. Maybe one from rOpenSci.
My current approach is to source a file at the beginning that is in the .gitignore. It, hence, gets ignored and I can share without worries.
source("never-commit-password.R")
However, there is still a danger that it might be uploaded via .RData because I left it in the workspace.
What is leading practice that trades off convenience with safety?

need help in choosing the right tool

I have a client who has set-up a testing environment in some AI language. It basically runs some predefined test cases and stores the results in as log files (comma separated txt files). My job is to identify and suggest a reporting system and I have these options in mind. either
1. Importing the logs into MSSQL and use the reporting(SSRS) it uses
2. or us import the logs to MySQL and use PHP to develop custom reporting.
I am thinking that going with option2 is better. The reason for this is, the logs are inconsistent and contain unexpected wild characters that normally DB's don't accept. So, I can write some scripts in php before loading them to the database.
Can anyone please suggest if this is your problem what will you suggest to do?
It depends how fancy you need to be. If the data is in CSV files, you could even go so simple as to load it into Excel (or their favorite spreadsheet tool), and use spreadsheet macros to analyze it.

Resources