How to import data from mql4 (metatrader) to r in order to automate? - r

To work with the data and time series of the financial market in real time, most of the brokers that offer the platform metratrader allows the download of historical data of the pairs and indexes; This process is done manually to create a csv file. I need to automate this process to download the historical data of 96 markets every 10 days and no bibliography or information about it.

If the question is how to organize the contact between MT4 and R, there are three general ways:
1. Use files, pipe channel as alternative.
2. REST, you need web server for that.
3. DLL( standard WinAPI, write DLL file, use websocket or contact broker). The latter might be the easiest way, try ZeroMQ.
If you need to download some data from MT4, you should write a small script that will collect the data. Something like
bool getData(string symbol,int timeframe,int startFrom,string fileName)
{
string message="";
for(int i=startFrom;i>=0;i--)
{
message=message+StringFormat("%s;%.5f;%.5f;%.5f;%.5f",
TimeToString(iTime(symbol,timeframe,i)),
iOpen(symbol,timeframe,i),
iHigh(symbol,timeframe,i),
iLow(symbol,timeframe,i),
iClose(symbol,timeframe,i));
}
int handle=FileOpen(fileName,FILE_READ|FILE_WRITE,FILE_CSV);
if(handle==INVALID_HANDLE)return(false);
FileSeek(CUR_END);
FileWrite(message);
FileClose(handle);
return(true);
}

Related

Is it possible to create dynamic jobs with Dagster?

Consider this example - you need to load table1 from source database, do some generic transformations (like convert time zones for timestamped columns) and write resulting data into Snowflake. This is an easy one and can be implemented using 3 dagster ops.
Now, imagine you need to do the same thing but with 100s of tables. How would you do it with dagster? Do you literally need to create 100 jobs/graphs? Or can you create one job, that will be executed 100 times? Can you throttle how many of these jobs will run at the same time?
You have a two main options for doing this:
Use a single job with Dynamic Outputs:
With this setup, all of your ETLs would happen in a single job. You would have an initial op that would yield a DynamicOutput for each table name that you wanted to do this process for, and feed that into a set of ops (probably organized into a graph) that would be run on each individual DynamicOutput.
Depending on what executor you're using, it's possible to limit the overall step concurrency (for example, the default multiprocess_executor supports this option).
Create a configurable job (I think this is more likely what you want)
from dagster import job, op, graph
import pandas as pd
#op(config_schema={"table_name": str})
def extract_table(context) -> pd.DataFrame:
table_name = context.op_config["table_name"]
# do some load...
return pd.DataFrame()
#op
def transform_table(table: pd.DataFrame) -> pd.DataFrame:
# do some transform...
return table
#op(config_schema={"table_name": str})
def load_table(context, table: pd.DataFrame):
table_name = context.op_config["table_name"]
# load to snowflake...
#job
def configurable_etl():
load_table(transform_table(extract_table()))
# this is what the configuration would look like to extract from table
# src_foo and load into table dest_foo
configurable_etl.execute_in_process(
run_config={
"ops": {
"extract_table": {"config": {"table_name": "src_foo"}},
"load_table": {"config": {"table_name": "dest_foo"}},
}
}
)
Here, you create a job that can be pointed at a source table and a destination table by giving the relevant ops a config schema. Depending on those config options, (which are provided when you create a run through the run config), your job will operate on different source / destination tables.
The example shows explicitly running this job using python APIs, but if you're running it from Dagit, you'll also be able to input the yaml version of this config there. If you want to simplify the config schema (as it's pretty nested as shown), you can always create a Config Mapping to make the interface nicer :)
From here, you can limit run concurrency by supplying a unique tag to your job, and using a QueuedRunCoordinator to limit the maximum number of concurrent runs for that tag.

Firebase data structure - is the Firefeed structure relevant?

Firefeed is a very nice example of what can be achieved with Firebase - a fully client side Twitter clone. So there is this page : https://firefeed.io/about.html where the logic behind the adopted data structure is explained. It helps a lot to understand Firebase security rules.
By the end of the demo, there is this snippet of code :
var userid = info.id; // info is from the login() call earlier.
var sparkRef = firebase.child("sparks").push();
var sparkRefId = sparkRef.name();
// Add spark to global list.
sparkRef.set(spark);
// Add spark ID to user's list of posted sparks.
var currentUser = firebase.child("users").child(userid);
currentUser.child("sparks").child(sparkRefId).set(true);
// Add spark ID to the feed of everyone following this user.
currentUser.child("followers").once("value", function(list) {
list.forEach(function(follower) {
var childRef = firebase.child("users").child(follower.name());
childRef.child("feed").child(sparkRefId).set(true);
});
});
It's showing how the writing is done in order to keep the read simple - as stated :
When we need to display the feed for a particular user, we only need to look in a single place
So I do understand that. But if we take a look at Twitter, we can see that some accounts has several millions followers (most followed is Katy Perry with over 61 millions !). What would happen with this structure and this approach ? Whenever Katy would post a new tweet, it would make 61 millions Write operations. Wouldn't this simply kill the app ? And even more, isn't it consuming a lot of unnecessary space ?
With denormalized data, the only way to connect data is to write to every location its read from. So yeah, to publish a tweet to 61 million followers would require 61 million writes.
You wouldn't do this in the browser. The server would listen for child_added events for new tweets, and then a cluster of workers would split up the load paginating a subset of followers at a time. You could potentially prioritize online users to get writes first.
With normalized data, you write the tweet once, but pay for the join on reads. If you cache the tweets in feeds to avoid hitting the database for each request, you're back to 61 million writes to redis for every Katy Perry tweet. To push the tweet in real time, you need to write the tweet to a socket for every online follower anyway.

Java Servlet Download muptiple csv files

I have a report which displays some information in a report and I have link in the report to export to CSV. To download the csv file what we are doing is ,
public class ReportServlet extends XYXServlet{
public void service(HttpServletRequest req, HttpServletResponse res) throws Exception {
...................
...................
res.setContentType("text/csv");
res.setHeader("Content-Disposition","attachment; filename=\""+reportName+"\"");
OutputStream out = res.getOutputStream();
// Render the report
ReportRender.renderReport(report,results,out,rtParam);
out.close();
}
}
This report is for one patient. Now I have the requirement where I have to download report for all the patient in the system. We have more than 5000 patients. It is a one time download. SO basically I should have one CSV file per patient .eg filename will be xyzreport-patientId. We are using velocity template . Basically ReportRender will take the report result and merge with the template using velocity template. like
VelocityContext c = new VelocityContext(params);
Writer w = new OutputStreamWriter(out);
template.merge(c,w);
w.flush();
So now my problem is how do I download all report for all patients at one time. Can I use one request/response to download reports for all patients?
You can use zip file creation.
Best Practices to Create and Download a huge ZIP (from several BLOBs) in a WebApp
In above example they have BLOBs to download. In your case you need to write CSV files on zipped stream. If you will process all at a time and then sending them will cause memory issue. You need to do it loop; writing on stream as soon as you read it. This will increase efficiency of output as well as will avoid memory issues.
Above question has also answer along with implementation which is submitted by one who asked question. It is tried and tested. :)

Most efficient method of pulling in weather data for multiple locations

I'm working on a meteor mobile app that displays information about local places of interest and one of the things that I want to show is the weather in each location. I've currently got my locations stored with latlng coordinates and they're searchable by radius. I'd like to use the openweathermap api to pull in some useful 'current conditions' information so that when a user looks at an entry they can see basic weather data. Ideally I'd like to limit the number of outgoing requests to keep the pages snappy (and API requests down)
I'm wondering if I can create a server collection of weather data that I update regularly, server-side (hourly?) that my clients then query (perhaps using a mongo $near lookup?) - that way all of my data is being handled within meteor, rather than each client going out to grab the latest data from the API. I don't want to have to iterate through all of the locations in my list and do a separate call out to the api for each as I have approx. 400 locations(!). I'm afraid I'm new to API requests (and meteor itself) so apologies if this is a poorly phrased question.
I'm not entirely sure if this is doable, or if it's even the best approach - any advice (and links to any useful code snippets!) would be greatly appreciated!
EDIT / UPDATE!
OK I haven't managed to get this working yet but I some more useful details on the data!
If I make a request to the openweather API I can get data back for all of their locations (which I would like to add/update to a collection). I could then do regular lookup, instead of making a client request straight out to them every time a user looks at a location. The JSON data looks like this:
{
"message":"accurate",
"cod":"200",
"count":50,
"list":[
{
"id":2643076,
"name":"Marazion",
"coord":{
"lon":-5.47505,
"lat":50.125561
},
"main":{
"temp":292.15,
"pressure":1016,
"humidity":68,
"temp_min":292.15,
"temp_max":292.15
},
"dt":1403707800,
"wind":{
"speed":8.7,
"deg":110,
"gust":13.9
},
"sys":{
"country":""
},
"clouds":{
"all":75
},
"weather":[
{
"id":721,
"main":"Haze",
"description":"haze",
"icon":"50d"
}
]
}, ...
Ideally I'd like to build my own local 'weather' collection that I can search using mongo's $near (to keep outbound requests down, and speed up), but I don't know if this will be possible because the format that the data comes back in - I think I'd need to structure my location data like this in order to use a geo search:
"location": {
"type": "Point",
"coordinates": [-5.47505,50.125561]
}
My questions are:
How can I build that collection (I've seen this - could I do something similar and update existing entries in the collection on a regular basis?)
Does it just need to live on the server, or client too?
Do I need to manipulate the data in order to get a geo search to work?
Is this even the right way to approach it??
EDIT/UPDATE2
Is this question too long/much? It feels like it. Maybe I should split it out.
Yes easily possible. Because your question is so large I'll give you a high level explanation of what I think you need to do.
You need to create a collection where you're gonna save the weather data in.
A request worker that requests new data and updates the collection on a set interval. Use something like cron-tick for scheduling the interval.
Requesting data should only happen server side and I can recommend the request npm package for that.
Meteor.publish the weather collection and have the client subscribe to that, with optionally a filter for it's location.
You should now be getting the weather data on your client and should be able to get freaky with it.

File locks in R

The short version
How would I go about blocking the access to a file until a specific function that both involves read and write processes to that very file has returned?
The use case
I often want to create some sort of central registry and there might be more than one R process involved in reading from and writing to that registry (in kind of a "poor man's parallelization" setting where different processes run independently from each other except with respect to the registry access).
I would not like to depend on any DBMS such as SQLite, PostgreSQL, MongoDB etc. early on in the devel process. And even though I later might use a DBMS, a filesystem-based solution might still be a handy fallback option. Thus I'm curious how I could realize it with base R functionality (at best).
I'm aware that having a lot of reads and writes to the file system in a parallel setting is not very efficient compared to DBMS solutions.
I'm running on MS Windows 8.1 (64 Bit)
What I'd like to get a deeper understanding of
What actually exactly happens when two or more R processes try to write to or read from a file at the same time? Does the OS figure out the "accesss order" automatically and does the process that "came in second" wait or does it trigger an error as the file access might is blocked by the first process? How could I prevent the second process from returning with an error but instead "just wait" until it's his turn?
Shared workspace of processes
Besides the rredis Package: are there any other options for shared memory on MS Windows?
Illustration
Path to registry file:
path_registry <- file.path(tempdir(), "registry.rdata")
Example function that registers events:
registerEvent <- function(
id=gsub("-| |:", "", Sys.time()),
values,
path_registry
) {
if (!file.exists(path_registry)) {
registry <- new.env()
save(registry, file=path_registry)
} else {
load(path_registry)
}
message("Simulated additional runtime between reading and writing (5 seconds)")
Sys.sleep(5)
if (!exists(id, envir=registry, inherits=FALSE)) {
assign(id, values, registry)
save(registry, file=path_registry)
message(sprintf("Registering with ID %s", id))
out <- TRUE
} else {
message(sprintf("ID %s already registered", id))
out <- FALSE
}
out
}
Example content that is registered:
x <- new.env()
x$a <- TRUE
x$b <- letters[1:5]
Note that the content usually is "nested", i.e. RDBMS would not be really "useful" anyway or at least would involve some normalization steps before writing to the DB. That's why I prefer environments (unique variable IDs and pass-by-reference is possible) over lists and, if one does make the step to use a true DBMS, I would rather turn NoSQL approaches such as MongoDB.
Registration cycle:
The actual calls might be spread over different processes, so there is a possibility of concurrent access atempts.
I want to have other processes/calls "wait" until a registerEvent read-write cycle is finished before doing their read-write cycle (without triggering errors).
registerEvent(values=list(x_1=x, x_2=x), path_registry=path_registry)
registerEvent(values=list(x_1=x, x_2=x), path_registry=path_registry)
registerEvent(id="abcd", values=list(x_1=x, x_2=x),
path_registry=path_registry)
registerEvent(id="abcd", values=list(x_1=x, x_2=x),
path_registry=path_registry)
Check registry content:
load(path_registry)
ls(registry)
See filelock R package, available since 2018. It is cross-platform. I am using it on Windows and have not found a single problem.
Make sure to read the documentation.
?filelock::lock
Although the docs suggest to leave the lock file, I have had no problems removing it on function exit in a multi-process environment:
on.exit({filelock::unlock(lock); file.remove(path.lock)})

Resources