Fastest way to send multiple http requests in R - r

You can use multithreading in Python and send lots of http requests, like in this SO question. My question is, is there any easy way to do this in R? I've seen a guide for RCurl here, but I'd prefer a simpler solution if possible. Currently I'm looping through a series of ids, it's be great to send all (or more) of them at once.

That guide to multiple requests in Rcurl looks pretty simple, in fact I'd say it looks simpler to me than the solution to the Python question you've linked. Better yet, the work is already done for you. Most of that guide is going into detail about the advantages of concurrent requests; the method itself is deceptively simple, and is provided for you pre-cooked right at the top of the page.
You can literally cut and paste the code shown at the top of the post into an R script (include library(RCurl) above it), run that code to source the function, then call the function with a single line.
I won't paste the function code here, since you should get that from its author, but once you've sourced that function, their example usage is:
uris = c("http://www.omegahat.org/index.html", "http://www.omegahat.org/RecentActivities.html")
z <- getURIs(uris)
I just did the above on my own computer, and it works perfectly. I'd be surprised if you can find a simpler solution than that.

Related

How to see where in my code a function gets called in RStudio?

I'm currenty cleaning up my first big R project and at a point, where I have a lot of functions implemented but I am not sure, which function got called and used by me in an other script and which function got never used. So now I want to get all calls of this function in my project. Is this possible?
I'm using RStudio and a lot of other IDEs I've used got a feature like this, so I was wondering if this is also implemented in RStudio.
I searched the web and stack overflow, but got no answer, so I assume that this is not possible but I wanted to ask, just in case it IS possible but I didn't found the right answer.
Thank you!

Bokeh and Joblib don't play together

I have a Bokeh script which calls the data using a function wrapped with joblib's #memory.cache decorator. When I run the script as a python script the get_data function is fast (cached). When I call it using bokeh server --show code.py it seems like cache is lost and the function is re-evaluated, making data retrieval slow. How can I make Bokeh work nicely with Joblib?
It's hard to say for certain without being able to run an example that reproduces what you are seeing. But my guess is that it has something to do with the way the Bokeh server code runner executes the app script, on every session.
So, I can think of a few possible things to try.
First, as of 0.12.4 there's examples and guidance for embedding a Bokeh server as a library e.g. in a standalone python script, or in a Flask or Tornado app. The examples there all also use FunctionHandler which does not exec. My hunch is that this is more like the standard single process/single namespace python execution model, and will play better with your joblib decorator.
(If you try this route, and it works, please let use know somehow, it's probably worth documenting better.)
Otherwise, another option that might work better is to use lifecycle hooks to provide your wrapped function in a way that is sure to be shared across sessions. You can see this technique in the spectrogram example (c.f. the audio.py)
Finally, just some gentle advice for SO. If you can include a minimal example code, that greatly increases the odds of being able to get code back in an answer. E.g., if there was example code here that I could try to get working, then I'd be able to post a complete working code in the answer.

Use Julia to perform computations on a webpage

I was wondering if it is possible to use Julia to perform computations on a webpage in an automated way.
For example suppose we have a 3x3 html form in which we input some numbers. These form a square matrix A, and we can find its eigenvalues in Julia pretty straightforward. I would like to use Julia to make the computation and then return the results.
In my understanding (which is limited in this direction) I guess the process should be something like:
collect the data entered in the form
send the data to a machine which has Julia installed
run the Julia code with the given data and store the result
send the result back to the webpage and show it.
Do you think something like this is possible? (I've seen some stuff using HttpServer which allows computation with the browser, but I'm not sure this is the right thing to use) If yes, which are the things which I need to look into? Do you have any examples of such implementations of web calculations?
If you are using or can use Node.js, you can use node-julia. It has some limitations, but should work fine for this.
Coincidentally, I was already mostly done with putting together an example that does this. A rough mockup is available here, which uses express to serve the pages and plotly to display results (among other node modules).
Another option would be to write the server itself in Julia using Mux.jl and skip server-side javascript entirely.
Yes, it can be done with HttpServer.jl
It's pretty simple - you make a small script that starts your HttpServer, which now listens to the designated port. Part of configuring the web server is that you define some handlers (functions) that are invoked when certain events take place in your app's life cycle (new request, error, etc).
Here's a very simple official example:
https://github.com/JuliaWeb/HttpServer.jl/blob/master/examples/fibonacci.jl
However, things can get complex fast:
you already need to perform 2 actions:
a. render your HTML page where you take the user input (by default)
b. render the response page as a consequence of receiving a POST request
you'll need to extract the data payload coming through the form. Data sent via GET is easy to reach, data sent via POST not so much.
if you expose this to users you need to setup some failsafe measures to respawn your server script - otherwise it might just crash and exit.
if you open your script to the world you must make sure that it's not vulnerable to attacks - you don't want to empower a hacker to execute random Julia code on your server or access your DB.
So for basic usage on a small case, yes, HttpServer.jl should be enough.
If however you expect a bigger project, you can give Genie a try (https://github.com/essenciary/Genie.jl). It's still work in progress but it handles most of the low level work allowing developers to focus on the specific app logic, rather than on the transport layer (Genie's author here, btw).
If you get stuck there's GitHub issues and a Gitter channel.
Try Escher.jl.
This enables you to build up the web page in Julia.

R how to know if a library is effectively used?

I have a hudge code, and where all the libraries are attached in the begining of the code. Now, I'm cleaning a bit this code, removing parts of it / re-writting other.
I was wondering if there was a way to know if a specific library is used or not by the code (in order to clean the library part too)? I could list, for each library, all the functions that are attached, then search in the code that this function are not called, but it will become long. I could also remove this library and try to run the code, but I don't like this solution (not enough robust).
I'm sorry if the question have already been asked, but so far, I haven't found any solution :(.

Recommended method to download tweets based on search terms and store

I would like to download tweets based on certain search terms. I'm aware of HTTP GET and such techniques, but I'm not sure the best way to create a simple executable that downloads the tweets and saves them for subsequent analysis.
Any ideas? I'm a basic programmer - if you say "use curl" I know roughly what you mean but not how to set up an application to run curl commands!
Hence my dilemna.
Thanks in advance!
You absolutely can do it in c# or any other language.
From a very rudimentary standpoint, the Twitter API wiki will tell you how, but I know that's not what you're really asking.
I would suggest getting familiar with a good API such as Tweetsharp which also has methods not only for getting your typical timelines, but also using search. The advantage to this (aside from not having to handle your own serialization, etc.) is that it unifies the timeline and search calls as they are actually slightly different API's.
The downside to this approach though is that you're not going to be able to directly translate it to a mac, unless you write it using Silverlight.
the upside to this approach is that Tweetsharp gives you a number of options on how it gives you the data, which in turn gives you a number of options as to how to save the data.

Resources