I have an R script that used to be used on standalone CSV files but now needs to accept similar inputs from another, existing application. What are the typical options to call R from an external application written in Python and to pass data to it?
As a toy example, you could imagine a web application written in Python that needs to send R a dataset and then the R script calculates summary stats and sends back to the application. The size of the input dataset is small. Think of it as one row from a database with approx. 20 fields. The fields are a mix of text and numbers. The number of fields is fixed in this call. In the earlier flow these fields were members of a CSV file line.
Example:
New York, 23456,,25.5, 23/04/2015,, 0, 0, Yes, Yes, Absent
The return from R is something like:
0.87, Demographics, NA, History, NA
PS. I don't mean something like Shiny-R which provides both the front end and back end. Here the external application is pre-existing but just needs a way to call R with its data and get a result back.
I would suggest the rpy2 package from Python to allow the usage of R-Commands and Functions in a python script rather than send and receiving data back and forth to R.
rpy2 main website
Here is a nice tutorial on rpy2.
Have a deeper look at : Rserve
Related
I'm currently working on an R Shiny App that utilizes googlesheets4 to pull in a large dataset from GoogleSheets upon app launch. Loading in this dataset to my app takes ~2 minutes, which stalls my entire app's load time.
The only visual in my app is based on this GoogleSheets data, so it is very dependent on this specific dataset. Once the dataset gets pulled into my app, it is filter and therefore becomes much smaller (85,000 rows ---> 1,000 rows). This GoogleSheet data is updated every day, so I don't have the luxury of pre-downloading it once and storing it as a .csv forever.
There are two different fixes for this that I have tried but have been unsuccessful...curious if anyone has any thoughts.
Have a separate app running. My first idea was to create a separate Shiny app entirely, that would have a sole purpose of pulling the GoogleSheets df once a day. Once it pulls it, it would conduct the necessary data cleaning to get it down to ~1,000 rows, and then would push the smaller df to a different GoogleSheet link. Then, my original app with the visual would just always reference that new GoogleSheet (which would take much less time to load in).
The problem I ran into here is that I couldn't figure out how to write a new GoogleSheets doc using googlesheets4. If anyone has any idea how to do that it would be much appreciated.
Temporarily delay the load in of the GoogleSheets data, and let visual populate first. My second idea was to have the code that pulls in the GoogleSheets df be delayed upon launch, letting my visual first populate (using old data) and then have the GoogleSheets pull happen. Once the pull is complete, have the visual re-populate with the updated data.
I couldn't figure out the best/right way to make this happen. I tried messing around with sleep.sys() and futures/promises but couldn't get things to work correctly.
Curious if anyone has any thoughts on my 2 different approaches, or if there's a better approach I'm just not considering...
Thanks!
There is a function called write_sheet that allows you to write data to a google sheet. Does that work for you?
googlesheets4::write_sheet(data = your_data,
ss = spread_sheet_identifier,
sheet = "name_of_sheet_to_write_in")
If you on only want to add something without deleting everything in the sheet, the function is sheet_append
googlesheets4::sheet_append(data = your_data,
ss = spread_sheet_identifier,
sheet = "name_of_sheet_to_write_in")
Not sure you can store the credentials in a save way, but couldn't you use github actions? Or alternatively a cron job on your local computer?
I want to issue something e.g. a new option. Inside the flow where I'm issuing this new option, I need to get info from two separate oracles that need to provide data for the output state.
How should I do this... should I have one output and 3 commands? command with data from Oracle 1, command with data from Oracle 2 and then the issue command? or can this be done with one command?
It's entirely up to you - your command can contain whatever you data you want, so in theory, you could do the whole with one command.
Having said that, I would probably split it out into at least two commands for clarity and privacy. The privacy element is you can build a filtered transaction for the oracle to sign that only contains the oracle command.
If you don't mind the two oracles seeing the data sent to each for signing, you could encapsulate the data in one command e.g.
class OracleCommand(val spotPrice: SpotPrice, val volatility: Volatility) : CommandData
Where one oracle attests to spotPrice and the other to Volatility.
However, you would find it hard to determine what part of the data they attested to since they will both sign over the entire filtered transaction.
Unless you knew the design of the oracle could specifically pick out the correct data, you're probably better off going with three separate commands.
I'm writing my own R package to carry out some specific analyses for which I make a bunch of API calls to get some data from some websites. I have multiple keys for each API and I want to cycle them for two reasons:
Ensure I don't go over my daily limit
Depending on who is using the package, different keys may be used
All my keys are stored in a .csv file api_details.csv. This file is read by a function that gets the latest usage statistics and returns the key with the most calls available. I can add the .csv file to the package/data folder and it is available when the package is loaded but this presents two problems:
The .csv file is not read properly and all columns names are pasted together to create a single variable name and all values pasted together to create a single observation per row.
As I continue working, I would like to add more keys (and perhaps more details about the keys) to the api_details.csv but I'm not sure about how to do that.
I can save the details as an .RData file but I'm not sure about how it would be updated or read outside of R (by other people). Using .csv means that anyone using the package can easily add/remove some keys.
What's the best method to address 1, 2 above?
I have an R script that creates multiple scripts and submits these simultaneously to a computer cluster, and after all of the multiple scripts have completed and the output has been written in the respective folders, I would like to automatically launch another R script that works on these outputs.
I haven't been able to figure out whether there is a way to do this in R: the function 'wait' is not what I want since the scripts are submitted as different jobs and each of them completes and writes its output file at different times, but I actually want to run the subsequent script after all of the outputs appear.
One way I thought of is to count the files that have been created and, if the correct number of output files are there, then submit the next script. However to do this I guess I would have to have a script opened that checks for the presence of the files every now and then, and I am not sure if this is a good idea since it probably takes a day or more before the completion of the first scripts.
Can you please help me find a solution?
Thank you very much for your help
-fra
I think you are looking at this the wrong way:
Not an R problem at all, R happens to be the client of your batch job.
This is an issue that queue / batch processors can address on your cluster.
Worst case you could just wait/sleep in a shell (or R script) til a 'final condition reached' file has been touched
Inter-dependencies can be expressed with make too
I would like to randomize survey respondents on Mechanical Turk (or Survey Monkey, or a comparable web-based instrument) to particular conditions using my own R code. For example, the respondent might answer five background questions, then be exposed to a random question. I want to use the background data, run my R code on it, and return the question to the respondent immediately. (To be clear, I have a particular way I want to do the randomization in R that differs from complete randomization or random allocation of, e.g., 60% to one condition, 40% to the other.)
Any suggestions for how to go about integrating R code into a web-based survey like this?
Have you considered having MTurk query a web server which you control running R on it to get its randomization? You could then just feed MTurk a spreadsheet with ID codes, put those ID codes in the URL to the web server, and the web server could keep track of which IDs it randomized to what.
A demonstration of how simple this might be is in Section 3 here:
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RApacheProject/paper.pdf
Another more end user-oriented walkthrough:
http://www.jstatsoft.org/v08/i10/paper
Could also look to Rweb, but that would be less secure. Many other options exist.
Basically you want Mechanical Turk to load a frame with your webpage in it. The webpage it requests would have a CGI submit embedded in it ( e.g. MT loads a frame with the contents of the URL http://www.myserver.com/myproject.html?MTid=10473 ). Then your R script on the web server does the randomization, returns a webpage containing only the random number, and records on the web server which MTid was in the URL and which random number was generated. At the end just merge the web server's data with the MT data by the MTid.