Invoke R script on AWS Lambda from NodeJS - r

As a result of several hours of unfruitful searches, I am posting this question.
I suppose it is a duplicate of this one:
How do you run RServe on AWS Lambda with NodeJS?
But since it seems that the author of that question did not accomplish his/her goal successfully, I am going to try again.
What I currently have:
A NodeJS server, that invokes an R script through Rserve and passes data to evaluate through node-rio.
Function responsible for that looks like this:
const R = (arg1, arg2) => {
return new Promise((resolve, reject)=>{
const args = {
arg1, arg2
};
//send data to Rserve to evaluate
rio.$e({
filename: path.resolve('./r-scripts/street.R'),
entrypoint: 'run',
data: args,
})
.then((data)=>{
resolve(JSON.parse(data));
})
.catch((err)=>{
reject(`err: ${err}`);
});
});
};
And this works just fine. I am sending data over to my R instance and getting results back into my server.
What I am ultimately trying to achieve:
Every request seems to spawn its own R workspace, which has a considerable memory overhead. Thus, serving even hundreds of concurrent requests using this approach is impossible, as my AWS EC2 runs out of memory pretty quickly.
So, I am looking for a way to deploy all the memory intensive parts to AWS Lambda and thus get rid of the memory overhead.
I guess, the specific question in my case is if there is a way to package R and Rserve together with NodeJS lambda function. Or if there is a way for me to get convinced that this approach won't work using lambda and I should try to look for an alternative.
Note: I cannot use anything other than R, since these are external R scripts, that I have to invoke from my server.
Thanks in advance!

Related

environment variable use in Cloudflare workers node.js

I saw many articles setting environment variables in Cloudflare workers. but I am not able to read or retrieve it in node.js
Code:
async function handleRequest(request) {
if ('OKOK' == process.env.API_KEY) {
return new Response('found', {
headers: { 'content-type': 'text/plain' },
})
}
}
wrangler.toml
name = "hello"
type = "javascript"
# account_id = ""
workers_dev = true
[env.production]
name = "API_KEY"
Cloudflare Workers does not use Node.js. In Workers, environment variables become simple globals. So, to access your environment variable, you would just write API_KEY, not process.env.API_KEY.
(Note: Workers is currently transitioning to a new syntax based on ES modules. In that syntax, environment variables work differently; an env object is passed to the event handler containing all variables. Most people aren't using this new syntax yet, though. You would know if you are using it if your JavaScript uses export default { to define event handlers; on the other hand, if it uses addEventListener("fetch", ...), then it is using the old syntax.)
I recommand you use Miniflare now, pretty easy and strenght forward
"start": "miniflare --watch --debug -e .env"
This is an official Cloudflare lib by the way:
https://www.npmjs.com/package/miniflare
Reasone:
Miniflare is a simulator for developing and testing Cloudflare Workers.
🎉 Fun: develop workers easily with detailed logging, file watching
and pretty error pages supporting source maps.
🔋 Full-featured: supports most Workers features, including KV,
Durable Objects, WebSockets, modules and more.
âš¡ Fully-local: test and develop Workers without an internet
connection. Reload code on change quickly. It's an alternative to
wrangler dev, written in TypeScript, that runs your workers in a
sandbox implementing Workers' runtime APIs.

How do I disable firebase functions cache?

Suppose that I have the following firebase function that looks something like this:
functions = require('firebase-functions')
exports.myFunction = functions.https.onRequest((request, response) => {
// Do stuff...
})
After I deploy this function on the web and execute it for the first time it takes it around 10 seconds to finish, but every other execution after the first takes only 2 seconds to finish.
I assume this has something to do with cache.
I would like every execution of my function to run just like the first execution.
Why does this happen and how can I disable this feature?
It's not possible, and it's not really a "cache". Cloud Functions reuses server instances after creating one to service a request. That server instance will continue to run to handle more incoming requests for as long as it likes. There isn't some command or configuration that you can use to shut it down manually.

How to run cronjob with computer shut off (EC2 instance)

I outlined my small project in a different post - to summarize it again quickly, I am trying to do the following:
Write an R script that pulls data from a website
Schedule the R script to automatically run daily at the same time
Write / append the R script's output to a database
I am familiar with R web-scraping packages (rvest, rselenium) for doing the first bullet. For the 2nd bullet, just today I learned how to create a crontab to run my script when I desire, however the crontab does not run the script when my computer is off, or so I've read.
How can I have it such that the crontab is run even with my computer off? I am somewhat (not really) familiar with EC2 instances, but if I have my R script in an EC2 instance, could I schedule a crontab for the script there and then it would run with my computer off?
Thanks in advance for help!
Since cron is a service that runs on the instance you can't have it start the EC2 instance for you - it's a catch-22.
You can treat EC2 instances as computers that run in someone else's cellar (most of the time at least). You wouldn't expect a computer to run code when it's not turned on and it's exactly the same for an EC2 instance.
I suggest you consider if this is really the setup you want, it sounds to me that you'd be better served using AWS Lambda combined with one of Amazon's hosted data stores (RDS, DynamoDB, SimpleDB, or even S3). The downside here is that you're limited to JavaScript, Python, and Java and as such can't use R (well, you can, but it's messy since you'll have to package everything you need in a JS/Python/Java app and start it from there).
If you really want to run your R script on the EC2 instance you can start the instance with a lambda and then shut it down from your script. Just make sure your instance isn't set to terminate on shutdown.
Regardless of what path you chose you will need to create a lambda and run it from a scheduled CloudWatch Event.
Then you just need to implement the lambda, either to run your script or to use the EC2 API to start the instance.
If you use the lambda to start the EC2 instance you should not use cron on the instance to run the script at a specific time, but run it on startup. Then you have your script shut down the instance when it's finished.
Here's an example Python script for starting an EC2 instance from a lambda to get your started:
import logging
import boto3
# Set up logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# Set up a boto session to get credentials and region
session = boto3.session.Session()
# Set up EC2
ec2 = session.resource("ec2")
# The instance to start
instance_id = "i-1234567890abcd"
def lambda_handler(event, context):
logger.info('Start handling event.')
logger.info('Starting instance ' + instance_id)
instance = ec2.Instance(instance_id)
response = instance.start()
try:
current_state = response['StartingInstances'][0]['CurrentState']
except (KeyError, IndexError) as e:
logger.warn('Unexpected response when starting instance: {}'.format(response))
else:
if current_state not in ('pending', 'running'):
logger.warn('Instance {} is in unexpected state {} after starting'.format(id, current_state))
else:
logger.info('Started instance ' + instance_id)

Meteor methods - stream/yield data from server

I'm writing a Meteor app which allows clients to execute terminal commands on the server at the click of a button.
I know how to do this with a single command:
//server
Meteor.methods({ exec : cmd => { ... } })
//client
Meteor.call('exec', cmd, (err, result) => {
console.log(result)
})
But now I'm trying to implement a more complex protocol and don't quite know what the best way is. I want the client to kick off a series of commands, have the server run them and tell me, step by step, whether they succeeded or failed.
Obviously I could implement this with the above code by writing client-side code that runs exec with the first command, checks the result from the server, runs exec with the next command and so on.
The crux is that in my case the series of commands is always the same, so it would make much more sense to only do one Meteor.call on the client -- the server would know what commands to run. However I would also like to have the results of the individual commands available on the client as they come in -- and this is what I can't do, because Meteor.call only returns once, of course.
What I'm looking for is a sort of stream or iterator through which I can send a number of messages to the client until everything is done. I've seen some outdated packages called meteor-streams and similar that might be able to do something like that, but I'm thinking there must be a smart way in Meteor itself to solve this. Ideas?
A common solution is a Notifications collection. Create the collection with a schema: for: ${userid}, msg: ${msg string}, type: ${err success etc}. Create a Notifications publication, which publishes docs with the users userid.
You can then subscribe to the Notifications collection in some main template page on the client. Use observeChanges to look for changes to the collection and either console.log them, use JavaScript to display them on the page or simply install a package like sAlerts to handle them.
Inside the observe changes callback, a seenNotification method should be called which removes the notification from the db, so it is not shown again.
I'll post code snippets a bit later.
Have a look at this: https://github.com/RocketChat/meteor-streamer
I think it will solve your problem easily.

Async server or quickly loading state in R

I’m writing a webserver that sometimes has to pass data through a R script.
Unfortunately startup is slow, since i have to load some libraries which load other libraries etc.
Is there a way to either
load libraries, save the interpreter state to a file, and load that state fast when invoked next time? Or
maintain a background R process that can be sent messages (not just lowlevel data streams), which are delegated to asynchronous workers (i.e. sending a new message before the previous is parsed shouldn’t block)
R-Websockets is unfortunately synchronous.
Rserve and RSclient is an easy way to do create and use an Async server.
Open two R sessions.
in the first one type:
require(Rserve)
run.Rserve(port=6311L)
in the second one type:
require(RSclient)
rsc = RS.connect(port=6311L)
# start with a synchronous call
RS.eval(rsc, {x <<- vector(mode="integer")}, wait=TRUE)
# continue with an asynchronous call
RS.eval(rsc, {cat("begin")
for (i in 1:100000) x[[i]] <-i
cat("end")
TRUE
},
wait=FALSE)
# call collect until the result is available
RS.collect(rsc, timeout=1)

Resources