We are running a larger scale project on Firebase and have already invested in unit tests. Now, we are also using the experimental shell to run integration tests against a testing environment and database. We would very much like to invoke functions via bash and shell script instead of opening the experimental shell, requiring our testdata and invoking each function manually. We tried reverse-engineering the firebase tools for that matter but this seems to be overkill. Any idea how we might be able to test-run all our functions in series?
Related
There are several R scripts that need to be run periodically. Currently, i am having an EC2 instance where these R scripts are running through Cron jobs. However, this is not cost efficient as the scripts do not run all the time.
I am looking for a service that lets me deploy the R scripts and schedule them, only paying per use. Something like for instance AWS Lambda does.
Note: Rewrite these scripts is not a solution for now, since there are many and I do not have the resources for it.
Any ideas or suggestions about it?
You can containerize your scripts and try to run them on ECS with a cron schedule.
Quick search can give you plenty of examples on dockerizing R scripts, like this.
You can push your resulting images to AWS ECR, which is docker registry, and use images to define ECR tasks: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-task-definition.html
After that you can run your tasks on schedule.
This way scripts will only consume compute power while working. It still requires some refactoring in form of containerization, but after you do this once it should scale to all other scripts.
If containerization is still too much work, you can combine EC2 instance scheduler with reserved scheduled instances for savings, but be aware that reserved instances have a lot of limitations if you plan on savings.
I am developing processes for collecting, cleaning and storing various data sets. The development is done with RStudio projects. I won't say I'm following every tidyverse/RStudio workflow recommendation but in general I'm using that framework-- relevant now is that I'm using standard subdirectories and the here package for referencing them.
Every project has a MAIN.R script that ultimately sources the functions from the other scripts-- one only needs to run MAIN.R to execute the process. I did this not only for simplicity but also because the long-term intent is to have this be a scheduled process.
For now at least my method for scheduling R Scripts is with Windows Task Scheduler. Getting an R Script scheduled and running is not a problem. The issue is the contextual assumptions of developing within a project: source(here("CODE", "some-file.R")) fails when I run MAIN.R outside of the scope of the project.
One obvious solution would be to hard-code the project location as one of the parameters. I would need to have two different MAIN.R files, one for development that uses the project and one that uses that parameter for scheduling. I don't hate that idea, don't love it as it someone nullifies the whole point of the project/here approach. Is there a more elegant solution that someone else has created that I couldn't find on Google, or better workaround ideas?
I ended up using the solution described here: https://community.rstudio.com/t/how-to-play-nice-with-taskscheduler-r-studio-projects-and-here/24406/2 .
I didn't have to make any changes to the MAIN.R script. Instead, I scheduled it directly but added the project directory to the "Starts In" argument of the Windows Task Scheduler task.
I'm using Travis to automatically deploy my Firebase hosted website and cloud functions as I push to GitHub, as detailed here. However, even for my small website with a limited amount of cloud functions, deploying all of the functions takes quite a long time. Were I deploying manually, I would be able to use --only to specify precisely those functions that I actually changed. Is there a way to make this information available to Travis, so that only the necessary functions are rebuilt?
https://m.youtube.com/watch?v=iyGHW4UQ_Ts
min 30 and following
This guy solves the problem by copying all functions to a cloud bucket and then making a diff for every file. This works well if all your logic is in one file. But this is not what you want for larger projects. For my own project i used webpack to create one file for each function that includes the imports. then i generate a md5 hash for that file and save it to a functions-lock.json. with the next run i can easily check against the old hash value and only deploy the changed functions. The ci should manage the state of the lock file by uploading it to the cloud or doing some git magic
Unfortunately this isn't going to be simple to do -- the Firebase CLI deploys all of your functions because it's next-to-impossible to just analyze the code and figure out which functions are impacted (since you can require other files, you might have updated dependencies but no files changed, etc.).
One thing I can think of that might be a hack would be to have named branches for functions or groups of functions. Then you could git push to the branch of the specific function you want to deploy, and have a script that uses the branch name as a signal to pass the --only functions:<fnName> to the firebase deploy command. That's not the most glamorous solution, but, depending on how much this bugs you, it might help.
So this is a bit late but the long deployment times have bothered us for a while now.
Our solution is based on CircleCI but it should be possible to adapt.
First we get all changed files in the last merged PR for our branch with
git log -m -1 --name-only --pretty="format:" ${process.env.CIRCLE_SHA1}
CIRCLE_SHA1 is the SHA of the last merge commit, i.e featurebranch -> master
Then we get all the function filenames from our /functions/ directory and use
madge to generate an array of all the dependencies those functions have.
Next we go trough all changed files that we got from git and check if their filename is part of the dependency array for a sepcific cloud function, if so we add the cloudfunction to another array.
once this is done we pretty much have an array from all cloudfunctions that have been affected by the change of a specific file that we now can map to their actual cloud function names for deployment.
Now instead of always deploying 75 cloudfunctions which takes 45 minutes we only deploy maybe 20.
I have an R script that I run every day that scrapes data from a couple of different websites, and then writes the data scraped to a couple of different CSV files. Each day, at a specific time (that changes daily) I open RStudio, open the file, and run the script. I check that it runs correctly each time, and then I save the output to a CSV file. It is often a pain to have to do this everyday (takes ~10-15 minutes a day). I would love it if someway I could have this script run automatically at a pre-defined specific time, and a buddy of mine said AWS is capable of doing this?
Is this true? If so, what is the specific feature / aspect of AWS that is able to do this, this way I can look more into it?
Thanks!
Two options come to mind thinking about this:
Host a EC2 Instance with R on it and configure a CRON-Job to execute your R-Script regularly.
One easy way to get started: Use this AMI.
To execute the script R offers a CLI rscript. See e.g. here on how to set this up
Go Serverless: AWS Lambda is a hosted microservice. Currently R is not natively supported but on the official AWS Blog here they offer a step by step guid on how to run R. Basically you execute R from Python using the rpy2-Package.
Once you have this setup schedule the function via CloudWatch Events (~hosted cron-job). Here you can find a step by step guide on how to do that.
One more thing: You say that your function outputs CSV files: To save them properly you will need to put them to a file-storage like AWS-S3. You can do this i R via the aws.s3-package. Another option would be to use the AWS SDK for python which is preinstalled in the lambda-function. You could e.g. write a csv file to the /tmp/-dir and after the R script is done move the file to S3 via boto3's S3 upload_file function.
IMHO the first option is easier to setup but the second-one is more robust.
It's a bit counterintuitive but you'd use Cloudwatch with an event rule to run periodically. It can run a Lambda or send a message to an SNS topic or SQS queue. The challenge you'll have is that a Lambda doesn't support R so you'd either have to have a Lambda kick off something else or have something waiting on the SNS topic or SQS queue to run the script for you. It isn't a perfect solution as there are, potentially, quite a few moving parts.
#stdunbar is right about using CloudWatch Events to trigger a lambda function. You can set a frequency of the trigger or use a Cron. But as he mentioned, Lambda does not natively support R.
This may help you to use R with Lambda: R Statistics ready to run in AWS Lambda and x86_64 Linux VMs
If you are running windows, one of the easier solution is to write a .BAT script to run your R-script and then use Window's task scheduler to run as desired.
To call your R-script from your batch file use the following syntax:
C:\Program Files\R\R-3.2.4\bin\Rscript.exe" C:\rscripts\hello.R
Just verify the path to the "RScript" application and your R code is correct.
Dockerize your script (write a Dockerfile, build an image)
Push the image to AWS ECR
Create an AWS ECS cluster and AWS ECS task definition within the cluster that will run the image from AWS ECR every time it's spun-up
Use EventBridge to create a time-based trigger that will run the AWS ECS task definition
I recently gave a seminar walking through this at the Why R? 2022 conference.
You can check out the video here: https://www.youtube.com/watch?v=dgkm0QkWXag
And the GitHub repo here: https://github.com/mrismailt/why-r-2022-serverless-r-in-the-cloud
We have a complete Lab Management environment running Coded UI tests in nightly builds. What we are trying to achieve is to run our integration tests (regular TestMethod() with SQL connections) just before all the Coded UI tests to verify that our db scripts are executed correctly and that there are no new changes causing any problems.
So far I have found a way to execute tests remotely through .testrunconfig. The problem we have with that approach is that it's not possible to choose a testcontroller connected to a team project so I guess that would be only useful for running tests on physical machines outside of Lab Management?
One option seem to be to create a Test Case for each integration test and that should run it together with the UI tests but it feels like it will be to much maintenance managing hundreds of test cases just to run the integration tests. Also it would be better to completely separate the test runs for the different kinds of tests.
Is there any easy way to achieve this that I have totally missed? Or do I have to modify the lab build template to deploy and run the tests?
I guess that would be only useful for running tests on physical machines outside of Lab Management?
If you run your tests remotely through .testrunconfig you have to connect the Test Agent to another Test Controller which is NOT connected to the team project.
Unfortunately it is impossible for the environment which are running under the Lab Management, to my knowledge.
What about this approach:
Create an Ordered Test containing all you integration tests.
Create a new Test Case "Integration Tests" and automate it by the ordered test
So you do not have to maintain hundreds of Test Cases.
You could also create several Ordered Tests if you want to group the integration tests and then
create a "main" Ordered Test containing them.
This way it will be easier to analyze test results especially if you have a lot of tests.
Let the integration tests run as a part of your existing nightly build.
Create a new Build Definition which does not start a build but uses the last successful nightly build and let your CodedUI tests run using Lab Build Template.
This way you will have different test runs for the different kinds of tests.
The only drawback is that you have to "synchronize" these two builds...
You could just schedule the second build later so you could be sure the fist build is done.
It's not really perfect, I know... but this way you could easily achieve your goal.
I am not sure if there is an alternative solution, but on the project I am currently working on we have both our Unit and Integration Test Assemblies set under the Process options (Process>Basic>AutomatedTest>TestAssembly) in our Nightly Build. This was achieved through altering the Default Build Process Template (not the Lab Default) a bit, as you suggested (I thought this was standard, but it's been a while).