I am trying to reduce my development headaches for creating a ML Webservice on Azure ML Studio. One of the things that stuck me was can we just upload .rda files in the workbench and load it via an RScript (like in the figure below).
But can't connect directly to the R Script block. There's another way to do it (works to upload packages that aren't available in Azure's R directories) -- using zip. But there isn't really any resource out there that I found to access the .rda file in .zip.
I have 2 options here, make the .zip work or any other work around where I can directly use my .rda model. If someone could guide me about how to go forward it would appreciate it.
Note: Currently, I'm creating models via the "Create RModel" block, training them and saving it, so that I can use it to make a predictive web service. But for models like Random Forest, not sure how the randomness might create models (local versions and Azure versions are different, the setting of seed also isn't very helpful). A bit tight on schedule, Azure ML seems boxed for creating iterations and automating the ML workflow (or maybe I'm doing it wrong).
Here is an example of uploading a .rda file for scoring:
https://gallery.cortanaintelligence.com/Experiment/Womens-Health-Risk-Assessment-using-the-XGBoost-classification-algorithm-1
Related
I'm very excited on the newly released Azure Machine Learning service (preview), which is a great step up from the previous (and deprecated) Machine Learning Workbench.
However, I am thinking a lot about the best practice on structuring the folders and files in my project(s). I'll try to explain my thoughts.
Looking at the documentation for the training of a model (e.g. Tutorial #1), there seems to be good-practice to put all training scripts and necessary additional scripts inside a subfolder, so that it can be passed into the Estimator object without also passing all other files in the project. This is fine.
But when working with the deployment of the service, specifically the deployment of the image, the documentation (e.g. Tutorial #2) seems to indicate that the scoring script need to be located in the root folder. If I try to refer to a script located in a subfolder, I get an error message saying
WebserviceException: Unable to use a driver file not in current directory. Please navigate to the location of the driver file and try again.
This may not be a big deal. Except, I have some additional scripts that I import both in the training script and in the scoring script, and I don't want to duplicate those additional scripts to be able to import them in both the training and the scoring scripts.
I am working mainly in Jupyter Notebooks when executing the training and the deployment, and I could of course use some tricks to read the particular scripts from some other folder, save them to disk as a copy, execute the training or deployment while referring to the copies and finally delete the copies. This would be a decent workaround, but it seems to me that there should be a better way than just decent.
What do you think?
Currently, the score.py needs to be in current working directory, but dependency scripts - the dependencies argument to ContainerImage.image_configuration - can be in a subfolder.
Therefore, you should be able to use folder structure like this:
./score.py
./myscripts/train.py
./myscripts/common.py
Note that the relative folder structure is preserved during web service deployment; if you reference the common file in subfolder from your score.py, that reference should be valid within deployed image.
I'm working on an R package at work. My package has gotten large enough that I've decided I need some form of repeatable testing. I settled upon using testthat and mockery. I'm not a developer, so this is the first time I'm writing tests at this level.
I deal with a lot of data files and it's very convenient to have functions in my package to help locate files. These functions interact with the file system via calls to dir. For example,
Data from one event can be split over multiple files. If I have file datafile_2017.10.20_12.00.00, I have a function that can find the next file that is part of the same event, i.e. datafile_2017.10.20_12.05.00.
My question is this: what is the best way to test functions like this? My intuition is to avoid using actual files stored somewhere else in my repository because that can fail for a number of reasons, e.g. different paths, different repo states b/w systems. I searched around and it looks like different languages have mocking libraries that allow for mocking directory structures. I haven't found anything like that for R (except for testthatsomemore, but it was removed from CRAN sometime in 2016).
Is there an R package that allows for mocking directory structures? Or am I wrong to move away from storing small test files in my repo?
I've written an R function (code available on demand) that improves some analysis workflows in my research group (~10 people), and now I would like to deploy it so it's easily accessible to the rest of the group. I am the only person in the group with any knowledge of R.
In a nutshell, the function does the following:
Takes one argument, the directory in which to search for microscopy images in proprietary formats
Asks user (with readline()) which channels should be analysed and what they are named
Generate several histograms and scatter plots of intensity levels per image channel, after various normalisation steps, these are deposited in a .pdf file for each image stack
Perform linear regression, generate a .txt file per image stack
The .pdf and .txt files get output to the directory the user specifies as the argument when running the function. I want to turn it into something somewhat more user-friendly, essentially removing the need to install R + function dependencies. For the sake of universality I would like to deploy it as a web application that takes a .zip file of the images as input, extracts them and then runs the function with that newly created directory as the argument. When it's done, it should output a .zip file of the created .pdfs and .txts. I'm not sure how feasible this is. I have looked into using Shiny for this but I'm having a hard time figuring out how to apply it as I do not have experience with it. I have experience in unix server administration and have a remote server that I can play around with.
The second option would be somewhat less universal, but it would be to deploy it as a Windows executable (I am the only person in my group not to use Windows as a daily OS, but I do have access to a Windows environment). Here, ideally the executable should ask the user to navigate to a directory, then use that directory as the argument to the function and output the generated files in said directory.
As my experience with R is limited, I cannot tell which option would be more feasible and worth it in the long run. My gut feeling says the web application would be the most flexible and user friendly, but more challenging to deploy. Are either of my ideas implementable, and if so, what would be a good way to do so?
Thanks in advance!
As part of a transition from MATLAB to R, I am trying to figure out how to read TDMS files created with National Instruments LabVIEW using R. TDMS is a fairly complex binary file format (http://www.ni.com/white-paper/5696/en/).
Add-ons exist for excel and open-office (http://www.ni.com/white-paper/3727/en/), and I could make something in LabVIEW to make the conversion, but I am looking for a solution that would let me read the TDMS files directly into R. This would allow us to test out the use of R for certain data processing requirements without changing what we do earlier in the data acquisition process. Having a simple process would also reduce the barriers to others trying out R for this purpose.
Does anyone have any experience with reading TDMS files directly into R, that they could share?
This is far from supporting all TDMS specifications but I started a port of a python npTDMS package into R here https://github.com/msuefishlab/tdmsreader and it has been tested out in the context of a shiny app here
You don't say if you need to automate the reading of these files using R, or just convert the data manually. I'm assuming you or your colleagues don't have any access to LabVIEW yourselves otherwise you could just create a LabVIEW tool to do the conversion (and build it as a standalone application or DLL, if you have the professional development system or app builder - you could run the built app from your R code by passing parameters on a command line).
The document on your first link refers to (a) add-ins for OpenOffice Calc and for Excel, which should work for a manual conversion and which you might be able to automate using those programs' respective macro languages, and (b) a C DLL for reading TDMS - would it be possible for you to use one of those?
First I should say that a lot of this is over my head, so I apologize in advance for using incorrect terminology and potentially asking an unclear question. I'm doing my best.
Also, I saw ThisPost; is RCurl the tool I want to use for this task?
Every day for 4 months I'll be analyzing new data, and generating .csv files and .png's that need to be uploaded to a web site so that other team members will be checking. I've (nearly) automated all of the data collecting, data downloading, analysis, and file saving. The analysis is carried out in R, and R saves the files. Currently I use Filezilla to manually upload the new files to the website. Is there a way to use R to upload the files to the web site, so that I don't have to open Filezilla and drag+drop files?
It'd be nice to run my R-code and walk away, knowing that once it finishes running, the newly saved files will be automatically be put on the website.
Thanks for any help!
You didn't specify which protocol you use to upload your files using FileZilla. I assume it is ftp. If so, you can use the ftpUpload function of RCurl:
library(RCurl)
ftpUpload("yourfile", "ftp://ftp.yourserver.foo/yourfile",
userpwd="username:passwd")
RCurl also had methods for scp and should also support sftp using ftpUpload.