Rscript in Hive - r

I have a linear regression model created in R. And have the model file stored. How do i run this on a hive table and score inside hive CLI.
This document has some useful information on this. My question is how to code scorer.R wrapper file. What will be the content of the file. Thank
http://www.slideshare.net/huguk/hug-data-science

Related

How to knit Rmarkdown files without the need to run the codes

I am not sure how to express my question in a well-understood manner. Anyway, my problem is that when I knit the Rmarkdown file, R rerun everything in the file (import data, run models, etc.), which takes a lot of time. Is there a way I can have the output of the models, data frames, graphs, or tables and save that as objects then use these objects as they are without running the process that generated them again during knitting?
Thanks
I believe that your best option is to use the cache capabilities in RMarkdown: {r cache=TRUE}.
Se more here: https://bookdown.org/yihui/rmarkdown-cookbook/cache.html
I find it's effective to do the data preparation and model fitting in a separate .Rmd or .R file and save the resulting data frames and model objects with save.
The notebook I create with figures and tables simply loads the objects in the first chunk with load. That way I can easily iterate on the visualizations and tables without having to re-run the models every time.
Take a look at R Notebooks:
https://bookdown.org/yihui/rmarkdown/notebook.html
Notebooks are just like markdown, but with exactly the feature you are looking for.

How can I build a custom context based Question answering model SQuAD using deeppavlov

I have the following queries
Dataset format (is how to split train, test and valid data )
Where to place the dataset
How to change the path for dataset reader
How to save the model in my own directory
And How to use the trained model
Edit
my_config['dataset_reader']['data_path'] = '/home/ec2-user/SageMaker/squad/data/'
my_config['metadata']['variables']['MODELS_PATH'] = '/home/ec2-user/SageMaker/squad/model/'
I used this command to change my dataset path and model path in configuration file. My model is saved in this location but It is not using my dataset during training instead of this it is downloading its own dataset in that folder and using it.
The example of dataset is https://github.com/deepmipt/DeepPavlov/blob/f5117cd9ad1e64f6c2d970ecaa42fc09ccb23144/deeppavlov/dataset_readers/squad_dataset_reader.py#L46
Your dataset should have the same format.
2-3. The dataset should be placed in the folder https://github.com/deepmipt/DeepPavlov/blob/f5117cd9ad1e64f6c2d970ecaa42fc09ccb23144/deeppavlov/configs/squad/squad_torch_bert.json#L4
(you can change the folder name)
Model is saved in the directory https://github.com/deepmipt/DeepPavlov/blob/f5117cd9ad1e64f6c2d970ecaa42fc09ccb23144/deeppavlov/configs/squad/squad_torch_bert.json#L166
(here you can write your own directory)
Trained model can be used with the command: python3 -m deeppavlov interact <your_config_name> More detailed tutorial how to launch models is here https://github.com/deepmipt/DeepPavlov

How to schedule the run of r script which reads data from postgres database, do some analysis and write resulting data to the database?

I have written a r script file, which reads in data from PostgreSQL database, do some analysis, predict the dependent variable and write back the results into the database. Now, I want to run that script file at fixed intervals. How to achieve this? Is there any way to run this from PostgreSQL or any other way to run this? Please help me.
Try this one:
https://github.com/bnosac/cronR - Unix/Linux
https://github.com/bnosac/taskscheduleR - Win

How to open SPSS metadata files in either SAS or Stata or R or Excel?

I have some SPSS metadata files (*.mdd). However, I do not have SPSS installed on my computer and I do not know how to use SPSS. I want to open the files in either Stata, SAS, R or Excel. Stat-Transfer only allows for SAV and portable files to be converted, and I am not familiar with MySQL either. Any help is appreciated!
mdd files are produced by SPSS Data Collection, not by SPSS Statistics or Modeler. There is an OLEDB driver for these, but you would need to contact the new owner of Data Collection, UNICOM Systems, Inc., a Division of UNICOM Global, or unicomsi.com to see about availability. Of course, you would need an app that supports OLEDB.
To expand on what #JKP stated, .mdd files are really only useful if you're trying to import data collected by a SPSS data collection server, and it's in a non-flat file. I worked on a Data Collection Server for 3 years and the only time I needed an .mdd file in SPSS statistics was to import a file into SPSS statistics that was set up incorrectly on the server.
The script(s) languages used in the .mdd are proprietary, so there isn't much use for them ouside of either SPSS statistics or SPSS Data collection.

build R package using report generation with knitr

I am doing report generation in R with knitr.
So basically I have a dataset, do some preprocessing and then call knitr to output an html report.
This means the entire workflow consists of several R code files and some .Rhtml templates which are needed later on for report generation.
I would like to wrap all of this into a R package.
Having just .r files I would just run package.skeleton() and have a start..
But, how do I deal with the .Rhtml files. What is the proper way to deal with these when building a R package?
Thanks,
Ben Bolkers answer:
put them in a directory within an inst directory and use system.file() to retrieve them.

Resources