Connecting r with Hadoop - r

I am trying to integrate r with hadoop using Revolution Analytics RHadoop
BUt i am facing problems using windows.
I am gettting error while running rmr package
Error in mr(map = map, reduce = reduce, reduce.on.data.frame = reduce.on.data.frame, :
hadoop streaming failed with error code 127
Has anyone any idea how this issue can be resolved

You need to use CDH3 or higher or Apache Hadoop 1.0.2 or higher. If you absolutely have to run it with 0.20.2 there is a list of patches you need to apply on the wiki (Which Hadoop for rmr). Have you just asked on the issue tracker by any chance? Unfortunately there are many forums you can use, but only one answer.

Related

Telepresence Connection Error - Traffic Manager version unsupported, must be 2.4.5 or higher while it is 2.6.5

I have started to face this problem. While trying to connect, I am facing an error stating that my traffic-manager version is 2.1.5 and it should be at least 2.4.5.
"telepresence connect" command checks for new versions and modifies it if there is any new version exists. So I am thinking that started to create a problem. Because I was using it as normal.
When I check the connector.log file these two lines create the problem.
connector/session : Existing Traffic Manager 2.6.5 not owned by cli or does not need upgrade, will not modify
connector/session : failed to connect to root daemon: rpc error: code = Unknown desc = unsupported traffic-manager version 2.1.5. Minimum supported version is 2.4.5
So somehow I have two versions now while checking for the update it hits 2.6.5 but while trying to run it tries with 2.1.5. Trying to uninstall telepresence but it also faces the same problem and I couldn't locate and delete traffic-manager 2.1.5. My OS is Windows 11.
Because of that, I am kind of blocked with my tests. Any help will be well appreciated. Thanks!
After asking the question, a new version arrived, if anyone encountered this problem, please update telepresence to 2.6.6. It is fixed now.

makeClusterPSOCK ERROR workers failed to connect

I encounter this error when running Seurat on R.
Error in makeClusterPSOCK(workers, ...) :
Cluster setup failed. 4 of 4 workers failed to connect.
Never happened before installing R 4.1.
I have tried to no avail
parallel:::setDefaultClusterOptions(setup_strategy = "sequential")
cl <- parallel::makeCluster(2, setup_strategy = "sequential")
Any suggestions (and maybe a little explanation because I am relatively new to R still)? My computer overheats and I believe the command below is not working
**options(future.globals.maxSize = 8000 * 1024^2)
plan("multiprocess", workers = 4)**
4.1 R/RStudio has all sorts of issues with parallel right now. I experienced similar issues with the CB2 package on R 4.1 which also uses parallel for multicore support. This is probably related to an as of yet unpatched bug in R 4.1 (mentioned here and here), though there is now a specific fix in R-devel 80472. If your issues are unresolved with the advice from those threads, I suggest rolling back to a previous R version that doesn't present the issue.

AI platform R notebook

I have used RStudio to submit a job a few months ago to cloudml (AI platform) and it was successful.
Today I tried to use AI platform notebook to submit the same job but I get:
"ERROR: (gcloud.ai-platform.jobs.submit.training) INVALID_ARGUMENT: Field: runtime_version Error: The specified runtime version '1.9' with the Python version ''"
I even ran which python in the terminal and then in the R env.:
library(reticulate)
use_python("result of the which python")
I tried R in the terminal as well and get the same error.
I don't know if it helps or not but the previous run and this one were in different regions.
us-central was successful
australia-southeast1 was getting this error.
This error occurs because as of March 16, 2020, you can no longer create training jobs that use runtime version 1.9. You can try submitting the job with version 1.15 which is the only Tensorflow 1.x version that is currently supported for training jobs. It is still possible though that you may experience errors due to incompatibilities in the code.

Azure Machine Learning integration of R: Should the 'azureml' module have an attribute 'core'?

I'm having issues with Azure Machine Learning SDK for R: "module 'azureml' has no attribute 'core'"...
For reasons that aren't my own, I have to use azureml to apply machine learning (my own stuff, written in R) to data from our data warehouse that is put in the blob storage. The modelled output should be put back into the blob storage so it can be accessed from the data warehouse.
I've written the code in R on my local machine (stored in a git repo). Preferably, I'd find some method to pull my code from git into a pipeline in the azureml environment so that it can be directly run whenever new data is available in the blob storage.
I've embarked on a tutorial-spree and found this seemingly relevant walkthrough: Train and deploy your first model with Azure ML (and this one).
But... after trying all I could think of, I'm stuck on the first steps. After installing all (or at least.. that's what I think) packages, modules, apps etc, and running the following code in RStudio:
library(azuremlsdk)
existing_ws <- get_workspace(name = name,
subscription_id = subscription_id,
resource_group)
I run into an error that I haven't been able to fix:
AttributeError: module 'azureml' has no attribute 'core'
It seems that the azuerml is supposed to have an attribute "core", but when looking at it more closely, there is indeed no such attribute.
The function "get_workspace()" is trying to access: "azureml$core$Workspace$get".
I found that "azuerML$Workspace" does exist, but then I can't figure out how to make that work.
Can anyone explain to me why I'm encountering this error?
Does anyone know of a better tutorial on how to connect my R code the azureml's cloud service?
Any pointers in the right direction are much appreciated!
EDITS - still not solved:
After advice from others, I double, triple and quadruple checked the installation.
I updated R and I'm now running:
R.version
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 6.2
year 2019
month 12
day 12
svn rev 77560
language R
version.string R version 3.6.2 (2019-12-12)
nickname Dark and Stormy Night
I installed Conda with Python 3.6.10.
I installed the azuremlsdk R package (I tried both provided options).
I then realized that there are some inconsistencies with the versions of the azure-modules, so I also tried installing it with the keyword 'multi-arch':
remotes::install_cran('azuremlsdk', repos = 'http://cran.us.r-project.org', INSTALL_opts=c("--no-multiarch"))
Then, I installed the azureml python sdk.
I had a look at all the versions again (using python -m pip freeze):
azure-common==1.1.24
azure-graphrbac==0.61.1
azure-mgmt-authorization==0.60.0
azure-mgmt-containerregistry==2.8.0
azure-mgmt-keyvault==2.0.0
azure-mgmt-resource==7.0.0
azure-mgmt-storage==7.1.0
azureml==0.2.7
azureml-automl-core==1.0.83.1
azureml-core==1.0.69
azureml-dataprep==1.1.36
azureml-dataprep-native==13.2.0
azureml-pipeline==1.0.69
azureml-pipeline-core==1.0.69
azureml-pipeline-steps==1.0.69
azureml-sdk==1.0.69
azureml-telemetry==1.0.69
azureml-train==1.0.69
azureml-train-automl-client==1.0.83
azureml-train-core==1.0.69
azureml-train-restclients-hyperdrive==1.0.69
As I was surprised to see all the 1.0.69 versions, instead of the 1.0.83 versions, I re-installed the azureml python sdk using:
azuremlsdk::install_azureml(version = "1.0.83")
This worked, in the sense that indeed all versions are now 1.0.83:
azure-common==1.1.24
azure-graphrbac==0.61.1
azure-mgmt-authorization==0.60.0
azure-mgmt-containerregistry==2.8.0
azure-mgmt-keyvault==2.0.0
azure-mgmt-resource==7.0.0
azure-mgmt-storage==7.1.0
azureml==0.2.7
azureml-automl-core==1.0.83.1
azureml-core==1.0.83
azureml-dataprep==1.1.36
azureml-dataprep-native==13.2.0
azureml-pipeline==1.0.83
azureml-pipeline-core==1.0.83
azureml-pipeline-steps==1.0.83
azureml-sdk==1.0.83
azureml-telemetry==1.0.83
azureml-train==1.0.83
azureml-train-automl-client==1.0.83
azureml-train-core==1.0.83
azureml-train-restclients-hyperdrive==1.0.83
But still... I get the error with the missing core. I get it both when running:
library(azuremlsdk)
get_current_run()
and also when running:
library(azuremlsdk)
existing_ws <- get_workspace(name = name,
subscription_id = subscription_id,
resource_group)
Note that the first time running this code after starting up RStudio, I get the error:
Error in py_get_attr_impl(x, name, silent) :
AttributeError: module 'azureml' has no attribute '_base_sdk_common'
And every time after that I get this error:
Error in py_get_attr_impl(x, name, silent) :
AttributeError: module 'azureml' has no attribute 'core'
Any help would be much appreciated!
This issue was introduced by the latest reticulate 1.14 release, in which reticulate would create a default r-reticulate conda environment. Since Azure ML was installing the python SDK in an environment named r-azureml, the r-reticulate environment used by reticulate was missing the python SDK. A fix for this issue was addressed in a PR and has been merged into master. Please install from GitHub for now if you have reticulate version 1.14 and are running into this issue. We will be releasing an update to CRAN shortly.
I seemed to have fixed the issue by specifically installing the python package azureml AND azureml.core:
python -m pip install azureml
and then...
python -m pip install azureml.core
I did this for the Conda version that was called by R (r-reticulate). It's a bit odd to not be able to use the Conda environment 'r-azureml' without R switching back to 'r-reticulate', but ah well... at least I don't get my 'azureml' has no attribute 'core' anymore.

How to bypass repos error when installing repository on internet disconnected machine?

I've found a number of different methods that can be used to install packages on a machine that isn't connected to the internet. This post offers a fairly straightforward method to download the packages, transfer to the disconnected machine, and then to point the R installation at this custom repository.
After transferring the files, I ran the following command on the machine not connected to the internet:
update.packages(repos="C:/Users/username/Documents/R/R Repository/3.4",repos = NULL,type = "source")
After running the above line, I am getting the following error:
Error in update.packages(repos = "C:/Users/username/Documents/R/R Repository/3.4", :
formal argument "repos" matched by multiple actual arguments
Another thing I noticed is that the downloaded packages are all ".tar.gz" files, and this is a windows machine (as was the machine in the linked post above). Could this be part of the problem?
Any help would be much appreciated, thank you!
It looks like you have "repos =" twice in your call. Take out "repos = NULL" and try it again.

Resources