Avoiding mapred.child.env modification at runtime on HDP so that R can establish connection to hiveserver2 using RHive - r

I'm trying to get R's RHive package to communicate nicely with hiveserver2.
I receive an error while trying to connect into hiveserver2 using:
>rhive.connect(host="localhost",port=10000, hiveServer2=TRUE, user="root", password="hadoop")
The output on the initial run:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/03/19 07:08:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/19 07:08:23 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/03/19 07:08:24 INFO jdbc.Utils: Supplied authorities: localhost:10000
15/03/19 07:08:24 INFO jdbc.Utils: Resolved authority: localhost:10000
15/03/19 07:08:24 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default
This leads to the error:
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.child.env at runtime. It is not in list of params that are allowed to be modified at runtime
On subsequent runs of the same command the output reduces to:
15/03/19 07:16:24 INFO jdbc.Utils: Supplied authorities: localhost:10000
15/03/19 07:16:24 INFO jdbc.Utils: Resolved authority: localhost:10000
15/03/19 07:16:24 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.child.env at runtime. It is not in list of params that are allowed to be modified at runtime
This indicates to me that I may have insufficient permissions somewhere... However, I'm running this using root. So, I'm unsure of what permissions I'm missing...
I've installed RHive using the installation guidelines via README.
NOTE: The same error occurs if I use the CRAN version of the package.
I'm currently using Hortonworks Data Platform 2.2 (HDP 2.2)'s virtual box image. As a result, hadoop and hiveserver2 are already installed. I've installed R version 3.1.2.
The following is how I am installing RHive:
# Set up paths for HIVE_HOME, HADOOP_HOME, and HADOOP_CONF
export HIVE_HOME=/usr/hdp/
export HADOOP_HOME=/usr/hdp/
export HADOOP_CONF_DIR=/etc/hadoop/conf
# R Location via RHOME
# Place R_HOME into hadoop config location
sudo sh -c "echo \"R_HOME='$R_HOME'\" >> $HADOOP_HOME/conf/hadoop-env.sh"
# Add remote enable to Rserve config.
sudo sh -c "echo 'remote enable' >> /etc/Rserv.conf"
# Launch the daemon
R CMD Rserve
# Confirm launch
netstat -nltp
# Install ant to build java files
sudo yum -y install ant
# Install package dependencies
sudo R --no-save << EOF
install.packages( c('rJava','Rserve','RUnit'), repos='http://cran.us.r-project.org', INSTALL_opts=c('--byte-compile') )
# Install RHive package
git clone https://github.com/nexr/RHive.git
cd RHive
ant build
To check either open R and use the statements between EOF or just run the command directly from shell:
sudo R --no-save << EOF
rhive.connect(host="localhost",port=10000, hiveServer2=TRUE, user="root", password="hadoop")

The answer is mentioned at this link.
Basically, you have to add a property "hive.security.authorization.sqlstd.confwhitelist.append" with value "mapred.child.env" in /etc/hive/conf/hive-site.xml
This solution worked for me, but I used Ambari UI to make this configuration change.


Missing binary when run command 'conan create', the auto finding package_id is not the one existed on remote artifactory

I run conan create, but encountered an error.
it shows that it can't find the required package with the package_id 9a528cba5863064039249f6fd79f1ba70071cfb6 under version 0.1.3,
but the package_id of the package under version 0.1.3 existed on remote artifactory is 48a2fe710ac47cd442376b3b7175a956b16574c4,not 9a528cba5863064039249f6fd79f1ba70071cfb6, 9a528cba5863064039249f6fd79f1ba70071cfb6 is not existed under any version on remote or local.
why conan always find the not existed 9a528cba5863064039249f6fd79f1ba70071cfb6?
and how to resolve it to find the existed 48a2fe710ac47cd442376b3b7175a956b16574c4?
Reproduce steps:
I set 'xxxx_app/0.1.3#xx1.3.0/stable' as requirements in conanfile.py
and run below command and got above error:
$ mkdir build && cd build
$ conan create .. xx1.3.0/stable -r xxx-conan-local --profile=orin -s build_type=Release
Error log:
ERROR: Missing binary: xxxx_app/0.1.3#xx1.3.0/stable:9a528cba5863064039249f6fd79f1ba70071cfb6
xxxx_app/0.1.3#xx1.3.0/stable: WARN: Can't find a 'xxxx_app/0.1.3#xx1.3.0/stable' package for the specified settings, options and dependencies:
Settings: arch=armv8, build_type=Release, compiler=gcc, compiler.libcxx=libstdc++11, compiler.version=9.3, os=Linux, platform=orin
Package ID: 9a528cba5863064039249f6fd79f1ba70071cfb6

Broken airflow upgrade_check command. Outputs "Please install apache-airflow-upgrade-check distribution from PyPI to perform upgrade checks"

Currently running airflow 1.10.15. Wanted to perform some tests before upgrading to 2+. So installed pip install apache-airflow-upgrade-check in the scheduler pod which installed successfully. So I then run the command airflow upgrade_check but it did not return the results that I expected. It's giving me this output in terminal
[2021-06-15 21:02:38,637] {{settings.py:233}} DEBUG - Setting up DB connection pool (PID 15732)
[2021-06-15 21:02:38,637] {{settings.py:300}} DEBUG - settings.prepare_engine_args(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=15732
[2021-06-15 21:02:38,735] {{sentry.py:179}} DEBUG - Could not configure Sentry: No module named 'blinker', using DummySentry instead.
[2021-06-15 21:02:38,754] {{__init__.py:45}} DEBUG - Cannot import due to doesn't look like a module path
[2021-06-15 21:02:38,916] {{cli_action_loggers.py:42}} DEBUG - Adding <function default_action_log at 0x7f9a637c3a70> to pre execution callback
Please install apache-airflow-upgrade-check distribution from PyPI to perform upgrade checks
[2021-06-15 21:02:39,266] {{settings.py:310}} DEBUG - Disposing DB connection pool (PID 15732)
What am I missing?
Updated 6/16/2021: I verified if the package was installed, I did see the package in list:
apache-airflow 1.10.15
apache-airflow-upgrade-check 1.3.0
apispec 1.3.3
argcomplete 1.12.2
The problem I had was the container was running as a non-root user which was defined in Dockerfile. If I install the package in the running pod, it will install in some local directory and when executing the airflow upgrade_check command, it cannot find the package. To work-around this issue, I need to add the packages in Dockerfile so it will be included when creating the docker image.

Installing SystemML from MVN/GitHub?

SystemML is available on https://github.com/SparkTC/systemml
How do I get it started with? I am newbie to GitHub.
I created a directory in my Ubuntu and copied the POM.xml file - when I issued mvn clean package, I am getting the error:
mvn clean package
[INFO] Scanning for projects...
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR] The project com.ibm.systemml:systemml-parent:5.2-SNAPSHOT (/home/vmuser/system-ml/pom.xml) has 1 error
[ERROR] Child module /home/vmuser/system-ml/system-ml of /home/vmuser/system-ml/pom.xml does not exist
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
When I went to R and issued the following command on R 64 bit ver 3.1.1, I got error too:
> install.packages(c("batch", "bitops", "boot", "caTools", "data.table", "doMC", "doSNOW", "ggplot2", "glmnet", "lda", "Matrix", "matrixStats", "moments", "plotrix", "psych", "reshape", "topicmodels", "wordcloud", "methods"), dependencies=TRUE)
--- Please select a CRAN mirror for use in this session ---
Warning: unable to access index for repository https://cran.rstudio.com/bin/windows/contrib/3.1
Warning: package ‘methods’ is in use and will not be installed
Warning message:
packages ‘batch’, ‘bitops’, ‘boot’, ‘caTools’, ‘data.table’, ‘doMC’, ‘doSNOW’, ‘ggplot2’, ‘glmnet’, ‘lda’, ‘Matrix’, ‘matrixStats’, ‘moments’, ‘plotrix’, ‘psych’, ‘reshape’, ‘topicmodels’, ‘wordcloud’ are not available (for R version 3.1.1)
The error message you received tells you what the problem is (formatting mine):
The project com.ibm.systemml:systemml-parent:5.2-SNAPSHOT (/home/vmuser/system-ml/pom.xml) has 1 error
Child module /home/vmuser/system-ml/system-ml of /home/vmuser/system-ml/pom.xml does not exist
You said:
I created a directory in my Ubuntu and copied the POM.xml file
You don't just need the pom.xml file; you need the whole project. Either git clone it or download the source as a zip and extract it, then run mvn clean package from the project directory.
git clone is a better option if you intend to modify the source code. It will give you some powerful tools for integrating upstream changes and for submitting your modifications to the parent project. If you just want to use the project as-is, either option should be fine.
SystemML became an Apache (incubating) project in November of 2015. Its main website is located at http://systemml.apache.org/. The project can now be found on GitHub at https://github.com/apache/incubator-systemml.
Probably the quickest way to get started with Apache SystemML is to download a pre-built release package from the Apache SystemML Downloads page (see the main website). Information about Apache SystemML can be found at the Apache SystemML Documentation site, which is linked to from the main site. This includes information about running SystemML in notebooks, on Spark, and on Hadoop.
If you would like to clone the SystemML repository and build it locally with Maven, instructions to do so can be found in the project README on GitHub.

Getting "org.scala-sbt#sbt;0.13.8: not found" error though I have the same version installed in my system.How to resolve this?

set -e pipefail; sbt -Dsbt.log.noformat=true -DchiselVersion="latest.release" "run Parity --genHarness --compile --test --backend c --vcd " | tee Parity.out
Getting org.scala-sbt sbt 0.13.8 ...
problems summary ::
module not found: org.scala-sbt#sbt;0.13.8
:: org.scala-sbt#sbt;0.13.8: not found
Server access Error: Connection refused url=https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt/0.13.8/ivys/ivy.xml
Server access Error: Connection refused url=https://repo1.maven.org/maven2/org/scala-sbt/sbt/0.13.8/sbt-0.13.8.pom
Server access Error: Connection refused url=https://repo1.maven.org/maven2/org/scala-sbt/sbt/0.13.8/sbt-0.13.8.jar
unresolved dependency: org.scala-sbt#sbt;0.13.8: not found
Error during sbt execution: Error retrieving required libraries
Error: Could not retrieve sbt 0.13.8
this might be proxy issue.
Edit $SBT_HOME/conf directory/sbtconfig.txt file and add following entries:
-Dhttp.proxyHost=<proxy server>
-Dhttp.proxyPort=<proxy port>
-Dhttps.proxyHost=<proxy server>
-Dhttps.proxyPort=<proxy port>
-Dftp.proxyHost=<proxy server>
-Dftp.proxyPort=<proxy port>
https settings are necessary as many urls referred by the SBT are https based.
Don't include "http://" in the value
I faced a similar issue. Seems like the issue is with the java that is used. By mistake my environment was pointing to jre rather than jdk. After pointing to right JAVA_HOME as below, the sbt clean package compile worked fine.
[root#spark-sql-perf]# update-alternatives --config java
There are 2 programs which provide 'java'.
Selection Command
*+ 1 java-1.8.0-openjdk.ppc64le (/usr/lib/jvm/java-1.8.0-openjdk-
2 java-1.7.0-openjdk.ppc64le (/usr/lib/jvm/java-1.7.0-openjdk-
Enter to keep the current selection[+], or type selection number: q
There are 2 programs which provide 'java'.
Selection Command
*+ 1 java-1.8.0-openjdk.ppc64le (/usr/lib/jvm/java-1.8.0-openjdk-
2 java-1.7.0-openjdk.ppc64le (/usr/lib/jvm/java-1.7.0-openjdk-
Enter to keep the current selection[+], or type selection number: ^C
[root#spark-sql-perf]# export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-
[root#spark-sql-perf]# export PATH=$JAVA_HOME/bin:$PATH

rhive.connect error - file:///rhive/lib/2.0-0.0/rhive_udf.jar does not exist

I am facing an error while connecting R with Hive using the rhive package. The package was installed perfectly but it is returning error while using rhive.connect. Please note the following:
Rserve is running as a daemon
R and Hive are installed on separate servers but within the same cluster
RHive was built from source using git. The version is 2.0-0.0
I am connecting to hiveserver running on port 10000
The error message says "file:///rhive/lib/2.0-0.0/rhive_udf.jar does not exist" although the file is there (in linux directory) and the entire directory and file has full permissions.
Below is the snapshot of the error:
Loading required package: rJava
Loading required package: Rserve
hadoop home: /opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hadoop
hive home: /opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hive
14/07/04 00:45:51 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hadoop/client-0.20/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hadoop/client/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hadoop/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/07/04 00:45:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+ / hiveServer2 argument has not been provided correctly. +
+ / RHive will use a default value: hiveServer2=TRUE. +
Error: java.sql.SQLException: Error while processing statement: file:///rhive/lib/2.0-0.0/rhive_udf.jar does not exist.
Can someone please help? Thank you.
You need to specify defaultFS parameter. RHive tries to write to filesystem instead of HDFS if you don't give defaultFS parameter.
rhive.connect("", defaultFS="hdfs://")
