I am running CDH 5.10.0 VM
When I create .sql files using gedit in the terminal, into /home/cloudera, I can see the sql file is being created in Desktop-> Cloudera's Home. But the same is not appearing when I use hadoop fs -ls /home/cloudera
Similarly, when I execute INSERT OVERWRITE INTO DIRECTORY /home/cloudera/somefolder, it is not showing physically in Desktop -> Cloudera's Home. But it is being displayed when I use- hadoop fs -ls /home/cloudera
Is it a permission issue? or my VM is corrupted?
Hadoop file system is different from your OS file system(local filesystem) so the path Desktop-> Cloudera's Home is completely different with /home/cloudera in your HDFS.
Hive in Cloudera is configured to use HDFS by default so the query you issued :
INSERT OVERWRITE INTO DIRECTORY /home/cloudera/somefolder
ran using HDFS not your local file system.
Related
I had created a Google compute engine (virtual machine) instance with RStudio Server being unaware that RStudio Server is a licensed software. Now, my trial license for RStudio has expired, and I cannot login to my R sessions anymore.
However, I had written some code which I need to recover. How do I download the files?
I have SSH-ed into my virtual machine but cannot find the relevant files or a way to download them.
I had a similar issue and I was able to recover the files by performing the following steps:
SSH to the virtual machine
Once you are in the virtual machine run the following command: cd ../rstudio-user/
Now ls there you will see the file structure you used to see in the RStudio Server interface}
Navigate using cd and ls between the folders to get to the desired file
Once you are in the desired location (where with an ls you can see the files you want to recover) run the following command: pwd
Click on the Engine and go to download file
Enter the full path of the file you want to download, it will be something like: /home/rstudio-user/FILENAME.R
Click on Download
You can do this for each of the files you want to recover.
In case you want to recover a full folder its easier to compress to a zip file and then to download it.
I have followed the blog (Below mentioned) here and downloaded the parcel and put as per required.
Please let me know if any one has installed and the steps.
(https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html)
/opt/cloudera/csd/SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658-el5.parcel
But service cloudera-scm-server restart is not executing.
To use Cloudera Express (free), run:
sudo /home/cloudera/cloudera-manager --express
This requires at least 8 GB of RAM and at least 2 virtual CPUs.
SPARK 2.2 Installation Setup on Cloudera VM
Step 1: Download a quickstart_vm from the link:
Prefer a vmware platform as it is easy to use, anyways all the options are viable.
Size is around 5.4gb of the entire tar file. We need to provide the business email id as it won’t accept personal email ids.
Step 2: The virtual environment requires around 8gb of RAM, please allocate sufficient memory to avoid performance glitches.
Step 3: Please open the terminal and switch to root user as:
su root
password: cloudera
Step 4: Cloudera provides java –version 1.7.0_67 which is old and does not match with our needs. To avoid java related exceptions, please install java with the following commands:
(a). Downloading Java:
wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
(b). Switch to /usr/java/ directory with “cd /usr/java/” command.
(c). cp the java download tar file to the /usr/java/ directory.
(d). Untar the directory with “tar –zxvf jdk-8u31-linux-x64.tar.gz”
(e). Open the profile file with the command “vi ~/.bash_profile”
(f). export JAVA_HOME to the new java directory.
“export JAVA_HOME=/usr/java/jdk1.8.0_131”
Save and Exit.
(g). In order to reflect the above change, following command needs to be executed on the shell:
source ~/.bash_profile
Step 5: The Cloudera VM provides spark 1.6 version by default. However, 1.6 API’s are old and do not match with production environments. In that case, we need to download and manually install Spark 2.2.
(a). Switch to /opt/ directory with the command:
“cd /opt/”
(b). Download spark with the command:
wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz
(c). Untar the spark tar with the following command:
tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz
(d). We need to define some environment variables as default settings:
Please open a file with the following command:
vi /opt/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh
Paste the following configurations in the file:
SPARK_MASTER_IP=192.168.50.1
SPARK_EXECUTOR_MEMORY=512m
SPARK_DRIVER_MEMORY=512m
SPARK_WORKER_MEMORY=512m
SPARK_DAEMON_MEMORY=512m
Save and exit
(e). We need to start spark with the following command:
/opt/spark-2.2.0-bin-hadoop2.7/sbin/start-all.sh
Export spark_home :
export SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.7/
(f). Change the permissions of the directory:
chmod 777 -R /tmp/hive
(g). Try “spark-shell”, it should work.
Please follow below video it has all the necessary step required in order to install Sprak2 in Clouedra VM.
youtubue link - https://www.youtube.com/watch?v=lQxlO3coMxM
Also for for starting Cloudera Express (free) your VM should have at-least 8Gb RAM allocated or if you have default 4GB RAM allocated then you can forcefullly start ysing below command and then follow the above video.
sudo /home/cloudera/cloudera-manager --force --express
Try this command
sudo /home/cloudera/cloudera-manager --express --force
I gave up on this, nothing works well with parcel and non-parcel installation.
As soon as cloudera express is started numerous errors and Java 7 instead of Java 8.
I got a mapr VM install with Spark 2.x. No issues. Works first time.
That works well. This is my advice # 1.
If you want KUDU, then I would install centos and install things oneself. This is advice # 2. OK, you may miss Impala, but if for pure research and development then not so much of an issue.
With following two command my spark2.2 was automatically updated to spark 2.4:
(i) sudo yum update
It might be that your java, home path is screwed, in that case please export the java home path in bash file.
(a) vi ~/.bash_profile
(b)
(c) source ~/.bash_profile
Just download the right version of spark that you need say 'spark-2.2.0-bin-hadoop2.6'
open bashrc_profile through vi editor
vi ~/.bash_profile. Paste the below 2 lines
SPARK_HOME=/home/cloudera/Downloads/spark-2.2.0-bin-hadoop2.6
PATH=$PATH:$HOME/bin:$SPARK_HOME/bin
Save it
Then run the command : source ~/.bash_profile
Now start spark-shell .
Note : Make sure you have JDK 1.8 installed
SnPARK 2.2 Installation Setup on Cloudera VM
Step 1: Download a quickstart_vm from the link:
Prefer a vmware platform as it is easy to use, anyways all the options are viable.
Size is around 5.4gb of the entire tar file. We need to provide the business email id as it won’t accept personal email ids.
Step 2: The virtual environment requires around 8gb of RAM, please allocate sufficient memory to avoid performance glitches.
Step 3: Please open the terminal and switch to root user as:
su root
password: cloudera
Step 4: Cloudera provides java –version 1.7.0_67 which is old and does not match with our needs. To avoid java related exceptions, please install java with the following commands:
(a). Downloading Java:
wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
(b). Switch to /usr/java/ directory with “cd /usr/java/” command.
(c). cp the java download tar file to the /usr/java/ directory.
(d). Untar the directory with “tar –xvzf jdk-8u31-linux-x64.tar.gz”
(e). Open the profile file with the command “vi ~/.bash_profile”
(f). export JAVA_HOME to the new java directory.
“export JAVA_HOME=/usr/java/jdk1.8.0_131”
Save and Exit.
(g). In order to reflect the above change, following command needs to be executed on the shell:
source ~/.bash_profile
Step 5: The Cloudera VM provides spark 1.6 version by default. However, 1.6 API’s are old and do not match with production environments. In that case, we need to download and manually install Spark 2.2.
(a). Switch to /opt/ directory with the command:
“cd /opt/”
(b). Download spark with the command:
wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz
(c). Untar the spark tar with the following command:
tar -xvzf spark-2.2.0-bin-hadoop2.7.tgz
(d). We need to define some environment variables as default settings:
Please open a file with the following command:
vi /opt/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh
Paste the following configurations in the file:
SPARK_MASTER_IP=192.168.50.1
SPARK_EXECUTOR_MEMORY=512m
SPARK_DRIVER_MEMORY=512m
SPARK_WORKER_MEMORY=512m
SPARK_DAEMON_MEMORY=512m
SPARK_LOCAL_IP=127.0.0.1
Save and exit
(e). We need to start spark with the following command:
/opt/spark-2.2.0-bin-hadoop2.7/sbin/start-all.sh
Export spark_home :
export SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.7/
(f). Change the permissions of the directory:
chmod 777 -R /tmp/hive
(g). Try “spark-shell”, it should work.
Same answeras swapnil shashank with small modification below
SPARK_LOCAL_IP=127.0.0.1
tar -xvzf spark-2.2.0-bin-hadoop2.7.tgz
What is the right syntax to copy from Windows to a remote HDFS?
I'm trying to copy a file from my local machine to a remote hadoop cluster using RStudio
rxHadoopCopyFromLocal("C:/path/to/file.csv", "/target/on/hdfs/")
This throws
copyFromLocal '/path/to/file.csv': no such file or directory`
Notice the C:/ disappeared.
This syntax also fails
rxHadoopCopyFromLocal("C:\\path\\to\\file.csv", "/target/on/hdfs/")
with error
-copyFromLocal: Can not create a Path from a null string
This is a common mistake.
Turns out the rxHadoopCopyFromLocal command is a wrapper of the hdfs fs -copyFromLocal. All it does is copy from a local filesystem to an hdfs target.
In this case the rxSetComputeContext(remotehost) was set to a remote cluster. On the remote machine, there is not a C:\path\to\file.csv
Here are a couple of ways to get the files there.
Configure local hdfs-site.xml for remote Hdfs Cluster
Ensure you have hadoop tools installed on your local machine
Edit your local hdfs-site.xml to point to the remote cluster
Ensure rxSetComputeContext("local")
Run rxHadoopCopyFromLocal("C:\local\path\to\file.csv", "/target/on/hdfs/")
SCP and Remote Compute Context
Copy your file to the remote machine with scp C:\local\path\to\file.csv user#remotehost:/tmp
Ensure rxSetComputeContext(remotehost)
Run rxHadoopCopyFromLocal("/tmp/file.csv", "/target/on/hdfs/")
The dev version of dplyrXdf now supports files in HDFS. You can upload a file from the native filesystem as follows; this works both from the edge node, and from a remote client.
hdfs_upload("c\\path\\to\\file.csv", "/target/on/hdfs")
If you have a dataset (an R object) that you want to upload, you can also use the standard dplyr copy_to verb. This will import the data to an Xdf file and upload it, returning an RxXdfData data source pointing to the uploaded file.
txt <- RxTextData("file.csv")
hd <- RxHdfsFileSystem()
hdfs_xdf <- copy_to(hd, txt, name="uploaded_xdf")
I downloaded Instant Oracle Client Version 11.2.0.4.0(basic, sqlplus, devel .rpm file) by Oracle website in Ubuntu.
After converting .rpm into .deb using alien, I installed it, basic first and sqlplus and last devel.
And then I tried to run sqlplus.
But It is saying sqlplus64: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory
Even though my PATH contains the PATH.
The below shows my PATH and the location of libsqlplus.so.
A#ubuntu:~$ sudo find / -name libsqlplus.so
/usr/lib/oracle/11.2/client64/lib/libsqlplus.so
A#ubuntu:~$ echo $PATH
/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/sangmin/eclipse:/usr/lib/oracle/11.2/client64/lib:/usr/lib/oracle/11.2/client64
Test your Oracle client. User either sqlplus either sqlplus64 depending on your platform. In my case, I used:
$ sqlplus64 username/password#//dbhost:1521/SID
If you get the next message, then you need to instruct sqlplus to use the proper libray:
sqlplus64: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory.
To do so, first find the location of Oracle libraries. The path should be something like /usr/lib/oracle/<version>/client(64)/lib/. In my case (Ubuntu 14.04 LTS, Intel on 64-bit), it was /usr/lib/oracle/11.2/client64/lib/.
Now, add this path to the system library list. Create and edit a new file:
$ sudo nano /etc/ld.so.conf.d/oracle.conf
Add inside the path:
/usr/lib/oracle/11.2/client64/lib/
Run now the dynamic linker run-time bindings utility:
$ sudo ldconfig
If sqlplus yields of a missing libaio.so.1 file, run:
$ sudo apt-get install libaio1
For other errors when trying to run sqlplus, please consult the Ubuntu help page.
Might worth checking the permissions issue:
sqlplus: error while loading shared libraries
PERMISSIONS:
I want to stress the importance of permissions for "sqlplus".
For any "Other" UNIX user other than the Owner/Group to be able to run sqlplus and access an ORACLE database , read/execute permissions are required (rx) for these 4 directories :
$ORACLE_HOME/bin , $ORACLE_HOME/lib, $ORACLE_HOME/oracore, $ORACLE_HOME/sqlplus
Environment. Set those properly:
A. ORACLE_HOME
(example: ORACLE_HOME=/u01/app/oranpgm/product/12.1.0/PRMNRDEV/)
B. LD_LIBRARY_PATH
(example: ORACLE_HOME=/u01/app/oranpgm/product/12.1.0/PRMNRDEV/lib)
C. ORACLE_SID
D. PATH
export PATH="$ORACLE_HOME/bin:$PATH"
I installed Hadoop and Created a user named hduser and changes owner of hadoop folder to hduser.
After installing Hadoop i try to execute the hadoop command to check whether it is installed or not but it is giving "hadoop" command not found.
Then i had given execute privilege to hduser on all the files inside hadoop folder include bin folder
But still output is same.
When i am trying the same hadoop command with root as a user its working fine.
I think it is related to unix commands. Please help me out to give my user the privilege to execute hadoop command.
One more thing if i switch to root then hadoop commands works fine.
It is not a problem of privileges. You can still execute hadoop, if you type /usr/local/hadoop/bin/hadoop, right?
The problem is that $PATH is user-specific.
You have to add your $HADOOP_HOME/bin to the $PATH, as hduser, not as root. Login as hduser first (or just type su hduser) and then export PATH=$PATH:/$HADOOP_HOME/bin, as #iamkristian suggests, where $HADOOP_HOME is the directory in which you have placed hadoop (usually /usr/local/hadoop).
I sounds like hadoop isn't in your path. You can test that with
which hadoop
If that gives you command not found the you probably just need to add it to your path. Depending on where you installed hadoop, you need to add this to your ~/.bashrc
export PATH=$PATH:/usr/local/hadoop/bin/
And then reopen your terminal