Want to copy files from HDFS to local machine - bigdata

Trying to copy files from hdfs to local machine using copyToLocal with the following command:
Hadoop fs -copyToLocal remote path(hdfs file) destinationPath (my local path)
But I am getting the following error:
No such file or directory: error
Please help me with this.

You can copy the data from hdfs to the local filesystem by following two ways:
bin/hadoop fs -get /hdfs/source/path /localfs/destination/path
bin/hadoop fs -copyToLocal /hdfs/source/path /localfs/destination/path
Another alternative way would be:
Download the file from hdfs to the local filesystem. Just, point your web browser to HDFS WEBUI(namenode_machine:50070) and select the file and download it.

Related

can't read csv file on hdfs - Hadoop

I'm trying to read a csv file but i get : No such file or directory.
the file is on tmp folder.
This is the commands:
Your file is not at hdfs:///user/hdfs/titles.csv, and this is what the error is saying.
You are only showing ls, not hdfs dfs -ls, so you should be using just cat titles.csv
If you want to read a file from HDFS, you need to hdfs dfs -put titles.csv /user/hdfs/ first. (And create the user directory using hdfs dfs -mkdir -p /user/hdfs if it doesn't already exist)

Move multiple file from local unix to HDFS

I have several files in a unix directory that I have to move to Hadoop. I know the copyFromLocal command:
Usage: hadoop fs -copyFromLocal URI but that allows me to
move one by one.
Is there any way to move all those files to the HDFS in one command?
I want to know if there is a way to transfer several files at once
put command will work
if you want to copy whole directory from local to hdfs
hadoop fs -put /path1/file1 /pathx/target/
if you want to copy all files from directory to hdfs in one go
hadoop fs -put /path1/file1/* /pathx/target/
The put command supports multiple sources
Copy single src, or multiple srcs from local file system to the destination file system

How to rsync a file which does not have extension?

I'm trying to write a script in Synology to copy one system file (a file containing CPU temperature value) to another server. The file does not have extension. I always get the error
rsync: read errors mapping "/sys/class/hwmon/hwmon0/device/temp2_input": No data available (61)
Please note that I already created private/public keys for using rsync without having to input the remote server password. I've tried the rsync command in terminal and it produces the same result. The location of the file is definitely correct.
Need your help.
cd /
rsync -e ssh /sys/class/hwmon/hwmon0/device/temp2_input bthoven#192.168.x.xx:/usr/share/hassio/homeassistant/syno
rsync: read errors mapping "/sys/class/hwmon/hwmon0/device/temp2_input": No data available (61)

zip command in unix with wildcards

I am trying to zip file which is in the format of Amazon*.xls in unix and also remove the source file after compression.Below is the used command
zip -m Amazon`date +%Y-%m-%d:%H:%M:%S`.zip Amazon*.xls
For the above command i am getting below error
zip I/O error: No such file or directory
zip error: Could not create output file Amazon.zip
PS: GZIP is working fine. I need zip format files.
It is not the zip, it is how your shell deals with expanding/substituting variables. Two lines solution for bash
export mydate=`date +%Y-%m-%d:%H:%M:%S`
zip -m Amazon_$mydate.zip *matrix*
Execute by hand (few secs difference) or better put in a shell script myzipper.sh and just source it.
Use '-p' instead of '-m', if zip files are to be extracted on Windows OS.
export mydate=date +%Y-%m-%d:%H:%M:%S
zip -p Amazon_$mydate.zip matrix

SparkR : how to acces files passed with --files in yarn-cluster mode

I am sending a sparkR Job to run on a Yarn cluster in cluster mode with ./bin/spark-submit script. I need to upload a file (external dataset) by the --file option. This action upload files to HDFS tempory directory. But I need to access to the path where the file was downloaded to include it directly in my SparkR code.
For java and PySpark, files distributed using --files can be accessed via SparkFiles.get(filename) method which return the absolute path of filename. Is there an equivalent in SparkR ?
I know we can work around the problem by different ways :
Put files manualy to HDFS
Deploy files on worker nodes
But I want to use this option for convinient reasons.

Resources