can't read csv file on hdfs - Hadoop - unix

I'm trying to read a csv file but i get : No such file or directory.
the file is on tmp folder.
This is the commands:

Your file is not at hdfs:///user/hdfs/titles.csv, and this is what the error is saying.
You are only showing ls, not hdfs dfs -ls, so you should be using just cat titles.csv
If you want to read a file from HDFS, you need to hdfs dfs -put titles.csv /user/hdfs/ first. (And create the user directory using hdfs dfs -mkdir -p /user/hdfs if it doesn't already exist)

Related

Want to copy files from HDFS to local machine

Trying to copy files from hdfs to local machine using copyToLocal with the following command:
Hadoop fs -copyToLocal remote path(hdfs file) destinationPath (my local path)
But I am getting the following error:
No such file or directory: error
Please help me with this.
You can copy the data from hdfs to the local filesystem by following two ways:
bin/hadoop fs -get /hdfs/source/path /localfs/destination/path
bin/hadoop fs -copyToLocal /hdfs/source/path /localfs/destination/path
Another alternative way would be:
Download the file from hdfs to the local filesystem. Just, point your web browser to HDFS WEBUI(namenode_machine:50070) and select the file and download it.

Move multiple file from local unix to HDFS

I have several files in a unix directory that I have to move to Hadoop. I know the copyFromLocal command:
Usage: hadoop fs -copyFromLocal URI but that allows me to
move one by one.
Is there any way to move all those files to the HDFS in one command?
I want to know if there is a way to transfer several files at once
put command will work
if you want to copy whole directory from local to hdfs
hadoop fs -put /path1/file1 /pathx/target/
if you want to copy all files from directory to hdfs in one go
hadoop fs -put /path1/file1/* /pathx/target/
The put command supports multiple sources
Copy single src, or multiple srcs from local file system to the destination file system

Convert xlsx file to csv file in R when xlsx file is present in hdfs

I want to know how can we convert .xlsx file residing in hdfs to .csv file using R script.
I tried using XLConnect and xlsx packages, but its giving me error 'file not found'.I am providing HDFS location as input in the R script using the above packages.I am able to read .csv files from hdfs using R script (read.csv()).
Do I need to install any new packages for reading .xlsx present in hdfs .
sharing the code i used:
library(XLConnect)
d1=readWorksheetFromFile(file='hadoop fs -cat hdfs://............../filename.xlsx', sheet=1)
"Error: FileNotFoundException (Java): File 'filename.xlsx' could not be found - you may specify to automatically create the file if not existing."
I am sure the file is present in the specified location.
Hope my question is clear. Please suggest a method to resolve it.
Thanks in Advance!
hadoop fs isn't a file, but a command that copies a file from HDFS to your local filesystem. Run this command from outside R (or from inside it using system), and then open the spreadsheet.

zip command in unix with wildcards

I am trying to zip file which is in the format of Amazon*.xls in unix and also remove the source file after compression.Below is the used command
zip -m Amazon`date +%Y-%m-%d:%H:%M:%S`.zip Amazon*.xls
For the above command i am getting below error
zip I/O error: No such file or directory
zip error: Could not create output file Amazon.zip
PS: GZIP is working fine. I need zip format files.
It is not the zip, it is how your shell deals with expanding/substituting variables. Two lines solution for bash
export mydate=`date +%Y-%m-%d:%H:%M:%S`
zip -m Amazon_$mydate.zip *matrix*
Execute by hand (few secs difference) or better put in a shell script myzipper.sh and just source it.
Use '-p' instead of '-m', if zip files are to be extracted on Windows OS.
export mydate=date +%Y-%m-%d:%H:%M:%S
zip -p Amazon_$mydate.zip matrix

how to access HDFS file path(Installed packages: rmr2,rhdfs) in normal R commands?

I have zip files in HDFS.I am going to write a mapreduce program in R. Now R is having command to unzip the zip file.
unzip("filepath")
but here it is not accepting my HDFS file path? I tried like
unzip(hdfs.file("HDFS file path"))
it is throwing error..
invalid path argument..
Is there any way to give HDFS file path to my R commands?

Resources