How to recover an overwritten directory in cloudera - unix

I was using HIVE in my Cloudera VM.
I used the below command to write my output of my HQL statement to an output file.
`INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/output'
SELECT * from City; ....`
After getting the output i found that all my files in the cloudera directory got over written. i can see only my output file in the path.
Is there any way i can undo or recover all the files that i've lost.
My hive.log file is below for any reference

You can try to search into hive trash folder (/user/hive/.Trash). I think that you will find your lost file.

Related

Reading files present in a directory in a remote folder through SFTP

TLDR; Convert the bash line to download sftp files get Inbox/* to c++ or python. We do not have execute permissions on Inbox directory.
I am trying to read the files present in a directory in a remote server through SFTP. The catch is that I only had read and write permissions on the directory and not execute. This means any method that requires opening (cding) into the folder would fail. I need to read the file names since they are variable. From what I understand ls does not require execute privs. If I can get a list of files in the directory then reading then would be fine. Here is the directory structure:
Inbox
--file-a.txt
--file_b.txt
...
I have tried libssh but sftp_readdir required a handle of the open directory. I also looked at paramiko for python but that too requires to open the directory to read the file names.
I am able to do this in bash using send "get Inbox/* ${destination_dir}". Is there anyway I can use a similar pattern match but on c++ or python?
Also, I cannot execute bash commands through my binary. Does anyone know of any library in python or c++ (preferred) that would support this?
I have not posted here in a while so please excuse me if I am not following the formatting. I will learn from your suggestions. Thank you!

How to restore a deleted Jupyter notebook file

I accidentally deleted a jupyter notebook file on my Google Cloud instance. I wonder if there's anyway to restore/recover the file?
Thanks to this link, I found the solution. Files deleted in the browser should probably be in a Trash folder. In my case and on my Google Cloud instance, the deleted files were in the following path.
cd ~/.local/share/Trash/files/
By using ls, list the files and see if your file is in this folder. If yes, then simply using the mv command you can move your deleted file to the path you want.

pyspark: how to show current directory?

Hi I'm using pyspark interactively. I think I'm failing loading a LOCAL file correctly.
how do I check current directory, so that I can go to browser to take a look at that actual file?
Or is the default directory where pyspark is? Thanks
You can't load local file unless you have same file in all workers under same path. For example if you want to read data.csv file in spark, copy this file to all workers under same path(say /tmp/data.csv). Now you can use sc.textFile("file:///tmp/data.csv") to create RDD.
Current working directory is the folder from where you have started pyspark. You can start pyspark using ipython and run pwd command to check working directory.
[Set PYSPARK_DRIVER_PYTHON=/path/to/ipython in spark-env.sh to use ipython]
import os
cwd = os.getcwd()
print(cwd)

Creating temp folder in Linux

I was using R installed on a Linux server using SSH. Everything was fine, but now I have been denied access to temp folder and if I am loading R it is giving error cannot create 'R_TempDir', as it can't create the temp folder.
Can you please tell me how to create own local temp folder so that R can create temporary directory there ?
You can try to set one of these environment variable :
TMPDIR, TMP, TEMP:
Consulted (in that order) when setting the temporary directory for the session: see tempdir. TMPDIR is also used by some of the utilities see the help for build
by doing for instance :
export TMPDIR=/tmp
source
Hope this answers.
From what I understand,
I just thought that you could use .bashrc files in your /home/username/ directory
~# nano /home/username/.bashrc
You can put the command to create the folder inside this .bashrc file by just adding this line mkdir /your/dir/path/yourDir
This file is just like an autorun file which run everytime you upstart your linux server
But this is just working per user setting

hive looks for hdfs private directory for jar

How does add jar work in hive? when I add a local jar file,
add jar /users/course/jars/json-serde-1.3.1.jar;
hive query fails and says it could not find the jar in hdfs, same directory.
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/users/course/jars/json-serde-1.3.1.jar)
Then I put the jar into hdfs, add jar using that hdfs filepath.
add jar hdfs://localhost/users/course/jars/json-serde-1.3.1.jar;
Now, hive query says
File does not exist: hdfs://localhost:9000/private/var/folders/k5/bn104n8s72sdpg3tg7d8kkpc0000gn/T/a598a513-d7c9-4d55-9280-b6554487cac7_resources/json-serde-1.3.1.jar
I have no idea why it keeps looking for the jar in wrong places.
I believe Hive looks for the JAR locally, not on HDFS.
So if my home directory on the gateway server is
pwd
/home/my_username/
And the JAR is sitting locally at:
/home/my_username/hive_udfs/awesomeness.jar
Then I'd go into the hive shell and run:
add jar /home/my_username/awesomeness.jar
At least, that works for me in my environment. HTH. Good luck! :)

Resources