To list files based on size in unix - unix

GOAL : To fetch list of files occupying more space in unix
using the below command
ssh serverName du /folderName/* | grep -v 'cannot' | sort -nr | head -10
Using sort -nr to consider as numeric and sort in reverse (To get files occupying more space)
Using the grep -v 'cannot' because there is no access to few folders and these lines must be ignored before sorting
Below is the sample output
624 /folder1/folder2/conf
16 /folder1/folder2/error/include
192 /folder1/folder2/error
284 /folder1/folder2/htdocs
264 /folder1/folder2/icons/small
du: cannot read directory `/folder1/folder2/file1': Permission denied
du: cannot read directory `/folder1/folder2/file3': Permission denied
Facing issues with grep and sort commands, as the error messages are not getting filtered

You need to redirect stderr to stdout using 2>&1 so that you can grep out the error messages. You should also escape the wildcard so that it gets expanded on the remote machine, not on the local one.
ssh serverName du /folderName/\* 2>&1 | grep -v 'cannot' | sort -nr | head -10

You don't need the grep if you close stderr.
ssh serverName du /folderName/\* 2>&- | sort -nr | head -10
Note that the wildcard is escaped.

Related

Get size of file on remote server

I have a requirement where I need to do SFTP connection to remote server, get the size of the file on remote server and depending on the size, i need to get the file onto local server.
Is there any command in SFTP to get the size of the file.
If you'd like the size output to be human readable, try: ls -lah
You can get the file size of the remote files using the ls command by passing parameters.
To get Size of the file pass ls -l
To get Size of the file (HIdden files included) ls -al
To get it in human readable format pass ls -lh or ls -alh
you can get the size using James's with awk
ls -l | grep "filename" | awk '{print $5}'
If you are using it in a script and want to check using logic, you can store the file size in a variable like so.
varname=$(ls -l | grep "filename" | awk '{print $5}')
Then call sftp and the task
For a remote file maybe do this
filesize=$(ssh user#domain.ex << EOT ls -l | grep "filename" | awk '{print $5}' EOT)

Concatenating input to svn list command with output, then pass it to grep

I currently have the following shell command which is only partially working:
svn list $myrepo/libs/ |
xargs -P 10 -L 1 -I {} echo $myrepo/libs/ {} trunk |
sed 's/ //g' |
xargs -P 20 -L 1 svn list --depth infinity |
grep .xlsx
where $myrepo corresponds to the svn server address.
The libs folder contains a number of subfolders (currently about 30 although eventually up to 100), each which contain a number of tags, branches and a trunk. I wish to get a list of xlsx files contained only within the trunk folder of each of these subfolders. The command above works fine however it only returns the relative path from $myrepo/libs/subfolder/trunk/, so I get this back:
1/2/3/file.xlsx
Because of the potentially large number of files I would have to search through, I am performing it in two parallel steps by using xargs -P (I do not have and cannot use parallels). It am also trying to do this in one command so it can be used in php/perl/etc. and avoid multiple sytem calls.
What I would like to do is concatenate the input to this part of the command:
xargs -P 20 -L 1 svn list --depth infinity
with the output from it, to give the following:
$myrepo/libs/subfolder/trunk/1/2/3/file.xlsx
Then pass this to the grep to find the xlsx files.
I appreciate any assistance that could be provided.
If I manage to correctly divine your intention, something like this might work for you.
svn list "$myrepo/libs/" |
xargs -P 20 -n 1 sh -c 'svn list -R "$0/trunk/$1" |
sed -n "s%.*\.xlsx$%$0/trunk/$1/&%p"' "$myrepo"
Briefly, we postprocess the output from the inner svn list to filter to just .xslx files and tack the full SVN path back on at the same time. This way, the processing happens where the repo path is still known.
We hack things a bit by passing in "$myrepo" as "$0" to the subordinate sh so we don't have to export this variable. The input from the outer svn list comes as $1.
(The repos I have access to have a slightly different layout so there could be a copy/paste error somewhere.)

Pipe output of cat to cURL to download a list of files

I have a list URLs in a file called urls.txt. Each line contains 1 URL. I want to download all of the files at once using cURL. I can't seem to get the right one-liner down.
I tried:
$ cat urls.txt | xargs -0 curl -O
But that only gives me the last file in the list.
This works for me:
$ xargs -n 1 curl -O < urls.txt
I'm in FreeBSD. Your xargs may work differently.
Note that this runs sequential curls, which you may view as unnecessarily heavy. If you'd like to save some of that overhead, the following may work in bash:
$ mapfile -t urls < urls.txt
$ curl ${urls[#]/#/-O }
This saves your URL list to an array, then expands the array with options to curl to cause targets to be downloaded. The curl command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1), but it needs the -O option before each one in order to download and save each target. Note that characters within some URLs ] may need to be escaped to avoid interacting with your shell.
Or if you are using a POSIX shell rather than bash:
$ curl $(printf ' -O %s' $(cat urls.txt))
This relies on printf's behaviour of repeating the format pattern to exhaust the list of data arguments; not all stand-alone printfs will do this.
Note that this non-xargs method also may bump up against system limits for very large lists of URLs. Research ARG_MAX and MAX_ARG_STRLEN if this is a concern.
A very simple solution would be the following:
If you have a file 'file.txt' like
url="http://www.google.de"
url="http://www.yahoo.de"
url="http://www.bing.de"
Then you can use curl and simply do
curl -K file.txt
And curl will call all Urls contained in your file.txt!
So if you have control over your input-file-format, maybe this is the simplest solution for you!
Or you could just do this:
cat urls.txt | xargs curl -O
You only need to use the -I parameter when you want to insert the cat output in the middle of a command.
xargs -P 10 | curl
GNU xargs -P can run multiple curl processes in parallel. E.g. to run 10 processes:
xargs -P 10 -n 1 curl -O < urls.txt
This will speed up download 10x if your maximum download speed if not reached and if the server does not throttle IPs, which is the most common scenario.
Just don't set -P too high or your RAM may be overwhelmed.
GNU parallel can achieve similar results.
The downside of those methods is that they don't use a single connection for all files, which what curl does if you pass multiple URLs to it at once as in:
curl -O out1.txt http://exmple.com/1 -O out2.txt http://exmple.com/2
as mentioned at https://serverfault.com/questions/199434/how-do-i-make-curl-use-keepalive-from-the-command-line
Maybe combining both methods would give the best results? But I imagine that parallelization is more important than keeping the connection alive.
See also: Parallel download using Curl command line utility
Here is how I do it on a Mac (OSX), but it should work equally well on other systems:
What you need is a text file that contains your links for curl
like so:
http://www.site1.com/subdirectory/file1-[01-15].jpg
http://www.site1.com/subdirectory/file2-[01-15].jpg
.
.
http://www.site1.com/subdirectory/file3287-[01-15].jpg
In this hypothetical case, the text file has 3287 lines and each line is coding for 15 pictures.
Let's say we save these links in a text file called testcurl.txt on the top level (/) of our hard drive.
Now we have to go into the terminal and enter the following command in the bash shell:
for i in "`cat /testcurl.txt`" ; do curl -O "$i" ; done
Make sure you are using back ticks (`)
Also make sure the flag (-O) is a capital O and NOT a zero
with the -O flag, the original filename will be taken
Happy downloading!
As others have rightly mentioned:
-cat urls.txt | xargs -0 curl -O
+cat urls.txt | xargs -n1 curl -O
However, this paradigm is a very bad idea, especially if all of your URLs come from the same server -- you're not only going to be spawning another curl instance, but will also be establishing a new TCP connection for each request, which is highly inefficient, and even more so with the now ubiquitous https.
Please use this instead:
-cat urls.txt | xargs -n1 curl -O
+cat urls.txt | wget -i/dev/fd/0
Or, even simpler:
-cat urls.txt | wget -i/dev/fd/0
+wget -i/dev/fd/0 < urls.txt
Simplest yet:
-wget -i/dev/fd/0 < urls.txt
+wget -iurls.txt

I lost nginx.pid,it disappeared

Here is part of my nginx.conf:
pid /www/nginx0836/nginx.pid;
While I restart nginx, in several seconds I run ls /www/nginx0836 and it lists nginx.pid.
But after several seconds, running ls /www/nginx0836 again, nginx.pid is not listed.
Why?
By the way, nginx server works well and when I run
ps -ef | grep "nginx: master process" | grep -v "grep" | awk -F ' ' '{print $2}'
then I can see nginx pid.
try monitoring folder with incrond and log any changes with $# $# on that directory.
may be you will see something like puppet or an rsync deleting the pid file.
/www/nginx0836 IN_DELETE echo "$# $#"
it will log any delete event on directory
simpler than audit...
sorry the poor english
Try default configuration for nginx, you will find similar problem here

single quotes not working in shell script

I have a .bash_profile script and I can't get the following to work
alias lsls='ls -l | sort -n +4'
when I type the alias lsls
it does the sort but then posts this error message
"-bash: +4: command not found"
How do I get the alias to work with '+4'?
It works when type ls -l | sort -n +4 in the command line
I'm in OS X 10.4
Thanks for any help
bash-4.0$ ls -l | sort -n +4
sort: open failed: +4: No such file or directory
You need ls -l | sort -n -k 5, gnu sort is different from bsd sort
alias lsls='ls -l | sort -n -k 5'
Edit: updated to reflect change from 0 based indexing to 1 based indexing, thanks Matthew.
alias lsls='ls -l | sort -n +4' should work fine with the sort in OS X 10.4 (which does support that syntax).
when I type the alias lsls it does the sort but then posts this error message "-bash: +4: command not found"
Is it possible that you inserted a stray newline when editing your .bash_profile? e.g. if you ended up with something like this:
alias lsls='ls -l | sort -n
+4'
...that might explain the error message.
As an aside, you can get the same effect without piping through sort at all, using:
ls -lrS
This link discusses a very similar alias containing a pipe.
The problem may not have been the pipe, but the interesting solution was to use a function.

Resources