My goal is to be able to read new messages from a gmail account via a linux server. I guess I could do this via IMAP or something, but I'd like to avoid that complexity if possible given that gmail has this nice feed set up:
https://mail.google.com/mail/feed/atom/
The only issue is that I'm not sure how to authenticate the call to pull this. Is this possible?
A good starting point should be:
curl -u username:password --silent "https://mail.google.com/mail/feed/atom" | tr -d '\n' | awk -F '<entry>' '{for (i=2; i<=NF; i++) {print $i}}' | sed -n "s/<title>\(.*\)<\/title.*name>\(.*\)<\/name>.*/\2 - \1/p"
Checks the Gmail ATOM feed for your account, parses it and outputs a list of unread messages.
Also, see this thread: http://www.commandlinefu.com/commands/view/3380/check-your-unread-gmail-from-the-command-line
OTOH, I would recommend using mutt and IMAP.
Related
Using tcpdump, I am capturing network traffic. I am interested in extracting the actual TCP payload data, i.e. HTTP traffic in my particular case.
I tried to achieve that using scapy, but I only found function remove_payload(). Is there a corresponding counterpart? Or do you know of any other tools that provide such functionality?
Unfortunately, I did not find a satisfactory scapy documentation.
You can read a pcap with Scapy easily with rdpcap, you can then use the Raw (right above TCP) layer of your packets to play with HTTP content:
from scapy.all import *
pcap = rdpcap("my_file.pcap")
for pkt in pcap:
if Raw in pkt:
print pkt[Raw]
In case other users might have similar questions: I ended up using the following script:
infile=infile.pcap
outfile=outfile
ext=txt
rm -f ${outfile}_all.${ext}
for stream in $(tshark -nlr $infile -Y tcp.flags.syn==1 -T fields -e tcp.stream | sort -n | uniq | sed 's/\r//')
do
echo "Processing stream $stream: ${outfile}_${stream}.${ext}"
tshark -nlr $infile -qz "follow,tcp,raw,$stream" | tail -n +7 | sed 's/^\s\+//g' | xxd -r -p | tee ${outfile}_${stream}.${ext} >> ${outfile}_all.${ext}
done
The following did not work.
wget -r -A .pdf home_page_url
It stop with the following message:
....
Removing site.com/index.html.tmp since it should be rejected.
FINISHED
I don't know why it only stops in the starting url, do not go into the links in it to search for the given file type.
Any other way to recursively download all pdf files in an website. ?
It may be based on a robots.txt. Try adding -e robots=off.
Other possible problems are cookie based authentication or agent rejection for wget.
See these examples.
EDIT: The dot in ".pdf" is wrong according to sunsite.univie.ac.at
the following cmd works for me, it will download pictures of a site
wget -A pdf,jpg,png -m -p -E -k -K -np http://site/path/
This is certainly because of the links in the HTML don't end up with /.
Wget will not follow this has it think it's a file (but doesn't match your filter):
page
But will follow this:
page
You can use the --debug option to see if it's the actual problem.
I don't know any good solution for this. In my opinion this is a bug.
In my version of wget (GNU Wget 1.21.3), the -A/--accept and -r/--recursive flags don't play nicely with each other.
Here's my script for scraping a domain for PDFs (or any other filetype):
wget --no-verbose --mirror --spider https://example.com -o - | while read line
do
[[ $line == *'200 OK' ]] || continue
[[ $line == *'.pdf'* ]] || continue
echo $line | cut -c25- | rev | cut -c7- | rev | xargs wget --no-verbose -P scraped-files
done
Explanation: Recursively crawl https://example.com and pipe log output (containing all scraped URLs) to a while read block. When a line from the log output contains a PDF URL, strip the leading timestamp (25 characters) and tailing request info (7 characters) and use wget to download the PDF.
I have a list URLs in a file called urls.txt. Each line contains 1 URL. I want to download all of the files at once using cURL. I can't seem to get the right one-liner down.
I tried:
$ cat urls.txt | xargs -0 curl -O
But that only gives me the last file in the list.
This works for me:
$ xargs -n 1 curl -O < urls.txt
I'm in FreeBSD. Your xargs may work differently.
Note that this runs sequential curls, which you may view as unnecessarily heavy. If you'd like to save some of that overhead, the following may work in bash:
$ mapfile -t urls < urls.txt
$ curl ${urls[#]/#/-O }
This saves your URL list to an array, then expands the array with options to curl to cause targets to be downloaded. The curl command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1), but it needs the -O option before each one in order to download and save each target. Note that characters within some URLs ] may need to be escaped to avoid interacting with your shell.
Or if you are using a POSIX shell rather than bash:
$ curl $(printf ' -O %s' $(cat urls.txt))
This relies on printf's behaviour of repeating the format pattern to exhaust the list of data arguments; not all stand-alone printfs will do this.
Note that this non-xargs method also may bump up against system limits for very large lists of URLs. Research ARG_MAX and MAX_ARG_STRLEN if this is a concern.
A very simple solution would be the following:
If you have a file 'file.txt' like
url="http://www.google.de"
url="http://www.yahoo.de"
url="http://www.bing.de"
Then you can use curl and simply do
curl -K file.txt
And curl will call all Urls contained in your file.txt!
So if you have control over your input-file-format, maybe this is the simplest solution for you!
Or you could just do this:
cat urls.txt | xargs curl -O
You only need to use the -I parameter when you want to insert the cat output in the middle of a command.
xargs -P 10 | curl
GNU xargs -P can run multiple curl processes in parallel. E.g. to run 10 processes:
xargs -P 10 -n 1 curl -O < urls.txt
This will speed up download 10x if your maximum download speed if not reached and if the server does not throttle IPs, which is the most common scenario.
Just don't set -P too high or your RAM may be overwhelmed.
GNU parallel can achieve similar results.
The downside of those methods is that they don't use a single connection for all files, which what curl does if you pass multiple URLs to it at once as in:
curl -O out1.txt http://exmple.com/1 -O out2.txt http://exmple.com/2
as mentioned at https://serverfault.com/questions/199434/how-do-i-make-curl-use-keepalive-from-the-command-line
Maybe combining both methods would give the best results? But I imagine that parallelization is more important than keeping the connection alive.
See also: Parallel download using Curl command line utility
Here is how I do it on a Mac (OSX), but it should work equally well on other systems:
What you need is a text file that contains your links for curl
like so:
http://www.site1.com/subdirectory/file1-[01-15].jpg
http://www.site1.com/subdirectory/file2-[01-15].jpg
.
.
http://www.site1.com/subdirectory/file3287-[01-15].jpg
In this hypothetical case, the text file has 3287 lines and each line is coding for 15 pictures.
Let's say we save these links in a text file called testcurl.txt on the top level (/) of our hard drive.
Now we have to go into the terminal and enter the following command in the bash shell:
for i in "`cat /testcurl.txt`" ; do curl -O "$i" ; done
Make sure you are using back ticks (`)
Also make sure the flag (-O) is a capital O and NOT a zero
with the -O flag, the original filename will be taken
Happy downloading!
As others have rightly mentioned:
-cat urls.txt | xargs -0 curl -O
+cat urls.txt | xargs -n1 curl -O
However, this paradigm is a very bad idea, especially if all of your URLs come from the same server -- you're not only going to be spawning another curl instance, but will also be establishing a new TCP connection for each request, which is highly inefficient, and even more so with the now ubiquitous https.
Please use this instead:
-cat urls.txt | xargs -n1 curl -O
+cat urls.txt | wget -i/dev/fd/0
Or, even simpler:
-cat urls.txt | wget -i/dev/fd/0
+wget -i/dev/fd/0 < urls.txt
Simplest yet:
-wget -i/dev/fd/0 < urls.txt
+wget -iurls.txt
This is my first question on stackoverflow!
I want to have a unix script that will run grep on the console output. Here is what my script does:
1. Telnet into a remote server (I have done this part successfully)
2. On successful login, the remote server displays outputs information on the console. I need to run grep on that console output (need help with this)
So, I need a script to run grep on the output appearing on the console.
Any thoughts??
Thanks,
Puneet
Use SSH instead. It's more secure and far easier to script.
ssh remoteusername#remotehost:/path/to/remote/script | grep 'something'
with appropriate key setup, it won't even prompt you for a password.
Have you tried I/O redirection? You could either do
$ your-command > output.txt
and then run grep on that file, or just directly pipe the output through grep like so
$ your-command | grep ...
See this article or google around for similar. There are probably thousands of good articles about this around the web.
Instead of telnet, I would suggest using netcat (nc). You could then pass your login credentials via standard input and grep the standard output (nc prints anything sent by the server on standard output).
nc <host> <port> <auth.txt | grep 'string'
What you want to do is probably using a pipe. You can probably see it in the above answers it's the | sign you see in the command. It may be difficult to locate on your keyboard, depending on the layout. (I have to admit it is not very often used).
Pipes will redirect the output of one command. Instead of sending it to the console, they will send it as an input of another command.
cmd1 | grep foo is equivalent to running grep foo on the output of cmd1 (you can replace cmd1 by your netstat command).
One last thing is that you can have as many pipes as you want. For instance on my machine I can run ls -ltr | tail -1 | awk '{print $9}' | grep foo to look for the word foo in the last modified file.
I am running a FreeBSD server and I have been sent a warning that spam has been sent from my server. I do not have it set as an open relay and I have customized the sendmail configuration. I'd like to know who is sending what email along with their username, email subject line as well as a summary of how much mail they have been sending. I would like to run a report on a log similar to how it is done when processing Apache server logs.
What are my options?
One idea is to alias sendmail to be a custom script, which simply cats the sendmail arguments to the end of a log before calling sendmail in the usual manner.
You can also monitor all system calls to write and read functions by executing:
ps auxw | grep sendmail | awk '{print"-p " $2}' | xargs strace -s 256 -f 2>&1 | grep -E $'#|(([0-9]+\.){3}[0-9]+)' | tee -a "/var/log/sendmail-logs.log"
This will give you direct access to the information, you cannot go deeper I think.
Can you give some sample logs? I think you're best bet would be to look through them with either grep or cut to get the source/destinations that are being sent too. Also, you could write a Perl script to automate it once you have the correct regex. This would be the best option.
If FreeBSD have default config, you have only one way to handle output mail, check what sending through you sendmail system in /etc/mail.
All output mail must be logged by /var/log/maillog