First question may take care of this. When capturing in tshark using fields, like "-e tcp.flags", is there a way to have the output the flag label, like "FIN", instead of "0x1"? I've done a few searches through documentation. Probably right under my nose.
If not, then I need a function in my data pipeline to convert the hex into the labels. I thought about having a dictionary like "{'0x1':'FIN'}" and map it, but I'm not sure of all the flag combos that might appear.
So I am taking the hex string, converting it to an integer, then to a binary string. I turn that into a list "[0,0,0,0,0,1]" and use that like a filter against a label list like "[u, a, p, r, s, f]" that returns any labels joined, like "f" or "a_s". Using Python.
Is this function necessary? Is there a more efficient/ elegant way to convert the hex to labels?
Normally I would suggest using -e tcp.flags.str, but this doesn't display properly for me at least running on Windows 10 w/TShark (Wireshark) 3.3.0 (v3.3.0rc0-1433-gcac1426dd6b2). For example, here's what I get for what should be a "SYN" indication only:
tshark.exe -r tcpfile.pcap -c 1 -T fields -e frame.number -e tcp.flags -e tcp.flags.str
1 0x00000002 A·A·A·A·A·A·A·A·A·A·SA·
You can try it on your system and maybe it'll display as intended (In Wireshark, it's displayed correctly as ··········S·, so it may be a tshark bug or a problem with my shells - tried with both cmd and powershell.) In any case, if it doesn't display properly for you on your system, you can try using the tcp-flags-postdissector.lua dissector that Didier Stevens wrote, which was inspired by Snort and which I believe served as the inspiration for the Wireshark built-in tcp.flags.str field. I personally preferred a '.' instead of '*' for flag bits that aren't set, so I tweaked the Lua dissector to behave that way. Use it as is, or tweak it anyway you choose. With the Lua dissector, I get the expected output:
tshark.exe -r tcpfile.pcap -c 1 -T fields -e frame.number -e tcp.flags tcpflags.flags
1 0x00000002 ........S.
Since the same incorrect string is displayed in both cmd and powershell, it looks like a tshark bug to me, so I filed Wireshark Bug 16649.
I am attempting to bend zsh, my shell of choice, to my will, and am completely at a loss on the syntax and operation of completions.
My use case is this: I wish to have completions for 'ansible-playbook' under the '-e' option support three variations:
Normal file completion: ansible-playbook -e vars/file_name.yml
Prepended file completion: ansible-playbook -e #vars/file_name.yml
Arbitrary strings: ansible-playbook -e key=value
I started out with https://github.com/zsh-users/zsh-completions/blob/master/src/_ansible-playbook which worked decently, but required modifications to support the prefixed file pathing. To achieve this I altered the following lines (the -e line):
...
"(-D --diff)"{-D,--diff}"[when changing (small files and templates, show the diff in those. Works great with --check)]"\
"(-e --extra-vars)"{-e,--extra-vars}"[EXTRA_VARS set additional variables as key=value or YAML/JSON]:extra vars:(EXTRA_VARS)"\
'--flush-cache[clear the fact cache]'\
to this:
...
"(-D --diff)"{-D,--diff}"[when changing (small files and templates, show the diff in those. Works great with --check)]"\
"(-e --extra-vars)"{-e,--extra-vars}"[EXTRA_VARS set additional variables as key=value or YAML/JSON]:extra vars:__at_files"\
'--flush-cache[clear the fact cache]'\
and added the '__at_files' function:
__at_files () {
compset -P #; _files
}
This may be very noobish, but for someone that has never encountered this before, I was pleased that this solved my problem, or so I thought.
This fails me if I have multiple '-e' parameters, which is totally a supported model (similar to how docker allows multiple -v or -p arguments). What this means is that the first '-e' parameter will have my prefixed completion work, but any '-e' parameters after that point become 'dumb' and only allow for normal '_files' completion from what I can tell. So the following will not complete properly:
ansible-playbook -e key=value -e #vars/file
but this would complete for the file itself:
ansible-playbook -e key=value -e vars/file
Did I mess up? I see the same type of behavior for this particular completion plugin's '-M' option (it also becomes 'dumb' and does basic file completion). I may have simply not searched for the correct terminology or combination of terms, or perhaps in the rather complicated documentation missed what covers this, but again, with only a few days experience digging into this, I'm lost.
If multiple -e options are valid, the _arguments specification should start with * so instead of:
"(-e --extra-vars)"{-e,--extra-vars}"[EXTR ....
use:
\*{-e,--extra-vars}"[EXTR ...
The (-e --extra-vars) part indicates a list of options that can not follow the one being specified. So that isn't needed anymore because it is presumably valid to do, e.g.:
ansible-playbook -e key-value --extra-vars #vars/file
I use rsync to backup a few thousands of files and pipe the output to a file.
Given the number of files I'd like to see a list of only those transfers that had issues as well as a summary to show which completed.
So, using the -q flag displays nicely by exception any error only.
Using --stats shows a helpful summary at the end.
The problem is that I cannot combine them because it appears that -q suppresses the stats output.
Any ideas welcome.
This did the trick for me :
rsync -azh --stats <source> <destination>
-a/--archive: archive mode; equals -rlptgoD (no -H,-A,-X)
-z/--compress: compress file data during the transfer
-h/--human-readable: output numbers in a human-readable format
--stats: give some file-transfer stats
Perhaps this will help someone else. In the end the only thing that worked was to swap the output as suggested here.
So in my case it was simply redirecting as follows:
2>> /output.log >> /output.log
I would like to issue a patch command which is somewhat dumber than the default, but I cannot find the right flags (if they exist at all).
I don't want it to create .rej or .orig files, not even when the patch fails. If the patch fails I'd like the original files to remain unchanged.
I don't want it to try guessing if the patch is reversed or not, or try matching the lines before or after those given in the patch. If the lines at the given line numbers do not match, it should fail.
I've tried with -f -N -V never -r - --no-backup-if-mismatch, but still backup files are created and "fuzzy" matching is tried.
Run it with --dry-run -s and only apply if it doesn't report any problems (you may be able to key off the return code).
For disabling the fuzz, you need -F0
Informally, most of us understand that there are 'binary' files (object files, images, movies, executables, proprietary document formats, etc) and 'text' files (source code, XML files, HTML files, email, etc).
In general, you need to know the contents of a file to be able to do anything useful with it, and form that point of view if the encoding is 'binary' or 'text', it doesn't really matter. And of course files just store bytes of data so they are all 'binary' and 'text' doesn't mean anything without knowing the encoding. And yet, it is still useful to talk about 'binary' and 'text' files, but to avoid offending anyone with this imprecise definition, I will continue to use 'scare' quotes.
However, there are various tools that work on a wide range of files, and in practical terms, you want to do something different based on whether the file is 'text' or 'binary'. An example of this is any tool that outputs data on the console. Plain 'text' will look fine, and is useful. 'binary' data messes up your terminal, and is generally not useful to look at. GNU grep at least uses this distinction when determining if it should output matches to the console.
So, the question is, how do you tell if a file is 'text' or 'binary'? And to restrict is further, how do you tell on a Linux like file-system? I am not aware of any filesystem meta-data that indicates the 'type' of a file, so the question further becomes, by inspecting the content of a file, how do I tell if it is 'text' or 'binary'? And for simplicity, lets restrict 'text' to mean characters which are printable on the user's console. And in particular how would you implement this? (I thought this was implied on this site, but I guess it is helpful, in general, to be pointed at existing code that does this, I should have specified), I'm not really after what existing programs can I use to do this.
You can use the file command. It does a bunch of tests on the file (man file) to decide if it's binary or text. You can look at/borrow its source code if you need to do that from C.
file README
README: ASCII English text, with very long lines
file /bin/bash
/bin/bash: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), stripped
The spreadsheet software my company makes reads a number of binary file formats as well as text files.
We first look at the first few bytes for a magic number which we recognize. If we do not recognize the magic number of any of the binary types we read, then we look at up to the first 2K bytes of the file to see whether it appears to be a UTF-8, UTF-16 or a text file encoded in the current code page of the host operating system. If it passes none of these tests, we assume that it is not a file we can deal with and throw an appropriate exception.
You can determine the MIME type of the file with
file --mime FILENAME
The shorthand is file -i on Linux and file -I (capital i) on macOS (see comments).
If it starts with text/, it's text, otherwise binary. The only exception are XML applications. You can match those by looking for +xml at the end of the file type.
To list text file names in current dir/subdirs:
grep -rIl ''
Binaries:
grep -rIL ''
To check for a particular file:
grep -qI '' FILE
then, exit status '0' would mean the file is a text; '1' - binary.
To check:
echo $?
Key option is this:
-I Process a binary file as if it did not contain matching data;
Other options:
-r, --recursive
Read all files under each directory, recursively;
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would normally have been printed.
-L, --files-without-match
Suppress normal output; instead print the name of each input file from which no output would normally have been printed.
-q, --quiet, --silent
Quiet; do not write anything to standard output. Exit immediately with zero status if any match is found, even if an error was detected.
Perl has a decent heuristic. Use the -B operator to test for binary (and its opposite, -T to test for text). Here's shell a one-liner to list text files:
$ find . -type f -print0 | perl -0nE 'say if -f and -s _ and -T _'
(Note that those underscores without a preceding dollar are correct (RTFM).)
Well, if you are just inspecting the entire file, see if every character is printable with isprint(c). It gets a little more complicated for Unicode.
To distinguish a unicode text file, MSDN offers some great advice as to what to do.
The gist of it is to first inspect up to the first four bytes:
EF BB BF UTF-8
FF FE UTF-16, little endian
FE FF UTF-16, big endian
FF FE 00 00 UTF-32, little endian
00 00 FE FF UTF-32, big-endian
That will tell you the encoding. Then, you'd want to use iswprint(c) for the rest of the characters in the text file. For UTF-8 and UTF-16, you need to parse the data manually since a single character can be represented by a variable number of bytes. Also, if you're really anal, you'll want to use the locale variant of iswprint if that's available on your platform.
Its an old topic, but maybe someone will find this useful.
If you have to decide in a script if something is a file then you can simply do like this :
if file -i $1 | grep -q text;
then
.
.
fi
This will get the file type, and with a silent grep you can decide if its a text.
You can use libmagic which is a library version of the Unix file command line (source).
There are wrappers for many languages:
Python
.NET
Nodejs
Ruby
Go
Rust
Most programs that try to tell the difference use a heuristic, such as examining the first n bytes of the file and seeing if those bytes all qualify as 'text' or not (i.e., do they all fall within the range of printable ASCII charcters). For finer distiction there's always the 'file' command on UNIX-like systems.
One simple check is if it has \0 characters. Text files don't have them.
As previously stated *nix operating systems have this ability within the file command. This command uses a configuration file that defines magic numbers contained within many popular file structures.
This file, called magic was historically stored in /etc, although this may be in /usr/share on some distributions. The magic file defines offsets of values known to exist within the file and can then examine these locations to determine the type of the file.
The structure and description of the magic file can be found by consulting the relevant manual page (man magic)
As for an implementation, well that can be found within file.c itself, however the relevant portion of the file command that determines whether it is readable text or not is the following
/* Make sure we are dealing with ascii text before looking for tokens */
for (i = 0; i < nbytes - 1; i++) {
if (!isascii(buf[i]) ||
(iscntrl(buf[i]) && !isspace(buf[i]) &&
buf[i] != '\b' && buf[i] != '\032' && buf[i] != '\033'
)
)
return 0; /* not all ASCII */
}