In my file somehow  is getting added. I am not sure what it is and how it is getting added.
12345AÂ 210Â CBCDEM
I want to remove this character from the file . I tried basic sed command to get it remove but unsuccessful.
sed -i -e 's/\Â//g'
I also read that dos2unix will do the job but unfortunately that also didn't work .Assuming it was hex character I also tried to remove it using hex value sed -i 's/\xc2//g' but that also didnt work
I really want to understand what this character is and how it is getting added. Moreover , is there possible way to delete all such characters in a file .
Adding encoding details :--
file test.txt
test.txt: ISO-8859 text
echo $LANG
en_US.UTF-8
OS Details :--
uname -a
Linux vm-testmachine-001 3.10.0-693.11.1.el7.x86_64 #1 SMP Fri Oct 27 05:39:05 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Regards.
It looks like you have an encoding mismatch between the program that writes the file (in some part of ISO-8859) and the program reading the file (assuming it to be UTF-8). This is a textbook use-case for iconv. In fact the sample in the man-page is almost exactly applicable to your case:
iconv -f iso-8859-1 -t utf-8 test.txt
iconv is a fairly standard program on almost every Unix distribution I have seen, so you should not have any issues here.
Based on the fact that you appear to be writing with English as your primary language, you are probably looking for iso-8859-1, which is quite popular apparently.
If that does not fix your issue, You probably need to find the proper encoding for the output of your database. You can do
iconv -l
to get a list of encodings available for iconv, and use the one that works for you. Keep in mind that the output of file saying ISO-8859 text is not absolute. There is no way to distinguish things like pure ASCII and UTF-8 in many cases. If I am not mistaken, file uses heuristics based on frequencies of character codes in the file to determine the encoding. It is quite liable to make a mistake if the sample is small and/or ambiguous.
If you want to save the output of iconv and your version supports the -o flag, you can use it. Otherwise, use redirection, but carefully:
TMP=$(mktemp)
iconv -f iso-8859-1 -t utf-8 test.txt > "$TMP" && mv "$TMP" test.txt
(Sorry in advance - my english is very bad)
I encrypted a couple of *.ts files in the past. Usually, there is a *.m3u8 with the necessary information, a key-file and the *.ts chunks.
Example:
File sources:
https://support.jwplayer.com/customer/portal/articles/1430261-aes-content-protection
M3U8:
#EXTM3U
#EXT-X-VERSION:1
## Created with Unified Streaming Platform(version=1.6.7)
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-ALLOW-CACHE:NO
#EXT-X-TARGETDURATION:11
#EXT-X-KEY:METHOD=AES-128,URI="oceans.key"
#EXTINF:11, no desc
oceans_aes-audio=65000-video=236000-1.ts
...
What I do next is, download the key file "oceans.key" that is specified in #EXT-X-KEY and create a hexdump of the content:
cat oceans.key | hexdump -e '16/1 "%02x" "\n"'
f571bfecfd1ab9adb05ce6fa030efd81
With the sequence number converted to HEX and left-padding 0 to fill up the '16 octet', I get the initial vector and execute this statement to decrypt the *.ts file:
openssl aes-128-cbc -d -in oceans_aes-audio\=65000-video\=236000-1.ts -out decrypted.ts -K f571bfecfd1ab9adb05ce6fa030efd81 -iv 00000000000000000000000000000001
It works like a charm. To make a long story short, I've some trouble get these files decrypted:
https://www.sendspace.com/filegroup/ajezM9HbgI3OI%2BY9%2BvUdJNnxYyXqGXX6K6jsNcrzYNc
The file named 'security' is the key file. When I hex-dump its content it is 17 octets instead of 16. Run the openssl command I get the following output:
hex string is too long
invalid hex key value
I compared the content of the file with some other files from this host and recognized that the last character is always the same (0x5e / ^). So, I removed it and tried again - my guess was that it is some kind of escape sequence / line-ending. No luck so far. The result is:
bad decrypt
140050579783384:error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt:evp_enc.c:529:
Here the complete command:
openssl aes-128-cbc -d -in media_1496546273.ts -out media_1496546273.decrypt.ts -K aaf36c79be1fad3977a4e8b19e48f038 -iv 00000000000000000000000000008E97
I thought about padding, nopadding, different cipher, ... No result so far. Chrome did play the files in flowplayer without any issues.
I would appreciate any help and hint to get on with this.
BR
I'm developing a pre-commit hook to avoid committing files with non-ascii chars, it works as well from unix system, using the below REGEX:
grep -P -n '[\x80-\xFF]' /tmp/app.txt
Now the issue that is giving me a lot of pain is that when i commit from windows, the result of the grep change, giving me a lot of char more than non ascii chars...
Does someone know how to fix this? I really try a lot of different things..
strings -n 1 filename will show the normal characters, but what if you only want to see the kind of file? file filename will show the kind of file but I am afraid it won't work for you.
You might try something like
cat /tmp/app.txt | tr -d "[:print:]\r\n" | wc -c
or avoiding the cat
tr -d "[:print:]\r\n" < /tmp/app.txt | wc -c
I have a string in different ranges :
WATSON_AJAY_AB04_DOTHING.data
WATSON_NAVNEET_CK4_DOTHING.data
WATSON_PRASHANTH_KJ56_DOTHING.data
WATSON_ABHINAV_KD323_DOTHING.data
On these above string how can I extract
AB04,CK4,KJ56,KD323
in Unix?
echo "$string" | cut -d'_' -f3
You could use sed or grep for this task. But since the string is so simple, I dont think you will need to.
One method is to use the bash 'cut' command. Below is an example directly on the BASH shell/command line:
jimm#pi$ string='WATSON_AJAY_AB04_DOTHING.data'
jimm#pi$ cut -d '_' -f 3 <<< "$string"
AB04 <-- outputs the result directly
(edit: of course Lucas' answer above is also a quick 'one-liner' that does the same thing as above - he beat me to it) :)
The cut will take an _ character as the delimiter (the -d '_' part), then display the 3rd slice of the string (the -f 3 part).
Or, if you want to output that 3rd slice from a list of content (using your list above), you can write a simple BASH script.
First, save the lines above ('WATSON...etc') into something like text.txt. Then open up your favorite text editor and type:
#!/bin/sh
cut -d '_' -f 3 < $1
Save that script to some useful name like slice.sh, and make sure it is executable with something like chmod 775 slice.sh.
Then at the command line you can execute the script against your text file, and immediately get an output of those parts of the file you want (in this case the third set of text, separated by the _ character):
$ ./slice.sh text.txt
AB04
CK4
KJ56
KD323
Hope that helps! Bear in mind that the commands above may vary a bit, depending on the flavor of *nix you are using, but it should at least point you in the right direction.
I would like to generate a random filename in unix shell (say tcshell). The filename should consist of random 32 hex letters, e.g.:
c7fdfc8f409c548a10a0a89a791417c5
(to which I will add whatever is neccesary). The point is being able to do it only in shell without resorting to a program.
Assuming you are on a linux, the following should work:
cat /dev/urandom | tr -cd 'a-f0-9' | head -c 32
This is only pseudo-random if your system runs low on entropy, but is (on linux) guaranteed to terminate. If you require genuinely random data, cat /dev/random instead of /dev/urandom. This change will make your code block until enough entropy is available to produce truly random output, so it might slow down your code. For most uses, the output of /dev/urandom is sufficiently random.
If you on OS X or another BSD, you need to modify it to the following:
cat /dev/urandom | env LC_CTYPE=C tr -cd 'a-f0-9' | head -c 32
why do not use unix mktemp command:
$ TMPFILE=`mktemp tmp.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX` && echo $TMPFILE
tmp.MnxEsPDsNUjrzDIiPhnWZKmlAXAO8983
One command, no pipe, no loop:
hexdump -n 16 -v -e '/1 "%02X"' -e '/16 "\n"' /dev/urandom
If you don't need the newline, for example when you're using it in a variable:
hexdump -n 16 -v -e '/1 "%02X"' /dev/urandom
Using "16" generates 32 hex digits.
uuidgen generates exactly this, except you have to remove hyphens. So I found this to be the most elegant (at least to me) way of achieving this. It should work on linux and OS X out of the box.
uuidgen | tr -d '-'
As you probably noticed from each of the answers, you generally have to "resort to a program".
However, without using any external executables, in Bash and ksh:
string=''; for i in {0..31}; do string+=$(printf "%x" $(($RANDOM%16)) ); done; echo $string
in zsh:
string=''; for i in {0..31}; do string+=$(printf "%x" $(($RANDOM%16)) ); dummy=$RANDOM; done; echo $string
Change the lower case x in the format string to an upper case X to make the alphabetic hex characters upper case.
Here's another way to do it in Bash but without an explicit loop:
printf -v string '%X' $(printf '%.2s ' $((RANDOM%16))' '{00..31})
In the following, "first" and "second" printf refers to the order in which they're executed rather than the order in which they appear in the line.
This technique uses brace expansion to produce a list of 32 random numbers mod 16 each followed by a space and one of the numbers in the range in braces followed by another space (e.g. 11 00). For each element of that list, the first printf strips off all but the first two characters using its format string (%.2) leaving either single digits followed by a space each or two digits. The space in the format string ensures that there is then at least one space between each output number.
The command substitution containing the first printf is not quoted so that word splitting is performed and each number goes to the second printf as a separate argument. There, the numbers are converted to hex by the %X format string and they are appended to each other without spaces (since there aren't any in the format string) and the result is stored in the variable named string.
When printf receives more arguments than its format string accounts for, the format is applied to each argument in turn until they are all consumed. If there are fewer arguments, the unmatched format string (portion) is ignored, but that doesn't apply in this case.
I tested it in Bash 3.2, 4.4 and 5.0-alpha. But it doesn't work in zsh (5.2) or ksh (93u+) because RANDOM only gets evaluated once in the brace expansion in those shells.
Note that because of using the mod operator on a value that ranges from 0 to 32767 the distribution of digits using the snippets could be skewed (not to mention the fact that the numbers are pseudo random in the first place). However, since we're using mod 16 and 32768 is divisible by 16, that won't be a problem here.
In any case, the correct way to do this is using mktemp as in Oleg Razgulyaev's answer.
Tested in zsh, should work with any BASH compatible shell!
#!/bin/zsh
SUM=`md5sum <<EOF
$RANDOM
EOF`
FN=`echo $SUM | awk '// { print $1 }'`
echo "Your new filename: $FN"
Example:
$ zsh ranhash.sh
Your new filename: 2485938240bf200c26bb356bbbb0fa32
$ zsh ranhash.sh
Your new filename: ad25cb21bea35eba879bf3fc12581cc9
Yet another way[tm].
R=$(echo $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM | md5 | cut -c -8)
FILENAME="abcdef-$R"
This answer is very similar to fmarks, so I cannot really take credit for it, but I found the cat and tr command combinations quite slow, and I found this version quite a bit faster. You need hexdump.
hexdump -e '/1 "%02x"' -n32 < /dev/urandom
Another thing you can add is running the date command as follows:
date +%S%N
Reads nonosecond time and the result adds a lot of randomness.
The first answer is good but why fork cat if not required.
tr -dc 'a-f0-9' < /dev/urandom | head -c32
Grab 16 bytes from /dev/random, convert them to hex, take the first line, remove the address, remove the spaces.
head /dev/random -c16 | od -tx1 -w16 | head -n1 | cut -d' ' -f2- | tr -d ' '
Assuming that "without resorting to a program" means "using only programs that are readily available", of course.
If you have openssl in your system you can use it for generating random hex (also it can be -base64) strings with defined length. I found it pretty simple and usable in cron in one line jobs.
openssl rand -hex 32
8c5a7515837d7f0b19e7e6fa4c448400e70ffec88ecd811a3dce3272947cb452
Hope to add a (maybe) better solution to this topic.
Notice: this only works with bash4 and some implement of mktemp(for example, the GNU one)
Try this
fn=$(mktemp -u -t 'XXXXXX')
echo ${fn/\/tmp\//}
This one is twice as faster as head /dev/urandom | tr -cd 'a-f0-9' | head -c 32, and eight times as faster as cat /dev/urandom | tr -cd 'a-f0-9' | head -c 32.
Benchmark:
With mktemp:
#!/bin/bash
# a.sh
for (( i = 0; i < 1000; i++ ))
do
fn=$(mktemp -u -t 'XXXXXX')
echo ${fn/\/tmp\//} > /dev/null
done
time ./a.sh
./a.sh 0.36s user 1.97s system 99% cpu 2.333 total
And the other:
#!/bin/bash
# b.sh
for (( i = 0; i < 1000; i++ ))
do
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | head -c 32 > /dev/null
done
time ./b.sh
./b.sh 0.52s user 20.61s system 113% cpu 18.653 total
If you are on Linux, then Python will come pre-installed. So you can go for something similar to the below:
python -c "import uuid; print str(uuid.uuid1())"
If you don't like the dashes, then use replace function as shown below
python -c "import uuid; print str(uuid.uuid1()).replace('-','')"