Remove extra characters from diff output

Remove extra characters from diff output - unix

I have the below output from unix:
$ diff -y --suppress-common-lines backup.txt newfile.txt
> `jjj' int,
i need only jjj : int as output.
tried the below didnt work as expected:
$ diff -y --suppress-common-lines backup.txt newfile.txt | grep -i '>' |tr -d '[>]' |sed 's/,//g'

suggesting to try gawk script:
diff -y --suppress-common-lines backup.txt newfile.txt | gawk '{print $1 ":" $2}' FPAT="[[:alnum:]]+"

The most common reasons for this not working are:
Your file is encoded as a non ASCII file, most commonly in UTF-8.
(Save the text files as ASCII.)
You are running this in a command shell with colors.
(Colors are actually ANSI characters and messes up sed.)
You have encoded your file with a different EOL than used in your *nix OS (\n), such as \r\n (Windows) or \r (MacOS).
There are hidden TAB (\t) characters in the file.
After you have fixed the above, try this:
diff -Ewy -r --suppress-common-lines -aB -W 512 file.txt file2.txt | tr -d '[>]'

Related

How to remove ^M characters from file in aix?

I have tried following commands, but they don't work. sed isn't installed and hence doesn't work. Same goes for dos2unix.
awk 'sub(/^M/,"")' finename
cat finename | sed 's/^M//’ > finename
awk '{sub(/^M/,"")}1' finename > finename
tr -d $'\r' < finename
tr -d '\015' < finename > finename
awk 'sub(/^M/,"");1' finename

This command worked :
tr -d '\r' < filename > new_file

The easiest way is the dos2unix way. In case it's not installed, you might try this:
sudo apt install dos2unix
In case the installation of dos2unix is not permitted, you might try the following command:
sed 's/\r//' input > output
If sed is not working too, you might go for the following awk solution:
awk '{sub(/^M/,"")}1' input > output
(Afterwards you just rename output back to input)

if the file hasn't been pre-mangled by cat into caret notation of "^M" for \r, one could try
{m,n,g}awk 3 ORS= RS='\r'
if it has already been mangled,
gawk -Pe/-ce NF=NF OFS= FS='[\\^]M' # these 2 gawk modes act up;
# switching to FS instead of RS
gawk/nawk 6 ORS= RS='\\^M'
mawk 9 ORS= RS='\^M'

Append "/" to end of directory

Completely noob question but, using ls piped to grep, I need to find files or directories that have all capitals in their name, and directories need to have "/" appended to indicate that it is a directory. Trying to append the "/" is the only part I am stuck on. Again, I apologize for the amateur question. I currently have ls | grep [A-Z] and the example out should be: BIRD, DOG, DOGDIR/

It's an interesting question because it's a somewhat difficult thing to accomplish with a bash one-liner.
Here's what I came up with. It doesn't seem very elegant, but I'm not sure how to improve.
find /animals -type d -or -type f \
| grep '/[A-Z]*$' \
| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' \
| sed -e 's/, *$//'; echo
I'll break that down for you
find /animals -type d -or -type f writes out, once per line, the directories and files it found in /animals (see below for my test environment dockerfile - I created /animals to match your desired output). Find can't do a regex match as far as I know on the name, so...
grep '/[A-Z]*$' filter's find's output so that only paths are shown where the last part of the file or directory name, after the final /, is all uppercase
xargs -I + bash -c '...' when you're in a shell and you want to use a "for" loop, chances are what you should be using is xargs. Learn it, know it, love it. xargs takes its input, separated by default by $IFS, and runs the command you give it for each piece of input . So this is going to run a bash shell for each path. that passed the grep filter. In my case, -I + will make xargs replace the literal '+' character with its current input filename. -I also makes it pass one at a time through xargs. For more information, see the xargs manual page.
'echo -n $(basename +)$( test -d + && echo -n /),\\ ' this is the inner bash script that will be run by xargs for each path that got through grep.
basename + cuts the directory component off the path; from your example output you don't want eg /animals/DOGDIR/, you want DOGDIR/. basename is the program that trims the directories for us.
test -d + && echo -n / checks to see whether + (remember xargs will replace it with filename) is a directory ,and if so, runs echo -n /. the -n argument to echo suppresses the newline, important to get the output in the CSV format you specified.
now we can put it all together to see that we're echo -n the output of basename + , with / appended, if it's a directory, and then , appended to that. All the echos run with -n to suppress newlines to keep output CSV looking.
| sed -e 's/, *$//'; echo is purely for formatting. Adding , to each individual output was an easy way to get the CSV, but it leaves us with a final , at the end of the list. The sed invocation removes , followed by any number of spaces at the end of the output so far - eg the entire output from all the xargs invocations. And since we never did output a newline at the end of that output, the final echo is adding that.
Usually in unix shells, you probably wouldn't want a CSV style output. You'd probably instead want a newline-separated output in most cases, one matching file per line, and that would be somewhat simpler to do because you wouldn't need all that faffing with -n and , to make it CSV style. But, valid requirement if the need is there.
FROM debian
RUN mkdir -p /animals
WORKDIR /animals
RUN mkdir -p DOGDIR lowerdir && touch DOGDIR/DOG DOGDIR/lowerDOG2 lowerdir/BIRD
ENTRYPOINT [ "/bin/bash" ]
CMD [ "-c" , "find /animals -type d -or -type f | grep '/[A-Z]*$'| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' | sed -e 's/, *$//'; echo"]
$ docker run --rm test
BIRD, DOGDIR/, DOG

You can start looking at
ls -F | grep -v "[[:lower:]]"
I did not add something for a comma-seperated line, because this is the wrong method: Parsing ls should be avoided ! It will go wrong for filenames like
I am a terribble filename,
with newlines inside me,
and the ls command combined with grep
will only show the last line
BECAUSE THIS LINE HAS NO LOWERCASE CHARACTERS
To get the files without a pipe, you can use
shopt -s extglob
ls -dp +([[:upper:]])
shopt -u extglob
An explanation of the extglob and uppercase can be found at https://unix.stackexchange.com/a/389071/57293
When you want the output in one line, you can get troubles with filenames that have newlines or commas in its name. You might want something like
# parsing ls, yes wrong and failing for some files
ls -dp +([[:upper:]]) | tr "\n" "," | sed 's/,$/\n/'

Unable to remove control # from file

I have a file which has ^# in it and I am unable to remove it using sed or replace command in python. I can see ^# only when I open the file in vi editor. Please suggest. Below is what i tried using sed.
sed 's/^#/?/g' filename

Tested on Linux, not sure if syntax varies elsewhere, try
$ printf 'abc\0baz\n' | cat -v
abc^#baz
$ printf 'abc\0baz\n' | tr -d '\0' | cat -v
abcbaz
tr will delete all ASCII NUL characters from input.. cat -v is used here to highlight non-printing characters
for file input, use tr -d '\0' <filename
GNU sed (and possibly few other implementations) allow to use hex value to represent a character
$ printf 'abc\0baz\n' | sed 's/\x00//g' | cat -v
abcbaz
so, for in-place editing, use sed -i 's/\x00//g' filename (See also: sed in-place flag that works both on Mac (BSD) and Linux )

using awk to get column values and then running another command on values and printing them

I've always used Stack Overflow to get help with issues but this is my first post. I am new to UNIX scripting and I was given a task to get values of column two and then run a command on them. The command I am suppose to run is 'echo -n "$2" | openssl dgst -sha1;' which is a function to hash a value. My problem is not hashing one value, but hashing them all and then printing them. Can someone maybe help me figure this out? This is how I am starting but I think the path I am going is wrong.
NOTE: this is a CSV text file and I know I need to use AWK command for this.
awk 'BEGIN { FS = "," } ; { print $2 }'
while [ "$2" != 0 ];
do
echo -n "$2" | openssl dgst -sha1
done
This prints the second column in it's entirety and also print some type of hashed value.
Sorry for the long first post, just trying to be as specific as possible. Thanks!

You don't really need awk just for extracting the second column. You can do by using bash read built in and setting the IFS to the delimiter.
while IFS=, read -ra line; do
[[ ${line[1]} != 0 ]] && echo "${line[1]}" | openssl dgst -sha1
done < inputFile
You should probably post some sample input data and the error you are getting so that someone can debug your existing code better.

This will do the trick:
$ awk '{print $2}' file | xargs -n1 openssl dgst -sha1
Use awk to print the second field in the file and xargs with the -n1 to pass each record separately to openssl.
If by CSV you mean each record is seperated by a comma then you need to add -F, to awk.
$ awk -F, '{print $2}' file | xargs -n1 openssl dgst -sha1

Count number of blank lines in a file

In count (non-blank) lines-of-code in bash they explain how to count the number of non-empty lines.
But is there a way to count the number of blank lines in a file? By blank line I also mean lines that have spaces in them.

Another way is:
grep -cvP '\S' file
-P '\S'(perl regex) will match any line contains non-space
-v select non-matching lines
-c print a count of matching lines
If your grep doesn't support -P option, please use -E '[^[:space:]]'

One way using grep:
grep -c "^$" file
Or with whitespace:
grep -c "^\s*$" file

You can also use awk for this:
awk '!NF {sum += 1} END {print sum}' file
From the manual, "The variable NF is set to the total number of fields in the input record". Since the default field separator is the space, any line consisting in either nothing or some spaces will have NF=0.
Then, it is a matter of counting how many times this happens.
Test
$ cat a
aa dd
ddd
he llo
$ cat -vet a # -vet to show tabs and spaces
aa dd$
$
ddd$
$
^I$
he^Illo$
Now let's' count the number of blank lines:
$ awk '!NF {s+=1} END {print s}' a
3

grep -v '\S' | wc -l
(On OSX the Perl expressions are not available, -P option)

grep -cx '\s*' file
or
grep -cx '[[:space:]]*' file
That is faster than the code in Steve's answer.

Using Perl one-liner:
perl -lne '$count++ if /^\s*$/; END { print int $count }' input.file

To count how many useless blank lines your colleague has inserted in a project you can launch a one-line command like this:
blankLinesTotal=0; for file in $( find . -name "*.cpp" ); do blankLines=$(grep -cvE '\S' ${file}); blankLinesTotal=$[${blankLines} + ${blankLinesTotal}]; echo $file" has" ${blankLines} " empty lines." ; done; echo "Total: "${blankLinesTotal}
This prints:
<filename0>.cpp #blankLines
....
....
<filenameN>.cpp #blankLines
Total #blankLinesTotal

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex