Change specific column if equal to something AWK - unix

I have a list of search phrases. I'm trying to replace the third column to N/A if it is equal to "error". I used the following code to sucessfully do this on the second column. So, I'm not sure why it isn't working on the third column. Any thoughts?
data
protector new ipad,0,error
60 led lcd television,0,error
boost mobile new phone 2013,0,error
seagate st320014a,0,error
awk -F, '{$3=($3=="error"?"N/A":$3)}1' OFS=, nTotal.csv > n3Total.csv

Your awk command is correct, not efficient but correct. However, there could be other reasons as to why it may not be working.
There could be a trailing space after the error. Since you testing an exact match of the third column, it could fail.
You files are dos formatted. If you made the files on windows machine and are using it on unix/linux machines then you need to convert those line endings. Doing cat -vet will show ^M characters at the end. You can use dos2unix or similar utilities to convert it to unix format.

Try this awk
awk -F, '$3=="error" {$3="N/A"}1' OFS=, file
data
protector new ipad,0,N/A
60 led lcd television,0,N/A
boost mobile new phone 2013,0,N/A
seagate st320014a,0,N/A
Your solution gives:
data,,
protector new ipad,0,N/A
60 led lcd television,0,N/A
boost mobile new phone 2013,0,N/A
seagate st320014a,0,N/A
You see the extra ,, after data, since it creates two extra field in the way you are testing.

Related

UNIX matching pattern and extraction

I'm new to unix. I have a file which has network connection details.I am trying to extract only the hostname and port number from the file using shell script.The data is like this "(example.easyway.com=(description=(address_list=(protocol=tcp)(host=184.43.35.345)(port=1234))(connect=port))"
I've 100 lines of connection information like this.I have to extract only the Host name and port and paste it in a new file. can anyone guide me to do this?
There a different ways in Unix to do so, something like
sed 's/^..\([^=]*\)=.*port=\([^)]*\).*/\1 \2/' file
I think you will not understand this and want something easier now. You can tryit with some steps, checking after each step:
cut -d= -f1,7 file | cut -d")" -f1 | cut -c2-
The easiest way is, when you are unfamiliar with these tool, is opening the file in some editor, replace global the string =(description=(address_list=(protocol=tcp)(host= by a space (or use regular expressions in your editor), the same for ))(connect=port)) and sit for 10 minutes to edit te remainig part of the 100 lines.
That looks like Oracle TNS configuration to me. Presuming that host always comes before port this call out to Perl would to the trick
perl -ne 'print "$1:$2\n" if(/host=([\w\.-]+).*port=(\d+)/)' < my-tns-config.txt
If the order of port and host is unpredictable then this would work
perl -ne 'print "$1:$2\n" if(/host=([\w\.-]+).*port=(\d+)|port=(\d+).*host=([\w\.-.]+)/)' < my-tns-config.txt
Check https://regex101.com/ or https://regexper.com for an explanation of those regular expressions.
M.

Different results from awk and nawk

I just ran these two commands on a file having around 250 million records.
awk '{if(substr($0,472,1)=="9") print $0}' < file1.txt >> file2.txt
and
nawk '{if(substr($0,472,1)=="9") print $0}' < file1.txt >> file2.txt
The record length is 482. The first command gave the correct number of records in file2.txt
i.e.; 60 million but the nawk command gives only 4.2 million.
I am confused and would like to know if someone has come across issue like this. How exactly this simple command being treated in a different way internally? Is there a buffer which can hold only up to certain number of bytes while using nawk?
would appreciate if someone can throw some light on this.
My OS details are
SunOS <hostname> 5.10 Generic_147148-26 i86pc i386 i86pc
The difference probably lies on the buffer limit of Nawk. One of the records (lines) found in your input file has probably exceeded it.
This crucial line can be found in awk.h:
#define RECSIZE (8 * 1024) /* sets limit on records, fields, etc., etc. */
Your command can be reduced to just this:
awk 'substr($0,472,1)==9'
On Solaris (which you are on) when you run awk by default you are running old, broken awk (/usr/bin/awk) so I suspect that nawk is the one producing the correct result.
Run /usr/xpg4/bin/awk with the same script/arguments and see which of your other results it's output agrees with.
Also, check if your input file was created on Windows by running dos2unix on it and see if it's size changes and, if so, re-run your awk commands on the modified files. If it was created on Windows then it will have some control-Ms in there that could be causing chaos.

Converting from 16-bit WAV to GSM using SOX

I'm currently working on some telephony integration with Asterisk and a PHP web interface. I want to give the user an option to upload their own custom greeting in a wav file, and then once it's on the server convert the wav to a gsm file at 8000hz. Currently, i'm trying to use sox to accomplish this.
However, it seems like when I convert between anything other than an 8khz sav to gsm, the gsm file is severely distorted. It's almost like it slows down the file by a factor of 10 (a 3 second intro in wav format turns into a 30 second gsm file) I've tried several combinations of speed and resampling to no avail. Ideally, I would like to take any wav file that's uploaded and convert it, without putting too much responsibility on the user to encode it properly. I'm definitely not an audiophile, so if anybody could point me in the right direction it would be much appreciated.
This is the command that I use to convert regular 16-bit .wav files to 8-bit mono .gsm files (works fine):
sox input.wav -r 8000 -c1 output.gsm lowpass 4000 compand 0.02,0.05 -60,-60,-30,-10,-20,-8,-5,-8,-2,-8 -8 -7 0.05
I have seen cases with sox where I needed to break up changes and pipe them one after another rather then in one command.
What does your sox cmd look like?
Could you first convert the wav to 8khz, then transcode, piping the output from the one sox call to the other?
I use
sox foo.wav -r 8000 -c1 foo.gsm resample -ql
Just a little late, I current use:
sox somefile.wav -r 8000 -c1 output.gsm

Remove lines which are between given patterns from a file (using Unix tools)

I have a text file (more correctly, a “German style“ CSV file, i.e. semicolon-separated, decimal comma) which has a date and the value of a measurement on each line.
There are stretches of faulty values which I want to remove before further work. I'd like to store these cuts in some script so that my corrections are documented and I can replay those corrections if necessary.
The lines look like this:
28.01.2005 14:48:38;5,166
28.01.2005 14:50:38;2,916
28.01.2005 14:52:38;0,000
28.01.2005 14:54:38;0,000
(long stretch of values that should be removed; could also be something else beside 0)
01.02.2005 00:11:43;0,000
01.02.2005 00:13:43;1,333
01.02.2005 00:15:43;3,250
Now I'd like to store a list of begin and end patterns like 28.01.2005 14:52:38 + 01.02.2005 00:11:43, and the script would cut the lines matching these begin/end pairs and everything that's between them.
I'm thinking about hacking an awk script, but perhaps I'm missing an already existing tool.
Have a look at sed:
sed '/start_pat/,/end_pat/d'
will delete lines between start_pat and end_pat (inclusive).
To delete multiple such pairs, you can combine them with multiple -e options:
sed -e '/s1/,/e1/d' -e '/s2/,/e2/d' -e '/s3/,/e3/d' ...
Firstly, why do you need to keep a record of what you have done? Why not keep a backup of the original file, or take a diff between the old & new files, or put it under source control?
For the actual changes I suggest using Vim.
The Vim :global command (abbreviated to :g) can be used to run :ex commands on lines that match a regex. This is in many ways more powerful than awk since the commands can then refer to ranges relative to the matching line, plus you have the full text processing power of Vim at your disposal.
For example, this will do something close to what you want (untested, so caveat emptor):
:g!/^\d\d\.\d\d\.\d\d\d\d/ -1 write tmp.txt >> | delete
This matches lines that do NOT start with a date (the ! negates the match), appends the previous line to the file tmp.txt, then deletes the current line.
You will probably end up with duplicate lines in tmp.txt, but they can be removed by running the file through uniq.
you are also use awk
awk '/start/,/end/' file
I would seriously suggest learning the basics of perl (i.e. not the OO stuff). It will repay you in bucket-loads.
It is fast and simple to write a bit of perl to do this (and many other such tasks) once you have grasped the fundamentals, which if you are used to using awk, sed, grep etc are pretty simple.
You won't have to remember how to use lots of different tools and where you would previously have used multiple tools piped together to solve a problem, you can just use a single perl script (usually much faster to execute).
And, perl is installed on virtually every unix/linux distro now.
(that sed is neat though :-)
use grep -L (print none matching lines)
Sorry - thought you just wanted lines without 0,000 at the end

script,unix,compare

I have two files ...
file1:
002009092312291100098420090922111
010555101070002956200453T+00001190.81+00001295.920010.87P
010555101070002956200449J+00003128.85+00003693.90+00003128
010555101070002956200176H+00000281.14+00000300.32+00000281
file2:
002009092410521000098420090709111
010560458520002547500432M+00001822.88+00001592.96+00001822
010560458520002547500432D+00000106.68+00000114.77+00000106
In both files in every record starting with 01, the string from 3rd char to 25th char, i.e up to alphabet is the key.
Based on this key, I have to compare two files, and if there is any record matching in file 2, then I have to replace that record in file1, or else append it if it won't match.
Well, this is a fairly unspecific (and basic) programming question. We'll be better able to help us if you explain exactly what you did and where you got stuck.
Also, it looks a bit like homework, and people are wary of giving too much help on homework problems, as it might look like cheating.
To get you started:
I'd recommend Perl to solve this, but awk or another scripting language will also do. I'd recommend against sh/bash, as they are weak on text manipulation; also combining grep et al will become rather cumbersome.
First write a Perl program that filters records starting with 01. Then extract the key and put it into a hash (a Perl structure). Then output a new, combined file as required.
Using awk get the fields from 3-25 but doing something like
awk -F "" '/^01/{print $1}' file_name | cut -c 3-25 and match the first two fields with 01 from both files and get all the lines in two different buffers and compare both the buffers using for line in in a shell script.
Whenever the line in second buffer matches the first one grep the line in second buffer in first file and replace the line in first file with the line in second. I think you need to work a bit around the logic.

Resources