Unix command to cut file and recreate new one [closed] - unix

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I have a input.txt which contains data like this
123
1234
1223
I want it to convert to another file output.txt and file should look like this
'123','1234','1223'
Can you someone please let me how it can be done in unix?

You can try this,
tr -s '\n' < input.txt | sed "s/.*/'&'/g" | tr '\n' ',' | sed 's/,$//g' > output.txt

I'm afraid I can't you with bash. Try this in Python:
InputFilepath = /path/to/input.txt
OutputFilepath = /path/to/output.txt
with open(InputFilepath, "r") as f:
words = f.read().splitlines() #may be not needed?
result = ','.join(f)
with open(OutputFilepath, "w") as g:
g.write(result)

I bet there is a cleaner way to do this but can't think of it so far.
# 1 2 3 4
sed "/^[ \t]*$/d; s/\(.*\)/'\1'/" input.txt | tr "\n" "," | sed 's/,$//'
remove blanks lines (including lines containing spaces/tabs).
add single quotes around each line
replace new-line with comma
remove trailing ,

You could use sed
cat input.txt | sed -n "s!\(.*\)!'\1'!;H;\$!b;x;s!^\n!!;s!\n!,!g;p"
Read each line in (not printing by default -p), and then append it to the hold space H - then stop for all lines except the last one \$b.
On the last line - copy the hold space into the pattern space x, ditch the first newline (the hold space has a newline in it to start with), and then replace the remaining newlines with ','. Finally print out the pattern space p.
You could use a perl script
#!/usr/bin/perl
my #lines = <>;
chomp(#lines);
print join(',', map { "\"$_\"" } #lines), "\n";
./script input.txt

Here is an awk version
awk 'NF{s=s q$0q","} END {sub(/,$/,x,s);print s}' q="'" file
'123','1234','1223'
How it works:
awk '
NF { # When line is not blank, do:
s=s q$0q","} # Chain together all data with ' before and ',
END { # End block
sub(/,$/,x,s) # Remove last ,
print s} # Print the result
' q="'" file # Helps awk to handle single quote in print, and read the file

With GNU awk for a multi-char RS:
$ awk -v RS='\n+$' -v FS='\n+' -v OFS="','" -v q="'" '{$1=$1; print q $0 q }' file
'123','1234','1223'
It just reads the whole file as one record (RS='\n+$') using sequences of contiguous newlines as the input field separator (FS='\n+') then recompiles the record using ',' as the output field separator (OFS="','") by assigning a field to itself ($1=$1), and prints the result with a ' at the front and back.

Related

Linux - Get Substring from 1st occurence of character

FILE1.TXT
0020220101
or
01 20220101
Need to extra date part from file where text starts from 2
Options tried:
t_FILE_DT1='awk -F"2" '{PRINT $NF}' FILE1.TXT'
t_FILE_DT2='cut -d'2' -f2- FILE1.TXT'
echo "$t_FILE_DT1"
echo "$t_FILE_DT2"
1st output : 0101
2nd output : 0220101
Expected Output: 20220101
Im new to linux scripting. Could some one help guide where Im going wrong?
Use grep like so:
echo "0020220101\n01 20220101" | grep -P -o '\d{8}\b'
20220101
20220101
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
grep manual
perlre - Perl regular expressions
Using any awk:
$ awk '{print substr($0,length()-7)}' file
20220101
20220101
The above was run on this input file:
$ cat file
0020220101
01 20220101
Regarding PRINT $NF in your question - PRINT != print. Get out of the habit of using all-caps unless you're writing Cobol. See correct-bash-and-shell-script-variable-capitalization for some reasons.
The 2 in your scripts is telling awka and cut to use the character 2 as the field separator so each will carve up the input into substrings everywhere a 2 occurs.
The 's in your question are single quotes used to make strings literal, you were intending to use backticks, `cmd`, but those are deprecated in favor of $(cmd) anyway.
I would instead of looking for "after" the 2 .. (not having to worry about whether there is a space involved as well) )
Think instead about extracting the last 8 characters, which you know for fact is your date ..
input="/path/to/txt/file/FILE1.TXT"
while IFS= read -r line
do
# read in the last 8 characters of $line .. You KNOW this is the date ..
# No need to worry about exact matching at that point, or spaces ..
myDate=${line: -8}
echo "$myDate"
done < "$input"
About the cut and awk commands that you tried:
Using awk -F"2" '{PRINT $NF}' file will set the field separator to 2, and $NF is the last field, so printing the value of the last field is 0101
Using cut -d'2' -f2- file uses a delimiter of 2 as well, and then print all fields starting at the second field, which is 0220101
If you want to match the 2 followed by 7 digits until the end of the string:
awk '
match ($0, /2[0-9]{7}$/) {
print substr($0, RSTART, RLENGTH)
}
' file
Output
20220101
The accepted answer shows how to extract the first eight digits, but that's not what you asked.
grep -o '2.*' file
will extract from the first occurrence of 2, and
grep -o '2[0-9]*' file
will extract all the digits after every occurrence of 2. If you specifically want eight digits, try
grep -Eo '2[0-9]{7}'
maybe also with a -w option if you want to only accept a match between two word boundaries. If you specifically want only digits after the first occurrence of 2, maybe try
sed -n 's/[^2]*\(2[0-9]*\).*/\1/p' file

Need help parsing a file via UNIX commands

I have a file that has lines that look like this
LINEID1:FIELD1=ABCD,&FIELD2-0&FIELD3-1&FIELD4-0&FIELD9-0;
LINEID2:FIELD1=ABCD,&FIELD5-1&FIELD6-0;
LINEID3:FIELD1=ABCD,&FIELD7-0&FIELD8-0;
LINEID1:FIELD1=XYZ,&FIELD2-0&FIELD3-1&FIELD9-0
LINEID3:FIELD1=XYZ,&FIELD7-0&FIELD8-0;
LINEID1:FIELD1=PQRS,&FIELD3-1&FIELD4-0&FIELD9-0;
LINEID2:FIELD1=PQRS,&FIELD5-1&FIELD6-0;
LINEID3:FIELD1=PQRS,&FIELD7-0&FIELD8-0;
I'm interested in only the lines that begin with LINEID1 and only some elements (FIELD1, FIELD2, FIELD4 and FIELD9) from that line. The output should look like this (no & signs.can replace with |)
FIELD1=ABCD|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ|FIELD2-0|FIELD9-0;
FIELD1=PQRS|FIELD4-0|FIELD9-0;
If additional information is required, do let me know, I'll post them in edits. Thanks!!
This is not exactly what you asked for, but no-one else is answering and it is pretty close for you to get started with!
awk -F'[&:]' '/^LINEID1:/{print $2,$3,$5,$6}' OFS='|' file
Output
FIELD1=ABCD,|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ,|FIELD2-0|FIELD9-0|
FIELD1=PQRS,|FIELD3-1|FIELD9-0;|
The -F sets the Input Field Separator to colon or ampersand. Then it looks for lines starting LINEID1: and prints the fields you need. The OFS sets the Output Field Separator to the pipe symbol |.
Pure awk:
awk -F ":" ' /LINEID1[^0-9]/{gsub(/FIELD[^1249]+[-=][A-Z0-9]+/,"",$2); gsub(/,*&+/,"|",$2); print $2} ' file
Updated to give proper formatting and to omit LINEID11, etc...
Output:
FIELD1=ABCD|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ|FIELD2-0|FIELD9-0
FIELD1=PQRS|FIELD4-0|FIELD9-0;
Explanation:
awk -F ":" - split lines into LHS ($1) and RHS ($2) since output only requires RHS
/LINEID1[^0-9]/ - return only lines that match LINEID1 and also ignores LINEID11, LINEID100 etc...
gsub(/FIELD[^1249]+[-=][A-Z0-9]+/,"",$2) - remove all fields that aren't 1, 4 or 9 on the RHS
gsub(/,*&+/,"|",$2) - clean up the leftover delimiters on the RHS
To select rows from data with Unix command lines, use grep, awk, perl, python, or ruby (in increasing order of power & possible complexity).
To select columns from data, use cut, awk, or one of the previously mentioned scripting languages.
First, let's get only the lines with LINEID1 (assuming the input is in a file called input).
grep '^LINEID1' input
will output all the lines beginning with LINEID1.
Next, extract the columns we care about:
grep '^LINEID1' input | # extract lines with LINEID1 in them
cut -d: -f2 | # extract column 2 (after ':')
tr ',&' '\n\n' | # turn ',' and '&' into newlines
egrep 'FIELD[1249]' | # extract only fields FIELD1, FIELD2, FIELD4, FIELD9
tr '\n' '|' | # turn newlines into '|'
sed -e $'s/\\|\\(FIELD1\\)/\\\n\\1/g' -e 's/\|$//'
The last line inserts newlines in front of the FIELD1 lines, and removes any trailing '|'.
That last sed pattern is a little more challenging because sed doesn't like literal newlines in its replacement patterns. To put a literal newline, a bash escape needs to be used, which then requires escapes throughout that string.
Here's the output from the above command:
FIELD1=ABCD|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ|FIELD2-0|FIELD9-0
FIELD1=PQRS|FIELD4-0|FIELD9-0;
This command took only a couple of minutes to cobble up.
Even so, it's bordering on the complexity threshold where I would shift to perl or ruby because of their excellent string processing.
The same script in ruby might look like:
#!/usr/bin/env ruby
#
while line = gets do
if line.chomp =~ /^LINEID1:(.*)$/
f1, others = $1.split(',')
fields = others.split('&').map {|f| f if f =~ /FIELD[1249]/}.compact
puts [f1, fields].flatten.join("|")
end
end
Run this script on the same input file and the same output as above will occur:
$ ./parse-fields.rb < input
FIELD1=ABCD|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ|FIELD2-0|FIELD9-0
FIELD1=PQRS|FIELD4-0|FIELD9-0;

Replacement by dictionary possible with AWK or Sed? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
You have a dictionary, Dictionary.txt, and an input file, inFile.txt. The dictionary tells you about possible translations. The solution to a similar problem in unix shell: replace by dictionary seems to hardcode things here that I cannot fully understand. You can come up with better replacement technique than dictionary but AWK/Sed script should be able to read in multiple files, in the simplest case only one dictionary file and one infile.
How to replace elegantly by dictionary with AWK or Sed?
Example
Dictionary.txt
1 one
2 two
3 three
four fyra
five fem
inFile.txt
one 1 hello hallo 2 three hallo five five
Output from the Command, we are after for the command like awk/sed {} Dictionary.txt inFile.txt
one one hello hallo two three hallo fem fem
AWK example where specifically selected the replacements but one-one replacements not working.
awk 'BEGIN {
lvl[1] = "one"
lvl[2] = "two"
lvl[3] = "three"
# TODO: this does not work
# lvl[four] = "fyra"
# lvl[five] = "fem"
# lvl[one] = "one"
# lvl["hello"] = "hello"
# lvl[hallo] = "hallo"
# lvl[three] = "three"
}
NR == FNR {
evt[$1] = $2; next
}
{
print $1, evt[$2], $3, $4, evt[$5], $6, $7, evt[$8], evt[$9]
#TODO: this dos not work, eg. one-one mapping
# print evt[$1], evt[$2], evt[$3], evt[$4], evt[$5], evt[$6], evt[$7], evt[$8], evt[$9]
}' dictionary.txt infile.txt
$ awk 'NR==FNR{map[$1]=$2;next} { for (i=1;i<=NF;i++) $i=($i in map ? map[$i] : $i) } 1' fileA fileB
one one hello hallo two three hallo fem fem
Note that it will compress any chains of contiguous white space to a single blank char. Tell us if that is an issue.
if you have gnu sed, it supports script-file with -f:
`-f SCRIPT-FILE'
`--file=SCRIPT-FILE'
Add the commands contained in the file SCRIPT-FILE to the set of
commands to be run while processing the input.
you could write your substitutions in "c.sed" for example, then
sed -f c.sed file
example c.sed:
s/1/one/g
s/2/two/g
...
EDIT
just now you didn't tag the question with awk, sure, the awk one-liner would be simpler: (with your example)
awk '$1=$2' file
test:
kent$ echo "1 one
2 two
3 three
four fyra
five fem"|awk '$1=$2'
one one
two two
three three
fyra fyra
fem fem
EDIT
This answers the original post. doesn't answer the multiple times edited and restructured question...
on top of that I get a -1 from the OP who asked this question... Damn!
Yes, much simpler in awk :
This will print both column as the value for the second column :
awk '{print $2, $2}' file
If you want to flip first with second column:
awk '{print $2, $1}' file
If ReplaceLeftWithRight_where_you_do_not_replace_things.txt contains pairs of string replacements, where any occurrence of the text in the first column should be replaced by the second column,
1 one
2 two
3 three
four fyra
five fem
then this can trivially be expressed as a sed script.
s/1/one/g
s/2/two/g
s/3/three/g
s/four/fyra/g
s/five/fem/g
and you can trivially use sed to create this sed script:
sed 's%.*%s/&/g%;s% %/%' ReplaceLeftWithRight_where_you_do_not_replace_things.txt
then pass the output of that to a second instance of sed:
sed 's%.*%s/&/%;s% %/%' ReplaceLeftWithRight_where_you_do_not_replace_things.txt |
sed -f - someFile_Where_You_Replace_Things.txt
to replace all the matches in the file someFile_Where_You_Replace_Things.txt and have the output printed to standard output.
Sadly, not all sed dialects support the -f - option to read a script from standard input, but this should work at least on most Linuxes.
Sorry if I misunderstood your problem statement.

UNIX: Replace Newline w/ Colon, Preserving Newline Before EOF

I have a text file ("INPUT.txt") of the format:
A<LF>
B<LF>
C<LF>
D<LF>
X<LF>
Y<LF>
Z<LF>
<EOF>
which I need to reformat to:
A:B:C:D:X:Y:Z<LF>
<EOF>
I know you can do this with 'sed'. There's a billion google hits for doing this with 'sed'. But I'm trying to emphasis readability, simplicity, and using the correct tool for the correct job. 'sed' is a line editor that consumes and hides newlines. Probably not the right tool for this job!
I think the correct tool for this job would be 'tr'. I can replace all the newlines with colons with the command:
cat INPUT.txt | tr '\n' ':'
There's 99% of my work done. I have a problem, now, though. By replacing all the newlines with colons, I not only get an extraneous colon at the end of the sequence, but I also lose the carriage return at the end of the input. It looks like this:
A:B:C:D:X:Y:Z:<EOF>
Now, I need to remove the colon from the end of the input. However, if I attempt to pass this processed input through 'sed' to remove the final colon (which would now, I think, be a proper use of 'sed'), I find myself with a second problem. The input is no longer terminated by a newline at all! 'sed' fails outright, for all commands, because it never finds the end of the first line of input!
It seems like appending a newline to the end of some input is a very, very common task, and considering I myself was just sorely tempted to write a program to do it in C (which would take about eight lines of code), I can't imagine there's not already a very simple way to do this with the tools already available to you in the Linux kernel.
This should do the job (cat and echo are unnecessary):
tr '\n' ':' < INPUT.TXT | sed 's/:$/\n/'
Using only sed:
sed -n ':a; $ ! {N;ba}; s/\n/:/g;p' INPUT.TXT
Bash without any externals:
string=($(<INPUT.TXT))
string=${string[#]/%/:}
string=${string//: /:}
string=${string%*:}
Using a loop in sh:
colon=''
while read -r line
do
string=$string$colon$line
colon=':'
done < INPUT.TXT
Using AWK:
awk '{a=a colon $0; colon=":"} END {print a}' INPUT.TXT
Or:
awk '{printf colon $0; colon=":"} END {printf "\n" }' INPUT.TXT
Edit:
Here's another way in pure Bash:
string=($(<INPUT.TXT))
saveIFS=$IFS
IFS=':'
newstring="${string[*]}"
IFS=$saveIFS
Edit 2:
Here's yet another way which does use echo:
echo "$(tr '\n' ':' < INPUT.TXT | head -c -1)"
Old question, but
paste -sd: INPUT.txt
Here's yet another solution: (assumes a character set where ':' is
octal 72, eg ascii)
perl -l72 -pe '$\="\n" if eof' INPUT.TXT

How do I extract lines from a file using their line number on unix?

Using sed or similar how would you extract lines from a file? If I wanted lines 1, 5, 1010, 20503 from a file, how would I get these 4 lines?
What if I have a fairly large number of lines I need to extract?
If I had a file with 100 lines, each representing a line number that I wanted to extract from another file, how would I do that?
Something like "sed -n '1p;5p;1010p;20503p'. Execute the command "man sed" for details.
For your second question, I'd transform the input file into a bunch of sed(1) commands to print the lines I wanted.
with awk it's as simple as:
awk 'NR==1 || NR==5 || NR==1010' "file"
#OP, you can do this easier and more efficiently with awk. so for your first question
awk 'NR~/^(1|2|5|1010)$/{print}' file
for 2nd question
awk 'FNR==NR{a[$1];next}(FNR in a){print}' file_with_linenr file
This ain't pretty and it could exceed command length limits under some circumstances*:
sed -n "$(while read a; do echo "${a}p;"; done < line_num_file)" data_file
Or its much slower but more attractive, and possibly more well-behaved, sibling:
while read a; do echo "${a}p;"; done < line_num_file | xargs -I{} sed -n \{\} data_file
A variation:
xargs -a line_num_file -I{} sed -n \{\}p\; data_file
You can speed up the xarg versions a little bit by adding the -P option with some large argument like, say, 83 or maybe 419 or even 1177, but 10 seems as good as any.
*xargs --show-limits </dev/null can be instructive
I'd investigate Perl, since it has the regexp facilities of sed plus the programming model surrounding it to allow you to read a file line by line, count the lines and extract according to what you want (including from a file of line numbers).
my $row = 1
while (<STDIN>) {
# capture the line in $_ and check $row against a suitable list.
$row++;
}
In Perl:
perl -ne 'print if $. =~ m/^(1|5|1010|20503)$/' file

Resources