Looping through and combining two files in UNIX - unix

This should be simple for those of you who have some programming knowledge... Unfortunately I don't.
I'm trying to iterate through a text file of image captions and add them as title tags to an html file. The image captions file has 105 captions (each is separated by a carriage return) and the gallery file has blank alt tags on each a tag (set up like alt="#"). The order of the captions corresponds with the order of the images in the html file.
So in other words... the psuedo code would be: "Loop through every line in captions.txt and for every alt="#" inside the gallery.html file, replace the # with the corresponding caption."
I'm on a Mac so I'd like to use UNIX.
Any help is greatly appreciated!
Thanks,
Mike

If all the alt="#" are on separate lines, you can use ed:
{
while read cap
do echo "/alt=\"#\"/ s//alt=\"$cap\"/"
done < captions.txt
echo wq
} | ed gallery.html
This assumes none of your captions contain a slash.

There are many ways to accomplish this goal. awk is the classic text manipulation program. (Well, awk and sed, for different purposes, but sed won't help here.)
awk '
BEGIN {
caps = ARGV[1]
delete ARGV[1]
}
/#/ {
getline cap < caps
gsub("#", cap)
}
{ print }
' captions.txt gallery.html
You could put it into a script to avoid having to type it more than once. Just start a plain text file with "#!/usr/bin/awk -f", put the "BEGIN ... { print }" below it, and give the file execute permissions.
This translates trivially into most scripting languages. Perl:
#!/usr/bin/perl -p
BEGIN { open CAPS, shift }
if (/#/) {
chomp($cap = <CAPS>);
s/#/$cap/g;
}
Almost the same in Ruby:
#!/usr/bin/ruby
caps = IO.readlines(ARGV.shift).each {|s| s.chomp!}
while gets
$_.gsub!(/#/, caps.shift) if $_ =~ /#/
print
end
And Python:
#!/usr/bin/python
import sys
caps = [s.strip() for s in file(sys.argv[1]).readlines()]
for f in [file(s, 'r') for s in sys.argv[2:]] or [sys.stdin]:
for s in f:
if s.find('#') > 0: s = s.replace('#', caps.pop(0))
print s,

Related

Implement tr and sed functions in awk

I need to process a text file - a big CSV - to correct format in it. This CSV has a field which contains XML data, formatted to be human readable: break up into multiple lines and indentation with spaces. I need to have every record in one line, so I am using awk to join lines, and after that I am using sed, to get rid of extra spaces between XML tags, and after that tr to eliminate unwanted "\r" characters.
(the first record is always 8 numbers and the fiels separator is the pipe character: "|"
The awk scrips is (join4.awk)
BEGIN {
# initialise "line" variable. Maybe unnecessary
line=""
}
{
# check if this line is a beginning of a new record
if ( $0 ~ "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]|" ) {
# if it is a new record, then print stuff already collected
# then update line variable with $0
print line
line = $0
} else {
# if it is not, then just attach $0 to the line
line = line $0
}
}
END {
# print out the last record kept in line variable
if (line) print line
}
and the commandline is
cat inputdata.csv | awk -f join4.awk | tr -d "\r" | sed 's/> *</></g' > corrected_data.csv
My question is if there is an efficient way to implement tr and sed functionality inside the awk script? - this is not Linux, so I gave no gawk, just simple old awk and nawk.
thanks,
--Trifo
tr -d "\r"
Is just gsub(/\r/, "").
sed 's/> *</></g'
That's just gsub(/> *</, "><")
mawk NF=NF RS='\r?\n' FS='> *<' OFS='><'
Thank you all folks!
You gave me the inspiration to get to a solution. It is like this:
BEGIN {
# initialize "line" variable. Maybe unnecessary.
line=""
}
{
# if the line begins with 8 numbers and a pipe char (the format of the first record)...
if ( $0 ~ "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\|" ) {
# ... then the previous record is ready. We can post process it, the print out
# workarounds for the missing gsub function
# removing extra spaces between xml tags
# removing extra \r characters the same way
while ( line ~ "\r") { sub( /\r/,"",line) }
# "<text text> <tag tag>" should look like "<text text><tag tag>"
while ( line ~ "> *<") { sub( /> *</,"><",line) }
# then print the record and update line var with the beginning of the new record
print line
line = $0
} else {
# just keep extending the record with the actual line
line = line $0
}
}
END {
# print the last record kept in line var
if (line) {
while ( line ~ "\r") { sub( /\r/,"",line) }
while ( line ~ "> *<") { sub( /> *</,"><",line) }
print line
}
}
And yes, it is efficient: the embedded version runs abou 33% faster.
And yes, it would be nicer to create a function for the postprocessing of the records in "line" variable. Now I have to write the same code twice to process the last recond in the END section. But it works, it creates the same output as the chained commands and it is way faster.
So, thanks for the inspiration again!
--Trifo

Unix: Using filename from another file

A basic Unix question.
I have a script which counts the number of records in a delta file.
awk '{
n++
} END {
if(n >= 1000) print "${completeFile}"; else print "${deltaFile}";
}' <${deltaFile} >${fileToUse}
Then, depending on the IF condition, I want to process the appropriate file:
cut -c2-11 < ${fileToUse}
But how do I use the contents of the file as the filename itself?
And if there are any tweaks to be made, feel free.
Thanks in advance
Cheers
Simon
To use as a filename the contents of a file which is itself identified by a variable (as asked)
cut -c2-11 <"$( cat $filetouse )"
// or in zsh just
cut -c2-11 <"$( < $filetouse )"
unless the filename in the file ends with one or more newline character(s), which people rarely do because it's quite awkward and inconvenient, then something like:
read -rdX var <$filetouse; cut -c2-11 < "${var%?}"
// where X is a character that doesn't occur in the filename
// maybe something like $'\x1f'
Tweaks: your awk prints the variable reference ${completeFile} or ${deltaFile} (because they're within the single-quoted awk script) not the value of either variable. If you actually want the value, as I'd expect from your description, you should pass the shell vars to awk vars like this
awk -vf="$completeFile" -vd="$deltaFile" '{n++} END{if(n>=1000)print f; else print d}' <"$deltaFile"`
# the " around $var can be omitted if the value contains no whitespace and no glob chars
# people _often_ but not always choose filenames that satisfy this
# and they must not contain backslash in any case
or export the shell vars as env vars (if they aren't already) and access them like
awk '{n++} END{if(n>=1000) print ENVIRON["completeFile"]; else print ENVIRON["deltaFile"]}' <"$deltaFile"
Also you don't need your own counter, awk already counts input records
awk -vf=... -vd=... 'END{if(NR>=1000)print f;else print d}' <...
or more briefly
awk -vf=... -vd=... 'END{print (NR>=1000?f:d)}' <...
or using a file argument instead of redirection so the name is available to the script
awk -vf="$completeFile" 'END{print (NR>=1000?f:FILENAME)}' "$deltaFile" # no <
and barring trailing newlines as above you don't need an intermediate file at all, just
cut -c2-11 <"$( awk -vf="$completeFile" -'END{print (NR>=1000?f:FILENAME)}' "$deltaFile")"
Or you don't really need awk, wc can do the counting and any POSIX or classic shell can do the comparison
if [ $(wc -l <"$deltaFile") -ge 1000 ]; then c="$completeFile"; else c="$deltaFile"; fi
cut -c2-11 <"$c"

How to remove leading and trailing " , remove leading and trailing spaces from each row each field in ksh

I have many functions in ksh scripts(which uses gawk a lot) which does many computations on files. Files are pipe delemited.
But now my source files changed. Now each field in the file comes within double quotes as below.
Also, I have to trim the leading and trailing spaces or tabs if any.
Old_Myfile.txt
Name|Designation|emlid
Alex|Software Design Engg|E0023
Corner|SDE|E0056
New_Myfile.txt
"Name"|"Designation"|"emlid"
"Alex"|"Software Design Engg"|" E0023"
" Corner "|" SDE"|" E0056 "
Please suggest ways that will be compatible to my already written script.
with sed
$ sed 's/ *" *//g' file
Name|Designation|emlid
Alex|Software Design Engg|E0023
Corner|SDE|E0056
can be combined in the awk script without this extra step as well.
This script may be over-engineered for what you need, but it will operate on each field individually (within the for-loop), in case you need to add additional logic at a later time.
BEGIN{
FS="|";
OFS="|";
}
{
for(i=1; i<=NF; i++){
gsub(/(^"[ ]*|[ ]*"$)/, "", $i);
if (i == NF) {
printf("%s\n", $i);
}
else {
printf("%s%s", $i, OFS);
}
}
}
Here's the output
$ awk -f /tmp/script.awk </tmp/input.txt
Name|Designation|emlid
Alex|Software Design Engg|E0023
Corner|SDE|E0056
If your quoted fields cannot contain |s then within your existing awk script add this as the first line:
awk '
{ gsub(/[[:space:]]*"[[:space:]]*/,"") }
<existing script>
'

Split line into multiple lines of 42 Unix after last given char

I have a text file in unix formed from multiple long lines
ALTER Tit como(titel('42423432;434235111;757567562;2354679;5543534;6547673;32322332;54545453'))
ALTER Mit como(Alt('432322;434434211;754324237562;2354679;5543534;6547673;32322332;54545453'))
I need to split each line in multiple lines of no longer than 42 characters.
The split should be done at the end of last ";", and
so my ideal output file will be :
ALTER Tit como(titel('42423432;434235111; -
757567562;2354679;5543534;6547673; -
32322332;54545453'))
ALTER Mit como(Alt('432322;434434211; -
754324237562;2354679;5543534;6547673; -
32322332;54545453'))
I used fold -w 42 givenfile.txt | sed 's/ $/ -/g'
it splits the line but doesnt add the "-" at the end of the line and doesnt split after the ";".
any help is much appreciated.
Thanks !
awk -F';' '
w{
print""
}
{
w=length($1)
printf "%s",$1
for (i=2;i<=NF;i++){
if ((w+length($i)+1)<42){
w+=length($i)+1
printf";%s",$i
} else {
w=length($i)
printf"; -\n%s",$i
}
}
}
END{
print""
}
' file
This produces the output:
ALTER Tit como(titel('42423432;434235111; -
757567562;2354679;5543534;6547673; -
32322332;54545453'))
ALTER Mit como(Alt('432322;434434211; -
754324237562;2354679;5543534;6547673; -
32322332;54545453'))
How it works
Awk implicitly loops through each line of its input and each line is divided into fields. This code uses a single variable w to keep track of the current width of the output line.
-F';'
Tell awk to break fields on semicolons.
`w{print""}
If the last line was not completed, w>0, then print a newline to terminate it before we start with a new line.
w=length($1); printf "%s",$1
Print the first field of the new line and set w according to its length.
Loop over the remaining fields:
for (i=2;i<=NF;i++){
if ((w+length($i)+1)<42){
w+=length($i)+1
printf";%s",$i
} else {
w=length($i)
printf"; -\n%s",$i
}
}
This loops over the second to final fields of this line. Whenever we reach the point where we can't print another field without exceeding the 42 character limit, we print ; -\n.
END{print""}
Print a newline at the end of the file.
This might work for you (GNU sed):
sed -r 's/.{1,42}$|.{1,41};/& -\n/g;s/...$//' file
This globally replaces 1 to 41 characters followed by a ; or 1 to 42 characters followed by end of line with -\n. The last string will have three characters too many and so they are deleted.

Using Vim, how can I make CSS rules into one liners?

I would like to come up with a Vim substitution command to turn multi-line CSS rules, like this one:
#main {
padding: 0;
margin: 10px auto;
}
into compacted single-line rules, like so:
#main {padding:0;margin:10px auto;}
I have a ton of CSS rules that are taking up too many lines, and I cannot figure out the :%s/ commands to use.
Here's a one-liner:
:%s/{\_.\{-}}/\=substitute(submatch(0), '\n', '', 'g')/
\_. matches any character, including a newline, and \{-} is the non-greedy version of *, so {\_.\{-}} matches everything between a matching pair of curly braces, inclusive.
The \= allows you to substitute the result of a vim expression, which we here use to strip out all the newlines '\n' from the matched text (in submatch(0)) using the substitute() function.
The inverse (converting the one-line version to multi-line) can also be done as a one liner:
:%s/{\_.\{-}}/\=substitute(submatch(0), '[{;]', '\0\r', 'g')/
If you are at the beginning or end of the rule, V%J will join it into a single line:
Go to the opening (or closing) brace
Hit V to enter visual mode
Hit % to match the other brace, selecting the whole rule
Hit J to join the lines
Try something like this:
:%s/{\n/{/g
:%s/;\n/;/g
:%s/{\s+/{/g
:%s/;\s+/;/g
This removes the newlines after opening braces and semicolons ('{' and ';') and then removes the extra whitespace between the concatenated lines.
If you want to change the file, go for rampion's solution.
If you don't want (or can't) change the file, you can play with a custom folding as it permits to choose what and how to display the folded text. For instance:
" {rtp}/fold/css-fold.vim
" [-- local settings --] {{{1
setlocal foldexpr=CssFold(v:lnum)
setlocal foldtext=CssFoldText()
let b:width1 = 20
let b:width2 = 15
nnoremap <buffer> + :let b:width2+=1<cr><c-l>
nnoremap <buffer> - :let b:width2-=1<cr><c-l>
" [-- global definitions --] {{{1
if exists('*CssFold')
setlocal foldmethod=expr
" finish
endif
function! CssFold(lnum)
let cline = getline(a:lnum)
if cline =~ '{\s*$'
return 'a1'
elseif cline =~ '}\s*$'
return 's1'
else
return '='
endif
endfunction
function! s:Complete(txt, width)
let length = strlen(a:txt)
if length > a:width
return a:txt
endif
return a:txt . repeat(' ', a:width - length)
endfunction
function! CssFoldText()
let lnum = v:foldstart
let txt = s:Complete(getline(lnum), b:width1)
let lnum += 1
while lnum < v:foldend
let add = s:Complete(substitute(getline(lnum), '^\s*\(\S\+\)\s*:\s*\(.\{-}\)\s*;\s*$', '\1: \2;', ''), b:width2)
if add !~ '^\s*$'
let txt .= ' ' . add
endif
let lnum += 1
endwhile
return txt. '}'
endfunction
I leave the sorting of the fields as exercise. Hint: get all the lines between v:foldstart+1 and v:voldend in a List, sort the list, build the string, and that's all.
I won’t answer the question directly, but instead I suggest you to reconsider your needs. I think that your “bad” example is in fact the better one. It is more readable, easier to modify and reason about. Good indentation is very important not only when it comes to programming languages, but also in CSS and HTML.
You mention that CSS rules are “taking up too many lines”. If you are worried about file size, you should consider using CSS and JS minifiers like YUI Compressor instead of making the code less readable.
A convenient way of doing this transformation is to run the following
short command:
:g/{/,/}/j
Go to the first line of the file, and use the command gqG to run the whole file through the formatter. Assuming runs of nonempty lines should be collapsed in the whole file.

Resources