Pipe a certain field of an input stream in a command, and paste back the results - unix

Say I have an input stream consisting of lines separated into a certain number of fields. I would like to cut on the various fields, pipe a certain field (or fields) to a program (which is assumed to return one line for each input line) and leave the other fields as is, and paste the results back together. I can probably imagine convoluted solutions, but there ought to be a clean and natural way to do that.
As a specific example, say I have a program producing lines of the form:
$ inputprog
<a> hello world!
<b> hi everyone!
<a> hi!
Say I would like to put the message in uppercase while leaving the first field unchanged. Here is how I would imagine things:
$ inputprog | program -d' ' -f2- "tr a-z A-Z"
<a> HELLO WORLD!
<b> HI EVERYONE!
<a> HI!
I am looking for a reasonably clean way to approximate program. (I am not interested in solutions which are specific to this example.)
Thanks in advance for your help!

awk can do what you want. For example:
$ echo "field1 field2" | awk '{$2 = toupper($2); print;}'
field1 FIELD2
Comes pretty close to what you want to do. $2 = toupper($2); changes the second field, while print prints out the whole (modified) line.
However, you got a problem in how you define a 'field'. In the example above fields are separated by spaces (you can change the field separator to an arbitrary regexp with like so: -F'<[a-zA-Z]+>' - this would consider as a field separator).
But in your example you seem to view <a> as one field and hello world! as another one. Any program could only come to your desired behaviour by wild guessing that way. Why wouldn't world! be considered a third field?
So, if you can get input with a clear policy of separating fields, awk is exactly what you want.
Check out pages like http://people.cs.uu.nl/piet/docs/nawk/nawk_92.html (awk string functions) and http://www.pement.org/awk/awk1line.txt (awk 1 liners) for more information.
BTW, one could also make your specific example above work by looping over all the fields except the first one (NF == Number of Fields):
$ echo "<a> hello world!
<b> hi everyone!
<a> hi" |
awk '{for(i=2;i<=NF;++i) { $i=toupper($i); }; print;}'
<a> HELLO WORLD!
<b> HI EVERYONE!
<a> HI
Even though you are not interested in the solution to this example. ;-)
P.S.: sed should also be able to do the job (http://en.wikipedia.org/wiki/Sed)

Related

filter that prints a value and then forwards the entire json document to the next filter, like tee

Is it possible to print parts of the document in a filter and then move on to select further down and print more later?
Here is pseudo code for what I want.
{
"version":"1",
"some":{
"more":{
"depth":"here"
}
}
}
select(.some.more.depth=="here") | tee .version | .some.more.depth
This would output
"1"
"here"
I know, that in this case, it would work with .version, .some.more.depth but in a more complex case it's more about working down the document while printing parts along the way.
https://jqplay.org/s/n7WjphEVf7
Not just in this case, in any case. That's what the comma operator does, and what it's for. It runs two expressions in the same context and produces all of the outputs of both. Remember that you can always use parentheses, so it's legit to do things like
.a.b.c | (.d, .e.f | (.g, .h))
to produce .a.b.c.d, .a.b.c.e.f.g, and .a.b.c.e.f.h

.ksh paste user input value into dataset

Good morning.
First things first: I know next to nothing about shell scripting in Unix, so please pardon my naivety.
Here's what I'd like to do, and I think it's relatively simple: I would like to create a .ksh file to do two things: 1) take a user-provided numerical value (argument) and paste it into a new column at the end of a dataset (a separate .txt file), and 2) execute a different .ksh script.
I envision calling this script at the Unix prompt, with the input value added thereafter. Something like, "paste_and_run.ksh 58", where 58 would populate a new, final (un-headered) column in an existing dataset (specifically, it'd populate the 77th column).
To be perfectly honest, I'm not even sure where to start with this, so any input would be very appreciated. Apologies for the lack of code within the question. Please let me know if I can offer any more detail, and thank you for taking a look.
I have found the answer: the "nawk" command.
TheNumber=$3
PE_Infile=$1
Where the above variables correspond to the third and first arguments from the command line, respectively. "PE_Infile" represents the file (with full path) to be manipulated, and "TheNumber" represents the number to populate the final column. Then:
nawk -F"|" -v TheNewNumber=$TheNumber '{print $0 "|" TheNewNumber/10000}' $PE_Infile > $BinFolder/Temp_Input.txt
Here, the -F"|" dictates the delimiter, and the -v dictates what is to be added. For reasons unknown to myself, the declaration of a new varible (TheNewNumber) was necessary to perform the arithmetic manipulation within the print statement. print $0 means that the whole line would be printed, while tacking the "|" symbol and the value of the command line input divided by 10000 to the end. Finally, we have the input file and an output file (Temp_PE_Input.txt, within a path represented by the $Binfolder variable).
Running the desired script afterward was as simple as typing out the script name (with path), and adding corresponding arguments ($2 $3) afterward as needed, each separated by a space.

Multiple lines of text in single cell of simple table?

I found this question, but I don't want explicit <br>s in my cell; I just want it to line-wrap where necessary.
e.g.,
================ ============
a short sentence second cell
a much longer bottom right
sentence
================ ============
I want "a much longer sentence" to all fit in one cell. I'd need to use very long lines of text unless I can find a way to wrap it. Is this possible?
I'm using NoTex w/ PDF output if relevant.
There is a clean way. The issue is by default the columns are set to no-wrap, so that's why you get the scroll. To fix that you have to override the css with the following:
/* override table no-wrap */
.wy-table-responsive table td, .wy-table-responsive table th {
white-space: normal;
}
The simple table style does not support wrapping blocks. Use the grid style instead, like this:
+------------------+--------------+
| a short sentence | second cell |
+------------------+--------------+
| a much longer | bottom right |
| sentence | |
+------------------+--------------+
These tables are more tedious to work with, but they're more flexible. See the full documentation for details.
A workaround for this problem is to use a replace directive:
================ ============
a short sentence second cell
|long_sentence| bottom right
================ ============
.. |long_sentence| replace:: a much longer sentence
The example ddbeck presented may work because the sentence is to short. In the case of the lenght of the sentence dont fit in the screen, the sentence will not continue in a new line. Instead, the table will create a horizontal scrollbar. There is no clean way for solving this problem. You can implicit use pipe to implicitly change line like you saw here.
If you want alternatives to write your tables in restructuredtext, more pratical ways, you can check it in Sphinx/Rest Memo.
I wrote a python utility to format fixed-width plaintext table with multiline cells: https://github.com/kkew3/tabulate. Hope it helps.

How to grep value from a line in Unix

Suppose I have a lines as follows:
<Instance name="cd" id="sa1">
<work id="23" permission="r">
I want to get the id value printed, where the id field is not constant.
It hard to give a hint without doing it for you. But assuming your real needs are more involved than you describe, then perhaps some learning can happen while applying this answer.
Grep isn't really powerful enough to do the job you describe, although it may be useful in a pipline to select data at a larger "grain". If your file has one-tag-per-line like your example shows, you can use grep to filter just the Instance or work tags.
grep Instance | program to extract id val
or
grep work| program to extract id val
To extract the value you need something more powerful than grep. Assuming the value is enclosed in double-quotes and contains no embedded quotes; and that there are no similarly named attributes that could confuse the expression, this sed magic should do the trick.
sed 's/.*id="\([^"]*\)".*/\1/'
If any one the above asumptions are not true, the expression will have to be more complicated.

combining captures in regex

some text I want to capture. <tag> junk I don't care about</tag> more stuff I want.
Is there a easy way to write a regex that captures the first and third sentences in one capture?
You could also consider stripping out the unwanted data and then capturing.
data = "some text to capture. <tag>junk</tag> other stuff to capture".
data = re.replace('<tag>[^<]*</tag>', data, "")
data_match = re.match('[\w\. ]+', data)
Not to my knowledge. Usually that's why regex search-and-replace functions allow you to refer to multiple capturing groups in the first place.
Unfortunately No, its not possible. The solution is to capture into two seperate captures and then contactenate after the fact.
According to this older thread on this site:
Regular expression to skip character in capture group
A group capture is consecutive so you cant. You can do it in one parse with regex like below and join the line in code
^(?<line1>.*?)(?:\<\w*\>.*?\</\w*\>)(?<line3>.*?)$
here's a non regex way, split on </tag>, go through the array items, find <tag>, then split on <tag> and get first element. eg
>>> s="some text I want to capture. <tag> junk I don't care about</tag> more stuff I want. <tag> don't care </tag> i care"
>>> for item in s.split("</tag>"):
... if "<tag>" in item:
... print item.split("<tag>")[0]
... else:
... print item
...
some text I want to capture.
more stuff I want.
i care
Use the split() function of asp.net to do the same.

Resources