I want to use grep command to print only those lines that contains a specific string in another file. My file is in xsls format and looks like:
I only want to extract those lines that contain 5'UTR in Annotation column.
The command I use is:
grep 'UTR$' Lee2012.xslx > utr.txt
However, I end up getting an empty file. I'll appreciate if someone can explain what I am doing wrong. Insights will be appreciated. Thank you!
Your sixth column ends with UTR, not the whole line, that is why you get no results.
To get all lines where the sixth column ends with UTR, you can use
awk '$6 ~ /UTR$/' Lee2012.xslx > utr.txt
This assumes your column (field) separators are whitespace.
Related
I have an application where I create a .csv file and then create a .xlsx file from the .csv file. The problem I am having now is that the end of rows in the .csv have a trailing space. My database is in MarkLogic and I am using a custom REST endpoint to create this. The code where I create the .csv file is:
document{($header-row,for $each in $data-rows return fn:concat('
',$each))}
I pass in the header row then pass each data row with a carriage return and a line feed at the beginning. I do this to put the cr/lf at the end of the header and then each of the other lines except for the last one. I know my data rows do not have the space at the end. I have tried to normalize the space around $each and the extra space is still present. It seems to be tied to the line feed. Any suggestions on getting rid of that space?
Example of current output:
Name,Email,Phone
Bill,bill#company.com,999-999-9999
Steve,steve#company.com,999-999-9999
Bob,bob#company.com,999-999-9999
. . .
You are creating a document that is text() node from a sequence of strings. When those are combined to create a single text(), they will be separated by a space. For instance, this:
text{("a","b","c")}
will produce this:
"a b c"
Change your code to use string-join() to join your $header-row and sequence of $data-rows string values with the
separator:
document{ string-join(($header-row, $data-rows), "
") }
I'm trying to search for a pattern and if it repeats 2nd time, delete to end of line..., please help.
For that using I'm using :%s/my_character.*//g, but this will work for the 1st occurrence of the character in a line, but I need it from 2nd occurrence in the line...
Not sure if I understand it well (you should give a plain example, it always makes it clearer)
I would do it like this:
:s/^\(.\{-}my_character.\{-}\)my_character.*$/\1/
This will search for:
As few character as possible before my_character
my_character
As few character as possible before the second my_character
my_character
Any character
And replace it with characters captured from 1 to 3 in the steps above.
Example:
Input:
werklj z sdkl Azlksd er.
search and replace:
:s/^\(.\{-}z.\{-}\)z.*$/\1/
Output:
werklj z sdkl A
Good morning.
First things first: I know next to nothing about shell scripting in Unix, so please pardon my naivety.
Here's what I'd like to do, and I think it's relatively simple: I would like to create a .ksh file to do two things: 1) take a user-provided numerical value (argument) and paste it into a new column at the end of a dataset (a separate .txt file), and 2) execute a different .ksh script.
I envision calling this script at the Unix prompt, with the input value added thereafter. Something like, "paste_and_run.ksh 58", where 58 would populate a new, final (un-headered) column in an existing dataset (specifically, it'd populate the 77th column).
To be perfectly honest, I'm not even sure where to start with this, so any input would be very appreciated. Apologies for the lack of code within the question. Please let me know if I can offer any more detail, and thank you for taking a look.
I have found the answer: the "nawk" command.
TheNumber=$3
PE_Infile=$1
Where the above variables correspond to the third and first arguments from the command line, respectively. "PE_Infile" represents the file (with full path) to be manipulated, and "TheNumber" represents the number to populate the final column. Then:
nawk -F"|" -v TheNewNumber=$TheNumber '{print $0 "|" TheNewNumber/10000}' $PE_Infile > $BinFolder/Temp_Input.txt
Here, the -F"|" dictates the delimiter, and the -v dictates what is to be added. For reasons unknown to myself, the declaration of a new varible (TheNewNumber) was necessary to perform the arithmetic manipulation within the print statement. print $0 means that the whole line would be printed, while tacking the "|" symbol and the value of the command line input divided by 10000 to the end. Finally, we have the input file and an output file (Temp_PE_Input.txt, within a path represented by the $Binfolder variable).
Running the desired script afterward was as simple as typing out the script name (with path), and adding corresponding arguments ($2 $3) afterward as needed, each separated by a space.
I found a command which takes the input data from a binary file and writes into a output file.
nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg
It's working but I am not able to find out how...Could anyone please help me out how above command is working and what is it doing?...this nawk is too tough to understand...:(
Thanks in advance......
nawk is not tough to understand and is same like other languages, I guess you are not able to understand it because it not properly formatted, if you format it you will know how it's working.
To answer your question this command is searching lines containing an input text in given input file, and prints few lines before matched line(s) and few lines after the matched line. How many lines to be printed are controlled by variable "b" (no of lines before) and "a" (no of lines after) and string/text to be searched is passed using variable "s".
This command will be helpful in debugging/troubleshooting where one want to extract lines from large size log files (difficult to open in vi or other editor on UNIX/LINUX) by searching some error text and print some lines above it and some line after it.
So in your command
b=1 ## means print only 1 line before the matching line
a=19 ## means print 19 lines after the matching line
s="<Comment>Ericsson_OCS_V1_0.0.0.7" ## means search for this string
/var/opt/fds/config/ServiceConfig/ServiceConfig.cfg ## search in this file
/opt/temp/"$circle"_"$sdpid"_RG.cfg ## store the output in this file
Your formatted command is below, the very first condition which was looking like c-->0 before format is easy to interpret which means c-- greater than 0. NR variable in AWK gives the line number of presently processing line in input file being processed.
nawk '
c-- > 0;
$0 ~ s
{
if(b)
for(c=b+1;c>1;c--)
print r[(NR-c+1)%b];
print;
c=a
}
b
{
r[NR%b]=$0
}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg
A bit new to UNIX but I have a question with reagrds altering csv files going into a datafeed.
There are a few | seperated columns where the date has come back as (for example)
|07-04-2006 15:50:33:44:55:66|
and this needs to be changed to
|07-04-2006|
It doesn't matter if all the data gets written to another file. There are thousands of rows in these files.
Ideally, I'm looking for a way of going to the 3rd and 7th piped columns and taking the first 10 characters and removing anything else till the next |
Thanks in advance for your help.
What exactly do you want?
You can replace |07-04-2006 15:50:33:44:55:66| by |07-04-2006| using File IO.
This operates on all columns, but should do unless there are date columns which must not be changed:
sed 's/|\(..-..-....\) ..:..:..:..:..:..|/|\1|/g'
If you want to change the files in place, you can use sed's -i option.