using sed in a loop on a single file - unix

I require to make 58 changes to be done on a html file.
The for loop runs 29 times
It contains the below sed command. Every run of the for loop replaces 2 place holders out of 58
sed "s?$Plc_hldr1?$DateTime?;s?$Plc_hldr2?$Total?" html_format.htm >> html_final.htm
I am using above command which makes changes in the original file on every loop and appends it into the html_final.htm file.
Thus there are 29 copies of html_format.htm in html_final.htm.
I require only 1 copy of the html_format.htm with all the 58 place holder values replaced.
Below is the small example of the whole table:
01/02/2014 15
%%DDMS2RT%% %%DDMS2C%%
%%DDMS3RT%% %%DDMS3C%%
%%DDMS4RT%% %%DDMS4C%%
%%DDMS5RT%% %%DDMS5C%%
%%DDMS6RT%% %%DDMS6C%%
%%DDMS7RT%% %%DDMS7C%%
after the 2nd for loop below is the content of the html_final.htm
01/02/2014 15
%%DDMS2RT%% %%DDMS2C%%
%%DDMS3RT%% %%DDMS3C%%
%%DDMS4RT%% %%DDMS4C%%
%%DDMS5RT%% %%DDMS5C%%
%%DDMS6RT%% %%DDMS6C%%
%%DDMS7RT%% %%DDMS7C%%
%%DDMS1RT%% %%DDMS1C%%
01/02/2014 817
%%DDMS3RT%% %%DDMS3C%%
%%DDMS4RT%% %%DDMS4C%%
%%DDMS5RT%% %%DDMS5C%%
%%DDMS6RT%% %%DDMS6C%%
%%DDMS7RT%% %%DDMS7C%%
Note that the same table is appended once again after the 2nd for loop and the place holder in the 2nd row contains the values, value in 1st row is again replaced by the placeholders.
What I would like is the below output. i.e. 1 single table instead of multiple copies and all the place holders replaced within that table itself
01/02/2014 15
01/02/2014 817
01/02/2014 512
01/02/2014 765
%%DDMS5RT%% %%DDMS5C%%
%%DDMS6RT%% %%DDMS6C%%
%%DDMS7RT%% %%DDMS7C%%
I tried to play with sed -i but it is not available in AIX unix.
I really hope I have explained expressed it very clearly and my question is not an XY problem anymore!!

The quickest solution would surely be to use > rather than >>.
>> is used to concatenate the output of standard out to the specified file.
> is used to write the output from standard out to the specified file, replacing the old file if it already exists.
i.e.
sed "s?$Plc_hldr1?$DateTime?;s?$Plc_hldr2?$Total?" html_format.htm > html_final.htm
cp html_final.htm html_format.htm

If we imagine that your real problem statement is "how do I piece together a HTML table from fragments I retrieve e.g. from a database", the answer might look something like this.
#!/bin/sh
# Output HTML header
cat <<'____HERE'
<html><head><title>Table</title></head>
<body><table>
<tr><th>Date</th><th>Result</th></tr>
____HERE
# Obtain results from database
sql 'select date, result from table;' |
# Read each record, format an HTML table fragment
while read date result; do
cat <<________HERE
<tr><td>$date</td><td>$result</td></tr>
________HERE
done
# Output HTML footer
cat <<____HERE
</table></body></html>
____HERE
The script prints an HTML page on its standard output. Redirect to a file if you want it in a file.
(Sorry if my HTML skills are rusty. It's been a while...)

Related

Extract exactly one file (any) from each 7zip archive, in bulk (Unix)

I have 1,500 7zip archives, each archive contains 2 to 10 files, with no subdirectories.
Each file has the same extension, however the filename varies.
I only want one file out of each archive, but I'd like to perform this in bulk. I do not care which file is taken out, as long as only one file is taken out. It can be the first file, the newest, the biggest, the smallest, it doesn't matter.
Here's an example:
aa.7z {blah 56.smc, blah 57.smc, 1 blah 58.smc}
ab.7z {xx.smc, xx 1.smc, xx_2.smc}
ac.7z {1.smc}
I want to run something equivalent to:
7z e *.7z # But somehow only extract one file
Thank you!
Ultimately my solution was to extract all files and run the following in the directory:
for n in *; do echo "$n"; done > files.txt
I then imported that list into excel, and split the files by a special character that divided the title of the file with the qualifying data inside the filename (for example: Some Title (V1) [X2].smc), specifically I used a brackets delimiter.
Then I removed all duplicates, leaving me with only one edition of each from the zip. I finally remerged the columns (unfortunately the bracket was deleted during the splitting so wrote a function to add it back on the condition of whether there was content in the next column) and then resaved files.txt, after a bit of reviewing StackOverflow for answers, deleted files based on an input file (files.txt). A word of warning on this, spaces in filenames cause problems with rm and xargs so I had to encapsulate the variable with quotes.
Ultimately this still didn't serve me well enough so I just used a different resource entirely.
Posting this answer so others who find themselves in a similar predicament find an alternative resolution.

.ksh paste user input value into dataset

Good morning.
First things first: I know next to nothing about shell scripting in Unix, so please pardon my naivety.
Here's what I'd like to do, and I think it's relatively simple: I would like to create a .ksh file to do two things: 1) take a user-provided numerical value (argument) and paste it into a new column at the end of a dataset (a separate .txt file), and 2) execute a different .ksh script.
I envision calling this script at the Unix prompt, with the input value added thereafter. Something like, "paste_and_run.ksh 58", where 58 would populate a new, final (un-headered) column in an existing dataset (specifically, it'd populate the 77th column).
To be perfectly honest, I'm not even sure where to start with this, so any input would be very appreciated. Apologies for the lack of code within the question. Please let me know if I can offer any more detail, and thank you for taking a look.
I have found the answer: the "nawk" command.
TheNumber=$3
PE_Infile=$1
Where the above variables correspond to the third and first arguments from the command line, respectively. "PE_Infile" represents the file (with full path) to be manipulated, and "TheNumber" represents the number to populate the final column. Then:
nawk -F"|" -v TheNewNumber=$TheNumber '{print $0 "|" TheNewNumber/10000}' $PE_Infile > $BinFolder/Temp_Input.txt
Here, the -F"|" dictates the delimiter, and the -v dictates what is to be added. For reasons unknown to myself, the declaration of a new varible (TheNewNumber) was necessary to perform the arithmetic manipulation within the print statement. print $0 means that the whole line would be printed, while tacking the "|" symbol and the value of the command line input divided by 10000 to the end. Finally, we have the input file and an output file (Temp_PE_Input.txt, within a path represented by the $Binfolder variable).
Running the desired script afterward was as simple as typing out the script name (with path), and adding corresponding arguments ($2 $3) afterward as needed, each separated by a space.

Character strings are converted to date in .csv. Can't even be converted back in Excel (R)

I have a data.frame that looks like this:
a=data.frame(c("MARCH3","SEPT9","XYZ","ABC","NNN"),c(1,2,3,4,5))
> a
c..MARCH3....SEPT9....XYZ....ABC....NNN.. c.1..2..3..4..5.
1 MARCH3 1
2 SEPT9 2
3 XYZ 3
4 ABC 4
5 NNN 5
Write into csv: write.csv(a,"test.csv")
I want everything to stay the way it is but MARCH3 and SEPT9 become 3-Mar and 9-Sep. I have tried everything in Excel: formatting by date, text, custom...none works. 3-Mar would be converted to 42066 and 9-Sep to 42256. In reality, a is a fairly large table so this can't even be done manually. Is there a way to coerce a[,1] so that Excel would ignore its format?
The best way to prevent Excel from autoformatting would probably be to store the data as excel file:
library(xlsx)
write.xlsx(a, "test.xlsx")
Your best bet is probably to change the file extension (e.g. make it ".txt" or ".dat" or something like that). When you open such a file in Excel the text import wizard will open. Specify that the file is delimited with commas, then make sure to change the appropriate column from "General" to "Text".
As an example: looking at the data in the question it appears that your CSV file might look like
,,,,MARCH3,,,,1
,,,,SEPT9,,,,2
,,,,XYZ,,,,3
,,,,ABC,,,,4
,,,,NNN,,,,5
If I save this file with a ".csv" extension and open it in Excel I get:
3-Mar 1
9-Sep 2
XYZ 3
ABC 4
NNN 5
with the date values changed as you noted. When I change the file extension to ".dat", making no other changes to the file, and open it in Excel I'm presented with the Text Import Wizard. I tell Excel that the file is "Delimited", choose "Comma" as the delimiter, and in the column with the "MARCH3" and "SEPT9" values I change the Column Data Type to "Text" (instead of "General"). After I clicked the Finish button on the wizard I got the following data in the spreadsheet:
MARCH3 1
SEPT9 2
XYZ 3
ABC 4
NNN 5
I tried putting the MARCH3 and SEPT9 values in double-quotes to see if that would convince Excel to treat these values as text but Excel still converted these cells to dates.
Share and enjoy.
My solution was to append a semicolon to all the gene names. The added character convinces excel that this column is text not a date. You can find and replace the semicolon later is you want, but most programs - like perseus will allow you to ignore everything after the semicolon so its not always a problem...
df$Gene.name <- paste(df$Gene.name, ";", sep="")
I would be interested in anyone has a trick for doing this to just the Sept, March gene names though...

How to view all special characters

I am facing hard time in removing the special characters from the csv file.
I have done a head -1 so i am trying to compare only 1 row.
wc filename shows it has 1396 byte count
If i go to the end of the file the curson ends at 1394.
In vi I do set list (to check for control characters), i see a $ (nothing after that), so i now know its the 1395 byte count.
Can someone please tell me where is the 1396th byte?
I am trying to compare 2 files using diff and its giving me a lot of trouble.
Please help.
The last 2 bytes of your line are \r\n - this is a Windows line ending. dos2unix converts this into a Unix line ending, which is \n - hence the line is shortened by 1 byte following conversion.

Striping out time components for data in a csv file with | seperated variables

A bit new to UNIX but I have a question with reagrds altering csv files going into a datafeed.
There are a few | seperated columns where the date has come back as (for example)
|07-04-2006 15:50:33:44:55:66|
and this needs to be changed to
|07-04-2006|
It doesn't matter if all the data gets written to another file. There are thousands of rows in these files.
Ideally, I'm looking for a way of going to the 3rd and 7th piped columns and taking the first 10 characters and removing anything else till the next |
Thanks in advance for your help.
What exactly do you want?
You can replace |07-04-2006 15:50:33:44:55:66| by |07-04-2006| using File IO.
This operates on all columns, but should do unless there are date columns which must not be changed:
sed 's/|\(..-..-....\) ..:..:..:..:..:..|/|\1|/g'
If you want to change the files in place, you can use sed's -i option.

Resources