How do I load a column using sqlloader if the CSV column is empty except for the header? - unix

I would like to load a CSV file using sqlloader without having to edit the CSV file each time. When the file is sent to me, the data in the first column is only included in a header above the rest of that data. First code snippet is an example of what I'm receiving with fake data, second code snippet is what I would like to change it to.
Each row in the SQL table needs to include the number from row one.
1,Intern,
,1/8/2023,Bob
,5/3/2022,Alice
,7/25/2022,Charles
2,Assistant,
,1/8/2023,Heather
,5/3/2022,Harold
,7/25/2022,Dave
3,Manager,
,1/1/2023,Tim
,1/8/2023,Lyon
,5/3/2022,Greg
,7/25/2022,Tyler
5,Head Manager,
,1/8/2023,Charles
,5/3/2022,Zack
How I need it to look:
1,Intern,
1,1/8/2023,Bob
1,5/3/2022,Alice
1,7/25/2022,Charles
2,Assistant,
2,1/8/2023,Heather
2,5/3/2022,Harold
2,7/25/2022,Dave
3,Manager,
3,1/1/2023,Tim
3,1/8/2023,Lyon
3,5/3/2022,Greg
3,7/25/2022,Tyler
5,Head Manager,
5,1/8/2023,Charles
5,5/3/2022,Zack
I'm thinking there may be some way to edit them in the shell script that calls the sqlldr command, such as sed or awk.

With an input file (input.csv) as shown in the question, the command
awk -F, -v 'OFS=,' '{ if($1!="") x=$1; else $1=x } 1' input.csv
prints
1,Intern,
1,1/8/2023,Bob
1,5/3/2022,Alice
1,7/25/2022,Charles
2,Assistant,
2,1/8/2023,Heather
2,5/3/2022,Harold
2,7/25/2022,Dave
3,Manager,
3,1/1/2023,Tim
3,1/8/2023,Lyon
3,5/3/2022,Greg
3,7/25/2022,Tyler
5,Head Manager,
5,1/8/2023,Charles
5,5/3/2022,Zack
Explanation:
-F, set input field separator to ,
-v 'OFS=,' set output field separator ,
if($1!="") x=$1 save non-empty value from column 1
else $1=x replace empty value with saved value
1 always-true condition with default action print (shortcut for {print})

Related

keep the leading zeros when echo'ing a string to a variable

I'm trying to keep the leading zeros on a number when I echo it to a csv file but I cannot find a way w/o echoing it with a single quote. How can I do this using a bash script?
After going through a file and extracting what I want and storing it in a variable, I echo it to a variable that's mapped to a file.
file name:
TMP_RESULT_STORE="/tmp/AdvertisedTotals_GLB_`date +%y%m%d`.csv"
Lets say:
SYM=0090498
Later SYM value can change to:
SYM=034249822
SYM=BVZ342
When loop through a file with a for loop and echo the SYM to a File like this:
echo $SYM >> $TMP_RESULT_STORE
The output would looks like this for the above entries:
90498
34249822
BVZ342
The leading zeros are lost. I can get it to keep the zeros like this:
echo ""\'$SYM" >> $TMP_RESULT_STORE
But then it looks like this and I cannot get rid of the single quote(pls ignore the backward single quotes at the end. I was trying to get this site to write each entry to a new line and this was the best I can do):
'0090498
'034249822
'BVZ342```
Obviously, I'm doing other stuff in this script while scraping a log, I wanted to keep it simple to understand and just focus on the issue.
How can I get the bash script to keep the leading zeros when there is or are leading zeros using bash script?

Split file without separating rows beginning with like values in Unix

I have a sorted .csv file that is something like this:
AABB1122,ABC,BLAH,4
AABB1122,ACD,WHATEVER,1
AABB1122,AGT,CAT,4
CCDD4444,AYT,DOG,4
CCDD4444,ACG,MUMMY,8
CCEE4444,AOP,RUN,5
DDFF9900,TUI,SAT,33
DDFF9900,WWW,INDOOR,5
I want to split the file into smaller files of roughly two lines each, but I do not want rows with like values in the first column separated.
Here, I would end up with three files:
x00000:
AABB1122,ABC,BLAH,4
AABB1122,ACD,WHATEVER,1
AABB1122,AGT,CAT,4
x00001:
CCDD4444,AYT,DOG,4
CCDD4444,ACG,MUMMY,8
x00002:
CCEE4444,AOP,RUN,5
DDFF9900,TUI,SAT,33
DDFF9900,WWW,INDOOR,5
My actual data is about 7 gigs in size and contains over 100 million lines. I want to split it into files of about 100K lines each or ~6MB. I am fine with using either file size or line numbers for splitting.
I know that I can use "sort" to split, such as:
split -a 5 -d -1 2
Here, that would give me four files, and like values in the first column would be split over files in most cases.
I think I probably need awk, but, even after reading through the manual, I am not sure how to proceed.
Help is appreciated! Thanks!
An awk script:
BEGIN { FS = "," }
!name { name = sprintf("%06d-%s.txt", NR, $1) }
count >= 2 && prev != $1 {
close(name)
name = sprintf("%06d-%s.txt", NR, $1)
count = 0
}
{
print >name
prev = $1
++count
}
Running this on the given data will create three files:
$ awk -f script.awk file.csv
$ cat 000001-AABB1122.txt
AABB1122,ABC,BLAH,4
AABB1122,ACD,WHATEVER,1
AABB1122,AGT,CAT,4
$ cat 000004-CCDD4444.txt
CCDD4444,AYT,DOG,4
CCDD4444,ACG,MUMMY,8
$ cat 000006-CCEE4444.txt
CCEE4444,AOP,RUN,5
DDFF9900,TUI,SAT,33
DDFF9900,WWW,INDOOR,5
I have arbitrarily chosen to use the line number from the original file from where the first line was taken, along with the first field's data on that line as the filename.
The script counts the number of lines printed to the current output file, and if that number is greater than or equal to 2, and if the first field's value is different from the previous line's first field, the current output file is closed, a new output name is constructed, and the count is reset.
The last block simply prints to the current filename, remembers the first field in the prev variable, and increments the count.
The BEGIN block initializes the field delimiter (before the first line is read) and the !name block sets the initial output file name (when reading the very first line).
To get exactly the filenames that you have in the question, use
name = sprintf("x%05d", ++n)
to set the output filename in both places where this is done.
With csplit if available
With the given data
csplit -s infile %^A% /^C/ %^C% /^D/ /^Z/ {*}

.ksh paste user input value into dataset

Good morning.
First things first: I know next to nothing about shell scripting in Unix, so please pardon my naivety.
Here's what I'd like to do, and I think it's relatively simple: I would like to create a .ksh file to do two things: 1) take a user-provided numerical value (argument) and paste it into a new column at the end of a dataset (a separate .txt file), and 2) execute a different .ksh script.
I envision calling this script at the Unix prompt, with the input value added thereafter. Something like, "paste_and_run.ksh 58", where 58 would populate a new, final (un-headered) column in an existing dataset (specifically, it'd populate the 77th column).
To be perfectly honest, I'm not even sure where to start with this, so any input would be very appreciated. Apologies for the lack of code within the question. Please let me know if I can offer any more detail, and thank you for taking a look.
I have found the answer: the "nawk" command.
TheNumber=$3
PE_Infile=$1
Where the above variables correspond to the third and first arguments from the command line, respectively. "PE_Infile" represents the file (with full path) to be manipulated, and "TheNumber" represents the number to populate the final column. Then:
nawk -F"|" -v TheNewNumber=$TheNumber '{print $0 "|" TheNewNumber/10000}' $PE_Infile > $BinFolder/Temp_Input.txt
Here, the -F"|" dictates the delimiter, and the -v dictates what is to be added. For reasons unknown to myself, the declaration of a new varible (TheNewNumber) was necessary to perform the arithmetic manipulation within the print statement. print $0 means that the whole line would be printed, while tacking the "|" symbol and the value of the command line input divided by 10000 to the end. Finally, we have the input file and an output file (Temp_PE_Input.txt, within a path represented by the $Binfolder variable).
Running the desired script afterward was as simple as typing out the script name (with path), and adding corresponding arguments ($2 $3) afterward as needed, each separated by a space.

not able to understand NAWK use

I found a command which takes the input data from a binary file and writes into a output file.
nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg
It's working but I am not able to find out how...Could anyone please help me out how above command is working and what is it doing?...this nawk is too tough to understand...:(
Thanks in advance......
nawk is not tough to understand and is same like other languages, I guess you are not able to understand it because it not properly formatted, if you format it you will know how it's working.
To answer your question this command is searching lines containing an input text in given input file, and prints few lines before matched line(s) and few lines after the matched line. How many lines to be printed are controlled by variable "b" (no of lines before) and "a" (no of lines after) and string/text to be searched is passed using variable "s".
This command will be helpful in debugging/troubleshooting where one want to extract lines from large size log files (difficult to open in vi or other editor on UNIX/LINUX) by searching some error text and print some lines above it and some line after it.
So in your command
b=1 ## means print only 1 line before the matching line
a=19 ## means print 19 lines after the matching line
s="<Comment>Ericsson_OCS_V1_0.0.0.7" ## means search for this string
/var/opt/fds/config/ServiceConfig/ServiceConfig.cfg ## search in this file
/opt/temp/"$circle"_"$sdpid"_RG.cfg ## store the output in this file
Your formatted command is below, the very first condition which was looking like c-->0 before format is easy to interpret which means c-- greater than 0. NR variable in AWK gives the line number of presently processing line in input file being processed.
nawk '
c-- > 0;
$0 ~ s
{
if(b)
for(c=b+1;c>1;c--)
print r[(NR-c+1)%b];
print;
c=a
}
b
{
r[NR%b]=$0
}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg

Decrypt many PDFs in one go using pdftk

I have 10 PDFs that ask for a user password to open. I know that password. I want to keep them in a decrypted format. Their filenames follow the form:
static_part.dynamic_part_like_date.pdf
I want to convert all the 10 files. I can give a * after the static part and work on all of them, but I also want the corresponding output filenames. So there has to be a way to capture the dynamic part of the filename and then use it in the output filename.
The normal way of doing this for one file is:
pdftk secured.pdf input_pw foopass output unsecured.pdf
I want to do something like:
pdftk var=secured*.pdf input_pw foopass output unsecured+var.pdf
Thanks.
Your request is a little ambiguous, but here are some ideas that might help you.
Assuming 1 of your 10 files is
# static_part.dynamic_part_like_date.pdf
# SalesReport.20110416.pdf (YYYYMMDD)
And you want only the SalesReport.pdf converted as unsecured, you can use a shell script to achieve your requirement:
# make a file with the following contents,
# then make it executable with `chmod 755 pdfFixer.sh`
# the .../bin/bash has to be the first line the file.
$ cat pdfFixer.sh
#!/bin/bash
# call the script like $ pdfFixer.sh staticPart.*.pdf
# ( not '$' char in your command, that is the cmd-line prompt in this example,
# yours may look different )
# use a variable to hold the password you want to use
pw=foopass
for file in ${#} ; do
# %%.* strips off everything after the first '.' char
unsecuredName=${file%%.*}.pdf
#your example : pdftk secured.pdf input_pw foopass output unsecured.pdf
#converts to
pdftk ${file} input_pw ${foopass} output ${unsecuredName}.pdf
done
You may find that you need to modify the %.* thing to
strip less from end, (use %.*) to strip just the last '.' and all chars after (strip from right).
strip from the fron (use #*.) to just the static part, leaving the dynamic part OR
strip from the front (use ##*.) to strip everything until the last '.' char.
It will really be much easier for you to figure out what you need at the cmd-line.
Set a variable with 1 sample fileName
myTestFileName=staticPart.dynamicPart.pdf
and then use echo combined with the variable modifiers to see the results.
echo ${myTestFileName##*.}
echo ${myTestFileName#*.}
echo ${myTestFileName##.*}
echo ${myTestFileName#.*}
echo ${myTestFileName%%.*}
etc.
Also notice how I combine a modified variable value with a plain string (.pdf), in unsecuredName=${file%%.*}.pdf
IHTH

Resources