Remove date from filename UNIX - unix

I am working in UNIX and trying to write the following commands. I am receiving a source file daily whose filename is in the format :
ONSITE_EXTR_ONSITE_EXTR_20170707.
Since I am receiving a file daily, the file name would change based on the current date, so ONSITE_EXTR_ONSITE_EXTR_20170708, ONSITE_EXTR_ONSITE_EXTR_20170709 etc. I need to strip the date out of the filename and rename it to ONSITE_EXTR_ONSITE_EXTR. After I have finished whatever data reading and processing I need to do, I need to change the file name back to ONSITE_EXTR_ONSITE_EXTR_20170707 for example. So since the file is being delivered daily, I cant hard code the date in whatever commands I write. Any help would be greatly appreciated

Depending on your toolchain, this may be as simple as running:
$ mv ONSITE_EXTR_ONSITE_EXTR_$(date +%Y%m%d) ONSITE_EXTR_ONSITE_EXTR
... before running the rest of your script, assuming you're using a Bash-like shell.
Having said that, you can just drop in ONSITE_EXTR_ONSITE_EXTR_$(date +%Y%m%d) into your script when trying to access your file instead.
This is all assuming the script's run the same day and in the same time zone as the file is downloaded.

If you were using bash and you had the file name in a variable, you could do:
IN="ONSITE_EXTR_ONSITE_EXTR_20170707"
echo ${IN:0:23}
to give ONSITE_EXTR_ONSITE_EXTR
Googling gives all sorts of guides here...

Related

Looping through the content of a file in Zsh

I'm trying to loop through the contents of a file in zsh. In my loop I want to get user input. Going off of this answer for Bash, I'm attempting to do:
while read -u 10 line; do
echo $line;
# TODO read from stdin here, etc.
done 10<myfile.txt
However I get an error:
zsh: parse error near `10'
Referring to the 10 after the done. Obviously I'm not getting the file descriptor syntax right, but I'm having trouble figuring out the docs.
Use a file descriptor number less than 10. If you want to hard code file descriptor numbers, stick to the range 3-9 (plus 0-2 for stdin,out,err). When zsh needs file descriptors itself, it uses them in the 10+ range.
If you're even getting close to needing more than the 7 available hard coded file descriptors, you should really think about using variables to name them. Syntax like exec {myfd}<myfile.txt will open a file with zsh allocating a file descriptor greater than 10 and assigning it to $myfd.
Bourne shell syntax is not entirely unambiguous given file descriptors numbering 10 and over and even in bash, I'd advise against using them. I'm not entirely sure how bash avoids conflicts if it needs to open any for internal use - I guess it never needs to leave any open. This may look like a zsh limitation at first sight but is actually a sensible feature.

Unix remove old files based on date from file name

I have filenames in a directory like:
ACCT_GA12345_2015-01-10.xml
ACCT_GA12345_2015-01-09.xml
ACCT_GDC789g_2015-01-09.xml
ACCT_GDC567g_2015-01-09.xml
ACCT_GDC567g_2015-01-08.xml
ACCT_GCC7894_2015-01-01.xml
ACCT_GCC7894_2015-01-02.xml
ACCT_GAC7884_2015-02-01.xml
ACCT_GAC7884_2015-01-01.xml
I want to have only the latest file in the folder. The latest file can be found using only the file name (NOT the date stamp). For example ACCT 12345 has files from 1/10 & 1/09. I need to delete 1/09 file and have only 1/10 file, for ACCT 789g there is only one file so I have to have that file, and ACCT 567g the latest file is 1/09 so I have to remove 1/08 and have 1/09. So the combination for latest file should be ACCT & Max date for that ACCT.
I would need the final list of files as:
ACCT_GA12345_2015-01-10.xml
ACCT_GDC789g_2015-01-09.xml
ACCT_GDC567g_2015-01-09.xml
ACCT_GCC7894_2015-01-02.xml
ACCT_GAC7884_2015-02-01.xml
Can someone help me with this command in unix? Any help is appreciated
I'd do something like this.... to test start with ls command, when you get what you want to delete, then do rm.
ls ACCT_{GDC,GA1}*-{09,10}.xml
this will list any GDC or GA1 files that end in 09 or 10. You can play with combinations and different values until you have the right set of files showing that you want deleted. once you to just change ls to rm and you should be golden.
With some more info I could help you out. To test this out I did:
touch ACCT_{GDC,GA1}_{01..10}_{05..10}.xml
this will make 56 different dummy files with different combinations. Make a directory, run this command, and get your hands dirty. That is the best way to learn linux cli. Also 65% of commands you need, you will learn, understand, use then never use again...so learn how to teach yourself how to use man pages and setup a spot to play around in.

Unix SQLLDR scipt gives 'Unexpected End of File' error

All, I am running the following script to load the data on to the Oracle Server using unix box and sqlldr. Earlier it gave me an error saying sqlldr: command not found. I added "SQLPLUS < EOF", it still gives me an error for unexpected end of file syntax error on line 12 but it is only 11 line of code. What seems to be the problem according to you.
#!/bin/bash
FILES='ls *.txt'
CTL='/blah/blah1/blah2/name/filename.ctl'
for f in $FILES
do
cat $CTL | sed "s/:FILE/$f/g" >$f.ctl
sqlplus ID/'PASSWORD'#SERVERNAME << EOF sqlldr SCHEMA_NAME/SCHEMA_PASSWORD control=$f.ctl data=$f EOF
done
sqlplus will never know what to do with the command sqlldr. They are two complementary cmd-line utilities for interfacing with Oracle DB.
Note NO sqlplus or EOF etc required to load data into a schema:
#!/bin/bash
#you dont want this FILES='ls *.txt'
CTL_PATH=/blah/blah1/blah2/name/'
CTL_FILE="$CTL_PATH/filename.ctl"
SCHEMA_NM=SCHEMA_NAME
SCHEMA_PSWD=SCHEMA_PASSWORD
for f in *.txt
do
# don't need cat! cat $CTL | sed "s/:FILE/$f/g" >"$f".ctl
sed "s/:FILE/$f/g" "$CTL_FILE" > "$CTL_PATH/$f.ctl"
#myBad sqlldr "$SCHEMA_NAME/$SCHEMA_PASSWORD" control="$CTL_PATH/$f.ctl" data="$f"
sqlldr $SCHEMA_USER/$SCHEMA_PASSWORD#$SERVER_NAME control="$CTL_PATH/$f.ctl" data="$f" rows=10000 direct=true errors=999
done
Without getting too philosophical, using assignments like FILES=$(ls *.txt) is a bad habit to get into. By contrast, for f in *.txt will deal correctly for files with odd characters in them (like spaces or other syntax breaking values). BUT the other habit you do want to get into is to quote all variable references (like $f), with dbl-quotes : "$f", OK? ;-) This is the otherside of protection for files with spaces etc embedded in them.
In the edit update, I've varibalized your CTL_PATH and CTL_FILE. I think I understand your intent, that you have 1 std CTL_FILE that you pass thru sed to create a table specific .ctl file (a good approach in my experience). Note that you don't need to use cat to send a file to sed, but your use to create a altered file via redirection (> $f.ctl) is very shell-like too.
In 2nd edit update, I looked here on S.O. and found an example sqlldr cmdline that has the correct syntax and have modified to work with your variable names.
To finish up,
A. Are you sure the Oracle Client package is installed on the machine
that you are running your script on?
B. Is the /path/to/oracle/client/tools/bin included in your working
$PATH?
C. try which sqlldr. If you don't get anything, either its not
installed or its not in the path.
D. If not installed, you'll have to get it installed.
E. Once installed, note the directory that contains the sqlldr cmd.
find / -name 'sqlldr*' will take a long time to run, but it will
print out the path you want to use.
F. Take the "path" part of what is returned (like
/opt/oracle/11.2/client/bin/ (but not the sqlldr at the end), and
edit script at 2nd line with
(Txt added to appease the S.O. Formatter ;-) )
export ORCL_PATH="/path/you/found/to/oracle/client"
export PATH="$ORCL_PATH:$PATH"
These steps should solve any remaining issues. If this doesn't work, see if there is someone where you work that understands your local computing environment that can help explain any missing or different steps.
IHTH

How to create a new output file in R if a file with that name already exists?

I am trying to run an R-script file using windows task scheduler that runs it every two hours. What I am trying to do is gather some tweets through Twitter API and run a sentiment analysis that produces two graphs and saves it in a directory. The problem is, when the script is run again it replaces the already existing files with that name in the directory.
As an example, when I used the pdf("file") function, it ran fine for the first time as no file with that name already existED in the directory. Problem is I want the R-script to be running every other hour. So, I need some solution that creates a new file in the directory instead of replacing that file. Just like what happens when a file is downloaded multiple times from Google Chrome.
I'd just time-stamp the file name.
> filename = paste("output-",now(),sep="")
> filename
[1] "output-2014-08-21 16:02:45"
Use any of the standard date formatting functions to customise to taste - maybe you don't want spaces and colons in your file names:
> filename = paste("output-",format(Sys.time(), "%a-%b-%d-%H-%M-%S-%Y"),sep="")
> filename
[1] "output-Thu-Aug-21-16-03-30-2014"
If you want the behaviour of adding a number to the file name, then something like this:
serialNext = function(prefix){
if(!file.exists(prefix)){return(prefix)}
i=1
repeat {
f = paste(prefix,i,sep=".")
if(!file.exists(f)){return(f)}
i=i+1
}
}
Usage. First, "foo" doesn't exist, so it returns "foo":
> serialNext("foo")
[1] "foo"
Write a file called "foo":
> cat("fnord",file="foo")
Now it returns "foo.1":
> serialNext("foo")
[1] "foo.1"
Create that, then it returns "foo.2" and so on...
> cat("fnord",file="foo.1")
> serialNext("foo")
[1] "foo.2"
This kind of thing can break if more than one process might be writing a new file though - if both processes check at the same time there's a window of opportunity where both processes don't see "foo.2" and think they can both create it. The same thing will happen with timestamps if you have two processes trying to write new files at the same time.
Both these issues can be resolved by generating a random UUID and pasting that on the filename, otherwise you need something that's atomic at the operating system level.
But for a twice-hourly job I reckon a timestamp down to minutes is probably enough.
See ?files for file manipulation functions. You can check if file exists with file.exists, and then either rename the existing file, or create a different name for the new one.

Process many EDI files through single MFX

I've created a mapping in MapForce 2013 and exported the MFX file. Now, I need to be able to run the mapping using MapForce Server. The problem is, I need to specify both the input EDI file and the output file. As far as I can tell, the usage pattern is to run the mapping with MapForce server using the input/output configuration in the MFX itself, not passed in on the command line.
I suppose I could change the input/output to some standard file name and then just write the input file to that path before performing the mapping, and then grab the output from the standard output file path when the mapping is complete.
But I'd prefer to be able to do something like:
MapForceServer run -in=MyInputFile.txt -out=MyOutputFile.xml MyMapping.mfx > MyLogFile.txt
Is something like this possible? Perhaps using parameters within the mapping?
There are two options that I've come across in dealing with a similar situation.
Option 1- If you set the input XML file to *.xml in the component settings, mapforceserver.exe will automatically process all txt in the directory assuming your source is xml (this should work for text just the same). Similar to the example below you can set a cleanup routine to move the files into another folder after processing.
Note: It looks in the folder where the schema file is located.
Option 2 - Since your output is XML you can use Altova's raptorxml (rack up another license charge). Now you can generate code in XSLT 2.0 and use a batch file to automatically execute, something like this.
::#echo off
for %%f IN (*.xml) DO (RaptorXML xslt --xslt-version=2 --input="%%f" --output="out/%%f" %* "mymapping.xslt"
if NOT errorlevel 1 move "%%f" processed
if errorlevel 1 move "%%f" error)
sleep 15
mymapping.bat
I tossed in a sleep command to loop the batch for rechecking every 15 seconds. Unfortunately this does not work if your output target is a database.

Resources