Based on the count from some other file, I need to rename all the files extensions.
Ex: If the count is 10 and there are 5 files exists, I need to rename all the files as below.
from File_1.txt to File_11.txt,
from File_2.txt to File_12.txt,
from File_3.txt to File_13.txt,
from File_4.txt to File_14.txt,
from File_5.txt to File_15.txt
Can I use one unix command to do this, appreciate your help on this.
Regards,
NPK
With standard UNIX you'd need more than one command, e. g.
count=10
for file in File_*.txt
do augend=`echo $file|sed 's/File_\(.*\).txt/\1/'`
mv $file File_`expr $augend + $count`.txt
done
But if you have a system with this rename available, you can
rename 's/File_(.*).txt/"File_".($1+$ENV{count}).".txt"/e' File_*.txt
(assuming count has been exported to the environment) or
rename 's/File_(.*).txt/"File_".($1+'$count').".txt"/e' File_*.txt
as well.
Related
I need to download many files from a server (specifically tectia) ideally using the ssh package. These files all follow the a predictable pattern across multiple sub folders. The filepath is formatted like this
/directory/subfolder/A001/abcde001.csv
Where A001 counts up alongside the last 3 digits of the filename (/A002/abcde002.csv and so on)
In the vignette for scp_download it states that the files parameter may contain wildcards so I have tried to do something like
scp_download(session, "/directory/subfolder/A.*/abcde.*[.]csv", to=tempdir())
and
scp_download(session, "directory/subfolder/A\\d{3}/abcde\\d{3}[.]csv", to=tempdir())
but no matter which combination of patterns or wildcards I can think of (which isn't many) I only get something like
Warning: SSH warning: scp: /directory/subfolder/A\d{3}/abcde\d{3}[.]csv: No such file or directory
What I'm hoping to do is either find a way to do pattern matching here, or to find a way to store tectia directories as a string to be read by scp_download. I've made sure that my session is connected properly and it works without attempting to pattern match, which it does.
I had the same problem. The problem is that when you use * in your pattern it gets escaped when you send it to the server. However, when you request a special file name like this /directory/subfolder/A001/abcde001.csv, it works fine.
Finally I changed my code based on the below steps:
I got the list of files/folders using ls command with ssh_exec_wait function and then store them on a variable.
Download files in the variable separately
session <- ssh_connect("username#ip",passwd="password")
files<-capture.output(ssh_exec_wait(session, command = 'ls /directory/subfolder/A001/*'))
dnc1<- scp_download(session, files[1], to = paste0(getwd(),"/data/"))
dnc2<- scp_download(session, files[2], to = paste0(getwd(),"/data/"))
dnc3<- scp_download(session, files[3], to = paste0(getwd(),"/data/"))
The bottom 3 commands can be done in a loop as this could be hundreds or thousands of records.
DataStage - There are 7 folders in a path and in each folder there are 2 files . for eg : the 2 files are in the folllowing format- filename = test_s1_YYYYMMDD.txt, test_s1_YYYYMMDD.done. The path for these files are user/test/test_s1/
user/test/test_s2/
...
...
..
user/test/test_s7/------here s1,s2...s7 represents the different folders
In these folders the 2 above mentioned files are present , so how can i process each file in a sequence job?
First you need a job to process a file and the filename needs to be a parameter of that job.
For the Sequence level you need two levels - the inner one for the two files within each folder and a outer one for the different directories.
For the inner one you can choose to build a loop with to iterations or simply add the processing job twice to the sequence (which will reduce complexity in case it will always be two files).
The outer Sequence is a loop where you could parameterize the path in a way that the loop counter could be used to generate your 1-7 flexible path addon.
Check out more details on loops here
You can use the loop counter (stage_label.$Counter) to parameterize your job.
Depending on what you want to do with the files, it is an important decision how to process your files. Starting a job (or more) in a sequence for each file can lead to heavy overhead for just starting the jobs. Try loading all files at once in a parallel job using the sequenial file stage.
In the Sequential File Stage, set the appropriate Format. You can also set everything to none to just put each row in one column and process that in a later job. This will make the reading very flexible and forgiving. If your files are all the same structure, define your columns as needed.
To select the files, use File Patterns. In the Options of the Sequential File Stage, choose to have a File Name Column so you can process the filenames in a later job. You might also want to add a Row Number Column.
This method works pretty fast.
I am working in UNIX and trying to write the following commands. I am receiving a source file daily whose filename is in the format :
ONSITE_EXTR_ONSITE_EXTR_20170707.
Since I am receiving a file daily, the file name would change based on the current date, so ONSITE_EXTR_ONSITE_EXTR_20170708, ONSITE_EXTR_ONSITE_EXTR_20170709 etc. I need to strip the date out of the filename and rename it to ONSITE_EXTR_ONSITE_EXTR. After I have finished whatever data reading and processing I need to do, I need to change the file name back to ONSITE_EXTR_ONSITE_EXTR_20170707 for example. So since the file is being delivered daily, I cant hard code the date in whatever commands I write. Any help would be greatly appreciated
Depending on your toolchain, this may be as simple as running:
$ mv ONSITE_EXTR_ONSITE_EXTR_$(date +%Y%m%d) ONSITE_EXTR_ONSITE_EXTR
... before running the rest of your script, assuming you're using a Bash-like shell.
Having said that, you can just drop in ONSITE_EXTR_ONSITE_EXTR_$(date +%Y%m%d) into your script when trying to access your file instead.
This is all assuming the script's run the same day and in the same time zone as the file is downloaded.
If you were using bash and you had the file name in a variable, you could do:
IN="ONSITE_EXTR_ONSITE_EXTR_20170707"
echo ${IN:0:23}
to give ONSITE_EXTR_ONSITE_EXTR
Googling gives all sorts of guides here...
I have filenames in a directory like:
ACCT_GA12345_2015-01-10.xml
ACCT_GA12345_2015-01-09.xml
ACCT_GDC789g_2015-01-09.xml
ACCT_GDC567g_2015-01-09.xml
ACCT_GDC567g_2015-01-08.xml
ACCT_GCC7894_2015-01-01.xml
ACCT_GCC7894_2015-01-02.xml
ACCT_GAC7884_2015-02-01.xml
ACCT_GAC7884_2015-01-01.xml
I want to have only the latest file in the folder. The latest file can be found using only the file name (NOT the date stamp). For example ACCT 12345 has files from 1/10 & 1/09. I need to delete 1/09 file and have only 1/10 file, for ACCT 789g there is only one file so I have to have that file, and ACCT 567g the latest file is 1/09 so I have to remove 1/08 and have 1/09. So the combination for latest file should be ACCT & Max date for that ACCT.
I would need the final list of files as:
ACCT_GA12345_2015-01-10.xml
ACCT_GDC789g_2015-01-09.xml
ACCT_GDC567g_2015-01-09.xml
ACCT_GCC7894_2015-01-02.xml
ACCT_GAC7884_2015-02-01.xml
Can someone help me with this command in unix? Any help is appreciated
I'd do something like this.... to test start with ls command, when you get what you want to delete, then do rm.
ls ACCT_{GDC,GA1}*-{09,10}.xml
this will list any GDC or GA1 files that end in 09 or 10. You can play with combinations and different values until you have the right set of files showing that you want deleted. once you to just change ls to rm and you should be golden.
With some more info I could help you out. To test this out I did:
touch ACCT_{GDC,GA1}_{01..10}_{05..10}.xml
this will make 56 different dummy files with different combinations. Make a directory, run this command, and get your hands dirty. That is the best way to learn linux cli. Also 65% of commands you need, you will learn, understand, use then never use again...so learn how to teach yourself how to use man pages and setup a spot to play around in.
I have a very large file (~10 GB) that can be compressed to < 1 GB using gzip. I'm interested in using sort FILE | uniq -c | sort to see how often a single line is repeated, however the 10 GB file is too large to sort and my computer runs out of memory.
Is there a way to compress the file while preserving newlines (or an entirely different method all together) that would reduce the file to a small enough size to sort, yet still leave the file in a condition that's sortable?
Or any other method of finding out / countin how many times each line is repetead inside a large file (a ~10 GB CSV-like file) ?
Thanks for any help!
Are you sure you're running out of the Memory (RAM?) with your sort?
My experience debugging sort problems leads me to believe that you have probably run out of diskspace for sort to create it temporary files. Also recall that diskspace used to sort is usually in /tmp or /var/tmp.
So check out your available disk space with :
df -g
(some systems don't support -g, try -m (megs) -k (kiloB) )
If you have an undersized /tmp partition, do you have another partition with 10-20GB free? If yes, then tell your sort to use that dir with
sort -T /alt/dir
Note that for sort version
sort (GNU coreutils) 5.97
The help says
-T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR or /tmp;
multiple options specify multiple directories
I'm not sure if this means can combine a bunch of -T=/dr1/ -T=/dr2 ... to get to your 10GB*sortFactor space or not. My experience was that it only used the last dir in the list, so try to use 1 dir that is big enough.
Also, note that you can go to the whatever dir you are using for sort, and you'll see the acctivity of the temporary files used for sorting.
I hope this helps.
As you appear to be a new user here on S.O., allow me to welcome you and remind you of four things we do:
. 1) Read the FAQs
. 2) Please accept the answer that best solves your problem, if any, by pressing the checkmark sign. This gives the respondent with the best answer 15 points of reputation. It is not subtracted (as some people seem to think) from your reputation points ;-)
. 3) When you see good Q&A, vote them up by using the gray triangles, as the credibility of the system is based on the reputation that users gain by sharing their knowledge.
. 4) As you receive help, try to give it too, answering questions in your area of expertise
There are some possible solutions:
1 - use any text processing language (perl, awk) to extract each line and save the line number and a hash for that line, and then compare the hashes
2 - Can / Want to remove the duplicate lines, leaving just one occurence per file? Could use a script (command) like:
awk '!x[$0]++' oldfile > newfile
3 - Why not split the files but with some criteria? Supposing all your lines begin with letters:
- break your original_file in 20 smaller files: grep "^a*$" original_file > a_file
- sort each small file: a_file, b_file, and so on
- verify the duplicates, count them, do whatever you want.