Ok, actually I have a loop of 50 iterations and then I need an output file for each of these iterations. That happens is that with my current code I only obtain the output file corresponding to the last iteration, so could you give me a code to let me get all the files in mi current folder??. Thank you enter image description here
part[] is a vector of length 50 (really a list but it does not matter
Use
for(i in 1:(length(vec)-1)){
write.table(part[[i]],paste(i,"txt",sep = "."))
}
How about using list.files()
That lists all the files in the current directory. or you can specify a directory as the first element of the function.
I have a sorted .csv file that is something like this:
AABB1122,ABC,BLAH,4
AABB1122,ACD,WHATEVER,1
AABB1122,AGT,CAT,4
CCDD4444,AYT,DOG,4
CCDD4444,ACG,MUMMY,8
CCEE4444,AOP,RUN,5
DDFF9900,TUI,SAT,33
DDFF9900,WWW,INDOOR,5
I want to split the file into smaller files of roughly two lines each, but I do not want rows with like values in the first column separated.
Here, I would end up with three files:
x00000:
AABB1122,ABC,BLAH,4
AABB1122,ACD,WHATEVER,1
AABB1122,AGT,CAT,4
x00001:
CCDD4444,AYT,DOG,4
CCDD4444,ACG,MUMMY,8
x00002:
CCEE4444,AOP,RUN,5
DDFF9900,TUI,SAT,33
DDFF9900,WWW,INDOOR,5
My actual data is about 7 gigs in size and contains over 100 million lines. I want to split it into files of about 100K lines each or ~6MB. I am fine with using either file size or line numbers for splitting.
I know that I can use "sort" to split, such as:
split -a 5 -d -1 2
Here, that would give me four files, and like values in the first column would be split over files in most cases.
I think I probably need awk, but, even after reading through the manual, I am not sure how to proceed.
Help is appreciated! Thanks!
An awk script:
BEGIN { FS = "," }
!name { name = sprintf("%06d-%s.txt", NR, $1) }
count >= 2 && prev != $1 {
close(name)
name = sprintf("%06d-%s.txt", NR, $1)
count = 0
}
{
print >name
prev = $1
++count
}
Running this on the given data will create three files:
$ awk -f script.awk file.csv
$ cat 000001-AABB1122.txt
AABB1122,ABC,BLAH,4
AABB1122,ACD,WHATEVER,1
AABB1122,AGT,CAT,4
$ cat 000004-CCDD4444.txt
CCDD4444,AYT,DOG,4
CCDD4444,ACG,MUMMY,8
$ cat 000006-CCEE4444.txt
CCEE4444,AOP,RUN,5
DDFF9900,TUI,SAT,33
DDFF9900,WWW,INDOOR,5
I have arbitrarily chosen to use the line number from the original file from where the first line was taken, along with the first field's data on that line as the filename.
The script counts the number of lines printed to the current output file, and if that number is greater than or equal to 2, and if the first field's value is different from the previous line's first field, the current output file is closed, a new output name is constructed, and the count is reset.
The last block simply prints to the current filename, remembers the first field in the prev variable, and increments the count.
The BEGIN block initializes the field delimiter (before the first line is read) and the !name block sets the initial output file name (when reading the very first line).
To get exactly the filenames that you have in the question, use
name = sprintf("x%05d", ++n)
to set the output filename in both places where this is done.
With csplit if available
With the given data
csplit -s infile %^A% /^C/ %^C% /^D/ /^Z/ {*}
I found a command which takes the input data from a binary file and writes into a output file.
nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg
It's working but I am not able to find out how...Could anyone please help me out how above command is working and what is it doing?...this nawk is too tough to understand...:(
Thanks in advance......
nawk is not tough to understand and is same like other languages, I guess you are not able to understand it because it not properly formatted, if you format it you will know how it's working.
To answer your question this command is searching lines containing an input text in given input file, and prints few lines before matched line(s) and few lines after the matched line. How many lines to be printed are controlled by variable "b" (no of lines before) and "a" (no of lines after) and string/text to be searched is passed using variable "s".
This command will be helpful in debugging/troubleshooting where one want to extract lines from large size log files (difficult to open in vi or other editor on UNIX/LINUX) by searching some error text and print some lines above it and some line after it.
So in your command
b=1 ## means print only 1 line before the matching line
a=19 ## means print 19 lines after the matching line
s="<Comment>Ericsson_OCS_V1_0.0.0.7" ## means search for this string
/var/opt/fds/config/ServiceConfig/ServiceConfig.cfg ## search in this file
/opt/temp/"$circle"_"$sdpid"_RG.cfg ## store the output in this file
Your formatted command is below, the very first condition which was looking like c-->0 before format is easy to interpret which means c-- greater than 0. NR variable in AWK gives the line number of presently processing line in input file being processed.
nawk '
c-- > 0;
$0 ~ s
{
if(b)
for(c=b+1;c>1;c--)
print r[(NR-c+1)%b];
print;
c=a
}
b
{
r[NR%b]=$0
}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg
I am hoping to perform a series of edits to a large text file composed almost entirely of single letters, seperated by spaces. The file is about 300 rows by about 400,000 columns, and about 250 MB.
My goal is to tranform this table using a series of steps, for eventual processing with another language (R, probably). I don't have much experience working with big data files, but PERL has been suggested to me as the best way to go about this. Please let me know if there is a better way :).
So, I am hoping to write a PERL script that does the following:
Open file, edit or write to a new file the following:
remove columns 2-6
merge/concatenate pairs of columns, starting with column 2 (so, merge column 2-3,4-5, etc)
replace each character pair according to sequential conditional algorithm running accross each row:
[example PSEUDOCODE: if character 1 of cell = character 2 of cell=a, cell=1
else if character 1 of cell = character 2 of cell=b, cell=2
etc.] such that except for the first column, the table is a numerical matrix
remove every nth column, or keep every nth column and remove all others
I am just starting to learn PERL, so I was wondering if these operations were possible in PERL, whether PERL would be the best way to do them, and if there were any suggestions for syntax on these operations in the context of reading/writing to a file.
I'll start:
use strict;
use warnings;
my #transformed;
while (<>) {
chomp;
my #cols = split(/\s/); # split on whitespace
splice(#cols, 1,6); # remove columns
push #transformed, $cols[0];
for (my $i = 1; $i < #cols; $i += 2) {
push #transformed, "$cols[$i]$cols[$i+1]";
}
# other transforms as required
print join(' ', #transformed), "\n";
}
That should get you on your way.
You need to post some sample input and expected output or we're just guessing what you want but maybe this will be a start:
awk '{
printf "%s ", $1
for (i=7;i<=NF;i+=2) {
printf "%s%s ", $i, $(i+1)
}
print ""
}' file
I have 10 PDFs that ask for a user password to open. I know that password. I want to keep them in a decrypted format. Their filenames follow the form:
static_part.dynamic_part_like_date.pdf
I want to convert all the 10 files. I can give a * after the static part and work on all of them, but I also want the corresponding output filenames. So there has to be a way to capture the dynamic part of the filename and then use it in the output filename.
The normal way of doing this for one file is:
pdftk secured.pdf input_pw foopass output unsecured.pdf
I want to do something like:
pdftk var=secured*.pdf input_pw foopass output unsecured+var.pdf
Thanks.
Your request is a little ambiguous, but here are some ideas that might help you.
Assuming 1 of your 10 files is
# static_part.dynamic_part_like_date.pdf
# SalesReport.20110416.pdf (YYYYMMDD)
And you want only the SalesReport.pdf converted as unsecured, you can use a shell script to achieve your requirement:
# make a file with the following contents,
# then make it executable with `chmod 755 pdfFixer.sh`
# the .../bin/bash has to be the first line the file.
$ cat pdfFixer.sh
#!/bin/bash
# call the script like $ pdfFixer.sh staticPart.*.pdf
# ( not '$' char in your command, that is the cmd-line prompt in this example,
# yours may look different )
# use a variable to hold the password you want to use
pw=foopass
for file in ${#} ; do
# %%.* strips off everything after the first '.' char
unsecuredName=${file%%.*}.pdf
#your example : pdftk secured.pdf input_pw foopass output unsecured.pdf
#converts to
pdftk ${file} input_pw ${foopass} output ${unsecuredName}.pdf
done
You may find that you need to modify the %.* thing to
strip less from end, (use %.*) to strip just the last '.' and all chars after (strip from right).
strip from the fron (use #*.) to just the static part, leaving the dynamic part OR
strip from the front (use ##*.) to strip everything until the last '.' char.
It will really be much easier for you to figure out what you need at the cmd-line.
Set a variable with 1 sample fileName
myTestFileName=staticPart.dynamicPart.pdf
and then use echo combined with the variable modifiers to see the results.
echo ${myTestFileName##*.}
echo ${myTestFileName#*.}
echo ${myTestFileName##.*}
echo ${myTestFileName#.*}
echo ${myTestFileName%%.*}
etc.
Also notice how I combine a modified variable value with a plain string (.pdf), in unsecuredName=${file%%.*}.pdf
IHTH