I have 1000 files, which have a format of framexxx.dat, such as
frame0.dat frame1.dat frame2.dat .... frame999.dat
I hope to change these file's name to
frame000.dat frame001.dat frame002.dat .... frame999.dat
Is there anyway to do this with simple linux command?
Also, if my files are framexx.dat or framexxxx.dat (xx are 2digit numbers and xxxx are 4 digit numbers) then how can I change the code to do the same?
you have to handle them by groups:
group 0: from frame100.dat to frame999.dat: nothing to do here.
group 1: from frame10.dat to frame99.dat: add one 0
for i in {10..99}; do mv frame$f.dat frame0$f.dat; done
group 2: from frame0.dat to frame9.dat: add 2 0s
for i in {0..9}; do mv frame$f.dat frame00$f.dat; done
A general guideline is to handle the big numbers first (in some cases some complications could arise)
This can be extended to bigger numbers...you got the idea.
Related
I am working with a large number of image files within several subdirectories of one parent folder.
I am attempting to run an ImageJ macro to batch-process the images (specifically, I am trying to stitch together a series of images taken on the microscope into single images). Unfortunately, I don't think I can't run this as an ImageJ Macro because the images were taken with varying grid sizes, ie some are 2x3, some are 3x3, some are 3x2, etc.
I've written an R script that is able to evaluate the image folders and determine the grid size, now I am trying to feed that information to my ImageJ macro to batch process the folder.
The issue I am running into seems like it should be easy to solve, but I haven't had any luck figuring it out: in R, I have a data.frame that I need to pass to the system command line-by-line with the columns concatenated into a single character string delimited by *'s.
Here's an example from the data.frame I have in R:
X xcoord ycoord input
1 4_10249_XY01_Fused_CH2 2 3 /XY01
2 4_10249_XY02_Fused_CH2 2 2 /XY02
3 4_10249_XY03_Fused_CH2 3 3 /XY03
4 4_10249_XY04_Fused_CH2 2 2 /XY04
5 4_10249_XY05_Fused_CH2 2 2 /XY05
6 4_10249_XY06_Fused_CH2 2 3 /XY06
Here's what each row needs to be transformed into so that ImageJ can understand it:
4_10249_XY01_Fused_CH2*2*3*/XY01
4_10249_XY02_Fused_CH2*2*2*/XY02
4_10249_XY03_Fused_CH2*3*3*/XY03
4_10249_XY04_Fused_CH2*2*2*/XY04
4_10249_XY05_Fused_CH2*2*2*/XY05
4_10249_XY06_Fused_CH2*2*3*/XY06
I tried achieving this with a for loop inside of a function that I thought would pass each row into the system command, but the macro only runs for the first line, none of the others.
macro <- function(i) {
for (row in 1:nrow(i)) {
df<-paste(i$X, i$xcoord, i$ycoord, i$input, sep='*')
}
system2('/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx', args=c('-batch "/Users/All Stitched CH2.ijm"', df))
}
macro(table)
I think this is because the for loop is not maintaining the list-form of the data.frame. How do I concatenate the table by row and maintain the list-structure? I don't know if I'm asking the right question, but hopefully I'm close enough that someone here understands what I'm trying to do.
I appreciate any help or tips you can provide!
Turns out taking a break helps a lot!
I came back to this after lunch and came up with an easy solution (duh!)- I thought I would post it in case anyone comes along later with a similar issue.
I used stringr to combine my datatable by columns, then put them back into list form using as.list. Finally, for feeding the list into my macro, I edited the macro to only contain the system command and then used lapply to apply the macro to my list of inputs. Here is what my code looks like in the end:
library(stringr)
tablecombined<- str_c(table$X, table$xcoord, table$ycoord, table$input, sep = "*")
listylist<-as.list(tablecombined)
macro <- function(i) {
system2('/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx', args=c('-batch "/Users/All Stitched CH2.ijm"', i))
}
runme<- lapply(listylist, macro)
Note: I am using the system2 command because it can take arguments, which is necessary for me to be able to feed it a series of images to iterate over. I started with the solution posted here: How can I call/execute an imageJ macro with R?
but needed additional flexibility for my specific situation. Hopefully someone may find this useful in the future when running ImageJ Macros from R!
I have some trouble using CombiTimeTable.
I want to fill the table using a txt file that contains two columns, the first is the time and the second is the related value (a current sample). Furthermore, I add #1 in the first line as the manual says.
Moreover, I add the following parameters:
tableOnFile=true,
fileName="C:/Users/gg/Desktop/CurrentDrivingCycle.txt"
I also have to add the parameter tableName but I don't know how to define it. I tried to define it using the name of the file (i.e. CurrentDrivingCycle) but I got this error message at the end of the simulation:
Table matrix "CurrentDrivingCycle" not found on file "C:/Users/ggalli/Desktop/CurrentDrivingCycle.txt".
simulation terminated by an assertion at initialization
Simulation process failed. Exited with code -1.
Do you know how can I solve this issue?
Thank you in advance!
See the documentation:
https://build.openmodelica.org/Documentation/Modelica.Blocks.Sources.CombiTimeTable.html
The name tab1(6,2) in the example of the documentation is the tableName. So yours should look something like:
#1
double CurrentDrivingCycle(6,2) # comment line
0 0
1 0
1 1
2 4
3 9
4 16
I am fully aware that similar questions may have been posted, but after searching it seems that the details of our questions are different (or at least I did not manage to find a solution that can be adopted in my case).
I currently have two files: "messyFile" and "wantedID". "messyFile" is of size 80,000,000 X 2,500, whereas "wantedID" is of size 1 x 462. On the 253rd line of "messyFile", there are 2500 IDs. However, all I want is the 462 IDs in the file "wantedID". Assuming that the 462 IDs are a subset of the 2500 IDs, how can I process the file "messyFile" such that it only contains information about the 462 IDs (ie. of size 80,000,000 X 462).
Thank you so much for your patience!
ps: Sorry for the confusion. But yeah, the question can be boiled down to something like this. In the 1st row of "File#1", there are 10 IDs. In the 1st row of "File#2", there are 3 IDs ("File#2" consists of only 1 line). The 3 IDs are a subset of the 10 IDs. Now, I hope to process "File#1" so that it contains only information about the 3 IDs listed in "File#2".
ps2: "messyFile" is a vcf file, whereas "wantedID" can be a text file (I said "can be" because it is small, so I can make almost any type for it)
ps3: "File#1" should look something like this:
sample#1 sample#2 sample#3 sample#4 sample#5
0 1 0 0 1
1 1 2 0 2
"File#2" should look something like this:
sample#2 sample#4 sample#5
Desired output should look like this:
sample#2 sample#4 sample#5
1 0 1
1 0 2
For parsing VCF format, use bcftools:
http://samtools.github.io/bcftools/bcftools.html
Specifically for your task see the view command:
http://samtools.github.io/bcftools/bcftools.html#view
Example:
bcftools view -Ov -S 462sample.list -r chr:pos -o subset.vcf superset.vcf
You will need to get the position of the SNP to specify chr:pos above.
You can do this using DbSNP:
http://www.ncbi.nlm.nih.gov/SNP/index.html
Just make sure to match the genome build to the one used in the VCF file.
You can also use plink:
https://www.cog-genomics.org/plink2
But, PLINK is finicky about duplicated SNPs and other things, so it may complain unless you address these issues.
I've done what you are attempting in the past using the awk programming language. For your sanity, I recommend using one of the above tools :)
Ok, I have no idea what a vcf file is but if the File#1 and File#2 samples you gave were files containing tab separated columns this will work:
declare -a data=(`head -1 data.txt`)
declare -a header=(`head -1 header.txt`)
declare fields
declare -i count
for i in "${header[#]}" ; do
count=0
for j in "${data[#]}" ; do
count=$count+1;
if [ $i == $j ] ; then
fields=$fields,$count
fi
done
done
cut -f ${fields:1} data.txt
If they aren't tab separated values perhaps it can be amended for the actual data format.
I have a very large file (~10 GB) that can be compressed to < 1 GB using gzip. I'm interested in using sort FILE | uniq -c | sort to see how often a single line is repeated, however the 10 GB file is too large to sort and my computer runs out of memory.
Is there a way to compress the file while preserving newlines (or an entirely different method all together) that would reduce the file to a small enough size to sort, yet still leave the file in a condition that's sortable?
Or any other method of finding out / countin how many times each line is repetead inside a large file (a ~10 GB CSV-like file) ?
Thanks for any help!
Are you sure you're running out of the Memory (RAM?) with your sort?
My experience debugging sort problems leads me to believe that you have probably run out of diskspace for sort to create it temporary files. Also recall that diskspace used to sort is usually in /tmp or /var/tmp.
So check out your available disk space with :
df -g
(some systems don't support -g, try -m (megs) -k (kiloB) )
If you have an undersized /tmp partition, do you have another partition with 10-20GB free? If yes, then tell your sort to use that dir with
sort -T /alt/dir
Note that for sort version
sort (GNU coreutils) 5.97
The help says
-T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR or /tmp;
multiple options specify multiple directories
I'm not sure if this means can combine a bunch of -T=/dr1/ -T=/dr2 ... to get to your 10GB*sortFactor space or not. My experience was that it only used the last dir in the list, so try to use 1 dir that is big enough.
Also, note that you can go to the whatever dir you are using for sort, and you'll see the acctivity of the temporary files used for sorting.
I hope this helps.
As you appear to be a new user here on S.O., allow me to welcome you and remind you of four things we do:
. 1) Read the FAQs
. 2) Please accept the answer that best solves your problem, if any, by pressing the checkmark sign. This gives the respondent with the best answer 15 points of reputation. It is not subtracted (as some people seem to think) from your reputation points ;-)
. 3) When you see good Q&A, vote them up by using the gray triangles, as the credibility of the system is based on the reputation that users gain by sharing their knowledge.
. 4) As you receive help, try to give it too, answering questions in your area of expertise
There are some possible solutions:
1 - use any text processing language (perl, awk) to extract each line and save the line number and a hash for that line, and then compare the hashes
2 - Can / Want to remove the duplicate lines, leaving just one occurence per file? Could use a script (command) like:
awk '!x[$0]++' oldfile > newfile
3 - Why not split the files but with some criteria? Supposing all your lines begin with letters:
- break your original_file in 20 smaller files: grep "^a*$" original_file > a_file
- sort each small file: a_file, b_file, and so on
- verify the duplicates, count them, do whatever you want.
I have these two files
File: 11
11
456123
File: 22
11
789
Output of diff 11 22
2c2
< 456123
---
> 789
Output to be
< 456123
> 789
I want it to not print the 2c2 and --- lines. I looked at the man page but could not locate any help. Any ideas? The file has more than 1000 lines.
What about diff 11 22 | grep "^[<|>]"?
Update: As knitti pointed out the correct pattern is ^[<>]
Diff has a whole host of useful options like --old-group-format that are described very briefly in help. They are expanded in http://www.network-theory.co.uk/docs/diff/Line_Group_Formats.html
The following is producing something similar to what you want.
diff 11.txt 22.txt --unchanged-group-format="" --changed-group-format="<%<>%>"
<456123
>789
You might also need to play with --old-group-format=format (groups hunks containing only lines from the first file) --new-group-format=format --old-line-format=format (formats lines just from the first file) and --new-line-format=format etc
Disclaimer - I have not used this for real before, in fact I have only just understood them. If you have further questions I am happy to look at it later.
Edited to change order of lines