unix diff special options - unix

How can I diff 2 files so that:
I do not care about any kind of white space (-b option)
I do not care about position of content. ( ?? )
What I mean by 2 above is: file with a on line1 and b on line2 is equal to another file with b on line1 and a on line2.
Please let me know if the question is still not clear.
thanks.

Sort the two files first, then diff them. There's no way to convince diff that the a and b lines are in any way interchangeable. Order is extremely important to diff.
Edit for comment -
Tools like diff do not understand any higher level semantics beyond simply ordered lines. You might try writing a tool that converts your files in to those higher level concepts, one per line, that perhaps diff can then process (vs writing a custom diff, which is kind of a pain). Since you can't sort the entire file, perhaps you sort those small sections where "order doesn't matter", that way they won't matter to diff as well.
The final file doesn't have to necessarily be a proper file format (i.e. compatible with the original syntax), rather simply enough to convey to use the differences you're looking for while still capturing the semantics your after while also leveraging an off the shelf tool like diff.
Example:
File 1:
block thing {
a
b
}
block thing 2 {
c
d
}
File 2:
block thing {
b
c
a
}
block thing 3 {
f
e
}
"sorted" File 1:
block thing {
a
b
}
block thing 2 {
c
d
}
"sorted" File 2:
block thing {
a
b
c
}
block thing 3 {
e
f
}
In the end, ideally, you'll find that Block 3 is "new" as well as the "c" in Block 1.

Related

I need help for gamemaker 2.3

Pls help me
A few weeks ago it came out of gamemaker 2.3, practically in the gamemaker language they changed the scripts into functions, but now after converting the files to be able to reopen them, I double-checked all the scripts and etc but anyway when I start it it remains a black screen, however it doesn't give me any compilation errors or whatever, what could be the problem?
Ps.
I might sound stupid, but if someone has the same program as me I can pass the project to them so they can see the scripts for themselves, so basically it's just the base and there is only the script to make the player walk and for collisions, I know that no one would want to waste time, but I ask the same
Its possible that your code is stuck in an infinite loop, here's an example of what that might look like:
var doloop = true
while(doloop == true){
x += 1
y += 1
}
the "doloop" variable is never changed within the while loop, so it is always equal to true and the loop never ends. Because the code never finishes looping, it can never get around to drawing anything, so you end up with a black screen. The easiest way to check for these is to put a breakpoint/debugging point at the beginning and just after every while/for/do/ect loop and debug it. e.g. (I am using asterisks "*" to represent breakpoints)
var doloop = true
* while(doloop == true){
x += 1
y += 1
}
*
When you get to one of the loops remove the first breakpoint and hit the "continue" button in the debugger. If it (it being the computer) takes an longer than it should to hit the second breakpoint (as in, you wait for a ten seconds to or two minutes (depends on how complex the code is) and it still hasn't hit the second breakpoint), then you should replace the breakpoint at the beginning of the loop to check and make sure it is still in there. If it is still in the loop, then that is likely where the code is getting stuck. Review the loop and everywhere any associated variables are set/changed, and you should be able to find the problem (even if it takes a while).
Majestic_Monkey_ and the commentors are correct: use the debugger. It's easy and it's your friend. Just place a red circle on the very first line of code that runs, and click the little bug icon and you can step through your code easily.
But to address your specific issue (or if anyone in the future has this issue): scripts have changed into files that can have many functions. Where you used to have
//script_name
var num = argument0 + argument1;
return num;
You would now have
function script_name(a, b) {
var num = a + b;
return num;
}
All you have to do is create a decleration for your new function:
function my_function_name(argument_names, etc...)
Then wrap all your old code in { }, and replace all those ugly "argument0" things with actual names. It's that easy. Plus you can have more than one function per script!

How to match ID in column in unix?

I am fully aware that similar questions may have been posted, but after searching it seems that the details of our questions are different (or at least I did not manage to find a solution that can be adopted in my case).
I currently have two files: "messyFile" and "wantedID". "messyFile" is of size 80,000,000 X 2,500, whereas "wantedID" is of size 1 x 462. On the 253rd line of "messyFile", there are 2500 IDs. However, all I want is the 462 IDs in the file "wantedID". Assuming that the 462 IDs are a subset of the 2500 IDs, how can I process the file "messyFile" such that it only contains information about the 462 IDs (ie. of size 80,000,000 X 462).
Thank you so much for your patience!
ps: Sorry for the confusion. But yeah, the question can be boiled down to something like this. In the 1st row of "File#1", there are 10 IDs. In the 1st row of "File#2", there are 3 IDs ("File#2" consists of only 1 line). The 3 IDs are a subset of the 10 IDs. Now, I hope to process "File#1" so that it contains only information about the 3 IDs listed in "File#2".
ps2: "messyFile" is a vcf file, whereas "wantedID" can be a text file (I said "can be" because it is small, so I can make almost any type for it)
ps3: "File#1" should look something like this:
sample#1 sample#2 sample#3 sample#4 sample#5
0 1 0 0 1
1 1 2 0 2
"File#2" should look something like this:
sample#2 sample#4 sample#5
Desired output should look like this:
sample#2 sample#4 sample#5
1 0 1
1 0 2
For parsing VCF format, use bcftools:
http://samtools.github.io/bcftools/bcftools.html
Specifically for your task see the view command:
http://samtools.github.io/bcftools/bcftools.html#view
Example:
bcftools view -Ov -S 462sample.list -r chr:pos -o subset.vcf superset.vcf
You will need to get the position of the SNP to specify chr:pos above.
You can do this using DbSNP:
http://www.ncbi.nlm.nih.gov/SNP/index.html
Just make sure to match the genome build to the one used in the VCF file.
You can also use plink:
https://www.cog-genomics.org/plink2
But, PLINK is finicky about duplicated SNPs and other things, so it may complain unless you address these issues.
I've done what you are attempting in the past using the awk programming language. For your sanity, I recommend using one of the above tools :)
Ok, I have no idea what a vcf file is but if the File#1 and File#2 samples you gave were files containing tab separated columns this will work:
declare -a data=(`head -1 data.txt`)
declare -a header=(`head -1 header.txt`)
declare fields
declare -i count
for i in "${header[#]}" ; do
count=0
for j in "${data[#]}" ; do
count=$count+1;
if [ $i == $j ] ; then
fields=$fields,$count
fi
done
done
cut -f ${fields:1} data.txt
If they aren't tab separated values perhaps it can be amended for the actual data format.

R - read html files within a folder, count frequency, and export output

I'm planning to use R to do some simple text mining tasks. Specifically, I would like to do the following:
Automatically read each html file within a folder, then
For each file, do frequency count of some particular words (e.g., "financial constraint" "oil export" etc.), then
Automatically write output to a csv. file using the following data structure (e.g., file 1 has "financial constraint" showing 3 times and "oil export" 4 times, etc.):
file_name count_financial_constraint count_oil_export
1 3 4
2 0 3
3 4 0
4 1 2
Can anyone please let me know where I should start, so far I think I've figured out how to clean html files and then do the count but I'm still not sure how to automate the process (I really need this as I have around 5 folders containing about 1000 html files within each)? Thanks!
Try this:
gethtml<-function(path=".") {
files<-list.files(path)
setwd(path)
html<-grepl("*.html",files)
files<-files[html]
htmlcount<-vector()
for (i in files) {
htmlcount[i]<- ##### add function that reads html file and counts it
}
return(sum(htmlcount))
}
R is not intended for doing rigorous text parsing. Subsequently, the tools for such tasks are limited. If you insist on doing it with R then you better get familiar with regular expressions and have a look at this.
However, I highly recommend using Python with the beautifulsoup library, which is specifically designed for this task.

How can I change numbering in all of the file names?

I have 1000 files, which have a format of framexxx.dat, such as
frame0.dat frame1.dat frame2.dat .... frame999.dat
I hope to change these file's name to
frame000.dat frame001.dat frame002.dat .... frame999.dat
Is there anyway to do this with simple linux command?
Also, if my files are framexx.dat or framexxxx.dat (xx are 2digit numbers and xxxx are 4 digit numbers) then how can I change the code to do the same?
you have to handle them by groups:
group 0: from frame100.dat to frame999.dat: nothing to do here.
group 1: from frame10.dat to frame99.dat: add one 0
for i in {10..99}; do mv frame$f.dat frame0$f.dat; done
group 2: from frame0.dat to frame9.dat: add 2 0s
for i in {0..9}; do mv frame$f.dat frame00$f.dat; done
A general guideline is to handle the big numbers first (in some cases some complications could arise)
This can be extended to bigger numbers...you got the idea.

How can I create multiple directories in a single line?

I want to know the source code: how can I create multiple directories using Turbo C++. For example, you can see in MS-DOS, in a single line:
md a b c d
creates a, b, c, and d directories simultaneously.
I have used this code in Turbo C++ (Borland Compiler 5.5):
char dir_name[256];
int status=mkdir(dir_name);
if(status==0)
{
cout<<"Directory created.";
}
else
{
cout<<"Error!";
}
Can anybody help me out, please...?
You can store the names in a 2-day array the names can be entered in a single line by delimiting them using space and then using a loop create directories till you reach the end of your array.

Resources