Does anyone know of a good string comparison / checking tool? - similarity

I am looking for a good online string checking tool that will allow me to enter two long strings; and it will show me where any differences occur. It would also be nice if there is a tool that on input of a string showed you below each character an index, with escaped characters handled correctly so that /0 only takes up one space and not two.
Does anyone know of such a tool? It would greatly assist in string verification.

Have you heard of Levenshtein distance? It seems similar to what you're looking for, and the wiki article has some links to online implementations.

there are lots of techniques like simmetrics, or locality sensitive hashing. Also look at the difflib library of python http://docs.python.org/library/difflib.html for the purpose.

After realizing what I should be searching for... "diff tool", I got :
http://www.quickdiff.com/

Related

How can i read this kind of text

First of all i dont understand all this coding thing, im just doing this out of curiousity and boredom.
So what i want to ask is...
Is there a way to read this in English
or you know in any languange people speak
(Like any languange, since i can use google translate)
\x79\x5D\xEF\xBF\xBD\x0F\xE6\x99\xB8\xEF\xBF\xBD\x5D\x2C\xEF\xBF\xBD\x10\x33\x2D\x23\xEF\xBF\xBD
If yes, then how to do it ?
If no, thanks for reading and trying to help.
Thanks
Most people would easily be able to do this with a bit of practice.
As it's a language not designed for humans, it will take a while
The above test is in ASCII.
Broken down: \xST is how ASCII is written
\x tells the machine to convert to hexadecimal
ST are just any one of 0-9 and a-f
So you just have to find the translator tool and start learning

Trying to understand how to accomplish this task

New to posting here on stackoverflow, so please forgive any transgresses that occur.
So a little background....
My grandfather is a current computer science professor at a university. I have always taken a great interest in computers, and have really grown into dealing with the hardware side of things. However, him being how he is, he wants to me to have a broader understanding of computing in general. Including coding/programming.
SO to my question.... He has given me a Key P5FW-93F6. He told me, that if I am able to make other keys "with the same value" he will give me a reward. So as I am trying to solve this problem, I haven't a clue where to start. In the beginning, I have entered the code, and followed the pattern of Letter, Number, Letter, Letter, etc. into excel and used the random value function.. However, none of these keys work in his program. He told me there is a massive amount of different "Keys" that will work but will not provide hints on to how to solve the problem. What language should I learn to solve this? Should I be looking for a hash value to be the same as the one key listed above? I am completely lost... any help would be appreciated!
Thanks!!
P.S. I do have an unlimited number of attempts, however only have one line that I can enter at a time. So I can't make batch entries.

About the pattern matching algorithm in OCaml

I am writing a compiler for a functional language I designed with OCaml. I want my little language to have the feature of pattern matching, however, I got stuck in coming up with an algorithm to implement it. It seems really complicated as I dig into the problem. I can't find much useful information about the corresponding algorithm with google. I will be appreciated if someone can give me some hint or point me to the resources. Or are there any tricks to take advantage of OCaml's power in pattern matching to solve this problem so that I don't need to implement it? Thanks!
There's a few good papers on compiling pattern matching by some of the people behind OCaml. In particular see Compiling Pattern Matching to Good Decision Trees and Optimizing Pattern Matching. It might also be useful to go over this stackoverflow post.

Using R to process Mail Files

I've done a bit of searching and after not finding much I thought I would post this question. Actually, because I've not found much, I think that may be an indicator of what the answer will be, but anyway...here it is:
Does anyone have any experience using R to process files for postal mailings...and if so...what packages do you use?
I realize R might not be the best tool for this task but sometimes you have to use the tools you have at hand and sometimes you have to do "extra" things at work to stay employed...so please don't flame me too hard for this question.
Basically I'm looking at merge purge, dup/elim sort of stuff. I've played with the compare() and merge() commands a bit. I'd like to incorporate some equivalencies in the compares such as
ST=St=St.=Street
BLVD=Blvd=Blvd.=Boulevard
etc...
I'm mostly wondering if packages have already been developed for this sort of data processing so I'm not reinventing the wheel.
I'd suggest the following basic workflow:
(1) Read in your data. I don't know what it looks like based on your question, so I'll assume that's easy for you.
(2) Use a mix of gsub, toupper, and other string manipulation tools to convert all the data to the same formats. I.e., get all addresses to use ST instead of St or street, etc.
(3) merge everything into a single dataframe.
(4) Use unique and/or sort/order to clean up the list and remove duplicates.
(5) Output to whatever format you're going for. Again, not clear from the question, so I can't offer specific advice here.

Should I use an expression parser in my Math game?

I'm writing some children's Math Education software for a class.
I'm going to try and present problems to students of varying skill level with randomly generated math problems of different types in fun ways.
One of the frustrations of using computer based math software is its rigidity. If anyone has taken an online Math class, you'll know all about the frustration of taking an online quiz and having your correct answer thrown out because your problem isn't exactly formatted in their form or some weird spacing issue.
So, originally I thought, "I know! I'll use an expression parser on the answer box so I'll be able to evaluate anything they enter and even if it isn't in the same form I'll be able to check if it is the same answer." So I fire up my IDE and start implementing the Shunting Yard Algorithm.
This would solve the problem of it not taking fractions in the smallest form and other issues.
However, It then hit me that a tricky student would simply be able to enter most of the problems into the answer box and my expression parser would dutifully parse and evaluate it to the correct answer!
So, should I not be using an expression parser in this instance? Do I really have to generate a single form of the answer and do a string comparison?
One possible solution is to note how many steps your expression evaluator takes to evaluate the problem's original expression, and to compare this to the optimal answer. If there's too much difference, then the problem hasn't been reduced enough and you can suggest that the student keep going.
Don't be surprised if students come up with better answers than your own definition of "optimal", though! I was a TA/grader for several classes, and the brightest students routinely had answers on their problem sets that were superior to the ones provided by the professor.
For simple problems where you're looking for an exact answer, then removing whitespace and doing a string compare is reasonable.
For more advanced problems, you might do the Shunting Yard Algorithm (or similar) but perhaps parametrize it so you could turn on/off reductions to guard against the tricky student. You'll notice that "simple" answers can still use the parser, but you would disable all reductions.
For example, on a division question, you'd disable the "/" reduction.
This is a great question.
If you are writing an expression system and an evaluation/transformation/equivalence engine (isn't there one available somewhere? I am almost 100% sure that there is an open source one somewhere), then it's more of an education/algebra problem: is the student's answer algebraically closer to the original expression or to the expected expression.
I'm not sure how to answer that, but just an idea (not necessarily practical): perhaps your evaluation engine can count transformation steps to equivalence. If the answer takes less steps to the expected than it did to the original, it might be ok. If it's too close to the original, it's not.
You could use an expression parser, but apply restrictions on the complexity of the expressions permitted in the answer.
For example, if the goal is to reduce (4/5)*(1/2) and you want to allow either (2/5) or (4/10), then you could restrict the set of allowable answers to expressions whose trees take the form (x/y) and which also evaluate to the correct number. Perhaps you would also allow "0.4", i.e. expressions of the form (x) which evaluate to the correct number.
This is exactly what you would (implicitly) be doing if you graded the problem manually -- you would be looking for an answer that is correct but which also falls into an acceptable class.
The usual way of doing this in mathematics assessment software is to allow the question setter to specify expressions/strings that are not allowed in a correct answer.
If you happen to be interested in existing software, there's the open-source Stack http://www.stack.bham.ac.uk/ (or various commercial options such as MapleTA). I suspect most of the problems that you'll come across have also been encountered by Stack so even if you don't want to use it, it might be educational to look at how it approaches things.

Resources