How to test Math-related units? - math

I'm giving TDD a serious try today and have found it really helpful, in line with all the praise it receives.
In my quest for exercises on which to learn both Python and TDD i have started to code SPOJ exercises using the TDD technique and i have arrived at a question:
Given that all of SPOJ's exercises are mostly math applied to programming; How does one test a math procedure as in the TDD Fashion? Sample known-to-be-correct data? Test against a known implementation?
I have found that using the sample data given in the problem itself is valuable but it feels overkill for something you can test so quickly using the console, not to mention the overhead to design your algorithms in a testable fashion (Proxying the stdout and stdin objects is nothing short of too much work for a really small reward), and while it is good because it forces you to think your solutions in testable terms i think i might be trying way too hard on this.
All guidance is welcome

test all edge cases. Your program is most likely to fail when the input is special for some reason: negative, or zero values, very large values, inputs in reverse order, empty inputs. You might also want some destructive tests to see how large the inputs can be before things break or grind to a halt.
Sphere online Judge may not be the best fit of TDD. For one the input data might be better behaved than what a real person might put in. Secondly there is a code size limit on some problems. An extensive test-suite might put you over that limit.

You might want to take a look at Uncle Bob's "Transformation Priority Premise." It offers some guidance on how to pick a sequence of tests to test drive an algorithm.

Use sample inputs for which you know the results (outputs). Use equivalence partitioning to identify a suitable set of test cases. With maths code you might find that you can not implement as incrementally as for other code: you might need several test cases for each incremental improvement. By that I mean that non maths code can typically be thought of as having a set of "features" and you can implement one feature at a time, but maths code is not much like that.

Related

RaptorQ FEC Implementation Obstacle

I am trying to implement the RaptorQ Forward Error Correction Scheme in java as specified here:
https://datatracker.ietf.org/doc/html/draft-ietf-rmt-bb-fec-raptorq-04#section-5.3.3
The core of the problem is actually to execute gaussian elimination on a matrix A in a smart way to be fast.
The matrix A is composed of submatrices, among others these are G_LDPC,1 and G_LDPC,2.
(Generator matrices for Low Density Parity Checks)
On page 22 in section "5.3.3.3. Pre-coding relationships" it is stated that this matrices can be decuced from the code snippet on the same page.
My Problem: I am not able to derive the structure of these two submatrices from the code snipped.
Does someone see how to do that, or how the structure looks like?
Thanks for any kind of help!
Max
I'm also trying to implement RaptorQ, and ran into this exactly same problem. My suggestion is this book:
Raptor Codes (Foundations and Trends(R) in Communications and Information Theory) [Paperback]
Amin Shokrollahi (Author), Michael Luby (Author)
It has a better explanation on constructing the constraint matrix in section 3.3.3 (I'd quote it, but I don't have it digital).
#Max anyway we can chat or you can share your RFC5053 implementation? I really could use someone familiar with these difficulties to talk to and share some doubts/ideas.
After being stuck with the problem, I decided to implement the Raptor codec according to RFC 5053 as described here:
https://www.rfc-editor.org/rfc/rfc5053
This is actually the predecessor version of RaptorQ.
The general working principle seems to be the same, but it is less optimized and therefore has worse properties, especially in sense of reception efficiency.
But on the other hand it was less complex and more intuitive to me, and therefore I was able to code a working implementation in Java.
And after all, I have to admit that I'm very astonished by the capabilities of the created codec!
With the deeper understanding gained during coding the RFC 5053 implementation I was probably also able to realize the RaptorQ codec now.

How to identify that code is over abstracted?

What should be the measures that should be used to identify that code is over abstracted and very hard to understand and what should be done to reduce over abstraction?
"Simplicity over complexity, complexity over complicatedness"
So - there's a benefit to abstract something only if You are "de-leveling" complicatedness to complexity. Reasons to do that can vary: better modularity, better encapsulation etc.
Identifying over abstraction is a chicken and egg problem. In order to reduce over abstraction You need to understand actual reason behind code lines. That includes understanding idea of particular abstraction itself (in contrast to calling it over abstracted cause of lack of understanding). And that's not enough - You need to know a better, simpler solution to prove that it's over abstracted.
If You are looking for tool that could do it in Your place - look no more, only mind can reliably judge that.
I will give an answer that will get a LOT of down votes!
If the code is written in an OO language .. it is necessarily heavily over-abstracted. The purer the language the worse the problem.
Abstraction should be used with great caution. If in doubt always use concrete data structures. (You can always abstract later, this is easier than de-abstraction :)
You must be very certain you have the right abstraction in your current context, and you must be very sure that concept will stand the test of change. Abstraction has a high price in performance of both the code and the coder.
Some weak tests for over-abstraction: if the data structure is a product type (struct in C) and the programmer has written get and set method for each field, they have utterly failed to provide any real abstraction, disabled operators like C increment, for no purpose, and simply not understood that the struct field names are already the abstract representation of a product. Duplicating and laming up the interface is not a good idea.
A good test for the product case is whether there exist any data invariants to maintain. For example a pair of integers representing a rational number is almost sufficient, there's little need for any abstraction because all pairs are valid except when the denominator is zero. However for performance reasons one may choose to maintain an invariant, typically the denominator is required to be greater than zero, and the numerator and denominator are relatively prime. To ensure the invariant, the product representation is encapsulated: the initial value protected by a constructor and methods constrained to maintain the invariant.
To fix code I recommend these steps:
Document the representation invariants the abstraction is maintaining
Remove the abstraction (methods) if you can't find strong invariants
Rewrite code using the method to access the data directly.
This procedure only works for low level abstraction, i.e. abstraction of small values by classes.
Over abstraction at a higher level is much harder to deal with. Ideally you'd refactor the code repeatedly, checking to see after each step it continues to work. However this will be hard, and sometimes a major rewrite is required, rather than a refinement. It's probably not worth it unless the abstraction is so far off base it is not tenable to continue to maintain it.
Download Magento and have a look at the code, read some documents on it and have a look at their ERD: http://www.magentocommerce.com/wiki/_media/doc/magento---sample_database_diagram.png?cache=cache
I'm not joking, this is over-abstraction.. trying to please everyone and cover every base is a terrible idea and makes life extremely difficult for everyone.
Personally I would say that "What is the ideal level of abstraction?" is a subjective question.
I don't like code that uses a new line for every atomic operation, but I also don't like 10 nested operations within one line.
I like the use of recursive functions, but I don't appreciate recursion for the sole sake of recursion.
I like generics, but I don't like (nested) generic functions that e.g. use different code for each specific type that's expected...
It is a matter of personal opinion as well as common sense. Does this answer your question?
I completely agree with what #ArnisLapsa wrote:
"Simplicity over complexity, complexity over complicatedness"
And that
an abstraction is used to "de-level" those, from complicated to complex
(and from complex to simpler)
Also, as stated by #MartinHemmings a good abstraction is quite subjective because we don't all think the same way. And actually our way of thinking change with time. So Something that someone find simple might looks complex to others, and even become simpler with more experiences. Eg. A monadic operation is something trivial for functional programmer, but can be seriously confusing for others. Similarly, a design with mutable object communicating with each other can be natural for some and feel un-trackable for others.
That being said, I would like to add a couple of indicators. Note that this applies to abstractions used in code-base, not "paradigm abstraction" such as everything-is-a-function, or everything-is-designed-as-objects. So:
To the people it concerns, the abstraction should be conceptually simpler than other alternatives, without looking at the implementation. If you find that thinking of all possible cases is simpler that reasoning using the abstraction, then this abstraction is not suitable (for you)
Its implementation should reason only about the abstraction, not the specific cases that it will be used for. As soon as the abstraction implementation has parts made for specific cases, it indicates an "unfit" abstraction. And increasing generalization to cope with each new case, is going the wrong way (and tends to fall to the next issue).
A very common indicator of over-abstraction I have found (and actually fell for) are abstractions that represent more than what is needed, now. As much as possible, they should allow to do exactly what is required, but nothing more. For example, say you're thinking of, or already have, a "2d point" abstraction for which you can define many operators you need. Then you have another need that could really be a "4d point" similar to the 2d. Don't start to use a "Ndimensionnal point" abstraction, especially thinking that you might later need it. Maybe you'll never have anything else than 2 and 4d (because it stays as "a good idea" in the backlog forever) but instead some requirements pops to convert 4d points into pairs of 2d points. That's going to be hard to generalize to n-dimensions. So, each abstraction can be checked to cover and only cover the actual needs. In my point example, the complexity "n-dimensional" is actually only used to cope with the 2 and 4d cases (and the 4d might not even be used that much).
Finally, in a more global point of view, a code-base that has many not related abstractions, is an indicator that the dev team tends to abstract every little issues. So probably many of them are or became over-abstracted.

Fiddling with point-free code?

I have been learning the Factor and J languages to experiment with point-free programming. The basic mechanics of the languages seem clear, but getting a feeling for how to approach algorithm design is a challenge.
A particular source of confusion for me is how one should structure code so that it is easy to experiment with different parameters. By this, I mean the sort of thing Mathematica and Matlab are so good at; you set up an algorithm then manipulate the variables and watch what happens.
How do you do this without explicit variables? Maybe I'm thinking about this all wrong. How should I approach this in point-free programming?
Here are three important advices that I found really helpful while dealing with the concatenative paradigm (applied to the Factor programming language in my case):
Factor your code mercilessly. Write extremely small functions: if there is more than 3-4 stack parameters, maybe you could break it into smaller parts.
Invest your time in learning dataflow combinators (bi, tri, cleave, spread,...). They allow to express common dataflow patterns while removing the need of complex stack shuffling.
Learn to build quotations from other quotations. Use currying techniques (curry, with, ...) to build simple quotations from stack parameters, and when things get too complex use Fried quotations (the "fry" vocab). They allow to easily build complex nested quotations from patterns, without any stack shuffling.
And as a always, read and "Walk" into existing code. In Factor it is quite easy to explore the runtime and see how things are working.
For your specific source of confusion, if you have a lot of input parameters in your algorithm, the most important thing to do is to study how they will be used. Harvest for dataflow patterns. You must really THINK about the best way to "schedule" operations on the smallest set of related parameters.
It is a quite difficult experience, but it is also really rewarding one when it succeed. We feel like a human compiler after that..
Good luck!
I have had a little experience in the concatenative programming language Joy and in a Backus FP-like language. Regarding the algorithm design, I can say that it is a very structured algorithm design.
How do you do this without explicit variables?
In fact, there are no global variables in languages like Backus FP. However, there is nothing to prevent the use of somewhat restricted local variables such as the instance variables.

How To: Pattern Recognition

I'm interested in learning more about pattern recognition. I know that's somewhat of a broad field, so I'll list some specific types of problems I would like to learn to deal with:
Finding patterns in a seemingly random set of bytes.
Recognizing known shapes (such as circles and squares) in images.
Noticing movement patterns given a stream of positions (Vector3)
This is a new area of experimentation for me personally, and to be honest, I simply don't know where to start :-) I'm obviously not looking for the answers to be provided to me on a silver platter, but some search terms and/or online resources where I can start to acquaint myself with the concepts of the above problem domains would be awesome.
Thanks!
ps: For extra credit, if said resources provide code examples/discussion in C# would be grand :-) but doesn't need to be
Hidden Markov Models are a great place to look, as well as Artificial Neural Networks.
Edit: You could take a look at NeuronDotNet, it's open source and you could poke around the code.
Edit 2: You can also take a look at ITK, it's also open source and implements a lot of these types of algorithms.
Edit 3: Here's a pretty good intro to neural nets. It covers a lot of the basics and includes source code (albeit in C++). He implemented an unsupervised learning algorithm, I think you may be looking for a supervised backpropagation algorithm to train your network.
Edit 4: Another good intro, avoids really heavy math, but provides references to a lot of that detail at the bottom, if you want to dig into it. Includes pseudo-code, good diagrams, and a lengthy description of backpropagation.
This is kind of like saying "I'd like to learn more about electronics.. anyone tell me where to start?" Pattern Recognition is a whole field - there are hundreds, if not thousands of books out there, and any university has at least several (probably 10 or more) courses at the grad level on this. There are numerous journals dedicated to this as well, that have been publishing for decades ... conferences ..
You might start with the wikipedia.
http://en.wikipedia.org/wiki/Pattern_recognition
This is kind of an old question, but it's relevant so I figured I'd post it here :-) Stanford began offering an online Machine Learning class here - http://www.ml-class.org
OpenCV has some functions for pattern recognition in images.
You might want to look at this :http://opencv.willowgarage.com/documentation/pattern_recognition.html. (broken link: closest thing in the new doc is http://opencv.willowgarage.com/documentation/cpp/ml__machine_learning.html, although it is no longer what I'd call helpful documentation for a beginner - see other answers)
However, I also recommend starting with Matlab because openCV is not intuitive to use.
Lot of useful links on this page on computer vision related pattern recognition. Some of the links seem to be broken now but you may find it useful.
I am not an expert on this, but reading about Hidden Markov Models is a good way to start.
Beware false patterns! For any decently large data set you will find subsets that appear to have pattern, even if it is a data set of coin flips. No good process for pattern recognition should be without statistical techniques to assess confidence that the detected patterns are real. When possible, run your algorithms on random data to see what patterns they detect. These experiments will give you a baseline for the strength of a pattern that can be found in random (a.k.a "null") data. This kind of technique can help you assess the "false discovery rate" for your findings.
learning pattern-recoginition is easier in matlab..
there are several examples and there are functions to use.
it is good for the understanding concepts and experiments...
I would recommend starting with some MATLAB toolbox. MATLAB is an especially convenient place to start playing around with stuff like this due to its interactive console. A nice toolbox I personally used and really liked is PRTools (http://prtools.org); they have an implementation of pretty much every pattern recognition tool and also some other machine learning tools (Neural Networks, etc.). But the nice thing about MATLAB is that there are many other toolboxes as well you can try out (there is even a proprietary toolbox from Mathworks)
Whenever you feel comfortable enough with the different tools (and found out which classifier is perfomring best for you problem), you can start thinking about implementing the machine learning in a different application.

When is it best to change code to match standards?

I have recently been put in charge of debugging two different programs which will eventually need to share an XML parsing script, at the minimum. One was written with PureMVC, and another was built from scratch. While it made sence, originally, to write the one from scratch (it saved a good deal of memory, but the memory problems have since been resolved).
Porting the non-PureMVC application will take a good deal of time and effort which does not need to be used, but it will make documentation and code-sharing easier. It will also lower the overall learning curve. With that in mind:
1. What should be taken into account when considering whether it is best to move things to one standard?
(On a related note)
Some of the code is a little odd. Because the interpreting App had to convert commands from one syntax to another, it made sense to have an interpreter Object. Because there needed to be communication with the external environment, it made more sense to have one object interact with the environment, and for that to deal with the interpreter exclusively.
Effectively, an anti-Singleton was created. The object would only interface with the interpreter, and that's it. If a member of another class were to try to call one of its public methods, the object would raise an Exception.
There are better ways to accomplish this, but it is definitely a bit odd. There are more standard means of accomplishing the same thing, though they often involve the creation of classes or class files which are extraordinarily large. The only solution which I could find that was standards compliant would involve as much commenting and explanation as is currently required, if not more. Considering this:
2. If some code is quirky, but effective, is it better to change it to make it less quirky, even if it is made a more unwieldy?
In my opinion this type of refactoring is often not considered in schedules and can only be done when there is extra time.
More often than not, the criterion for shipping code is if it works, not necessarily if it's the best possible code solution.
So in answer to your question, I try and refactor when I have time to do so. Priority One still remains to produce a functional piece of code.
Things to take into account:
Does it work as-is?
As Galwegian notes, this is the only criterion in many shops. However, IMO just as important is:
How skilled are the programmers who are going to maintain it? Have they ever encountered nonstandard code? Compare the cost of their time to learn it (including the cost of delayed dot releases) to the cost of your time to refactor it.
If you're maintaining it, then instead consider:
How much time will dealing with the nonstandard code cost you over the intended lifecycle of the codebase (e.g., the time between now and when the whole thing is rewritten)?
That's hard to guess, but consider that many codebases FAR outlive the usefulness envisioned by their original authors. (Y2K anyone?) I've gradually developed a sense of when a refactoring is worthwhile and when it's not, mostly by erring on the side of "not" too often and regretting it later.
Only change it if you need to be making changes anyway. But less quirky is always a good goal. Most of the time spent on a particular piece of software is in maintenance, so if you can do something to make that easier, you'll be reducing the overall time spent on that piece of code. Nonetheless, don't change something if it's working and doesn't need any modifications.
If you have time, now. If you don't have time and it can be avoided, later.

Resources