Anyone know of the logic to compute the Version of a QR code needed to encode data? - qr-code

The spec has 4 of these tables:
http://www.denso-wave.com/qrcode/vertable1-e.html
to handle versions 1-40
I'm wondering if anyone has coded something to formulate calculating the version needed for a string of data. None of the libraries I've seen for encoding the data offer this.

http://code.google.com/p/jsqrencode/downloads/list
Inside is genframe that finds the smallest version that a string will fit in.
It doesn't really use a formula, simply tests linearly (maybe a bsearch would be faster). There is no algorithm or equation nor is one (compact) really possible since the tables use fixed values and aren't algorithmically generated.

Related

R numerical method similar to Vpasolve in Matlab

I am trying to solve a numerical equation in R but would want a method which perform similar to vpasolve in Matlab. I have a non linear equation (involving lot of log functions) which when solve in R with uniroot gives me complete different answer compared to what vpasolve gives in matlab.
First, a word of caution: it's often much more productive to learn that there's a better way to do something than the way you are used to doing.
edit
I went back to MATLAB and realized that the "vpa" collection is using extended precision. Is that absolutely necessary for your purposes? If not, then my suggestions below may suffice.
If you do require extended precision, then perhaps Rmpfr::unirootR function will suffice. I would like to point out that, since all these solvers are generating an approximate solution (as opposed to analytic), the use of extended precision operations seems a bit pointless.
Next, you need to determine whether MATLAB::vpasolve or uniroot is getting you the correct answer. Or maybe you simply are converging to a root that's not the one you want, in which case you need to read up on setting limits on the starting conditions or the search region.
Finally, in addition to uniroot, I recommend you learn to use the R packages BBsolve , nleqslv, rootsolve, and ktsolve (disclaimer: I am the owner and maintainer of ktsolve). These packages are pretty flexible and may lead you to better solutions to your original problem.

Effect of setting seeds on an algorithm

I am writing an R code where, I am using set.seed() function in the whole program to generate the data and then using it in a function , ultimately plotting the function and then using optim to get the minima. But now the issue is the graphs of the function changes if I change the seed value and sometimes doesn't even produce a concave graph but an exponential graph.
I am not able to understand why this is happening and how I can fix it. If anyone can provide me with any reference to read in this subject or any suggestions as to what can be done, that will be great.
Thanks in advance
set.seed() configures the random number generator to start from that seed. This may be a bit more complicated, depending on the precise implementation, but the effects are always the same: The sequence of numbers will be identical.
This is useful in a number of applications where you want some randomness, but you want to get the same result if you re-run the code. Say for example you need to randomly sample your data, but since you are debugging, it's useful if you get the same sample so that the bugs don't disappear on you.
Also if you want other people to replicate the results, you simply pick some random number as the seed and tell them that you used that seed. Anything in the algorithm based on random numbers will behave the same because you are both using the same sequence of numbers.
For your graph problem you need to share some code so that people understand what you are doing. It's very hard to guess what went wrong. At the outset it seems that you algorithm is very strongly influenced by the random numbers (usually not a good sign).
In simple, if you set a seed, and extract a random number, the random number will be always the same. If you not set a seed, every time you choose a number the number will be different. The seed permit you to replicate your experiment.

Feature selection on subsets of feature set

I am trying to do the feature selection using Boruta package in R. The problem is that my feature set is way tooo large (70518 features) and therefore the dataframe is too large (2Gb) and cannot be processed by the Boruta package at once. I am wondering if I can split the data frame into several sets, each containing a smaller amount of features? This sounds a bit weird to me, as I am not sure if the algorithm can correctly identify the weights if not all features are present.
If not, I would be very grateful if someone can suggest an alternative way of doing it.
I think your best best in this case might be to first try and filter out some of the features that are either low information (e.g. ~zero variance) or highly correlated.
The caret package has some useful functions to help with this.
For example, the findCorrelation() can be used to easily remove redundant features:
dat <- cor(dat, method='spearman')
dat[is.na(dat)] <- 0
features_to_ignore <- findCorrelation(dat, cutoff=0.75, verbose=FALSE)
dat <- dat[,-features_to_ignore]
This will remove all features with a Spearman correlation of 0.75 or higher.
I'm going to start by asking why you believe that this can even work? In this case, not only is p >> n, but p >>>>>> n. You're always going to find spurious associations. More than that, even if you could do this (say by renting a sufficiently large machine in a cloud computing service, which is the method I'd suggest), you're looking at an absurd amount of computation, since the computational complexity of building a single decision tree is O(n * v log(v)), where n is the number of records and v is the number for fields in each record. Building an RF takes that much for each tree.
Instead of solving the problem as stated, you might want to rethink it from the ground up. What are you really trying to do here? Can you go back to first principles and rethink that?

Handling extremely small numbers

I need to find a way to handle extremely small numbers in R, particularly in order to take the log of extremely small numbers. According to the R-manual, “on a typical R platform the smallest positive double is about 5e-324.” Well, I need to deal with numbers even smaller (at least as small as 10^-350). If R is incapable of doing this, I was wondering if there is a way I can use a program that can do this (such as Matlab or Mathematica) from R.
Specifically, I am computing a matrix of probabilities, and some of these probabilities are so small that R does not distinguish them from 0. The reason I know this is because each probability is the product of two other probabilities; so I’ll have p(x)=10^-300, p(y)=10^-50, and then p(x)*p(y)=0. I’d like to be able to do these computations, take the log of the resultant very small number (-805.905 for my example, according to Mathematica), and then continue working with the log values in R.
So to be more detailed, I have a matrix of values for p(x), a matrix of values for p(y), both computed using dnorm, and I’m computing the product. In many cases, R is capable of evaluating p(x) and p(y), but the p(x)*p(y) is too small. In a few cases, though, even the p(x) or p(y) value itself is too small, and is itself just equated to 0 in R.
I’ve seen that there is stuff out there for calling R from Mathematica, but not much pertaining to calling Mathematica from R. I’d honestly prefer to do the latter than the former here. So if any one either knows how to do this (either employing Mathematica or Matlab or something else in R) or has another solution to this issue, I’d greatly appreciate it.
Note that I realize there are a few other threads on this topic, discussing such things as using the Brobdingnag package to deal with small numbers, but these do not appear applicable here.

How can I do blind fitting on a list of x, y value pairs if I don't know the form of f(x) = y?

If I have a function f(x) = y that I don't know the form of, and if I have a long list of x and y value pairs (potentially thousands of them), is there a program/package/library that will generate potential forms of f(x)?
Obviously there's a lot of ambiguity to the possible forms of any f(x), so something that produces many non-trivial unique answers (in reduced terms) would be ideal, but something that could produce at least one answer would also be good.
If x and y are derived from observational data (i.e. experimental results), are there programs that can create approximate forms of f(x)? On the other hand, if you know beforehand that there is a completely deterministic relationship between x and y (as in the input and output of a pseudo random number generator) are there programs than can create exact forms of f(x)?
Soooo, I found the answer to my own question. Cornell has released a piece of software for doing exactly this kind of blind fitting called Eureqa. It has to be one of the most polished pieces of software that I've ever seen come out of an academic lab. It's seriously pretty nifty. Check it out:
It's even got turnkey integration with Amazon's ec2 clusters, so you can offload some of the heavy computational lifting from your local computer onto the cloud at the push of a button for a very reasonable fee.
I think that I'm going to have to learn more about GUI programming so that I can steal its interface.
(This is more of a numerical methods question.) If there is some kind of observable pattern (you can kinda see the function), then yes, there are several ways you can approximate the original function, but they'll be just that, approximations.
What you want to do is called interpolation. Two very simple (and not very good) methods are Newton's method and Laplace's method of interpolation. They both work on the same principle but they are implemented differently (Laplace's is iterative, Newton's is recursive, for one).
If there's not much going on between any two of your data points (ie, the actual function doesn't have any "bumps" whose "peaks" are not represented by one of your data points), then the spline method of interpolation is one of the best choices you can make. It's a bit harder to implement, but it produces nice results.
Edit: Sometimes, depending on your specific problem, these methods above might be overkill. Sometimes, you'll find that linear interpolation (where you just connect points with straight lines) is a perfectly good solution to your problem.
It depends.
If you're using data acquired from the real-world, then statistical regression techniques can provide you with some tools to evaluate the best fit; if you have several hypothesis for the form of the function, you can use statistical regression to discover the "best" fit, though you may need to be careful about over-fitting a curve -- sometimes the best fit (highest correlation) for a specific dataset completely fails to work for future observations.
If, on the other hand, the data was generated something synthetically (say, you know they were generated by a polynomial), then you can use polynomial curve fitting methods that will give you the exact answer you need.
Yes, there are such things.
If you plot the values and see that there's some functional relationship that makes sense, you can use least squares fitting to calculate the parameter values that minimize the error.
If you don't know what the function should look like, you can use simple spline or interpolation schemes.
You can also use software to guess what the function should be. Maybe something like Maxima can help.
Wolfram Alpha can help you guess:
http://blog.wolframalpha.com/2011/05/17/plotting-functions-and-graphs-in-wolframalpha/
Polynomial Interpolation is the way to go if you have a totally random set
http://en.wikipedia.org/wiki/Polynomial_interpolation
If your set is nearly linear, then regression will give you a good approximation.
Creating exact form from the X's and Y's is mostly impossible.
Notice that what you are trying to achieve is at the heart of many Machine Learning algorithm and therefor you might find what you are looking for on some specialized libraries.
A list of x/y values N items long can always be generated by an degree-N polynomial (assuming no x values are the same). See this article for more details:
http://en.wikipedia.org/wiki/Polynomial_interpolation
Some lists may also match other function types, such as exponential, sinusoidal, and many others. It is impossible to find the 'simplest' matching function, but the best you can do is go through a list of common ones like exponential, sinusoidal, etc. and if none of them match, interpolate the polynomial.
I'm not aware of any software that can do this for you, though.

Resources