What is the difference between a Metric and a Norm?

What is the difference between a Metric and a Norm? - linear-algebra

From my understanding, a metric defines a more abstract entity than a norm, but I don't feel like I truly understand. Can someone please explain it to me in layman's terms?

A norm is a concept that only makes sense when you have a vector space. It defines the notion of the magnitude of vectors and can be used to measure the distance between two vectors as the magnitude of its difference. Norms are linear in that they preserve (positive) scaling. This means that if you scale (zoom) down or up a configuration of vectors (an operation that only makes sense in a vector space), the distances between the vectors will be scaled in the same proportion.
A metric is a more general notion that can be predicated on spaces where there is no underlying algebraic structure. They incarnate the concept of distance with independence from any algebraic features (which might not even exist in these spaces). If you have a norm, you have a distance, but you can have a distance without having any sum operation or scalar action.
There is a third level of abstraction where the concept of proximity can be expressed without any distance. These are called topological spaces and their embodiment does not rely on the concept of distance (or norms) but on the concept of neighborhood.

Related

What is vector quantization using k-means?

First of all, can someone explain what vector quantization is, its purpose, and what it does? Secondly, an explanation of how k-means is used to do this would be appreciated as well.
For the record, I don't know if this will make a difference in the explanation, but I'm trying to learn about vector quantization in the context of boundary descriptors. If I calculated a number of boundary descriptors for a particular segment in an image, and I wanted to vector quantize them using k-means, what would this mean, what would this do, why would I want to do, and how would I do it?

Vector quantization is the process of discretizing a random variable valued in some vector space. The result is the projection of that random variable onto a finite set of knots. It is used for signal transmission, quadrature, variance reduction and a lot of other applications.
Optimal quantization consists in choosing the knots in such a way to minimize the mean L^p discretization error.
The K-means is also called Lloyd algorithm consists in starting from an arbitrary set of knots, (or codebook), and iteratively replace each one of them by the L^p-median (or simply by the mean for quadratic quantization) of the probability distribution given that it falls in the Voronoi cell of that knot. An interactive animation is available here.
The historical reference on the Lloyd algorithm is the following
Stuart P. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, vol. 28, issue 2, pp. 129–137, 1982
K-means algorithms always decreases the quantization error but does not always converge to the globally optimal quantizer. Although, in the case of one-dimensional log-concave distributions, the algorithm converges to a unique global minimum.
The optimal quantization web site contains an extensive bibliography on the matter of vector quantization and functional quantization.

How do I generate data from a similarity matrix?

Suppose there are 14 objects, each of which have or do not have 1000 binary features. I have a 14x14 similarity matrix, but not the raw 14x1000 data. Is there a way to reconstruct or generate something similar to the raw data, given the similarity matrix?
I tried Monte Carlo simulations, but unconstrained they would take way too much time to achieve even a low level of consistency with the original similarity matrix.
I saw this relevant question: Similarity matrix -> feature vectors algorithm?. However, they wanted to reduce not increase dimensionality. Also, I am not sure (1) which matrix or matrices to use, and (2) how to convert into a binary matrix.

It's impossible to say for sure unless you describe how the similarity scores were computed.
In general, for the usual kind of similarity scoring this is not possible: information has been lost in the transformation from individual features to aggregate statistics. The best you can hope to do is to arrive at a set of features that are consistent with the similarity scores.
I think that is what you are talking about when you say "similar to" the original. That problem is pretty interesting. Suppose similarity was computed as the dot-product of two feature vectors (ie the count of features for a pair of objects that both have value = 1/true). This is not the only choice: it is consistent with value of 0 (false) meaning no information. But it may generalize to other similarity measures.
In such a case, the problem is really a linear programming problem: a naive approach is to exhaustively search the space of possible objects - not randomly, but guided by the constraints. For example, suppose SIM(A,B) := similarity of object A and object B. Define an order on these vectors.
If SIM(A,B) = N, then choose A=B minimal (like (1,....,1 (N times), 0, .... 0 (1000-N times)), and then choose the minimum C s.t. (A,C), (B,C) have the given values. Once you find an inconsistency, backtrack, and increment.
This will find a consistent answer, although the complexity is very high (but probably better than monte carlo).
Finding a better algorithm is an interesting problem, but more than this I can't say in a SO post - that's probably a topic for a CS thesis!

What is the meaning of "Inf" in S_Dbw output in R commander?

I have ran clv package which consists of S_Dbw and SD validity indexes for clustering purposes in R commander. (http://cran.r-project.org/web/packages/clv/index.html)
I evaluated my clustering results from DBSCAN, K-Means, Kohonen algorithms with S_Dbw index. but for all these three algorithms S_Dbw is "Inf".
Is it "Infinite" meaning? Why did i confront with "Inf". Is there any problem in my clustering results?
In general, when is S_Dbw index result "Inf"?

Be careful when comparing different algorithms with such an index.
The reason is that the index is pretty much an algorithm in itself. One particular clustering will necessarily be the "best" for each index. The main difference between an index and an actual clustering algorithm is that the index doesn't tell you how to find the "best" solution.
Some examples: k-means minimizes the distances from cluster members to cluster centers. Single-link hierarchical clustering will find the partition with the optimal minimum distance between partitions. Well, DBSCAN will find the partitioning of the dataset, where all density-connected points are in the same partition. As such, DBSCAN is optimal - if you use the appropriate measure.
Seriously. Do not assume that because one algorithm scores higher than another in a particular measure means that the algorithm works better. All that you find out this way is that a particular algorithm is more (cor-)related to a particular measure. Think of it as a kind of correlation between the measure and the algorithm, on a conceptual level.
Using a measure for comparing different results of the same algorithm is different. Then obviously there shouldn't be a benefit from one algorithm over itself. There might still be a similar effect with respect to parameters. For example the in-cluster distances in k-means obviously should go down when you increase k.
In fact, many of the measures are not even well-defined on DBSCAN results. Because DBSCAN has the concept of noise points, which the indexes do not AFAIK.
Do not assume that the measure will either give you an indication of what is "true" or "correct". And even less, what is useful or new. Because you should be using cluster analysis not to find a mathematical optimum of a particular measure, but to learn something new and useful about your data. Which probably is not some measure number.
Back to the indices. They usually are totally designed around k-means. From a short look at S_Dbw I have the impression that the moment one "cluster" consists of a single object (e.g. a noise object in DBSCAN), the value will become infinity - aka: undefined. It seems as if the authors of that index did not consider this corner case, but only used it on toy data sets where such situations did not arise. The R implementation can't fix this, without diverting from the original index and instead turning it into yet another index. Handling noise objects and singletons is far from trivial. I have not yet seen an index that doesn't fail in one way or another - typically, a solution such as "all objects are noise" will either score perfect, or every clustering can trivially be improved by putting each noise object to the nearest non-singleton cluster. If you want your algorithm to be able to say "this object doesn't belong to any cluster" then I do not know any appropriate index.

The IEEE floating point standard defines Inf and -Inf as positive and negative infinity respectively. It means your result was too large to represent in the given number of bits.

Floating point error in successive coordinate rotations

I have code (Python) that must perform some operations regarding distances between reflected segments of a curve.
In order to make the thinking and the code clearer, I apply two rotations (using matrix multiplication) before performing the actual calculation. I suppose it would be possible to perform the calculation without any rotation at all, but the code and the thinking would be much more awkward.
What I have to ask is: are the three rotations too much of a penalty in terms of lost precision because of rounding-off floating point errors? Is there a way to estimate the magnitude of this error?
Thanks for reading

As a rule of thumb in numerical calculations -- only take the first 12 digits seriously :)
Now, assuming 3D rotations, and that outcomes of trig functions are infinitely precise, a matrix multiplication will involve 3 multiplications and 2 additions per element in the rotated vector. Since you do two rotations, this amounts to 6 multiplications and 4 additions per element.
If you read this (which you should read front to back one day), or this, or this, you'll find that individual arithmetic operations of IEEE 754 are guaranteed to be accurate to within half a ULP (=last decimal place).
Applied to your problem, that means that the 10 operations per element in the result vector will be accurate to within 5 ULPs.
In other words -- suppose you're rotating a unit vector. The elements of the rotated vector will be accurate to 0.000000000000005 -- I'd say that's nothing to worry about.
Including the errors in the trig functions, well, that's a bit more complicated...that really depends on your programming language and/or version of your compiler etc. But I guarantee it'll be comparable to the 5 ULPs.
If you do think this accuracy is not going to be enough, then I'd suggest you perform the two rotations in one go. Work out the matrix multiplication analytically, and implement the rotation as a single matrix multiplication. Alternatively: have a look at quaternions (although I suspect that's a bit overkill for your situation).

What you need to do is compute the condition number of your operations and determine whether it may incur loss of significance. That should allow you to estimate the error that could be introduced.

Is there a hyperreal datatype implementation for doing computations in non-standard analysis?

Non-standard mathematical analysis extends the real number line to include "hyperreals" -- infinitesimals and infinite numbers. Is there (specification for an) implementation of a data type to implement computations using hyperreals? I'm looking for something analogous to the complex number data type you find in Python and Fortran and elsewhere. I actually don't know if such computations are useful: I'm just curious. I've played around with this concept a bit, but since I probably made errors I will spare you-all the details. Reference wikipedia page on hyperreals.

Edit: These are not the hyperreal numbers, but the construction could still be useful for computing derivatives or limits.
Consider quotients of polynomials with real coefficients over the variable w where w denotes the "smallest" infinity (i.e. not a product of smaller infinite numbers).
The polynomials are ordered lexicographically, i.e. the highest power where the polynomials differ determines the ordering. This can be extended in the standard way to quotients of polynomials (like the order on rational numbers, which are quotients of integers).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex