Determining the minimum Hamming Distance - math

How can I find the minimum Hamming Distance for the above?
I understand the string comparison idea and putting it into a table based on C0, C1, C2, etc but I'm not sure how to group the code above. Any suggestions? Thank you in advance.

Generally, to find the minimum Hamming distance you have to compute the Hamming distance of each pair of code words and then take the minimum of these. For special cases, e.g. linear codes there are theorems for quicker determination of the minimum Hamming distance (https://en.wikipedia.org/wiki/Linear_code).
In your example, the eye spots several adjacent code word pairs differing only in one bit, so as Egor wrote, the minimum Hamming distance is 1.

Related

How to calculate hamming distance for more than 2 inputs

here the question
1001110
1110101
1010011
0011011
How can calculate hamming distance for this example. Can you explain pls for more than 2 inputs.
Do you mean the minimum Hamming distance? The minimum Hamming distance of a code is the smallest Hamming distance between a pair of codewords. In your example code, which contains four codewords, the two closest codewords are the last two codewords, and they differ in exactly two positions. Hence, the minimum Hamming distance of this code is 2.

Hamming distance

I'm reading the book of Computer networks by Andrew S. Tanenbaum. I'm confused about the hamming distance in this paragraph. Why does it say the Hamming distance is 3 in this construction? Thanks.
You are confusing between the figure and the written part.They are 2 different examples.In the diagram example of (11,7) Hamming code is given,while the example above it is discussing about (11,29).
The Hamming distance between two strings of equal length is the number
of positions at which the corresponding symbols are different.
Hamming distance of (11,29) will be 3 as 11=01011 and 29=11101
Reach out if you want something else.
Cheers

Why find the Hamming Distance in Dynamical Networks?

In dynamical networks, one may calculate the Hamming distance to compare the similarity between two graphs, can anyone explain how?
Assuming that the Hamming distance of two graphs have equal edge density, what is the difference between Hamming distance and expected Hamming distance between two independent Erdos-Renyi random graphs? How does the later arise?
The Hamming distance measures the minimum number of substitutions required to change (transform) one mathematical 'object' (i.e. strings or binary) into another.
So in network theory it can be defined as a the number of different connections between two networks (it can be formulated also for not equally-sized networks and for weighted or directed graphs). In a simple case in which you have two Erdos-Renyi networks (the adjacency matrix has 1 if the node pair is connected and 0 if not) the distance is mathematically defined as follows:
The values that are subtracted are the two adjacency matrix. If you take two Erdos-Renyi networks with wiring probability of 0.5 and compute the hamming distance between them you should get a value around 0.5. I generated different Erdos-Renyi graph and their Hamming distances produced a Gaussian curve around 0.5 (as we can expect; see below).
If it is needed I can give you the code I used.

Mathematical representation of a set of points in N dimensional space?

Given some x data points in an N dimensional space, I am trying to find a fixed length representation that could describe any subset s of those x points? For example the mean of the s subset could describe that subset, but it is not unique for that subset only, that is to say, other points in the space could yield the same mean therefore mean is not a unique identifier. Could anyone tell me of a unique measure that could describe the points without being number of points dependent?
In short - it is impossible (as you would achieve infinite noiseless compression). You have to either have varied length representation (or fixed length with length being proportional to maximum number of points) or dealing with "collisions" (as your mapping will not be injective). In the first scenario you simply can store coordinates of each point. In the second one you approximate your point clouds with more and more complex descriptors to balance collisions and memory usage, some posibilities are:
storing mean and covariance (so basically perofming maximum likelihood estimation over Gaussian families)
performing some fixed-complexity density estimation like Gaussian Mixture Model or training a generative Neural Network
use set of simple geometrical/algebraical properties such as:
number of points
mean, max, min, median distance between each pair of points
etc.
Any subset can be identified by a bit mask of length ceiling(lg(x)), where bit i is 1 if the corresponding element belongs to the subset. There is no fixed-length representation that is not a function of x.
EDIT
I was wrong. PCA is a good way to perform dimensionality reduction for this problem, but it won't work for some sets.
However, you can almost do it. Where "almost" is formally defined by the Johnson-Lindenstrauss Lemma, which states that for a given large dimension N, there exists a much lower dimension n, and a linear transformation that maps each point from N to n, while keeping the Euclidean distance between every pair of points of the set within some error ε from the original. Such linear transformation is called the JL Transform.
In other words, your problem is only solvable for sets of points where each pair of points are separated by at least ε. For this case, the JL Transform gives you one possible solution. Moreover, there exists a relationship between N, n and ε (see the lemma), such that, for example, if N=100, the JL Transform can map each point to a point in 5D (n=5), an uniquely identify each subset, if and only if, the minimum distance between any pair of points in the original set is at least ~2.8 (i.e. the points are sufficiently different).
Note that n depends only on N and the minimum distance between any pair of points in the original set. It does not depend on the number of points x, so it is a solution to your problem, albeit some constraints.

An "asymmetric" pairwise distance matrix

Suppose there are three sequences to be compared: a, b, and c. Traditionally, the resulting 3-by-3 pairwise distance matrix is symmetric, indicating that the distance from a to b is equal to the distance from b to a.
I am wondering if TraMineR provides some way to produce an asymmetric pairwise distance matrix.
No, TraMineR does not produce 'assymetric' dissimilaries precisely for the reasons stressed in Pat's comment.
The main interest of computing pairwise dissimilarities between sequences is that once we have such dissimilarities we can for instance
measure the discrepancy among sequences, determine neighborhoods, find medoids, ...
run cluster algorithms, self-organizing maps, MDS, ...
make ANOVA-like analysis of the sequences
grow regression trees for the sequences
Inputting a non symmetric dissimilarity matrix in those processes would most probably generate irrelevant outcomes.
It is because of this symmetry requirement that the substitution costs used for computing Optimal Matching distances MUST be symmetrical. It is important to not interpret substitution costs as the cost of switching from one state to the other, but to understand them for what they are, i.e., edit costs. When comparing two sequences, for example
aabcc and aadcc, we can make them equal either by replacing arbitrarily b with d in the first one or d with b in the second one. It would then not make sense not giving the same cost for the two substitutions.
Hope this helps.

Resources