find the mean for points of binary features

find the mean for points of binary features - math

I have groups of binary string each bit represent a feature in a variable e.g I have a color variable where red blue and green are the features thus if I have 010 --> I have a blue object.
I need to get the center of these objects by calculating a weighted mean example 010 weight's 0.5; 100 weights 0.4 and 001 weights 0.8 [010 *0.5 + 100*0.4 + 001*0.8]/[1.7]
is there a possibility to get a point which represents the center of those points which should had same properties of others points (binary on 3 bits)
thank u in advance for your help

I guess you can use the following approach from cluster analysis: you need to choose metric for your object space (Euclidean, Taxicab or something else) and then for all objects from group (or if cardinality of the set is small - for all possible objects) calculate average distance to all objects from group. Then, you can assume object with a smallest average distance is center of a group.

Related

Given a set of points with x, y and z coordinates whose bounds are 0 to 1 (inclusive), determine if they're all uniformly distributed (or close to)

I'm trying to determine whether a set of points are uniformly distributed in a 1 x 1 x 1 cube. Each point comes with an x, y, and z coordinate that corresponds to their location in the cube.
A trivial way that I can think of is to flatten the set of points into 2 graphs and check how normally distributed both are however I do not know whether that's a correct way of doing so.
Anyone else has any idea?

I would compute point density map and then check for anomalies in it:
definitions
let assume we have N points to test. If the points are uniformly distributed then they should form "uniform grid" of mmm points:
m * m * m = N
m = N^(1/3)
To account for disturbances from uniform grid and asses statistics you need to divide your cube to grid of cubes where each cube will hold several points (so statistical properties could be computed) let assume k>=5 points per grid cube so:
cubes = m/k
create a 3D array of counters
simply we need integer counter per each grid cube so:
int map[cubes][cubes][cubes];
fill it with zeroes.
process all points p(x,y,z) and update map[][][]
Simply loop through all of your points, and compute grid cube position they belong to and update their counter by incrementing it.
map[x*(cubes-1)][y*(cubes-1)][z*(cubes-1)]++;
compute average count of the map[][][]
simple average like this will do:
avg=0;
for (xx=0;xx<cubes;xx++)
for (yy=0;yy<cubes;yy++)
for (zz=0;zz<cubes;zz++)
avg+=map[xx][yy][zz];
avg/=cubes*cubes*cubes;
now just compute abs distance to this average
d=0;
for (xx=0;xx<cubes;xx++)
for (yy=0;yy<cubes;yy++)
for (zz=0;zz<cubes;zz++)
d+=fabs(map[xx][yy][zz]-avg);
d/=cubes*cubes*cubes;
the d will hold a metric telling how far are the points from uniform density. Where 0 means uniform distribution. So just threshold it ... the d is also depending on the number of points and my intuition tells me d>=k means totally not uniform so if you want to make it more robust you can do something like this (the threshold might need tweaking):
d/=k;
if (d<0.25) uniform;
else nonuniform;
As you can see all this is O(N) time so it should be fast enough for you. If it isn't you can evaluate every 10th point by skipping points however that can be done only if the order of points is random. If not you would need to pick N/10 random points instead. The 10 might be any constant but you need to take in mind you still need enough points to process so the statistic results are representing your set so I would not go below 250 points (but that depends on what exactly you need)
Here few of my answers using density map technique:
Finding holes in 2d point sets?
Location of highest density on a sphere

Count outside edges of adjacent cells in a matrix in R

I'm working on some gridded temperature data, which I have categorised into a matrix where each cell can be one of two classes - let's say 0 or 1 for simplicity. For each class I want to calculate patch statistics, taking inspiration from FRAGSTATS, which is used in landscape ecology to characterise the shape and size of habitat patches.
For my purposes, a patch is a cluster of adjacent cells of the same class. Here's an example matrix, mat:
mat <-
matrix(c(0,1,0,
1,1,1,
1,0,1), nrow = 3, ncol = 3,
byrow = TRUE)
0 1 0
1 1 1
1 0 1
All the 1s in mat form a single patch (we'll ignore the 0s), and in order to calculate various different shape metrics I need to be able to calculate the perimeter (i.e. number of outside edges).
EDIT
Sorry I apparently can't post an image because I don't have enough reputation, but you can see in the black lines of G5W's answer below that the outside borders of 1's represent the outside edges I'm referring to.
Manually I can count that the patch of 1s has 14 outside edges and I know the area (i.e. number of cells) is 6. Based on a paper by He et al. and this other question I've figured out how to calculate the number of inside edges (5 in this example), but I'm really struggling to do the same for the outside edges! I think it's something to do with how the patch shape compares to the largest integer square that has a smaller area (in this case, a 2 x 2 square), but so far my research and pondering have been to no avail.
N.B. I am aware of the package SDMTools, which can calculate various FRAGSTATS metrics. Unfortunately the metrics returned are too processed e.g. instead of just Aggregation Index, I need to know the actual numbers used to calculate it (number of observed shared edges / maximum number of shared edges).
This is my first post on here so I hope it's detailed enough! Thanks in advance :)

If you know the area and the number of inside edges, it is simple to calculate the number of outside edges. Every patch has four edges so in some way, the total number of edges is 4 * area. But that is not quite right because every inside edge is shared between two patches. So the right number of total edges is
4*area - inside
The number of outside edges is the total edges minus the inside edges, so
outside = total - inside = (4*area- inside) - inside = 4*area - 2*inside.
You can see that the area is made up of 6 squares each of which has 4 sides. The inside edges (the red ones) are shared by two adjacent squares.

How is adaptative.density() (spatstat) managing duplicated points and default f value

I can not find this information in the reference literature [1]
1)how adaptative.density() (package spatstat) manage duplicated spatial points. I have duplicated points exactly in the same position because I am combining measurements from different years, and I am expecting that the density curve is higher in those areas but I am not sure about it.
2) is the default value of f in adaptative.density() f=0 or f=1?
My guess is that it is f=0, so it is doing an adaptive estimate by calculating the intensity estimate at every location equal to the average intensity (number of points divided by window area)
Thank you for your time and input!

The default value of f is 0.1 as you can see from the "Usage" section in the help file.
The function subsamples the point pattern with this selection probability and uses the resulting pattern to generate a Dirichlet tessellation (if there are duplicated points here they are ignored). The other fraction of points (1-f) is used to estimate the intensity by the number of points in each tile of the tessellation divided by the corresponding area (here duplicated points count equally to the total count in the tile).

How to calculate the average stretch of a Quad?

Not sure if I formulate the correct answer but basically I need to know how to calculate the average size of a stretch quad, or something like taking the 2 diagonals AC and BD how can I calculate the average of them
The blue square show its original size and the pink lines shows when its deform, I need to calculate some sort of average so I can change its color in relation to how is deform if expands change to a lighter color if contracts change to darker color, hope that makes sense

Not quite sure what the question here is.
If you're asking how to find the average diagonal length, just find the length of each diagonal, add them together and divide by two.
If you're asking how to determine the area of the resultant shape, here are several formulas for finding the area of an arbitrary quadrilateral. They are all equivalent to the formula
Area = 1/2 * |AC x BD|
Where x means the cross-product.

A quadrilateral will have a 2nd order symmetric strain tensor:
| Exx Exy |
| Exy Eyy |
Exx is the strain in the x-direction; Eyy is the strain in the y-direction; Exy is the shear strain that occurs when one of the angles that is originally 90 degrees changes its value.
If you have large strains, you'll need a Lagrangian point of a view and a Green-Lagrange large strain measure.
See Malvern's Continuum Mechanics for definitions of each term. I'd give you the formulas, but I don't have LaTeX available here.

3D Trilateration using given distances of unknown fixed points

I am new to this forum and not a native english speaker, so please be nice! :)
Here is the challenge I face at the moment:
I want to calculate the (approximate) relative coordinates of yet unknown points in a 3D euclidean space based on a set of given distances between 2 points.
In my first approach I want to ignore possible multiple solutions, just taking the first one by random.
e.g.:
given set of distances: (I think its creating a pyramid with a right-angled triangle as a base)
P1-P2-Distance
1-2-30
2-3-40
1-3-50
1-4-60
2-4-60
3-4-60
Step1:
Now, how do I calculate the relative coordinates for those points?
I figured that the first point goes to 0,0,0 so the second one is 30,0,0.
After that the third points can be calculated by finding the crossing of the 2 circles from points 1 and 2 with their distances to point 3 (50 and 40 respectively). How do I do that mathematically? (though I took these simple numbers for an easy representation of the situation in my mind). Besides I do not know how to get to the answer in a correct mathematical way the third point is at 30,40,0 (or 30,0,40 but i will ignore that).
But getting the fourth point is not as easy as that. I thought I have to use 3 spheres in calculate the crossing to get the point, but how do I do that?
Step2:
After I figured out how to calculate this "simple" example I want to use more unknown points... For each point there is minimum 1 given distance to another point to "link" it to the others. If the coords can not be calculated because of its degrees of freedom I want to ignore all possibilities except one I choose randomly, but with respect to the known distances.
Step3:
Now the final stage should be this: Each measured distance is a bit incorrect due to real life situation. So if there are more then 1 distances for a given pair of points the distances are averaged. But due to the imprecise distances there can be a difficulty when determining the exact (relative) location of a point. So I want to average the different possible locations to the "optimal" one.
Can you help me going through my challenge step by step?

You need to use trigonometry - specifically, the 'cosine rule'. This will give you the angles of the triangle, which lets you solve the 3rd and 4th points.
The rules states that
c^2 = a^2 + b^2 - 2abCosC
where a, b and c are the lengths of the sides, and C is the angle opposite side c.
In your case, we want the angle between 1-2 and 1-3 - the angle between the two lines crossing at (0,0,0). It's going to be 90 degrees because you have the 3-4-5 triangle, but let's prove:
50^2 = 30^2 + 40^2 - 2*30*40*CosC
CosC = 0
C = 90 degrees
This is the angle between the lines (0,0,0)-(30,0,0) and (0,0,0)- point 3; extend along that line the length of side 1-3 (which is 50) and you'll get your second point (0,50,0).
Finding your 4th point is slightly trickier. The most straightforward algorithm that I can think of is to firstly find the (x,y) component of the point, and from there the z component is straightforward using Pythagoras'.
Consider that there is a point on the (x,y,0) plane which sits directly 'below' your point 4 - call this point 5. You can now create 3 right-angled triangles 1-5-4, 2-5-4, and 3-5-4.
You know the lengths of 1-4, 2-4 and 3-4. Because these are right triangles, the ratio 1-4 : 2-4 : 3-4 is equal to 1-5 : 2-5 : 3-5. Find the point 5 using trigonometric methods - the 'sine rule' will give you the angles between 1-2 & 1-4, 2-1 and 2-4 etc.
The 'sine rule' states that (in a right triangle)
a / SinA = b / SinB = c / SinC
So for triangle 1-2-4, although you don't know lengths 1-4 and 2-4, you do know the ratio 1-4 : 2-4. Similarly you know the ratios 2-4 : 3-4 and 1-4 : 3-4 in the other triangles.
I'll leave you to solve point 4. Once you have this point, you can easily solve the z component of 4 using pythagoras' - you'll have the sides 1-4, 1-5 and the length 4-5 will be the z component.

I'll initially assume you know the distances between all pairs of points.
As you say, you can choose one point (A) as the origin, orient a second point (B) along the x-axis, and place a third point (C) along the xy-plane. You can solve for the coordinates of C as follows:
given: distances ab, ac, bc
assume
A = (0,0)
B = (ab,0)
C = (x,y) <- solve for x and y, where:
ac^2 = (A-C)^2 = (0-x)^2 + (0-y)^2 = x^2 + y^2
bc^2 = (B-C)^2 = (ab-x)^2 + (0-y)^2 = ab^2 - 2*ab*x + x^2 + y^2
-> bc^2 - ac^2 = ab^2 - 2*ab*x
-> x = (ab^2 + ac^2 - bc^2)/2*ab
-> y = +/- sqrt(ac^2 - x^2)
For this to work accurately, you will want to avoid cases where the points {A,B,C} are in a straight line, or close to it.
Solving for additional points in 3-space is similar -- you can expand the Pythagorean formula for the distance, cancel the quadratic elements, and solve the resulting linear system. However, this does not directly help you with your steps 2 and 3...
Unfortunately, I don't know a well-behaved exact solution for steps 2 and 3, either. Your overall problem will generally be both over-constrained (due to conflicting noisy distances) and under-constrained (due to missing distances).
You could try an iterative solver: start with a random placement of all your points, compare the current distances with the given ones, and use that to adjust your points in such a way as to improve the match. This is an optimization technique, so I would look up books on numerical optimization.

If you know the distance between the nodes (fixed part of system) and the distance to the tag (mobile) you can use trilateration to find the x,y postion.
I have done this using the Nanotron radio modules which have a ranging capability.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex