I'm not sure this is the right place but here I go:
I have a database of 300 picture in high-resolution. I want to compute the PCA on this database and so far here is what I do: - reshape every image as a single column vector - create a matrix of all my data (500x300) - compute the average column and substract it to my matrix, this gives me X - compute the correlation C = X'X (300x300) - find the eigenvectors V and Eigen Values D of C. - the PCA matrix is given by XV*D^-1/2, where each column is a Principal Component
This is great and gives me correct component.
Now what I'm doing is doing the same PCA on the same database, except that the images have a lower resolution.
Here are my results, low-res on the left and high-res on the right. Has you can see most of them are similar but SOME images are not the same (the ones I circled)
Is there any way to explain this? I need for my algorithm to have the same images, but one set in high-res and the other one in low-res, how can I make this happen?
thanks
It is very possible that the filter you used could have done a thing or two to some of the components. After all, lower resolution images don't contain higher frequencies that, too, contribute to which components you're going to get. If component weights (lambdas) at those images are small, there's also a good possibility of errors.
I'm guessing your component images are sorted by weight. If they are, I would try to use a different pre-downsampling filter and see if it gives different results (essentially obtain lower resolution images by different means). It is possible that the components that come out differently have lots of frequency content in the transition band of that filter. It looks like images circled with red are nearly perfect inversions of each other. Filters can cause such things.
If your images are not sorted by weight, I wouldn't be surprised if the ones you circled have very little weight and that could simply be a computational precision error or something of that sort. In any case, we would probably need a little more information about how you downsample, how you sort the images before displaying them. Also, I wouldn't expect all images to be extremely similar because you're essentially getting rid of quite a few frequency components. I'm pretty sure it wouldn't have anything to do with the fact that you're stretching out images into vectors to compute PCA, but try to stretch them out in a different direction (take columns instead of rows or vice versa) and try that. If it changes the result, then perhaps you might want to try to perform PCA somewhat differently, not sure how.
Related
I am trying to enlarge a point cloud data set. Suppose I have a point cloud data set consisting of 100 points & I want to enlarge it to say 5 times. Actually I am studying some specific structure which is very small, so I want to zoom in & do some computations. I want something like imresize() in Matlab.
Is there any function to do this? What does resize() function do in PCL? Any idea about how can I do it?
Why would you need this? Points are just numbers, regardless whether they are 1 or 100, until all of them are on the same scale and in the same coordinate system. Their size on the screen is just a visual representation, you can zoom in and out as you wish.
You want them to be a thousandth of their original value (eg. millimeters -> meters change)? Divide them by 1000.
You want them spread out in a 5 times larger space in that particular coordinate system? Multiply their coordinates with 5. But even so, their visual representations will look exactly the same on the screen. The data remains basically the same, they will not be resized per se, they numeric representation will change a bit. It is the simplest affine transform, just a single multiplication.
You want to have finer or coarser resolution of your numeric representation? Or have different range? Change your data type accordingly.
That is, if you deal with a single set.
If you deal with different sets, say, recorded with different kinds of sensors and the numeric representations differ a bit (there are angles between the coordinate systems, mm vs cm scale, etc.) you just have to find the transformation from one coordinate system to the other one and apply it to the first one.
Since you want to increase the number of points while preserving shape/structure of the cloud, I think you want to do something like 'upsampling'.
Here is another SO question on this.
The PCL offers a class for bilateral upsampling.
And as always google gives you a lot of hints on this topic.
Beside (what Ziker mentioned) increasing allocated memory (that's not what you want, right?) or zooming in in visualization you could just rescale your point cloud.
This can be done by multiplying each points dimensions with a constant factor or using an affine transformation. So you can e.g switch from mm to m.
If i understand your question correctly
If you have defined your cloud like this
pcl::PointCloud<pcl::PointXYZ>::Ptr cloud (new pcl::PointCloud<pcl::PointXYZ>);
in fact you can do resize
cloud->points.resize (cloud->width * cloud->height);
Note that doing resize does nothing more than allocate more memory for variable thus after resizing original data remain in cloud. So if you want to have empty resized cloud dont forget to add cloud->clear();
If you just want zoom some pcd for visual puposes(i.e you cant see what is shape of cloud because its too small) why dont you use PCL Visualization and zoom by scrolling up/down
Let's say I have a set of 5 markers. I am trying to find the relative distances between each marker using an augmented reality framework such as ARToolkit. In my camera feed thee first 20 frames show me the first 2 markers only so I can work out the transformation between the 2 markers. The second 20 frames show me the 2nd and 3rd markers only and so on. The last 20 frames show me the 5th and 1st markers. I want to build up a 3D map of the marker positions of all 5 markers.
My question is, knowing that there will be inaccuracies with the distances due to low quality of the video feed, how do I minimise the inaccuracies given all the information I have gathered?
My naive approach would be to use the first marker as a base point, from the first 20 frames take the mean of the transformations and place the 2nd marker and so forth for the 3rd and 4th. For the 5th marker place it inbetween the 4th and 1st by placing it in the middle of the mean of the transformations between the 5th and 1st and the 4th and 5th. This approach I feel has a bias towards the first marker placement though and doesn't take into account the camera seeing more than 2 markers per frame.
Ultimately I want my system to be able to work out the map of x number of markers. In any given frame up to x markers can appear and there are non-systemic errors due to the image quality.
Any help regarding the correct approach to this problem would be greatly appreciated.
Edit:
More information regarding the problem:
Lets say the realworld map is as follows:
Lets say I get 100 readings for each of the transformations between the points as represented by the arrows in the image. The real values are written above the arrows.
The values I obtain have some error (assumed to follow a gaussian distribution about the actual value). For instance one of the readings obtained for marker 1 to 2 could be x:9.8 y:0.09. Given I have all these readings how do I estimate the map. The result should ideally be as close to the real values as possible.
My naive approach has the following problem. If the average of the transforms from 1 to 2 is slightly off the placement of 3 can be off even though the reading of 2 to 3 is very accurate. This problem is shown below:
The greens are the actual values, the blacks are the calculated values. The average transform of 1 to 2 is x:10 y:2.
You can use a least-squares method, to find the transformation that gives the best fit to all your data. If all you want is the distance between the markers, this is just the average of the distances measured.
Assuming that your marker positions are fixed (e.g., to a fixed rigid body), and you want their relative position, then you can simply record their positions and average them. If there is a potential for confusing one marker with another, you can track them from frame to frame, and use the continuity of each marker location between its two periods to confirm its identity.
If you expect your rigid body to be moving (or if the body is not rigid, and so forth), then your problem is significantly harder. Two markers at a time is not sufficient to fix the position of a rigid body (which requires three). However, note that, at each transition, you have the location of the old marker, the new marker, and the continuous marker, at almost the same time. If you already have an expected location on the body for each of your markers, this should provide a good estimate of a rigid pose every 20 frames.
In general, if your body is moving, best performance will require some kind of model for its dynamics, which should be used to track its pose over time. Given a dynamic model, you can use a Kalman filter to do the tracking; Kalman filters are well-adapted to integrating the kind of data you describe.
By including the locations of your markers as part of the Kalman state vector, you may be able to be able to deduce their relative locations from purely sensor data (which appears to be your goal), rather than requiring this information a priori. If you want to be able to handle an arbitrary number of markers efficiently, you may need to come up with some clever mutation of the usual methods; your problem seems designed to avoid solution by conventional decomposition methods such as sequential Kalman filtering.
Edit, as per the comments below:
If your markers yield a full 3D pose (instead of just a 3D position), the additional data will make it easier to maintain accurate information about the object you are tracking. However, the recommendations above still apply:
If the labeled body is fixed, use a least-squares fit of all relevant frame data.
If the labeled body is moving, model its dynamics and use a Kalman filter.
New points that come to mind:
Trying to manage a chain of relative transformations may not be the best way to approach the problem; as you note, it is prone to accumulated error. However, it is not necessarily a bad way, either, as long as you can implement the necessary math in that framework.
In particular, a least-squares fit should work perfectly well with a chain or ring of relative poses.
In any case, for either a least-squares fit or for Kalman filter tracking, a good estimate of the uncertainty of your measurements will improve performance.
I have a 3D floating-point matrix, in worst-case scenario the size could be (200000x1000000x100), I want to visualize this matrix using Qt/OpenGL.
Since the number of elements is extremely high, I want to render them in a way that when the camera is far away from the matrix, I just show a number of interesting points that gives an approximation of how the matrix look like. When the camera gets closer, I want to get more details and hence more elements are calculated.
I would like to know if there are techniques that deals with this kind of visualization.
The general idea is called level-of-detail rendering and is a whole science in itself.
For your domain i would recommend two steps:
1) Reduce the number of cells by averaging (arithmetic-mean function) them in cubes of different sizes and caching those cubes (on disk as well as RAM). "Different" means here, that you have the same data in multiple sizes of cubes, e.g. coarse-grained cubes of 10000x10000x10000 and finer cubes of 100x100x100 cells resulting in multiple levels-of-detail. You have to organize these in a hierarchical structure (the larger ones containing multiple smaller ones) and for this i would recommend an Octree:
http://en.wikipedia.org/wiki/Octree
2) The second step is to actually render parts of this Octree:
To do this use the distance of your camera-point to the sub-cubes. Go through the cubes and decide to either enter the sub-cube or render the larger cube by using this distance-function and heuristically chosen or guessed threshold-values.
(2) can be further optimized but this is optional: To optimize this rendering organize the to-be-rendered cube's into layers: The direction of the layers (whether it is in x, y, or z-slices) depends on your camera-viewpoint to which it should be near-perpendicular. Then render each slice into a texture and voila you only have to render a single quad with that texture for each slice, 1000 quads are no problem to render.
Qt has some way of rendering huge number of elements efficiently. Check the examples/demo that is part of QT.
I was wondering what kind of model / method / technique Trendly might use to achieve this model:
[It tries to find the moments where significant changes set in and ignores random movements]
Any pointers very welcome! :)
I've never seen 'Trendly', and don't know anything about it, but if I wanted to produce that red line from that blue line, in an algorithmic fashion, I would try:
Fourier the whole data set
Choose a block size longer than the period of the dominant frequency
Divide the data up into blocks of the chosen size
Compare adjacent ones with a statistical test of some sort.
Where the test says two blocks belong to the same underlying distribution, merge them.
If any were merged, go back to 4.
Red trend line is the mean of each block.
A simple "median" function could produce smoother curves over a mostly un-smooth curve.
Otherwise, a brute-force or genetic algorithm could be used; attempting to find the way to split the data into sections, so that more sections = worse solution, and less accuracy of the lines = worse solution.
Another way would be like this: Start at the beginning. As soon as the line moves outside of some radius (3 above or 3 below the first, for instance) set the new height to an average of the current line's height and the previous marker.
If you keep doing that, it would ignore small fluctuations. However, if the fluctuation was large enough, it would still effect it.
I'm writing a program that works with images and at some point I need to posterize the image. This means I need to bin the colors, but I'm having trouble deciding how to tell how close one color is to another.
Given a color in RGB, I can think of at least 2 ways to see how different they are:
|r1 - r2| + |g1 - g2| + |b1 - b2|
sqrt((r1 - r2)^2 + (g1 - g2)^2 + (b1 - b2)^2)
And if I move into HSV, I can think of other ways of doing it.
So I ask, ignoring speed, what is the best way to tell how similar two colors are? Best meaning most accurate to the human eye.
Well, if speed is not an issue, the most accurate way would be to take some sample images and apply the filter to them using various cutoff values for the distance (distance being determined by one of the equations on the Color_difference page that astander linked to, meaning you'd have to use one of those color spaces listed there with the calculations, then convert to sRGB or something [which also means that you'd need to convert the image into the other color space first if it's not in it to begin with]), and then have a large number of people examine the images to see what looks best to them, then go with the cutoff value for the images that the majority agrees looks best.
Basically, it's largely a matter of subjectiveness; in fact, it also depends on how stylized you want the images, and you might even want to add in some sort of control so that you can alter the cutoff distance on the fly.
If speed does become a bit of an issue and/or you want more simplicity, then just use your second choice for distance calculation (which is simply the CIE76 equation; just make sure to use the Lab* color space) with the cutoff being around 2 or 2.3.
What do you mean by "posterize the image"?
If you're trying to cluster the colors into bins, you should look at
cluster analysis
Just a comment if you are going to move to HSV (or similar spaces):
Diffing on H: difference between 0° and 359° is numerically big but perceptually is negligible.
H difference if V or S are small - is small.
For computer vision apps, more important not perceptual difference (used mostly by paint manufacturers) but are these colors belong to the same object/segment or not. Which means that we might partially ignore V, which can change from lighting conditions.