What is the difference between segments(from segmentation) and classes( from Hierchical Clustering) - r

I am working on meanshift segmentation using R! I am now abit confused!my first question is how I can cluster the segmentation file( defining each segment as a super pixel) and second how I can then define that how many objects and how many classes I have! Because when I do clustering, there are many neighbour segments that are in one class so I cant count them as many segments and they are one segment? right? please someone help!
Thanks in advance,

Segmentation assigns each pixel a "segment id", usually with the proviso that all segments by physically contiguous. Hierarchical clustering clusters based on a similarity value. You'll need to write your own function, but it can be quite simple - just take mean rgb values and get distance between them. However you also need to return infinite distance for segments which are not physically contiguous, if you keep the segmentation crieterion.
I'm not familiar with R's hierarchical clustering functions, but it will certainly have one which accepts an arbitrary distance function.

Related

pcl::NormalEstimation setSearchMethod explanation

// Compute the normals
pcl::NormalEstimation<pcl::PointXYZ, pcl::Normal> normalEstimation;
pcl::search::KdTree<pcl::PointXYZ>::Ptr tree(new pcl::search::KdTree<pcl::PointXYZ>);
normalEstimation.setInputCloud(source_cloud);
normalEstimation.setSearchMethod(tree);
Hello everyone,
I am beginner for learning PCL
I don't understand at "normalEstimation.setSearchMethod(tree);"
what does this part mean?
Does it mean there are some methods that we have to choose?
Sometimes I see the code is like this
// Normal estimation
pcl::NormalEstimation<pcl::PointXYZ, pcl::Normal> n;
pcl::search::KdTree<pcl::PointXYZ>::Ptr tree(new pcl::search::KdTree<pcl::PointXYZ>);
tree->setInputCloud(cloud_smoothed)*/; [this part I dont understand too]
n.setInputCloud(cloud_smoothed);
n.setSearchMethod(tree);
Thankyou guys
cheers
you can find some information on how normals are computed here: http://pointclouds.org/documentation/tutorials/normal_estimation.php.
Basically, to compute a normal at a specific point, you need to "analyze" its neighbourhood, so the points that are around it. Usually, you take either the N closest points or all the points that are within a certain radius around the target point.
Finding these neighbour points is not trivial if the pointcloud is completely unstructured. KD-trees are here to speedup this search. In fact, they are an optimized data structure which allows very fast nearest-neighbour search for example. You can find more information on KD-trees here http://pointclouds.org/documentation/tutorials/kdtree_search.php.
So, the line normalEstimation.setSearchMethod(tree); just sets an empty KD-tree that will be used by the normal estimation algorithm.

How to find a point on 2-d weighted map, which will have equidistant (as close as possible) paths to multiple endpoints?

So let's say I got a matrix with two types of cells: 0 and 1. 1 is not passable.
I want to find a point, from which I can run paths (say, A*) to a bunch of destinations (don't expect it to be more than 4). And I want the length of these paths to be such that l1/l2/l3/l4 = 1 or as close to 1 as possible.
For two destinations is simple: run a path between them and take the midpoint. For more destinations, I imagine I can run paths between each pair, then they will create a sort of polygon, and I could grab the centroid (or average of all path point coordinates)? Or would it be better to take all midpoints of paths between each pair and then use them as vertices in a polygon which will contain my desired point?
It seems you want to find the point with best access to multiple endpoints. For other readers, this is like trying to found an ideal settlement to trade with nearby cities; you want them to be as accessible as possible. It appears to be a variant of the Weber Problem applied to pathfinding.
The best solution, as you can no longer rely on exploiting geometry (imagine a mountain path or two blocking the way), is going to be an iterative approach. I don't imagine it will be easy to find an optimal solution because you'll need to check every square; you can't guess by pathing between endpoints anymore. In nearly any large problem space, you will need to path from each possible centroid to all endpoints. A suboptimal solution will be fairly fast. I recommend these steps:
try to estimate the centroid using geometry, forming a search area
Use a modified A* algorithm from each point S in the search area to all your target points T to generate a perfect path from S to each T.
Add the length of each path S -> T together to get Cost (probably stored in a matrix for all sample points)
Select the lowest Cost from all your samples in the matrix (or the entire population if you didn't cull the search space).
The algorithm above can also work without estimating a centroid and limiting solutions. If you choose to search the entire space, the search will be much longer, but you can find a perfect solution even in a labyrinth. If you estimate the centroid and start the search near it, you'll find good answers faster.
I mentioned earlier that you should use a modified A* algorithm... Rather than repeating a generic A* search S->Tn for every T, code A* so that it seeks multiple target locations, storing the paths to each one and stopping when it has found them all.
If you really want a perfect solution to the problem, you'll be waiting a long time, so I recommend that you use any exploit you can to reduce wasteful calculations. Even go so far as to store found paths in a lookup table for each T, and see if a point already exists along any of those paths.
To put it simply, finding the point is easy. Finding it fast-enough might take lots of clever heuristics (cost-saving measures) and stored data.

Edge bipartization with equal number of nodes

I'm trying to solve the standard bipartization problem, i.e., find a subset of the edges such that the output graph is a bipartite graph.
My additional constraints are:
The number of vertices on each side must be equal.
Each vertex has exactly 1 edge.
In fact, it would suffice to know whether such a subset exists at all - I don't really need the construction itself.
Optimally, the algorithm should be fast as I need to run it for O(400) nodes repeatedly.
If each vertex is to be incident on exactly one edge, it seems what you want is a matching. If so, Edmonds's blossom algorithm will do the job. I haven't used an implementation of the algorithm to recommend. You might check out http://www.algorithmic-solutions.com/leda/ledak/index.htm

What Are Large Graphs? What is Large Graph Analysis? What Is Big Data? What is Big Data Analysis?

I know what are these as I have started working with them. But for now, I just want to know the formal definitions of these terms and questions.
Any help in these regards is highly appreciated.
In my opinion, there is no absolute, formal criterion of when a graph becomes 'large' of when the amount of data becomes 'big'. These adjectives are meaningless without a frame of reference.
For instance, when you say someone is 'tall', it is implicitly assumed that you are either comparing this person to yourself, or to a perceived average height of people. If you change your frame of reference and compare this person to, let's say Mount Everest, this person's height becomes negligible. I could give a billion other examples, but the take-home message is: there is no absolute notion of 'bigness' or 'smallness'. The notion of scale is a relative notion. Simple concept, but with very strong implication: in a sense, physics has been so successful because physicists understood it very early.
So, to answer this question, I think a good of thumb is:
'large graphs' are graphs the exploration of which require long computation times on a typical quad-core machine compared to what people judge reasonable (an hour, a day. Your patience may vary).
'big data' are typically data which take too much memory space to be stored on a single hard drive.
Of course these are just rules of thumbs.
Usually A graph that has a set of nodes and arrows is a Small graph; otherwise, it is a Large graph.
If we show collection of nodes of a graph G by G0 and the collection of arrows by G1 then let G0= {1,2}, G1= {a,b,c},
source(a) = target(a) = source(b) = target(c)=1 and target(b) = source(c) = 2. It is small graph but The graph of sets and functions has all sets as nodes and all functions between sets as arrows. The source of a function is its domain, and its target is its codomain.
In this example, unlike the previous ones, the nodes do not form a set.Thus the graph of sets and functions is a large graph.
More generally we refer to any kind of mathematical structure as ‘small’ if the collection(s) it is built on form sets, and ‘large’ otherwise.

Graph library implementation

I' m trying to implement a weighted graph. I know that there are two ways to implement a weighted graph. Either with a two dimensional array(adjacency matrix) or with an array of linked lists(adjacency list). Which one of the two is more efficient and faster?
Which one of the two is more efficient and faster?
That depends on your usage and the kinds of graphs you want to store.
Let n be the number of nodes and m be the number of edges. If you want to know whether two nodes u and v are connected (and the weight of the edge), an adjacency matrix allows you to determine this in constant time (in O-notation, O(1)), simply by retrieving the entry A[u,v]. With an adjacency list, you will have to look at every entry in u's list, or v's list - in the worst case, there could be n entries. So edge lookup for an adjacency list is in O(n).
The main downside of an adjacency matrix is the memory required. Alltogether, you need to store n^2 entries. With an adjacency list, you need to store only the edges that actually exist (m entries, asuming a directed graph). So if your graph is sparse, adjacency lists clearly occupy much less memory.
My conclusion would be: Use an adjacency matrix if your main operation is retrieving the edge weight for two specific nodes; under the condition that your graphs are small enough so that n^2 entries fit in memory. Otherwise, use the adjacency list.
Personally I'd go for the linked lists approach, assuming that it will often be a sparse graph (i.e. most of the array cells are a waste of space).
Went to wikipedia to read up on adjacency lists (been ages since I used graphs) and it has a nice section on the trade-offs between the 2 approaches. Ultimately, as with many either/or choices it will come down to 'it depends', based on what are the likely use cases for your library.
After reading the wiki article, I think another point in favor of using lists would be attaching data to each directed segment (or even different weights, think of walk/bike/car distance between 2 points etc.)

Resources