Extract attributes of a picture for image recognition - r

My question is inspired by the following Kaggle competition: https://www.kaggle.com/c/leaf-classification
I have a set of leaves which I would like to classify by how they look like. The classification part I managed to do it by using Random Forests and K-means. However, I am more interested in the pre-processing part so as to replicate this analysis with my own set of pictures.
The characteristics that conform each leaf are given by:
id - an anonymous id unique to an image
margin_1, margin_2, margin_3, ... margin_64 - each of the 64 attribute vectors for the margin feature
shape_1, shape_2, shape_3, ..., shape_64 - each of the 64 attribute vectors for the shape feature
texture_1, texture_2, texture_3, ..., texture_64 - each of the 64 attribute vectors for the texture feature
So, focusing on the question: I would like to get these characteristics from a raw picture. I have tried with Jpeg R package, but I haven't succeed. I do not show any code I've tried as this is a rather more theoretical question on how to tackle the issue, no need for the code.
I would really appreciate any advise on how to proceed to get the best descriptors of each image.

The problem is more of which kind of features can you extract based on margin, shape and texture of the images you have (leaves). This depends, some plants can easy be identified with shapes only while others need more features such as texture as they have similar shapes so this is still an open area of research. There have been a number of proposed features for the case of plant species identification which aims to focus on performance, efficiency or usability and good features must be invariant to scale, translation and rotation. Please refer to the link below providing a state-of-art review on the current feature extraction techniques that have been used in plant species identification
https://www.researchgate.net/publication/312147459_Plant_Species_Identification_Using_Computer_Vision_Techniques_A_Systematic_Literature_Review

Related

Questions about a research paper on salient region detection and segmentation

I am reading this paper in an attempt to recreate the salient region detection and segmentation model employed. I have the following questions pertaining to section 3 of the paper and I would highly appreciate it if someone could provide clarity on them.
The word "scales" is used at multiple points in the section, for example, line 4 of the section states "saliency maps are created at different scales". I do not exactly understand what the authors mean by the word scales. Moreover, is there a mathematical way to think about it?
I understand that a saliency value is computed for each pixel at () using the equation
However, there is no mention of in the equation. Hence, I am confused as to what pixel the saliency value is being computed for. Is it ?
I did not understand what the authors meant by the term "bin" in section 3.2 line 5 where it is stated, "The hill-climbing algorithm can be seen as a search window being run across the space of the d-dimensional histogram to find the largest bin within that window."
Lastly, any other tips or clarifications are most welcome and much appreciated!

How to make a good mesh in a biologically accurate model with very small domains

I have been trying to make a biologically accurate 2D spatial model of tissue layers, where different physiological processes happen. This includes mainly chemical reactions, diffusion and fluxes over boundaries.
I am making this model in COMSOL Multiphysics, a finite element software package that solves different physics like reaction-diffusion systems, although for my question this might not be really relevant.
In my geometry, I have really small regions between the cells of the tissue layers. These regions serve as openings where diffusion can take place between the cells (junctions). The quality of the mesh is not great here and if I want to improve the quality (mainly by introducing more elements and such), my simulation time increases drastically. The lesser quality mesh also causes convergence to take longer. I added a picture of the geometry to give an idea. I tried different meshes, all with different qualities of the elements and the number of elements ranging from 16000 to 50000.
My background in FEM is really limited and I wanted to know if I can tackle this problem in such a way that it
doesn't negatively affect the biology (keep the tissue domain sizes/problem etc as biologically accurate as possible),
doesn't increase the simulation time drastically,
give a better mesh quality.
So I really want to know what the best way to go is, since I have already thought of some things.
Can I go with the lesser quality mesh (which is not really bad, but not good either), so that I can keep the small regions for optimum biological accuracy and have a relatively small computation time (and hope I won't run into convergence errors).
But maybe there are possibilities that I am missing, for instance: is it possible to make the small domain bigger and then add some kind of factor to the diffusion rates. In other words, if I want to make the domain twice as large, do I factor the diffusion rate with half? Is that even accurate in chemical/physical laws :S.
Hopefully I made the problem a bit clear and thank you greatly in advance for the help.
Cheers,
Mesh of the tissue model
I know this thread was posted some months back but I am unsure if you found a solution.
In order to find the relationship between accuracy and computational time would be that you run a mesh analysis on your model and see how the mesh size directly affects the results you are expecting to obtain (pore pressure, fluid velocity, strain, etc.) This will allow you to determine the most appropriate mesh strategy for your specific problem.
Also, you might need to keep in mind that the diffusion rate of a material will depend on the pore size and the permeability (by means of Darcy's law) so depending on the assumptions you are making for the implementation of your constitutive law and your problem boundary conditions you might simplify/enlarge some of the smaller domains you have in your model so long they are within your previously made assumptions.

Network Analysis

I have a problem for network.
For one document I am extracting some information. I am drawing nice graphs for them. But in a document information flows. I am trying to depict it in graph like the way one reads a text flowing with text and then important most entity first and then the next important one.
To understand and grasp this problem what are the kinds of things I have to study or which aspect of network theory or graph theory deals with it.
If any one can kindly refer up.
Regs,
SK.
First of all, I'm not an expert in linguistic or study of languages. I think I understand what you're trying to do, and I don't know what's the best way to do it.
If I got it right, you want to determine some centrality measure for your words (that would explain the social network reference), to find those who are the most linked to others, is that it ?
The problem if you try that is that you will certainly find that the most central words are the most inintersting ones (the, if, then, some redundant adjectives...), if you don't apply a tokenization and lemmization procedure beforehand. Thus you could separate only nouns and stemming of verbs used, and then only you could try your approach.
Another problem that you must keep in mind is that words are important both by their presence and by their rarity (see tf-idf weight measure for instance).
To conclude, I did the following search on google :
"n gram graph language centrality word"
and found this paper that seems interesting for what you're asking (I might give it a look myself !) :
LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

rapid exploring random trees

http://msl.cs.uiuc.edu/rrt/
Can anyone explain how rrt works with simple wording that is easy to understand?
I read the description in the site and in wikipedia.
What I would like to see, is a short implementation of a rrt or a thorough explanation of the following thing:
Why does the rrt grow outwards instead of just growing very dense around the center?
How is it different from a naive random tree?
How is the next new vertex that we attempt to reach picked?
I know there is an Motion Strategy Library I could download but I would much rather understand the idea before I delve into the code rather than the other way around.
The simplest possible RRT algorithm has been so successful because it is pretty easy to implement. Things tend to get complicated when you:
need to visualise planning concepts in more than two dimensions
are unfamiliar with the terminology associated with planning, and;
in the huge number of variants of RRT that are have been described in the literature.
Pseudo code
The basic algorithm looks something like this:
Start with an empty search tree
Add your initial location (configuration) to the search tree
while your search tree has not reached the goal (and you haven't run out of time)
3.1. Pick a location (configuration), q_r, (with some sampling strategy)
3.2. Find the vertex in the search tree closest to that random point, q_n
3.3. Try to add an edge (path) in the tree between q_n and q_r, if you can link them without a collision occurring.
Although that description is adequate, after a while working in this space, I really do prefer the pseudocode of figure 5.16 on RRT/RDT in Steven LaValle's book "Planning Algorithms".
Tree Structure
The reason that the tree ends up covering the entire search space (in most cases) is because of the combination of the sampling strategy, and always looking to connect from the nearest point in the tree. This effect is described as reducing the Voronoi bias.
Sampling Strategy
The choice of where to place the next vertex that you will attempt to connect to is the sampling problem. In simple cases, where search is low dimensional, uniform random placement (or uniform random placement biased toward the goal) works adequately. In high dimensional problems, or when motions are very complex (when joints have positions, velocities and accelerations), or configuration is difficult to control, sampling strategies for RRTs are still an open research area.
Libraries
The MSL library is a good starting point if you're really stuck on implementation, but it hasn't been actively maintained since 2003. A more up-to-date library is the Open Motion Planning Library (OMPL). You'll also need a good collision detection library.
Planning Terminology & Advice
From a terminology point of view, the hard bit is to realise that although lots of the diagrams you see in the (early years of) publications on RRT are in two dimensions (trees that link 2d points), that this is the absolute simplest case.
Typically, a mathematically rigorous way to describe complex physical situations is required. A good example of this is planning for a robot arm with n- linkages. Describing the end of such an arm requires a minimum of n joint angles. This set of minimum parameters to describe a position is a configuration (or some publications state). A single configuration is often denoted q
The combination of all possible configurations (or a subset thereof) that can be achieved make up a configuration space (or state space). This can be as simple as an unbounded 2d plane for a point in the plane, or incredibly complex combinations of ranges of other parameters.

Rough Set-based Attribute Reduction

I tried RSAR, a free package, but I wonder if there any other good attribute reducers out there. Even packages for R or MATLAB, any resource capable of letting me find the minimal set of attributes which classify data.
For example, having a set with hundreds of examples of mail and different attributes which describe them and classified as spam or not spam, I want to find the minimal set of attributes that describe all the data, to discard useless information.
Considering the type of problem you describe, that is: choosing the right attributes for email classification, the best way might be to use Weka (Weka home). It has several feature-selection algorithms, which could be applied both interactively to visualize their effect, or in conjunction with various classification algorithms, to evaluate their effect on actual classification. (note that choosing attributes for classification without proper validation for a specific classifier might lead to less than optimal results in real life).
Some relevant links:
Weka's manual regarding attribute selection
A (somewhat outdated) hands-on example
you can use RoughSets package of R language. See the description of FS.one.reduct.computation in R (after installing RoughSets package)
e.g: HIRING2Matrix is a Decision Table with number of attributes. reduct1 is the reduced set of attributes
reduct1<- FS.one.reduct.computation(HIRING2Matrix, greedy = TRUE, power = 1)

Resources