I am reading this paper in an attempt to recreate the salient region detection and segmentation model employed. I have the following questions pertaining to section 3 of the paper and I would highly appreciate it if someone could provide clarity on them.
The word "scales" is used at multiple points in the section, for example, line 4 of the section states "saliency maps are created at different scales". I do not exactly understand what the authors mean by the word scales. Moreover, is there a mathematical way to think about it?
I understand that a saliency value is computed for each pixel at () using the equation
However, there is no mention of in the equation. Hence, I am confused as to what pixel the saliency value is being computed for. Is it ?
I did not understand what the authors meant by the term "bin" in section 3.2 line 5 where it is stated, "The hill-climbing algorithm can be seen as a search window being run across the space of the d-dimensional histogram to find the largest bin within that window."
Lastly, any other tips or clarifications are most welcome and much appreciated!
Related
My question is inspired by the following Kaggle competition: https://www.kaggle.com/c/leaf-classification
I have a set of leaves which I would like to classify by how they look like. The classification part I managed to do it by using Random Forests and K-means. However, I am more interested in the pre-processing part so as to replicate this analysis with my own set of pictures.
The characteristics that conform each leaf are given by:
id - an anonymous id unique to an image
margin_1, margin_2, margin_3, ... margin_64 - each of the 64 attribute vectors for the margin feature
shape_1, shape_2, shape_3, ..., shape_64 - each of the 64 attribute vectors for the shape feature
texture_1, texture_2, texture_3, ..., texture_64 - each of the 64 attribute vectors for the texture feature
So, focusing on the question: I would like to get these characteristics from a raw picture. I have tried with Jpeg R package, but I haven't succeed. I do not show any code I've tried as this is a rather more theoretical question on how to tackle the issue, no need for the code.
I would really appreciate any advise on how to proceed to get the best descriptors of each image.
The problem is more of which kind of features can you extract based on margin, shape and texture of the images you have (leaves). This depends, some plants can easy be identified with shapes only while others need more features such as texture as they have similar shapes so this is still an open area of research. There have been a number of proposed features for the case of plant species identification which aims to focus on performance, efficiency or usability and good features must be invariant to scale, translation and rotation. Please refer to the link below providing a state-of-art review on the current feature extraction techniques that have been used in plant species identification
https://www.researchgate.net/publication/312147459_Plant_Species_Identification_Using_Computer_Vision_Techniques_A_Systematic_Literature_Review
I recently started to work with a huge dataset, provided by medical emergency
service. I have cca 25.000 spatial points of incidents.
I am searching books and internet for quite some time and am getting more and more confused about what to do and how to do it.
The points are, of course, very clustered. I calculated K, L and G function
for it and they confirm serious clustering.
I also have population point dataset - one point for every citizen, that is similarly clustered as incidents dataset (incidents happen to people, so there is a strong link between these two datasets).
I want to compare these two datasets to figure out, if they are similarly
distributed. I want to know, if there are places, where there are more
incidents, compared to population. In other words, I want to use population dataset to explain intensity and then figure out if the incident dataset corresponds to that intensity. The assumption is, that incidents should appear randomly regarding to population.
I want to get a plot of the region with information where there are more or less incidents than expected if the incidents were randomly happening to people.
How would you do it with R?
Should I use Kest or Kinhom to calculate K function?
I read the description, but still don't understand what is a basic difference
between them.
I tried using Kcross, but as I figured out, one of two datasets used
should be CSR - completely spatial random.
I also found Kcross.inhom, should I use that one for my data?
How can I get a plot (image) of incident deviations regarding population?
I hope I asked clearly.
Thank you for your time to read my question and
even more thanks if you can answer any of my questions.
Best regards!
Jernej
I do not have time to answer all your questions in full, but here are some pointers.
DISCLAIMER: I am a coauthor of the spatstat package and the book Spatial Point Patterns: Methodology and Applications with R so I have a preference for using these (and I genuinely believe these are the best tools for your problem).
Conceptual issue: How big is your study region and does it make sense to treat the points as distributed everywhere in the region or are they confined to be on the road network?
For now I will assume we can assume they are distributed anywhere.
A simple approach would be to estimate the population density using density.ppp and then fit a Poisson model to the incidents with the population density as the intensity using ppm. This would probably be a reasonable null model and if that fits the data well you can basically say that incidents happen "completely at random in space when controlling for the uneven population density". More info density.ppp and ppm are in chapters 6 and 9 of 1, respectively, and of course in the spatstat help files.
If you use summary statistics like the K/L/G/F/J-functions you should always use the inhom versions to take the population density into account. This is covered in chapter 7 of 1.
Also it could probably be interesting to see the relative risk (relrisk) if you combine all your points in to a marked point pattern with two types (background and incidents). See chapter 14 of 1.
Unfortunately, only chapters 3, 7 and 9 of 1 are availble as free to download sample chapters, but I hope you have access to it at your library or have the option of buying it.
Do you guys have an idea how to approach the problem of finding artefacts/outliers in a blood pressure curve? My goal is to write a program, that finds out the start and end of each artefact. Here are some examples of different artefacts, the green area is the correct blood pressure curve and the red one is the artefact, that needs to be detected:
And this is an example of a whole blood pressure curve:
My first idea was to calculate the mean from the whole curve and many means in short intervals of the curve and then find out where it differs. But the blood pressure varies so much, that I don't think this could work, because it would find too many non existing "artefacts".
Thanks for your input!
EDIT: Here is some data for two example artefacts:
Artefact1
Artefact2
Without any data there is just the option to point you towards different methods.
First (without knowing your data, which is always a huge drawback), I would point you towards Markov switching models, which can be analysed using the HiddenMarkov-package, or the HMM-package. (Unfortunately the RHmm-package that the first link describes is no longer maintained)
You might find it worthwile to look into Twitter's outlier detection.
Furthermore, there are many blogposts that look into change point detection or regime changes. I find this R-bloggers blog post very helpful for a start. It refers to the CPM-package, which stands for "Sequential and Batch Change Detection Using Parametric and Nonparametric Methods", the BCP-package ("Bayesian Analysis of Change Point Problems"), and the ECP-package ("Non-Parametric Multiple Change-Point Analysis of Multivariate Data"). You probably want to look into the first two as you don't have multivariate data.
Does that help you getting started?
I could provide an graphical answer that does not use any statistical algorithm. From your data I observe that the "abnormal" sequences seem to present constant portions or, inversely, very high variations. Working on the derivative, and setting limits on this derivative could work. Here is a workaround:
require(forecast)
test=c(df2$BP)
test=ma(test, order=50)
test=test[complete.cases(test)]
which <- ma(0+abs(diff(test))>1, order=10)>0.1
abnormal=test; abnormal[!which]<-NA
plot(x=1:NROW(test), y=test, type='l')
lines(x=1:NROW(test), y=abnormal, col='red')
What it does: first "smooths" the data with a moving average to prevent the micro-variations to be detected. Then it applyes the "diff" function (derivative) and tests if it is greater than 1 (this value is to be adjusted manually depending on the soothing amplitude). THen, in order to get a whole "block" of abnormal sequence without tiny gaps, we apply again a smoothing on the boolean and test it superior to 0.1 to grasp better the boundaries of the zone. Eventually, I overplot the spotted portions in red.
This works for one type of abnormality. For the other type, you could make up a low treshold on the derivative, inversely, and play with the tuning parameters of smoothing.
I have a problem for network.
For one document I am extracting some information. I am drawing nice graphs for them. But in a document information flows. I am trying to depict it in graph like the way one reads a text flowing with text and then important most entity first and then the next important one.
To understand and grasp this problem what are the kinds of things I have to study or which aspect of network theory or graph theory deals with it.
If any one can kindly refer up.
Regs,
SK.
First of all, I'm not an expert in linguistic or study of languages. I think I understand what you're trying to do, and I don't know what's the best way to do it.
If I got it right, you want to determine some centrality measure for your words (that would explain the social network reference), to find those who are the most linked to others, is that it ?
The problem if you try that is that you will certainly find that the most central words are the most inintersting ones (the, if, then, some redundant adjectives...), if you don't apply a tokenization and lemmization procedure beforehand. Thus you could separate only nouns and stemming of verbs used, and then only you could try your approach.
Another problem that you must keep in mind is that words are important both by their presence and by their rarity (see tf-idf weight measure for instance).
To conclude, I did the following search on google :
"n gram graph language centrality word"
and found this paper that seems interesting for what you're asking (I might give it a look myself !) :
LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
http://msl.cs.uiuc.edu/rrt/
Can anyone explain how rrt works with simple wording that is easy to understand?
I read the description in the site and in wikipedia.
What I would like to see, is a short implementation of a rrt or a thorough explanation of the following thing:
Why does the rrt grow outwards instead of just growing very dense around the center?
How is it different from a naive random tree?
How is the next new vertex that we attempt to reach picked?
I know there is an Motion Strategy Library I could download but I would much rather understand the idea before I delve into the code rather than the other way around.
The simplest possible RRT algorithm has been so successful because it is pretty easy to implement. Things tend to get complicated when you:
need to visualise planning concepts in more than two dimensions
are unfamiliar with the terminology associated with planning, and;
in the huge number of variants of RRT that are have been described in the literature.
Pseudo code
The basic algorithm looks something like this:
Start with an empty search tree
Add your initial location (configuration) to the search tree
while your search tree has not reached the goal (and you haven't run out of time)
3.1. Pick a location (configuration), q_r, (with some sampling strategy)
3.2. Find the vertex in the search tree closest to that random point, q_n
3.3. Try to add an edge (path) in the tree between q_n and q_r, if you can link them without a collision occurring.
Although that description is adequate, after a while working in this space, I really do prefer the pseudocode of figure 5.16 on RRT/RDT in Steven LaValle's book "Planning Algorithms".
Tree Structure
The reason that the tree ends up covering the entire search space (in most cases) is because of the combination of the sampling strategy, and always looking to connect from the nearest point in the tree. This effect is described as reducing the Voronoi bias.
Sampling Strategy
The choice of where to place the next vertex that you will attempt to connect to is the sampling problem. In simple cases, where search is low dimensional, uniform random placement (or uniform random placement biased toward the goal) works adequately. In high dimensional problems, or when motions are very complex (when joints have positions, velocities and accelerations), or configuration is difficult to control, sampling strategies for RRTs are still an open research area.
Libraries
The MSL library is a good starting point if you're really stuck on implementation, but it hasn't been actively maintained since 2003. A more up-to-date library is the Open Motion Planning Library (OMPL). You'll also need a good collision detection library.
Planning Terminology & Advice
From a terminology point of view, the hard bit is to realise that although lots of the diagrams you see in the (early years of) publications on RRT are in two dimensions (trees that link 2d points), that this is the absolute simplest case.
Typically, a mathematically rigorous way to describe complex physical situations is required. A good example of this is planning for a robot arm with n- linkages. Describing the end of such an arm requires a minimum of n joint angles. This set of minimum parameters to describe a position is a configuration (or some publications state). A single configuration is often denoted q
The combination of all possible configurations (or a subset thereof) that can be achieved make up a configuration space (or state space). This can be as simple as an unbounded 2d plane for a point in the plane, or incredibly complex combinations of ranges of other parameters.