I am looking for Visibility Graph applications. In line with the articles I read, I have obtained the applications of this algorithm, which are as follows:
Robot path planning
Placement of radio antennas
Complex network theory
Regional planning
This algorithm is also used to analyse time series. In the analysis of time series using the graph algorithm, the question arose that after obtaining the graph obtained from this algorithm: what is the efficiency of this graph?
If we consider the meteorological data and obtain its graph with the Visibility Graph Algorithm, from this graph we can obtain statistical properties or the degree distribution of networks that follow the law of power.
In general, my question is what efficiency and information does the graph from the meteorological time series or the purchase of medicine at certain times and many other time series provide us with?
As explained in the paper entitled From time series to complex networks: The visibility graph by Lucas Lacasa, Bartolo Luque, Fernando Ballesteros, Jordi Luque, and Juan Carlos Nuño, in 2018, the visibility graph of a time series if invariant under translation, rescaling, addition of a linear trend, and other transformations. It however captures key time series features, like periodicity, self-similar indexes, etc.
Related
We have Fourier Series and several other chapters like Fourier Integral and Transforms, Ordinary Differential Equations, Partial Differential Equations in my Course.
I am pursuing Bachelors Degree in Computer Science & Engineering. Never being fond of mathematics I am little curios to know where this can be useful for me.
Fourier transform is one of the brilliant algorithms and it has quite a lot of use cases. Signal processing is the significant one among them.
Here are some use cases:
You can separate a song into its individual frequencies & boost
the ones you care for
Used for compression (audio for instance)
Used to predict earth quakes
Used to analyse DNA
Used to build apps like Shazam which predict what song is playing
Used in kinesiology to predict muscle fatigue by analysing muscle signals.
(In short, the signal frequency variations can be fed to a
machine learning algorithm and the algorithm could predict the type of
fatigue and so on)
I guess, this will give you an idea of how important it is.
I recently started to work with a huge dataset, provided by medical emergency
service. I have cca 25.000 spatial points of incidents.
I am searching books and internet for quite some time and am getting more and more confused about what to do and how to do it.
The points are, of course, very clustered. I calculated K, L and G function
for it and they confirm serious clustering.
I also have population point dataset - one point for every citizen, that is similarly clustered as incidents dataset (incidents happen to people, so there is a strong link between these two datasets).
I want to compare these two datasets to figure out, if they are similarly
distributed. I want to know, if there are places, where there are more
incidents, compared to population. In other words, I want to use population dataset to explain intensity and then figure out if the incident dataset corresponds to that intensity. The assumption is, that incidents should appear randomly regarding to population.
I want to get a plot of the region with information where there are more or less incidents than expected if the incidents were randomly happening to people.
How would you do it with R?
Should I use Kest or Kinhom to calculate K function?
I read the description, but still don't understand what is a basic difference
between them.
I tried using Kcross, but as I figured out, one of two datasets used
should be CSR - completely spatial random.
I also found Kcross.inhom, should I use that one for my data?
How can I get a plot (image) of incident deviations regarding population?
I hope I asked clearly.
Thank you for your time to read my question and
even more thanks if you can answer any of my questions.
Best regards!
Jernej
I do not have time to answer all your questions in full, but here are some pointers.
DISCLAIMER: I am a coauthor of the spatstat package and the book Spatial Point Patterns: Methodology and Applications with R so I have a preference for using these (and I genuinely believe these are the best tools for your problem).
Conceptual issue: How big is your study region and does it make sense to treat the points as distributed everywhere in the region or are they confined to be on the road network?
For now I will assume we can assume they are distributed anywhere.
A simple approach would be to estimate the population density using density.ppp and then fit a Poisson model to the incidents with the population density as the intensity using ppm. This would probably be a reasonable null model and if that fits the data well you can basically say that incidents happen "completely at random in space when controlling for the uneven population density". More info density.ppp and ppm are in chapters 6 and 9 of 1, respectively, and of course in the spatstat help files.
If you use summary statistics like the K/L/G/F/J-functions you should always use the inhom versions to take the population density into account. This is covered in chapter 7 of 1.
Also it could probably be interesting to see the relative risk (relrisk) if you combine all your points in to a marked point pattern with two types (background and incidents). See chapter 14 of 1.
Unfortunately, only chapters 3, 7 and 9 of 1 are availble as free to download sample chapters, but I hope you have access to it at your library or have the option of buying it.
For my thesis assignment I need to perform a cluster analysis on a high dimensional data set containing purchase data from a retail store (+1000 dimensions). Because traditional clustering algorithms are not well suited for high dimensions (and dimension reduction is not really an option), I would like to try algorithms specifically developed for high dimensional data(e.g. ProClus).
Here however, my problem starts.
I have no clue what value I should use for parameter d. Can anyone help me?
This is just one of the many limitations of ProClus.
The parameter is the average dimensionality of your cluster. It assumes there is a linear cluster somewhere in your data. This likely will not hold for purchase data, but you can try. For sparse data such as purchases, I would rather focus on frequent itemset mining.
There is no universal clustering algorithm. Any clustering algorithm will come with a variety of parameters that you need to experiment with.
For cluster analysis it is essential that you somehow can visualize or analyze the result, to be able to find out if and how well the method worked.
Please help me to solve this homework. I need to draw the ER diagram, relationships and cardinality.
An environmental Agency needs to catalog all the plants in an area that is vulnerable to acid rains. Plants exist in quadrants and a botanist is responsible for cataloging plants. The data that should be stored should include genus,species,quantity (in numbers,kg's) of the plants, date of record, quadrant id, quadrant location, average altitude of quadrant and botanists information such as name.
Before you begin to learn how to draw an ER diagram, you will do well to learn the differences between a relational model of the data and an ER model of the data. Most of the ER diagrams being presented here in SO are really diagrams of a relational model.
This may seem overly picky, but the confusion between the two kinds of models slows down beginners enormously. If you have decided on a relational model, and want to use an ERD to depict it, you can do that. But learn how to make a model before you learn how to draw a picture of a model.
I would like for this to become a sign-post for various time series breakout/change/disturbance detection methods in R. My question is to describe the motivation and differences in approaches with each of the following packages. That is, when does it make more sense to use one approach over the other, similarities/differences, etc.
Packages in question:
strucchange (example here)
changepoint (example here)
BreakoutDetection (link includes simple example)
qcc's Control Charts (tutorial here)
bfast
Perhaps (?) to a lesser extent: AnomalyDetection and mvOutlier
I am hopeful for targeted answers. Perhaps a paragraph for each method. It is easy to slap each of these across a time series but that can come at the cost of abusing/violating assumptions. There are resources that provide guidelines for ML supervised/unsupervised techniques. I (and surely others) would appreciate some guidepost/pointers around this area of time-series analysis.
Two very different motivations have led to time-series analysis:
Industrial quality control and detection of outliers, detecting deviations from a stable noise.
Scientific understanding of trends, where the understanding of trends and of their determinants is of central importance.
Of course both are to a large extent two sides of a same coin and the detection of outliers can be important for time series cleaning before trends analysis. I will nevertheless try hereafter to use this distinction as a red line to explain the diversity of packages offered by R to study time-series.
In quality control, the stability of the mean and standard deviation are of major importance as exemplified by the history of one of the first statistical efforts to maintain industrial quality, the control chart. In this respect, qcc is a reference implementation of the most classical quality control diagrams: Shewhart quality control, cusum and EWMA charts.
The old but still active mvoutlier and the more recent AnomalyDetection focus on outliers detection. mvoutlier mainly uses the Mahalanobis distance and can work with two dimensional datasets (rasters) and even multi-dimensional datasets using using the algorithm of Filzmoser, Maronna, and Werner (2007). AnomalyDetection uses the time series decomposition to identify both local anomalies (outlyers) and global anomalies (variations not explained by seasonal patterns).
and BreakoutDetection
As AnomalyDetection, BreakoutDetection have been open-sourced by twitter in 2014. BreakoutDetection, open-sourced in 2014 by Twitter, intends to detect breakouts it time series, that is groups of anomalies, using non-parametric statistics. The detection of breakouts comes very close to the detection of trends and understanding of patterns. In a similar optic, the brca package focuses on the analysis of irregularly sampled time-series, particularly to identify behavioral changes in animal movement.
Definitely shifting to determination of changes in trends changepoint implements multiple (simple) frequentist and non-parametric methods to detect single or multiple breaks in time series trends. strucchange allows to fit, plot and test trend changes using regression models. Finally, bfast builds on strucchange to analyze raster (e.g. satellite images) time series and handles missing data.