how to compute dendrogram for 1 dimensional data set, such as {1,23,45} in R - r

I'm not really sure how to represent the 1-D data set properly in R, so that I will be able to plot a dendrogram.
Please help.
##data set {1,23,45}
##this is what I have done so far, but the dendrogram doesn't seem correct.
data <-c(1,23,45)
datas <-data.frame(data)
d<- dist(datas,method="euclidean")
H.fit<- hclust(d,method="single")
plot(H.fit)

The plot is correct: every point in your list is being set in the same cluster.
The reason is that you are using single linkage which is the minimum distance between each cluster. In you data, the minimum distance between any pair and the remaining point is the same so everyone gets the same hierarchy.
Try using complete linkage. Your data dimensionality is well represented.

Related

How to get covariate data from a geographic raster for `ppm`?

I want to fit a Poisson point-process model with spatstat::ppm and I'm unsure what is the best way to feed covariate data to the function. I understand that spatstat expects planar coordinates, so I have transformed my point location data to a planar crs before creating a ppp point pattern object. The covariate data are in a raster stack with unprojected geographic coordinates and I understand that projecting rasters is generally ill-advised. I extracted covariate values for the point locations from the raster using the points' original geographic coordinates and raster::extract. So far so good. The issue is ...
it is not sufficient to have observed the covariate only at the points
of the data point pattern; the covariate must also have been observed
at other locations in the window. -ppm helpfile
I appear to have two options for providing the covariate data to the data argument.
A pixel image; seems ill-advised because of raster projection issues.
A list of functions (one per covariate) that can be evaluated at any location (x,y) to obtain corresponding covariate values. This seems like the way to go, but my attempt at writing such a function turns out to be ridiculously slow. It calls raster::extract for each coordinate pair after transforming the coordinates to the raster's crs. While raster::extract is reasonably fast when given a large number of points, there appears to be a substantial overhead for each call. According to microbenchmark, the coordinate transformation takes about 4ms and the extraction takes about 582ms for a single covariate, or about 4 seconds for each point to get all 7 covariates. I don't know how many times ppm will want to call this, but if it's even once per point in the pattern, it'll take too long.
Is there some way I can find out what is the complete set of points that ppm will query for covariate data so that I can extract those beforehand with a single call?
It seems like my use case (covariates in a geographic raster) should be pretty common, so I'm guessing there's an established way to do this right. What is it?
Thanks for a well written question clearly identifying you need. It would have been even better with a simple reproducible example using e.g. built-in data from raster and spatstat or artificially generated data. In lack of the reproducible example my answer will not contain any code but outline what you could do.
First step in ppm is to make a quadrature scheme or class quad or logiquad depending on which maximum likelihood approximation is used in ppm. These can be generated directly by the user via quadscheme or quadscheme.logi. The quadrature scheme contains all the points where ppm will evaluate the covariates. You can extract the coordinates of the quadrature scheme using the function coords. If you construct a data.frame with all covariates evaluated at these points you can supply that as the data argument to ppm while the quadrature scheme is the first argument. To understand things better try to read the Details section of help(ppm.quad).
Another approach which may give you the optimal use of your data is to extract the grid points of you current raster stack together with all the covariate values and project this point data. Then convert it to a simple data.frame with columns x, y, covar1, covar2, etc. Then you can use x and y together with your point observations of interest to create a quadrature scheme manually and the remaining columns can be supplied as data to ppm.
It would be interesting to compare the results from both these approaches as well as the results from just projecting the raster stack and converting it to a list of im objects.

r alphahull post-processing/ avoid two hulls

I have got a table with coordinates of points and want to get the smalest polygon around them. I tried different functions and so far alphahull works best for my purposes. My major interest is in the area of the hull. I have got approximately 3500 datasets, so I have to find a reliable method for my analysis.
I analysed some datasets and realised that in some cases I get a hull in a hull and areahull() is not able to return an area. A higher alpha-value would avoid this but would overestimate my area by far.
Is there a possibility to post-process my alpha-hull to remove the second hull? Or a better method to get the size of the area?
library(alphahull)
tmp <- ahull(path.points.1$x, path.points.1$y, alpha = 50)
plot(tmp, wpoints = F)
lin to example dataset
I found a solution which seems to work for my purposes: the function ahull_track() returns only the boundary as a geom_path()-object. the coordinates of the single boundary segments are stored in a list. unfortunately they are not in the correct order, so it is no straight-forward solution. I had to write a function which rearranges the segments into the correct order and generates a polygon.

Graph distances as a dist object

I have a graph with undirected weighted links, and I want to process the graph distance between all of its pairs of nodes. Because it is a large graph, I would like to get the result as a dist object (by opposition to a full, symmetric, matrix).
Is there a way to do that with igraph? According to the documentation, it doesn't seem so, but I may be missing something. Obviously, I don't want to get the full symmetric matrix and convert it using as.dist().
Is there any alternative R library allowing to get this result?
Thanks.

R: Determining whether a point lies inside a region made up of separate polygons generated from contourLines()

I am using the function contourLines() in R to record the vertices of a contour based on a probability density estimation. Then I test to see whether a point lies inside the contour region. I can do this test easily when there is only one region (polygon) created from contourLines, but sometimes the there are multiple polygons created. I am trying to come up with a way to determine whether a point lies inside the multiple polygon contour.
My idea so far is to calculate the number of polygons generated and treat each one separately. I was thinking I could use graph theory to determine the number of polygons generated because there will not be a path between points on 2 separate polygons.
Probably there is an easier way. Any suggestions?
Thanks in advance,
HS

Plotting cosine wave samples in Maple

I'm having trouble with Maple.
I have a cosine wave, which I figured out how to plot, but now I have to take samples
from that wave and plot those(as dots) over top of the original cosine wave.
Here is the question from the assignment:
"Produce the samples from Q1 above and plot the result (plot the points on a plot of the cosine wave - use different colours for both, it will look like a cosine wave with dots on it)"
Problem is, my samples keep being straight lines at different heights
http://i197.photobucket.com/albums/aa221/Haseo_Ame/Maple.png
I'm not sure what I'm doing wrong since I've never used maple before.
Firstly, try not to build up lists using repeated concatenation (which can incur an O(n^2) in resources) if you can use the seq command instead (which can incur an O(n) cost in resources). You should always reconsider, when coding like s:=[op(s),...] in a loop.
Next, a point-plot needs pairs of x-y values. Your list is just a collection of scalar values, and hence is being interpreted as a collection of constant functions to be plotted.
The pairs of x-y values can be in a list of (2-element) lists such as [[x1,y1],...,[xn,yn]
It's not clear how you want your x-axis scaled, but you could start off with something like this,
s:=[seq([i, 4*cos(2*Pi*i*70/200+Pi/4)],i=0..20)]:
plot(s, style=point);
# s:=[seq([2*Pi*i*70/200+Pi/4, 4*cos(2*Pi*i*70/200+Pi/4)],i=0..20)]:
ps. Please post source code as text, not as embedded images, so that anyone trying to help needn't type it all in.

Resources