how to draw ellipses without scatterplot in ggplot - r

I am trying to represent niche of species by drawing inertia ellipses. The function to do this in ade4 is niche. Here is an example:
data(trichometeo)
pca1 <- dudi.pca(trichometeo$meteo, scan = FALSE)
nic1 <- niche(pca1, log(trichometeo$fau + 1), scan = FALSE)
s.distri(dfxy = nic1$ls, dfdistri = eval.parent(as.list(nic1$call)[[3]]))
This graph is not really clear.
PCA is done on environmental variables.
Each point of the PCA is a study site. In each study site, several species have been observed. The ellipses are the niches of each species.
When building the ellipse of one species, a weight is given to each of the study sites (the points) according to the relative abundance of the species. The center of gravity of these weighed points is the center of the ellipsoid. The width of the ellipse is linked to the variance of the weighed points.
so there is no scatterplot with a factor i could use to use stat_ellipse.
Any suggestions on how to do that in ggplot graphics ?
thank you

So, finally i found how to plot ellipses in ggplot.It is explained in the first part of the answer. The second part describes how to extract ellipsoid coordinates from niche analysis in ade4.
Draw a simple ellipsoid in ggplot In order to do that, you have to build a data frame with to columns x and y for coordinates of some of the points that compose the ellipse, and use geom_polygon as follow:
> dput(test)
structure(list(x = c(-0.74970124137657, -0.776450364352299, -0.804256933708176,
-0.833011209618567, -0.862599712093033, -0.892905668830007, -0.923809476063724,
-0.955189170585639, -0.986920911077492, -1.01887946685642, -1.05093871210323,
-1.08297212362341, -1.11485328017637, -1.14645636140231, -1.1776566443777,
-1.20833099583969, -1.23835835813684, -1.26762022698836, -1.29600111916637,
-1.32338902825544, -1.34967586669074, -1.37475789233028, -1.39853611787775,
-1.42091670154015, -1.44181131737848, -1.46113750388986, -1.47881898944545,
-1.49478599329964, -1.50897550098285, -1.52133151299083, -1.53180526578912,
-1.5403554242604, -1.54694824483541, -1.55155770866329, -1.55416562429619,
-1.55476169948255, -1.55334358178592, -1.54991686786897, -1.54449508140603,
-1.53709961971128, -1.52775966929336, -1.51651209066957, -1.50340127289419,
-1.48847895837522, -1.47180403867071, -1.45344232207069, -1.43346627388192,
-1.41195473044039, -1.38899258798039, -1.3646704675878, -1.3390843575601,
-1.31233523458437, -1.28452866522849, -1.2557743893181, -1.22618588684363,
-1.19587993010666, -1.16497612287294, -1.13359642835103, -1.10186468785917,
-1.06990613208025, -1.03784688683344, -1.00581347531326, -0.973932318760297,
-0.94232923753436, -0.911128954558971, -0.880454603096979, -0.850427240799828,
-0.821165371948307, -0.792784479770296, -0.765396570681226, -0.739109732245926,
-0.714027706606386, -0.690249481058919, -0.66786889739652, -0.646974281558191,
-0.627648095046803, -0.609966609491222, -0.593999605637033, -0.579810097953819,
-0.567454085945835, -0.556980333147553, -0.548430174676264, -0.541837354101259,
-0.537227890273375, -0.534619974640476, -0.534023899454122, -0.535442017150752,
-0.538868731067695, -0.544290517530637, -0.551685979225388, -0.561025929643302,
-0.572273508267099, -0.585384326042479, -0.600306640561448, -0.616981560265954,
-0.63534327686597, -0.655319325054748, -0.676830868496273, -0.699793010956271,
-0.724115131348859), y = c(0.325013216091984, 0.336960163623126,
0.346538198705152, 0.353709521209382, 0.358445829202159, 0.360728430639646,
0.360548317136793, 0.357906199519309, 0.352812505018361, 0.345287336119057,
0.335360391225119, 0.323070847452856, 0.308467206016976, 0.291607100818459,
0.272557070989873, 0.251392298295821, 0.228196310424862, 0.203060651343884,
0.176084520015889, 0.147374378907005, 0.117043533827776, 0.085211686766882,
0.0520044634820859, 0.0175529177127555, -0.0180069860293719,
-0.0545349090500008, -0.0918866923249894, -0.129914925430158,
-0.168469528302887, -0.207398343539562, -0.246547736891326, -0.285763203588276,
-0.324889978099203, -0.363773644920435, -0.402260747983286, -0.440199396275052,
-0.477439863283444, -0.513835177898732, -0.549241704441568, -0.583519709527388,
-0.616533913530252, -0.648154024469714, -0.678255252213751, -0.7067188009684,
-0.733432338110486, -0.758290437513162, -0.781194995614671, -0.802055618588285,
-0.820789979085438, -0.837324141144163, -0.851592851980555, -0.863539799511696,
-0.873117834593722, -0.880289157097953, -0.885025465090729, -0.887308066528217,
-0.887127953025363, -0.884485835407879, -0.879392140906931, -0.871866972007626,
-0.861940027113689, -0.849650483341425, -0.835046841905545, -0.818186736707027,
-0.799136706878442, -0.77797193418439, -0.754775946313431, -0.729640287232453,
-0.702664155904458, -0.673954014795575, -0.643623169716345, -0.611791322655452,
-0.578584099370656, -0.544132553601326, -0.508572649859198, -0.47204472683857,
-0.434692943563581, -0.396664710458413, -0.358110107585684, -0.31918129234901,
-0.280031898997246, -0.240816432300296, -0.201689657789369, -0.162805990968137,
-0.124318887905287, -0.0863802396135213, -0.0491397726051291,
-0.012744457989841, 0.0226620685529939, 0.0569400736388145, 0.0899542776416779,
0.12157438858114, 0.151675616325177, 0.180139165079826, 0.206852702221912,
0.231710801624588, 0.254615359726098, 0.275475982699712, 0.294210343196865,
0.31074450525559)), .Names = c("x", "y"), row.names = c(NA, -100L
), class = "data.frame")
then just plot the polygon:
ggplot()+geom_polygon(data=test, aes(x=x, y=y))
For this specific issue: how to extract ellipses coordinates from a niche analysis with ade4:
plots from ade4 can be put in an oject:
data(trichometeo)
pca1 <- dudi.pca(trichometeo$meteo, scan = FALSE)
nic1 <- niche(pca1, log(trichometeo$fau + 1), scan = FALSE)
p1<-s.distri(dfxy = nic1$ls, dfdistri = eval.parent(as.list(nic1$call)[[3]]))
p1 is an object of class S4, and it is possible to access slots with data using # as follow:
p1#s.misc$ellipse
this command display a list containing, for each species:
one vector of x coordinates of the ellipse
one vector of y coordinates
one vector with coordinates of the axes of the ellipse
To exctract these coordinates, you use sapply
listx=sapply(p1#s.misc$ellipse, "[", "x")
listy=sapply(p1#s.misc$ellipse, "[", "y")
then transform them into a data frame:
tabx=do.call(data.frame, listx)
taby=do.call(data.frame, listy)
and combine them in one data frame (i use melt from reshape package to have a long data frame for ggplot)
tabx.long=melt(tabx)
taby.long=melt(taby)
tab.fin=cbind.data.frame(tabx.long,taby.long)
you can then use this dataframe with the method explained above

Related

Exporting a contoured Kernel density estimation plot to raster or shapefile format

I'm trying to perform Kernel density estimation in R using some GPS data that I have. My aim is to create a contoured output with each line representing 10% of the KDE. From here i want to import the output (as a shapefile or raster) into either QGIS or arcmap so I can overlay the output on top of existing environmental layers.
So far i have used AdehabitatHR to create the following output using the below code:
kud<-kernelUD(locs1[,1], h="href")
vud<-getvolumeUD(kud)
vud <- estUDm2spixdf(vud)
xyzv <- as.image.SpatialGridDataFrame(vud)
contoured<-contour(xyzv, add=TRUE)
Aside from being able to remove the colour, this is how i wish the output to appear (or near to). However i am struggling to figure out how i can export this as either a shapefile or raster? Any suggestions would be gratefully received.
With the amt package this should be relatively straightforward:
library(adehabitatHR)
library(sf)
library(amt)
data("puechabonsp")
relocs <- puechabonsp$relocs
hr <- as.data.frame(relocs) %>% make_track(X, Y, name = Name) %>%
hr_kde(trast = raster(amt::bbox(., buffer = 2000), res = 50)) %>%
hr_isopleths(level = seq(0.05, 0.95, 0.1))
# Use the sf package to write a shape file, or any other supported format
st_write(hr, "~/tmp/home_ranges.shp")
Note, it is also relatively easy to plot
library(ggplot2)
ggplot(hr) + geom_sf(fill = NA, aes(col = level))

How to get a good dendrogram using R

I am using R to do a hierarchical cluster analysis using the Ward's squared euclidean distance. I have a matrix of x columns(stations) and y rows(numbers in float), the first row contain the header(stations' names). I want to have a good dendrogram where the name of the station appear at the bottom of the tree as i am not able to interprete my result. My aim is to find those stations which are similar. However using the following codes i am having numbers (100,101,102,...) for the lower branches.
Yu<-read.table("yu_s.txt",header = T, dec=",")
library(cluster)
agn1 <- agnes(Yu, metric = "euclidean", method="ward", stand = TRUE)
hcd<-as.dendrogram(agn1)
par(mfrow=c(3,1))
plot(hcd, main="Main")
plot(cut(hcd, h=25)$upper,
main="Upper tree of cut at h=25")
plot(cut(hcd, h=25)$lower[[2]],
main="Second branch of lower tree with cut at h=25")
A nice collection of examples are present here (http://gastonsanchez.com/blog/how-to/2012/10/03/Dendrograms.html)
Two methods:
with hclust from base R
hc<-hclust(dist(mtcars),method="ward")
plot(hc)
Default plot
ggplot
with ggplot and ggdendro
library(ggplot2)
library(ggdendro)
# basic option
ggdendrogram(hc, rotate = TRUE, size = 4, theme_dendro = FALSE)

How to export data from a Thin Plate Spline (TPS) plot in R language?

I'm a beginner to R and I am trying to extract data in a gridded format from a Thin Plate Spline plot in the R language. Basically I have a data-set of points from across the UK containing the longitude, latitude and amount of rainfall for a particular day. Using the following code I can plot this data onto a UK map:
dat <- read.table("~jan1.csv", header=T, sep=",")
names(dat) <- c("gauge", "date", "station", "mm", "lat", "lon", "location", "county", "days")
library(fields)
quilt.plot(cbind(dat$lon,dat$lat),dat$mm)
world(add=TRUE)
So far so good. I can also perform a thin plate spline interpolation (TPS) using:
fit <- Tps(cbind(dat$lon, dat$lat), dat$mm, scale.type="unscaled")
and then I can do a surface plot at a grid scale of my choice e.g.:
surface (fit, nx=100, ny=100)
This effectively gives me a gridded data plot at the resolution of 100*100. So here are my questions:
How do I extract the data from this gridded data set (i.e. actual values) to put in a file such as excel or .txt?
How could I change the grid size so the grid starts at a particular x value (and y value) in steps of my choice?
With a predict function available, a typical strategy would be to use something like:
rnglat <- range(dat$lat)
rnglon <- range(dat$lon)
xvals <- seq(rnglon[1], rnglon[2], len=100)
yvals <- seq(rnglat[1], rnglat[2], len=100)
griddf <- expand.grid(xvals, yvals)
griddf$pred <- predict(fit, x=as.matrix(griddf) )
(Since Tps doesn't use a formula interface and predict.Krig doesn't appear to use a newdata argument, I'm not making this in a form that would work for most regression problems.) If you want to narrow the range to something less than the full range or change the number of "grid lines", then modify the seq arguments. (Tested with the fit0-object constructed in the last example on the fields::predict.Krig help page.)

plot multiple shp file on a graph using spplot in R

I have 3 shp files representing the house, room, and beds of a house respectively. I need to plot them on a graph using R so that they all overlap with each other. I know that in plot function, I can use line to plot new lines on top of the existing plot, is there anything equivalent in spplot? Thanks.
Here's one approach, using the nifty layer() function from the latticeExtra package:
# (1) Load required libraries
library(sp)
library(rgeos) # For its readWKT() function
library(latticeExtra) # For layer()
# (2) Prepare some example data
sp1 = readWKT("POLYGON((0 0,1 0,1 1,0 1,0 0))")
sp2 = readWKT("POLYGON((0 1,0.5 1.5,1 1,0 1))")
sp3 = readWKT("POLYGON((0.5 0,0.5 0.5,0.75 0.5,0.75 0, 0.5 0))")
# spplot provides "Plot methods for spatial data with attributes",
# so at least the first object plotted needs a (dummy) data.frame attached to it.
spdf1 <- SpatialPolygonsDataFrame(sp1, data=data.frame(1), match.ID=1)
# (3) Plot several layers in a single panel
spplot(spdf1, xlim=c(-0.5, 2), ylim=c(-0.5, 2),
col.regions="grey90", colorkey=FALSE) +
layer(sp.polygons(sp2, fill="saddlebrown")) +
layer(sp.polygons(sp3, fill="yellow"))
Alternatively, you can achieve the same result via spplot()'s sp.layout= argument. (Specifying first=FALSE ensures that the 'roof' and 'door' will be plotted after/above the grey square given as spplot()'s first argument.)
spplot(spdf1, xlim=c(-0.5, 2), ylim=c(-0.5, 2),
col.regions="grey90", colorkey=FALSE,
sp.layout = list(list(sp2, fill="saddlebrown", first=FALSE),
list(sp3, fill="yellow", first=FALSE)))
You can use the sp.layout argument in spplot. Alternatively, you can use ggplot2. Some example code (untested):
library(ggplot2)
shp1_data.frame = fortify(shp1)
shp1_data.frame$id = "shp1"
shp2_data.frame = fortify(shp2)
shp2_data.frame$id = "shp2"
shp = rbind(shp1_data.frame, shp2_data.frame)
ggplot(aes(x = x, y = y, group = group, col = id), data = shp) + geom_path()
In ggplot2, columns in the data are linked to graphical scales in the plot. In this case x is the x-coordinate, y is the y-coordinate, group is a column in the data.frame shp which specifies to which polygon a point belongs, and col is the color of the polygon. The geometry I used is geom_path, which draws a series of lines based on the polygon input data.frame. An alternative is to use geom_poly, which also supports filling the polygon.

How can I overlay two dense scatter plots so that I can see the outlines of each in R or Matlab?

See this example
This was created in matlab by making two scatter plots independently, creating images of each, then using the imagesc to draw them into the same figure and then finally setting the alpha of the top image to 0.5.
I would like to do this in R or matlab without using images, since creating an image does not preserve the axis scale information, nor can I overlay a grid (e.g. using 'grid on' in matlab). Ideally I wold like to do this properly in matlab, but would also be happy with a solution in R. It seems like it should be possible but I can't for the life of me figure it out.
So generally, I would like to be able to set the alpha of an entire plotted object (i.e. of a matlab plot handle in matlab parlance...)
Thanks,
Ben.
EDIT: The data in the above example is actually 2D. The plotted points are from a computer simulation. Each point represents 'amplitude' (y-axis) (an emergent property specific to the simulation I'm running), plotted against 'performance' (x-axis).
EDIT 2: There are 1796400 points in each data set.
Using ggplot2 you can add together two geom_point's and make them transparent using the alpha parameter. ggplot2 als adds up transparency, and I think this is what you want. This should work, although I haven't run this.
dat = data.frame(x = runif(1000), y = runif(1000), cat = rep(c("A","B"), each = 500))
ggplot(aes(x = x, y = y, color = cat), data = dat) + geom_point(alpha = 0.3)
ggplot2 is awesome!
This is an example of calculating and drawing a convex hull:
library(automap)
library(ggplot2)
library(plyr)
loadMeuse()
theme_set(theme_bw())
meuse = as.data.frame(meuse)
chull_per_soil = ddply(meuse, .(soil),
function(sub) sub[chull(sub$x, sub$y),c("x","y")])
ggplot(aes(x = x, y = y), data = meuse) +
geom_point(aes(size = log(zinc), color = ffreq)) +
geom_polygon(aes(color = soil), data = chull_per_soil, fill = NA) +
coord_equal()
which leads to the following illustration:
You could first export the two data sets as bitmap images, re-import them, add transparency:
library(grid)
N <- 1e7 # Warning: slow
d <- data.frame(x1=rnorm(N),
x2=rnorm(N, 0.8, 0.9),
y=rnorm(N, 0.8, 0.2),
z=rnorm(N, 0.2, 0.4))
v <- with(d, dataViewport(c(x1,x2),c(y, z)))
png("layer1.png", bg="transparent")
with(d, grid.points(x1,y, vp=v,default="native",pch=".",gp=gpar(col="blue")))
dev.off()
png("layer2.png", bg="transparent")
with(d, grid.points(x2,z, vp=v,default="native",pch=".",gp=gpar(col="red")))
dev.off()
library(png)
i1 <- readPNG("layer1.png", native=FALSE)
i2 <- readPNG("layer2.png", native=FALSE)
ghostize <- function(r, alpha=0.5)
matrix(adjustcolor(rgb(r[,,1],r[,,2],r[,,3],r[,,4]), alpha.f=alpha), nrow=dim(r)[1])
grid.newpage()
grid.rect(gp=gpar(fill="white"))
grid.raster(ghostize(i1))
grid.raster(ghostize(i2))
you can add these as layers in, say, ggplot2.
Use the transparency capability of color descriptions. You can define a color as a sequence of four 2-byte words: muddy <- "#888888FF" . The first three pairs set the RGB colors (00 to FF); the final pair sets the transparency level.
AFAIK, your best option with Matlab is to just make your own plot function. The scatter plot points unfortunately do not yet have a transparency attribute so you cannot affect it. However, if you create, say, most crudely, a bunch of loops which draw many tiny circles, you can then easily give them an alpha value and obtain a transparent set of data points.

Resources