I am undertaking research looking at the interactions of individual rats with a grid of traps distributed across the landscape (I have x, y coordinates for all trap locations). For each rat, I have generated a kernel utilisation density "home range" estimate using the R package adehabitatHR. What I'd like to do next is the following:
1- For each rat, calculate fine-scale home range contours from 1 - 99%
2- For each trap, calculate the minimum isopleth on which it is located: for example, trap 1 might "first" be on the 20% isopleth, trap 2 might "first" be on the 71% isopleth
My ultimate goal is to use the minimum isopleths calculated in a logistic regression to estimate the probability that a particular rat will "encounter" a particular trap within a specified time period.
Step 1 is easy enough but I'm having trouble imagining a way to accomplish step 2 short of plotting it all out manually (possible but I think there must be a better way). I suspect that part of my problem is that I'm new to both R and analysis of spatial data and I'm probably not searching with the right key words. Of what I've managed to find, the discussion that most closely resembles what I want to do is this.
How can I get the value of a kernel density estimate at specific points?
The above succeeds in calculating the probability value at specific points within a kernel utilisation distribution. However, what I'm trying to do is more to assign specific locations to a "category" - i.e. 5% category, 22% category etc.
Here is a small sample of my rat location data (coordinate system NZTM)
RatID Easting Northing
18 1732782.018 5926656.26
18 1732746.074 5926624.161
18 1732775.206 5926617.687
18 1732750.443 5926653.985
18 1732759.188 5926645.705
18 1732765.358 5926624.287
18 1732762.588 5926667.765
18 1732707.336 5926638.793
18 1732759.54 5926693.451
18 1732743.532 5926645.08
18 1732724.905 5926637.952
18 1732729.757 5926594.709
18 1732743.725 5926603.689
18 1732754.217 5926591.804
18 1732733.287 5926619.997
18 1732813.398 5926632.372
18 1732764.513 5926609.795
18 1732756.472 5926607.948
18 1732771.352 5926609.855
18 1732789.088 5926598.158
18 1732768.952 5926620.593
18 1732742.667 5926630.391
18 1732751.399 5926595.63
18 1732749.846 5926624.015
18 1732756.466 5926661.141
18 1732748.507 5926597.018
18 1732782.934 5926620.3
18 1732779.814 5926633.227
18 1732773.356 5926613.596
18 1732755.782 5926627.243
18 1732786.594 5926619.327
18 1732758.493 5926610.918
18 1732760.756 5926617.973
18 1732748.722 5926621.693
18 1732767.133 5926655.643
18 1732774.129 5926646.358
18 1732766.18 5926659.081
18 1732747.999 5926630.82
18 1732755.94 5926606.326
18 1732757.592 5926586.467
And here are the location data for my grid of traps:
TrapNum Easting Northing
HA1 1732789.055 5926589.589
HA2 1732814.738 5926605.615
HA3 1732826.837 5926614.635
HA4 1732853.275 5926621.766
HA5 1732877.903 5926638.804
HA6 1732893.335 5926649.771
HA7 1732917.186 5926651.287
HA8 1732944.25 5926669.952
HA9 1732963.233 5926679.758
HB1 1732778.721 5926613.718
HB2 1732798.169 5926624.735
HB3 1732818.44 5926631.303
HB4 1732844.132 5926647.878
HB5 1732862.387 5926662.465
HB6 1732884.118 5926671.112
HB7 1732903.641 5926681.234
HB8 1732931.883 5926695.332
HB9 1732947.286 5926698.757
HC1 1732766.385 5926629.555
HC2 1732785.31 5926647.128
HC3 1732801.985 5926657.742
HC4 1732835.289 5926664.553
HC5 1732843.434 5926694.72
HC6 1732862.648 5926702.187
HC7 1732878.385 5926709.82
HC8 1732916.886 5926712.215
HC9 1732935.947 5926715.582
HD1 1732755.253 5926654.033
HD2 1732774.911 5926672.812
HD3 1732794.617 5926671.724
HD4 1732820.064 5926689.754
HD5 1732816.794 5926714.769
HD6 1732841.166 5926732.481
HD7 1732865.646 5926734.21
HD8 1732906.592 5926738.893
HD9 1732930.1 5926752.73
Below is the code I used to calculate 1-99% home range contours using package adehabitatHR (Step 1). In addition, the code to plot selected home range isopleths over the grid of traps.
### First, load adehabitatHR and dependents
## specifying which variables are coordinates converts the dataframe into class SpatialPointsDataFrame
coordinates (RatLocs) = c("Easting", "Northing")
# create and store in object kudH KUDs using default bivariate normal kernel function and least-squares-cross-validation as smoothing bandwidth
kudH = kernelUD(RatLocs[,1], h = "LSCV")
kudH
## estimating home range from the KUD - mode VECTOR
homerange = getverticeshr(kudH)
## calculate home-range area for ALL probability levels (every 1%)
hr1to100 = kernel.area(kudH, percent = seq(1,100, by =1))
# generates error - for 100% kernel. rerun kernel UD with larger extent parameter.
## tried a range of values for other extents. Couldn't get one that worked for a 100% isopleth, 99% works
hr1to99 = kernel.area(kudH, percent = seq(1,99, by =1))
## An example of calculating and plotting selected home range isopleths over the grid of traps
## plot the trap grid
plot(Grid[,2], Grid[,3], xlab="Easting", ylab="Northing", pch=3, cex = 0.6, col="black", bty = "n", xlim=c(1742650,1743100), ylim=c(5912900,5913200), main = "KUD Home Range rat 33")
text(Grid[,2], Grid[,3], Grid[,1], cex=0.6, pos=2)
# Calculate and plot 95%, 75% and 50% contours for rat ID 33 (rat 2 in dataset)
HR95pc = getverticeshr(kudH)
plot(HR95pc[2,], col= rgb (1,0,0, alpha =0.1), border = "red1", add=TRUE)
HR75pc = getverticeshr(kudH, percent=75)
plot (HR75pc[2,], col = rgb(0,0,1, alpha =0.3), border = "purple", add=TRUE)
HR50pc = getverticeshr(kudH, percent=50)
plot(HR50pc[2,], col = rgb (0,1,1, alpha=0.3), border = "blue2", add=TRUE)
# Add individual location points for rat ID 33
rat33L = subset(RatLocs, RatID =="33")
plot(rat33L[,1], pch = 16, col = "blue", add=TRUE)
Can anyone help me get started on Step 2? I'd be grateful for any ideas.
Thanks.
Related
I want to identify a couple of points with high leverage on the plot below, but unfortunately, their row number is illegible, because there must be a couple of such points, and their id is printed out one on top of the other. They are all the way to the right of the plot:
How can the print out of these labels on the plot be resized and spread out so that they can be legible?
The easiest way to find the cooks distance is the built in function:
LM = lm(speed ~ dist, cars)
cooks.distance(LM)
You can pick out whatever values you want:
> which(cooks.distance(LM) > 0.05)
1 2 23 35 39 49
1 2 23 35 39 49
Here is a plot of several different time series that I made in R:
I made these using a simple loop:
for(i in 1:ngroups){
x[paste0("Group_",i)] = apply(x[,group == i],1,mean)
}
plot(x$Group_1,type="l",ylim=c(0,300))
for(i in 2:ngroups){
lines(x[paste0("Group_",i)],col=i)
}
I also could have made this plot using matplot. Now, as you can see, each group is the mean of several other columns. What I would like to do is plot the series as in the plot above, but additionally show the range of the underlying data contributing to that mean. For example, the purple line would be bounded by a region shaded light purple. At any given time index, the purple region will extend from the lowest value in the purple group to the highest value (or, say, the 5 to 95 percentiles). Is there an elegant/clever way to do this?
Here is an answer using the graphics package (graphics that come with R). I also try to explain how it is that the polygon (which is used to generate the CI) is created. This can be repurposed to solve your problem, for which I do not have the exact data.
# Values for noise and CI size
s.e. <- 0.25 # standard error of noise
interval <- s.e.*qnorm(0.975) # standard error * 97.5% quantile
# Values for Fake Data
x <- 1:10 # x values
y <- (x-1)*0.5 + rnorm(length(x), mean=0, sd=s.e.) # generate y values
# Main Plot
ylim <- c(min(y)-interval, max(y)+interval) # account for CI when determining ylim
plot(x, y, type="l", lwd=2, ylim=ylim) # plot x and y
# Determine the x values that will go into CI
CI.x.top <- x # x values going forward
CI.x.bot <- rev(x) # x values backwards
CI.x <- c(CI.x.top, CI.x.bot) # polygons are drawn clockwise
# Determine the Y values for CI
CI.y.top <- y+interval # top of CI
CI.y.bot <- rev(y)-interval # bottom of CI, but rev Y!
CI.y <- c(CI.y.top,CI.y.bot) # forward, then backward
# Add a polygon for the CI
CI.col <- adjustcolor("blue",alpha.f=0.25) # Pick a pretty CI color
polygon(CI.x, CI.y, col=CI.col, border=NA) # draw the polygon
# Point out path of polygon
arrows(CI.x.top[1], CI.y.top[1]+0.1, CI.x.top[3], CI.y.top[3]+0.1)
arrows(CI.x.top[5], CI.y.top[5]+0.1, CI.x.top[7], CI.y.top[7]+0.1)
arrows(CI.x.bot[1], CI.y.bot[1]-0.1, CI.x.bot[3], CI.y.bot[3]-0.1)
arrows(CI.x.bot[6], CI.y.bot[6]-0.1, CI.x.bot[8], CI.y.bot[8]-0.1)
# Add legend to explain what the arrows are
legend("topleft", legend="Arrows indicate path\nfor drawing polygon", xjust=0.5, bty="n")
And here is the final result:
I have made a df using some random data.
Here's the df
df
x y
1 1 3.1667912
2 1 3.5301539
3 1 3.8497014
4 1 4.4494311
5 1 3.8306889
6 1 4.7681518
7 1 2.8516945
8 1 1.8350802
9 1 5.8163498
10 1 4.8589443
11 2 0.3419090
12 2 2.7940851
13 2 1.9688636
14 2 1.3475315
15 2 0.9316124
16 2 1.3208475
17 2 3.0367743
18 2 3.2340156
19 2 1.8188969
20 2 2.5050162
When you plot using stat_summary with mean_cl_normal and geom smooth
ggplot(df,aes(x=x,y=y))+geom_point() +
stat_summary(fun.data=mean_cl_normal, geom="smooth", colour="red")
As someone commented, maybe mean_cl_boot was better so I used it.
ggplot(df,aes(x=x,y=y))+geom_point() +
stat_summary(fun.data=mean_cl_boot, geom="smooth", colour="red")
They are indeed a little different. Also you could play with confint parameter depending on your need.
Assuming I have the posterior samples for each of the four parameters. My question is how to plot the pairwise marginal distribution on a grid of 4*4=16 with ggplot2?
I would like to creat a plot like the picture below but instead of the scatter plot I will use a pairwise marginal distributions. Organized in the form of this kind of grid.
I am wondering can ggmcmc package achieve my goal?
Thanks in advance, guys!!
After getting help from the previous comments, I post the code below in case other people would like to do the same thing as me.
Below is a simple dataset I create for demonstration.This is the dataset "df" with four variables x, y, z, w. We want to get the pairwise joint kernel density estimation. One easy way I find is to use ggpairs from GGally package based on the comments by user20650. The codes are below: It will create the following plot:
ggpairs(df,upper = list(continuous = "density"),
lower = list(combo = "facetdensity"))
x y z w
1 0.49916998 -0.07439680 0.37731097 0.0927331640
2 0.25281542 -1.35130718 1.02680343 0.8462638556
3 0.50950876 -0.22157249 -0.71134553 -0.6137126948
4 0.28740609 -0.17460743 -0.62504812 -0.7658094835
5 0.28220492 -0.47080289 -0.33799637 -0.7032576540
6 -0.06108038 -0.49756810 0.49099505 0.5606988283
7 0.29427440 -1.14998030 0.89409384 0.5656682378
8 -0.37378096 -1.37798177 1.22424964 1.0976507702
9 0.24306941 -0.41519951 0.17502049 -0.1261603208
10 0.45686871 -0.08291032 0.75929106 0.7457002259
11 -0.16567173 -1.16855088 0.59439600 0.6410396945
12 0.22274809 -0.19632766 0.27193362 0.5532901113
13 1.25555629 0.24633499 -0.39836999 -0.5945792966
14 1.30440121 0.05595755 1.04363679 0.7379212885
15 -0.53739075 -0.01977930 0.22634275 0.4699563173
16 0.17740551 -0.56039760 -0.03278126 -0.0002523205
17 1.02873716 0.05929581 -0.74931661 -0.8830775310
18 -0.13417946 -0.60421101 -0.24532606 -0.1951831558
19 0.11552305 -0.14462104 0.28545703 -0.2527437818
20 0.71783902 -0.12285529 1.23488185 1.3224880574
Not sure whether this should go on cross validated or not but we'll see. Basically I obtained data from an instrument just recently (masses of compounds from 0 to 630) which I binned into 0.025 bins before plotting a histogram as seen below:-
I want to identify the bins that are of high frequency and that stands out from against the background noise (the background noise increases as you move from right to left on the a-xis). Imagine drawing a curve line ontop of the points that have almost blurred together into a black lump and then selecting the bins that exists above that curve to further investigate, that's what I'm trying to do. I just plotted a kernel density plot to see if I could over lay that ontop of my histogram and use that to identify points that exist above the plot. However, the density plot in no way makes any headway with this as the densities are too low a value (see the second plot). Does anyone have any recommendations as to how I Can go about solving this problem? The blue line represents the density function plot overlayed and the red line represents the ideal solution (need a way of somehow automating this in R)
The data below is only part of my dataset so its not really a good representation of my plot (which contains just about 300,000 points) and as my bin sizes are quite small (0.025) there's just a huge spread of data (in total there's 25,000 or so bins).
df <- read.table(header = TRUE, text = "
values
1 323.881306
2 1.003373
3 14.982121
4 27.995091
5 28.998639
6 95.983138
7 2.0117459
8 1.9095478
9 1.0072853
10 0.9038475
11 0.0055748
12 7.0964916
13 8.0725191
14 9.0765316
15 14.0102531
16 15.0137390
17 19.7887675
18 25.1072689
19 25.8338140
20 30.0151683
21 34.0635308
22 42.0393751
23 42.0504938
")
bin <- seq(0, 324, by = 0.025)
hist(df$values, breaks = bin, prob=TRUE, col = "grey")
lines(density(df$values), col = "blue")
Assuming you're dealing with a vector bin.densities that has the densities for each bin, a simple way to find outliers would be:
look at a window around each bin, say +- 50 bins
current.bin <- 1
window.size <- 50
window <- bin.densities[current.bin-window.size : current.bin+window.size]
find the 95% upper and lower quantile value (or really any value you think works)
lower.quant <- quantile(window, 0.05)
upper.quant <- quantile(window, 0.95)
then say that the current bin is an outlier if it falls outside your quantile range.
this.is.too.high <- (bin.densities[current.bin] > upper.quant
this.is.too.low <- (bin.densities[current.bin] < lower.quant)
#final result
this.is.outlier <- this.is.too.high | this.is.too.low
I haven't actually tested this code, but this is the general approach I would take. You can play around with window size and the quantile percentages until the results look reasonable. Again, not exactly super complex math but hopefully it helps.
I have a number of coordinates and I want to plot them in a gridded interface by using R.
The problem is that the relative distance between observations is large. Coordinates are in a geographic coordinate system and the study area is Switzerland. Moreover, id of the points is required to be plotted.
The problem is that two clusters of the points are dense and some other points are separated with a large distance. How I can plot them in a proper way to have readable presentation? Any suggestion for plotting the data?
Preferably, do not use ggplot as I used it before and it did not present proper results.
Data:
id x y
2 7.1735 45.86880001
3 7.17254 45.86887001
4 7.171636 45.86923601
5 7.18018 45.87158001
6 7.17807 45.87014001
7 7.177229 45.86923001
8 7.17524 45.86808001
9 7.181409 45.87177001
10 7.179299 45.87020001
11 7.178359 45.87070001
12 7.175189 45.86974001
13 7.179379 45.87081001
14 7.175509 45.86932001
15 7.176839 45.86939001
17 7.18099 45.87262001
18 7.18015 45.87248001
19 7.18122 45.87355001
20 7.17491 45.86922001
25 7.15497 45.87058001
28 7.153399 45.86954001
29 7.152649 45.86992001
31 7.154419 45.87004001
32 7.156099 45.86983001
GSBi_1 7.184 45.896
GSBi__1 7.36 45.901
GSBj__1 7.268 45.961
GSBj_1 7.276 45.836
GSB 7.272 45.899
GSB_r 7.166667 45.866667
Location of points:
As you can see in the plot, the points' ids are not readable both for the dense parts and others.
Practically, it is not always possible to ensure that all points are visually separable on the screen when plotting a set of points that contains very close and very far points at the same time.
Think of a 1000x800 pixel screen. Let's say we have three points A, B and C that are located respectively on the same horizontal line such that: the distance between A and B is 1 unit and the distance between A and C is 4000 unit.
If you map this maximum distance (4000 unit) to the width of the screen (1000px). Then a pixel will correspond to 4 units in horizontal. That means A and B will fit into one pixel since the distance between them is only 1 unit. So, they will not be visually separable on the screen.
Your points are far too close to really do too much with, but an idea might be spread.labels from plotrix:
opar <- par()
par(xpd=TRUE)
plot(dat$x, dat$y)
spread.labels(dat$x,dat$y,dat$id)
par(opar)
You may want to consider omitting all the numerical labels and placing them in a different graph.