R: Interpreting distance classes in correlog {pgirmess}? - r
I'm using the correlog function in the pgirmess package to get Moran's I over 20-30 distance classes, but am not sure what the unit of measurement is for the output distance classes. The input coordinates are in decimal degrees. The function documentation notes "Distances are euclidian and in the same unit as the spatial coordinates", but I'm still not exactly sure how to interpret the distance classes in the output - are the distance class bins in m, km, degrees, etc? Here's my code:
# longitude = mid.lon
# latitude = mid.lat
# variable of interest for spatial autocorrelation = std_cpue
library(pgirmess)
library(spdep)
df.xy = cbind(data$mid.lon, data$mid.lat)
pgi.cor = correlog(coords=df.xy, z=data$std_cpue, method="Moran", nbclass=30)
With 30 neighborhood classes, the result for the entire dataset looks like this. The distance class given is the midpoint for the bin:
print(pgi.cor)
# Moran I statistic
# dist.class coef p.value n
# [1,] 0.2519862 2.738572e-02 3.911359e-32 698490
# [2,] 0.7559590 -2.650938e-03 9.050678e-01 1084054
# [3,] 1.2599313 6.667723e-05 4.116504e-01 1063526
# [4,] 1.7639037 3.513692e-03 1.228453e-02 884720
# [5,] 2.2678760 2.719341e-03 4.536515e-02 729678
# [6,] 2.7718483 -5.959940e-03 9.988661e-01 690428
# [7,] 3.2758207 3.388526e-03 2.280808e-02 718940
# [8,] 3.7797930 1.443793e-03 1.830925e-01 633504
# [9,] 4.2837653 -4.573091e-04 5.278008e-01 519468
# [10,] 4.7877377 -8.749218e-03 9.999291e-01 397686
# [11,] 5.2917100 2.405016e-03 1.493334e-01 311976
# [12,] 5.7956823 2.089573e-03 2.258621e-01 256072
# [13,] 6.2996547 -1.182670e-03 5.998478e-01 202578
# [14,] 6.8036270 -2.270657e-03 7.158043e-01 166596
# [15,] 7.3075993 -4.629743e-03 9.011101e-01 156026
# [16,] 7.8115716 -3.213096e-03 8.094323e-01 160848
# [17,] 8.3155440 -4.373410e-03 8.707319e-01 163870
# [18,] 8.8195163 -3.356690e-04 5.015126e-01 169376
# [19,] 9.3234886 -4.467592e-03 8.685484e-01 169512
# [20,] 9.8274610 -2.546946e-03 7.127175e-01 150146
# [21,] 10.3314333 1.370106e-02 4.662235e-04 122808
# [22,] 10.8354056 -8.699153e-03 9.719764e-01 109024
# [23,] 11.3393780 -9.322568e-03 9.750500e-01 102748
# [24,] 11.8433503 -2.383252e-03 6.464213e-01 85680
# [25,] 12.3473226 -3.473310e-03 7.210551e-01 85942
# [26,] 12.8512950 2.053248e-03 3.396486e-01 66042
# [27,] 13.3552673 -1.037403e-02 8.547700e-01 32428
# [28,] 13.8592396 -1.033826e-02 6.762256e-01 11012
# [29,] 14.3632120 -3.007297e-02 7.217509e-01 1244
# [30,] 14.8671843 -6.886551e-02 6.864535e-01 154
Reproducible data (only the first 50 rows of the dataset):
> dput(data)
structure(list(mid.lat = c(28.7969496928494, 28.3930867867479,
29.994, 27.4784336939524, 29.422593551961, 28.5826238813314,
28.7477216329144, 29.3433487514478, 29.4226940782315, 29.3535708114362,
28.113333, 28.1130776659231, 28.2415339610655, 29.0009495727289,
29.7557386166675, 30.1020183777123, 28.0200002127096, 28.7864004408834,
30.1284937679637, 29.8328992823496, 28.9037836662043, 29.8021310079424,
28.0232807300034, 28.3553360292622, 29.0875191742967, 29.0220856353549,
27.9313060847168, 28.83, 29.5104509959267, 29.8466720353246,
28.8814346610816, 28.1373531188643, 29.3582385823534, 28.809044113648,
29.3867773013913, 29.4805574724306, 28.465504194069, 28.6696044277849,
29.5300092012194, 28.0430185205882, 28.2061620529272, 29.4275806851126,
26.5081134049796, 28.1275544648238, 29.8995981792495, 27.9848607011733,
26.709333, 28.0248252141179, 27.9728617106042, 28.9710761741436
), mid.lon = c(-84.5963462803782, -90.2686343226641, -87.374667,
-84.7457473224263, -87.9880238574933, -84.8349303764527, -84.6637647705975,
-87.8703015583197, -87.6622139897327, -88.5050810721282, -94.3925,
-90.346370340355, -92.8455008541893, -85.8699396759243, -86.9236199327813,
-86.9270244367842, -84.1683543397277, -89.2031178427517, -86.7908469980617,
-86.7643717886603, -85.819506226643, -86.7113004426214, -95.8135406472186,
-91.6316607122335, -85.2654292446955, -85.3228098920376, -93.9566215033579,
-89.526667, -87.6660902037082, -86.0710278956076, -89.5803704536036,
-90.8071728375477, -85.9890923714648, -84.7585523170688, -86.3493169018374,
-87.9960861956136, -84.266238497227, -84.5619763017653, -87.516209287989,
-91.3888746998191, -90.5451786588464, -87.3552938848394, -82.8477832707687,
-93.3828028011249, -86.2444455292202, -95.0747515699181, -82.891333,
-93.7656918819001, -92.8027598646245, -85.9850645824538), std_cpue = c(4.15234074914,
5.66057254934, 9.18436048054, 57.3175320669, 18.8400703246, 1843.47803667,
2.11506377428, 12.7170026758, 11.1626934066, 8.54011518736, 15.86758562,
13.8956556998, 4.38083061994, 67.7079534217, 5.76247720007, 25.4144340451,
9.46034915015, 14.8236026456, 22.8203364264, 5.79376884735, 89.6224743353,
8.45411201327, 23.9702041714, 13.1097292376, 75.4677852659, 1.56569331032,
44.990410447, 19.7090607295, 18.1197937416, 21.593493236, 46.9911787332,
19.2194902326, 55.782614307, 12.6585921867, 87.6939183102, 7.76649659183,
5.01359412606, 14.7829900356, 28.2493550901, 22.752832268, 7.43168604362,
75.9057643933, 1.18254364377, 5.98151873107, 23.1061861061, 41.3675267384,
11.4449526399, 45.7536886171, 10.6669337284, 66.5718319458)), .Names = c("mid.lat",
"mid.lon", "std_cpue"), row.names = c(1L, 67L, 69L, 536L, 842L,
2203L, 2586L, 2997L, 2998L, 3472L, 3474L, 3475L, 3855L, 4582L,
5084L, 5088L, 5987L, 6776L, 6778L, 7648L, 7651L, 8075L, 8079L,
8086L, 9069L, 9073L, 9080L, 9532L, 10526L, 11307L, 11308L, 11683L,
12082L, 12086L, 12087L, 12094L, 12500L, 12503L, 12505L, 12506L,
12507L, 12994L, 13016L, 13488L, 13497L, 13507L, 13520L, 14605L,
15487L, 15792L), class = "data.frame")
Thanks in advance!
After working with this data and the spdep package a bit more, I believe that the distance class variable here is in km. Other functions that take decimal degree coordinates as inputs also give outputs in km, or Great Circle distances (which are in km). Since the correlog documentation notes that "Distances are euclidian and in the same unit as the spatial coordinates", I'm interpreting this as km. An example in "Applied Spatial Data Analysis with R" by Bivand et al. also indicated that the bins used in coorelog are in km.
If you look inside the function correlog it simply calculates the Euclidean distance of the coordinates you provided the function and returns the average distance for each nbclass bin you generate. Meaning it returns the dist.class values in the same units as you provided it.
It makes no special conversion from whatever format you provide it, the code snip below, you can see it calculate simple distances for the bins with no unit transformation.
function (coords, z, method = "Moran", nbclass = NULL, ...)
{
coords <- as.matrix(coords)
matdist <- dist(coords)
...
etendue <- range(matdist)
breaks1 <- seq(etendue[1], etendue[2], l = nbclass + 1)
breaks2 <- breaks1 + 0.000001
breaks <- cbind(breaks1[1:length(breaks1) - 1], breaks2[2:length(breaks2)])
...
res <- cbind(dist.class = rowMeans(breaks), coef = mat[,
1], p.value = mat[, 2], n = mat[, 3])
}
Related
Using terra and sf in R: Why am I getting illogical distance measurements?
I am using terra to get "curvy" distances between points within a bounding polygon and comparing those to straight-line distances that ignore the polygon. The results I'm getting back don't make sense, and I am hoping you all could help me figure out what is going on. We load the US Congressional map used in the 114th Congress for the state of Texas first: texas = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/texascongressmaps") ggplot() + geom_sf(data = texas$geometry) We also make some storage objects: longest.dist.district.straight = rep(NA, 36) longest.dist.district.curved = rep(NA, 36) Then, we go district by district (n = 36). For each, we take a sample of 100 random points within that district's polygon. Then, we ask "What is the longest straight-line distance between any two of our 100 points?" We then rasterize the polygon, mask it, and go point by point, asking "How far is this point from all others, assuming we cannot travel outside the polygon?" This means we'll have to bend around within the polygon to get between the points some of the time. We find the longest such distance between any two points. We then compare the straight-line and curvy-line approaches, with the assumption that the curvy-line approaches will always be longer by some amount... for(c in 1:36) { #Texas had 36 districts. if(c %% 3 == 0) {print(c)} # Progress bar this.district = texas[c, ] #Get the current district #We'll get a sample of 100 randomly placed points around the district. rand.ptsDistrict = sf::st_sample(this.district, size = 100, type = 'random', exact = TRUE) #What's the max straight-line distance between any two points? longest.dist.district.straight[c] = max(sf::st_distance(rand.ptsDistrict)) #Now, calculate our 'as the politician would walk' distances (aka curvy distances). We need to do this for each of our 100 points separately, with each as the target point in turn, and save the longest value we get... current.raster = terra::ext(this.district) # Rasterizing current.raster = terra::rast(current.raster, nrow=100, ncol=100, crs = crs(this.district), vals = 1) current.raster = terra::mask(current.raster, # Masking terra::vect(this.district), updatevalue = NA) point.locs = terra::cellFromXY(current.raster, # Getting point locations in the new grid sf::st_coordinates(rand.ptsDistrict)) longest.dists.i = rep(NA, 100) # Storage object for(i in 1:100) { point.i.loc = cellFromXY(current.raster, #Focal point this time. st_coordinates(rand.ptsDistrict[i])) point.noni.loc = cellFromXY(current.raster, #All other points st_coordinates(rand.ptsDistrict[-i])) terra::values(current.raster)[point.i.loc] = 2 # Make focal point the target value all.dists = terra::gridDistance(current.raster, #Get all distances to the target value target = 2, scale = 1) longest.dists.i[i] = max(values(all.dists)[point.noni.loc], na.rm=TRUE) # Find the longest of these for this point and store it. terra::values(current.raster)[point.i.loc] = 1 } longest.dist.district.curved[c] = max(longest.dists.i) # Find the longest curved distance between any two points in the current district. } When I do this, I always get straight-line distances that are strictly longer than the curvy distances from the same district, which doesn't logically make sense--how could a straight line between two points ever be longer than a curvy line between them? > (cbind(longest.dist.district.straight, longest.dist.district.curved)) longest.dist.district.straight longest.dist.district.curved [1,] 239285.77 121703.64 [2,] 63249.88 48238.89 [3,] 49495.09 24823.91 [4,] 290542.38 147894.80 [5,] 213758.13 108663.63 [6,] 129261.83 68351.77 [7,] 36705.18 22081.22 [8,] 165759.58 87749.33 [9,] 38317.61 19903.54 [10,] 196211.38 100959.66 [11,] 505130.81 261479.58 [12,] 79502.87 45134.11 [13,] 604901.43 313317.24 [14,] 201724.57 115286.81 [15,] 414257.14 208204.75 [16,] 61867.34 32115.77 [17,] 193198.96 103829.75 [18,] 41693.26 26462.02 [19,] 433902.07 225041.00 [20,] 32201.45 17060.41 [21,] 212300.45 119597.54 [22,] 88143.49 46720.59 [23,] 777236.95 394663.54 [24,] 39692.06 21192.98 [25,] 299336.81 153871.46 [26,] 65901.64 35200.83 [27,] 272822.43 158724.70 [28,] 362477.84 205297.74 [29,] 40210.19 30094.43 [30,] 44693.37 23430.33 [31,] 93781.16 50340.85 [32,] 38941.81 21047.40 [33,] 52395.85 31169.46 [34,] 394586.71 206545.50 [35,] 138182.61 73556.10 [36,] 223351.15 112601.38 I can only guess I have either messed up the code somewhere or else have found a bug. Please help! Thanks! Edit: I just noticed after posting this that it looks like if I were to multiply the curvy distances by 2, I'd get values that were believable (the curvy distances are always longer but by a variable amount)--but I don't see a coding reason to need to do this...can anyone else see one I'm missing?
You are comparing the shortest-distance ("as the crow flies" to those who have not seen crows fly) with the grid-distance (move from the center of a cell to the center of a neighboring cell), only allowing to use the grid cells that fall within a district. When I run a condensed version of your code, I see that the distances are very similar, with the grid distance always longer, as they should be, except for district 14 as that district is not contiguous. library(terra) #terra 1.6.47 texas <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/texascongressmaps") tex <- vect(texas) # generate random points set.seed(0) b <- spatSample(tex[, "DISTRICT"], size = 100, method="random", strata=1:nrow(tex)) # max distance between any two random points by district. pdist <- sapply(tex$DISTRICT, \(i) max( distance( b[b$DISTRICT == i, ])) ) # max grid distance between any two random points by district. pgrid <- rep(NA, nrow(tex)) for (i in 1:nrow(tex)) { r <- rast(tex[i,], nrow=100, ncol=100) r <- rasterize(tex[i,], r) xy <- crds(b[b$DISTRICT==i, ]) cells <- cellFromXY(r, xy) maxdists <- rep(NA, 100) for(j in 1:100) { r[cells[j]] <- 2 dists <- gridDist(r, target=2) # Find the longest of these for this point maxdists[j] <- max( dists[ cells[-j] ], na.rm=TRUE) r[cells[j]] <- 1 } pgrid[i] <- max(maxdists) } The results look good: head(cbind(pdist, pgrid)) # pdist pgrid #1 217746.46 223906.22 #2 61707.87 99422.07 #3 50520.61 51479.98 #4 282744.13 293656.59 #5 196074.08 202014.45 #6 120913.60 126532.72 plot(pdist, pgrid) abline(0, 1, col="red") If your results are different you are perhaps using an older version of "terra"? I assume you are because you are using gridDistance which works with a warning because it was renamed to gridDist in the current version. You use different grid cell sizes for each district. I do not know what your goal is, but it might be more reasonable to use a single template raster for all of Texas. You could do something like # outside the loop rr <- rast(tex, res=1/60, vals=1) # inside the loop r <- crop(rr, tex[i,], mask=TRUE)
Generate n-dim random samples based on empirical distribution and copula
I am given an empirical distribution FXemp of a real-valued random variable X. Given now X1,..., Xn having the same distribution as X and dependencies given by a copula C. I would like now to produce random samples of X1,..., Xn element of R. E.g. I am given a vector of samples and the corresponding cdf x <- rnorm(1000) df <- ecdf(x) Assume that I pick for a example a t-student or Clayton copula C. How can I produce random samples of for example 10 copies of x, where their dependency is determined by C. Is there an easy way? Or are their any packages that can be used here?
You can sample from the copula (with uniform margins) by using the copula package, and then apply the inverse ecdf to each component: library(copula) x <- rnorm(100) # sample of X d <- 5 # desired number of copies copula <- claytonCopula(param = 2, dim = d) nsims <- 25 # number of simulations U <- rCopula(nsims, copula) # sample from the copula (with uniform margins) # now sample the copies of X #### Xs <- matrix(NA_real_, nrow = nsims, ncol = d) for(i in 1:d){ Xs[,i] <- quantile(x, probs = U[,i], type = 1) # type=1 is the inverse ecdf } Xs # [,1] [,2] [,3] [,4] [,5] # [1,] -0.5692185 -0.9254869 -0.6821624 -1.2148041 -0.682162391 # [2,] -0.4680407 -0.4263257 -0.3456553 -0.6132320 -0.925486872 # [3,] -1.1322063 -1.2148041 -0.8115089 -1.0074435 -1.430405604 # [4,] 0.9760268 1.2600186 1.0731551 1.2369623 0.835024471 # [5,] -1.1280825 -0.8995429 -0.5761037 -0.8115089 -0.543125426 # [6,] -0.1848303 -1.2148041 -0.5692185 0.8974921 -0.613232036 # [7,] -0.5692185 -0.3070884 -0.8995429 -0.8115089 -0.007292346 # [8,] 0.1696306 0.4072428 0.7646646 0.4910863 1.236962330 # [9,] -0.7908557 -1.1280825 -1.2970952 0.3655081 -0.633521404 # [10,] -1.3226053 -1.0074435 -1.6857615 -1.3226053 -1.685761474 # [11,] -2.5410325 -2.3604936 -2.3604936 -2.3604936 -2.360493569 # [12,] -2.3604936 -2.2530003 -1.9311289 -2.2956444 -2.360493569 # [13,] 0.4072428 -0.2150035 -0.3564803 -0.1051930 -0.166434458 # [14,] -0.4680407 -1.0729763 -0.6335214 -0.8995429 -0.899542914 # [15,] -0.9143225 -0.1522242 0.4053462 -1.0729763 -0.158375658 # [16,] -0.4998761 -0.7908557 -0.9813504 -0.1763604 -0.283013334 # [17,] -1.2148041 -0.9143225 -0.5176347 -0.9143225 -1.007443492 # [18,] -0.2150035 0.5675260 0.5214050 0.8310799 0.464151265 # [19,] -1.2148041 -0.6132320 -1.2970952 -1.1685962 -1.132206305 # [20,] 1.4456635 1.0444720 0.7850181 1.0742214 0.785018119 # [21,] 0.3172811 1.2369623 -0.1664345 0.9440006 1.260018624 # [22,] 0.5017980 1.4068250 1.9950305 1.2600186 0.976026807 # [23,] 0.5675260 -1.0729763 -1.2970952 -0.3653535 -0.426325703 # [24,] -2.5410325 -2.2956444 -2.3604936 -2.2956444 -2.253000326 # [25,] 0.4053462 -0.5431254 -0.5431254 0.8350245 0.950891450
Rolling PCA and plotting proportional variance of principal components
I'm using the following code to perform PCA: PCA <- prcomp(Ret1, center = TRUE, scale. = TRUE) summary(PCA) I get the following result: #Importance of components: # PC1 PC2 PC3 PC4 #Standard deviation 1.6338 0.9675 0.60446 0.17051 #Proportion of Variance 0.6673 0.2340 0.09134 0.00727 #Cumulative Proportion 0.6673 0.9014 0.99273 1.00000 What I would like to do is a Rolling PCA for a specific window ( e.g. 180 days). The Result should be a matrix which shows the evolution of the "Proportion of Variance" of all principal components though time. I tried it with rollapply(Ret1, 180, prcomp) but this doesn't work and I have no Idea how to save the "Proportion of Variance" for each time step in matrix. The output matrix should look like this: # PC1 PC2 PC3 PC4 #Period 1 0.6673 0.2340 0.09134 0.00727 #Period 2 0.7673 0.1340 0.09134 0.00727 # .... Here is a mini subset of my data Ret1: Cats Dogs Human Frogs 2016-12-13 0.0084041063 6.518479e-03 6.096295e-04 5.781271e-03 2016-12-14 -0.0035340384 -8.150321e-03 4.418382e-04 -5.978296e-03 2016-12-15 0.0107522782 3.875708e-03 -1.784663e-02 3.012253e-03 2016-12-16 0.0033034130 -1.752174e-03 -1.753624e-03 -4.448850e-04 2016-12-17 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00 2016-12-18 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00 2016-12-19 0.0019876743 1.973190e-03 -8.577261e-03 1.996151e-03 2016-12-20 0.0033235161 3.630921e-03 -4.757395e-03 4.594355e-03 2016-12-21 0.0003401156 -2.460351e-03 3.708875e-03 -1.636413e-03 2016-12-22 -0.0010940147 -1.864724e-03 -7.991572e-03 -1.158029e-03 2016-12-23 -0.0005387228 1.250898e-03 -2.843725e-03 7.492594e-04 2016-12-24 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00 2016-12-25 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00 2016-12-26 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00 2016-12-27 0.0019465877 2.245918e-03 0.000000e+00 5.632058e-04 2016-12-28 0.0002396803 -8.391658e-03 8.307552e-03 -5.598988e-03 2016-12-29 -0.0020884556 -2.933868e-04 1.661246e-03 -7.010738e-04 2016-12-30 0.0026172923 -4.647865e-03 9.574997e-03 -2.889166e-03 I tried the following: PCA <- function(x){ Output=cumsum((apply((prcomp(x,center = TRUE, scale. = TRUE))$x, 2, var))/sum(vars)) return(Output)} window <- 10 data <- Ret1 result <- rollapply(data, window,PCA) plot(result) #Gives you the Proportion of Variance = cumsum((apply((prcomp(x,center = TRUE, scale. = TRUE))$x, 2, var))/sum(vars))
First, the correct function for your purpose may be written as follow, using $sdev result of prcomp. I have left over center = TRUE and scale. = TRUE as they are function default. PCA <- function(x){ oo <- prcomp(x)$sdev oo / sum(oo) } Now, we can easily use sapply to do rolling operation: ## for your mini dataset of 18 rows window <- 10 n <- nrow(Ret1) oo <- sapply(seq_len(n - window + 1), function (i) PCA(Ret1[i:(i + window - 1), ])) oo <- t(oo) ## an extra transposition as `sapply` does `cbind` # [,1] [,2] [,3] [,4] # [1,] 0.5206345 0.3251099 0.12789683 0.02635877 # [2,] 0.5722264 0.2493518 0.14588631 0.03253553 # [3,] 0.6051199 0.1973694 0.16151859 0.03599217 # [4,] 0.5195527 0.2874197 0.16497219 0.02805543 # [5,] 0.5682829 0.3100708 0.09456654 0.02707977 # [6,] 0.5344804 0.3149862 0.08912882 0.06140464 # [7,] 0.5954948 0.2542775 0.10434155 0.04588616 # [8,] 0.5627977 0.2581071 0.13068875 0.04840648 # [9,] 0.6089650 0.2559285 0.11022974 0.02487672 Each column is a PC, while each row gives proportional variance for each component in that period. To further plot the result, you can use matplot: matplot(oo, type = "l", lty = 1, col = 1:4, xlab = "period", ylab = "proportional variance") PCA 1-4 are sketched with colour 1:4, i.e., "black", "red", "green" and "blue". Additional comments: If you want to use zoo::rollapply, do oo <- zoo::rollapply(Ret1, window, PCA, by.column = FALSE) Precisely, I am reporting proportional standard deviation. If you really want proportional variance, chance PCA function to: PCA <- function(x){ oo <- prcomp(x)$sdev ^ 2 oo / sum(oo) }
Grid Bootstrap - Error in solve.default(t(x) %*% x) : system computationally singular
I'm new of R, thus I apologize in advance if my following question may result a little dumb, but I really need some expert advice. I have a problem in applying someone else's code to my data. The author's code works perfectly with the examples he provides, and it seems to me I am doing all the correct steps in my case, but apparently I am not. The main function is: grid_boot <- function(dat,name,t,ar,grid,bq,c,all,grph) and I should simply specify the parameter and then running the code given in the script, or at least for the author's example this works. All the specification in the examples are close to my case, except for the ar parameter. The ar parameter is the autoregressive order of a time series, so it is simply a number from 1 to n that you choose. My series requires a simple ar = 1, but if I run the code with this specification, R give me back the following error: " Error in solve.default(t(x) %*% x) : system is computationally singular: reciprocal condition number = 6.07898e-34 In addition: Warning messages: 1: In dat[(ar - k + 2):(n - k + 1)] - dat[(ar - k + 1):(n - k)] : longer object length is not a multiple of shorter object length 2: In dat[(ar - k + 2):(n - k + 1)] - dat[(ar - k + 1):(n - k)] : longer object length is not a multiple of shorter object length" (I know there are other post title with this error, but nothing seems to fit my case) In the example, the author specifies as follow: orig <- 2 # set to 1 for original data, set to 2 for extended data # t <- 2 grid <- 200 bq <- 9999 c <- .9 i <- 7 d <- np[i,] if (orig==1){ y <- as.matrix(dat[d[1]:(d[2]-18)]) if (i==4) y <- y[21:82] }else{ y <- as.matrix(dat[d[1]:d[2]]) } name <- "GNP per Capita: 1869-1988" ar <- d[3] What I can't figure out is the indication ar <- d[3] and in general what precisely he means with specifying i and d. I think this specification is due to the fact that his dataset is made of several variables all written in the same column and they are associated with an index. When I give these inputs to R (I use RStudio), in the environment pane appears as ar = 1. When I give the numerical input (ar <- 1) for my exercise instead, the only result is the error above mentioned. Below, I report my data (as you can see, it is only a single series with few observation, so it should be an easy one) and my inputs, while the script, the example and the data for the example are downloadable here. I hope someone can help me figuring out what am I doing wrong and I will be very grateful to anyone who is willing to help a newbie like me. install.packages("pracma") library(pracma) source(file.choose()) dat <- as.matrix(read.csv(file.choose(), header = TRUE)) print(dat) USA [1,] 0.01075000 [2,] 0.01116000 [3,] 0.01214000 [4,] 0.01309000 [5,] 0.01668000 [6,] 0.02991000 [7,] 0.02776000 [8,] 0.04218000 [9,] 0.05415000 [10,] 0.05895000 [11,] 0.04256000 [12,] 0.03306000 [13,] 0.00622000 [14,] 0.11035000 [15,] 0.09132000 [16,] 0.05737000 [17,] 0.06486000 [18,] 0.07647000 [19,] 0.11266000 [20,] 0.13509000 [21,] 0.10316000 [22,] 0.06161000 [23,] 0.03212000 [24,] 0.04317000 [25,] 0.03561000 [26,] 0.01859000 [27,] 0.03741000 [28,] 0.04009000 [29,] 0.04827000 [30,] 0.05398000 [31,] 0.04235000 [32,] 0.03029000 [33,] 0.02952000 [34,] 0.02607000 [35,] 0.02805000 [36,] 0.02931000 [37,] 0.02338000 [38,] 0.01552000 [39,] 0.02188000 [40,] 0.03377000 [41,] 0.02826000 [42,] 0.01586000 [43,] 0.00002270 [44,] 0.02677000 [45,] 0.03393000 [46,] 0.03226000 [47,] 0.02853000 [48,] 0.03839000 [49,] -0.00000356 [50,] 0.00001640 [51,] 0.03157000 [52,] 0.02069000 [53,] 0.01465000 [54,] 0.01622000 [55,] 0.01622000 dat <- dat name <- "Inflation" t <- 1 ar <- 1 grid <- 200 bq <- 1999 c <- .9 all <- 0 grph <- 1 out <- grid_boot(dat, name, t, ar, grid, bq, c, all, grph)
Calculating weighted polygon centroids in R
I need to calculate the centroids of a set of spatial zones based on a separate population grid dataset. Grateful for a steer on how to achieve this for the example below. Thanks in advance. require(raster) require(spdep) require(maptools) dat <- raster(volcano) # simulated population data polys <- readShapePoly(system.file("etc/shapes/columbus.shp",package="spdep")[1]) # set consistent coordinate ref. systems and bounding boxes proj4string(dat) <- proj4string(polys) <- CRS("+proj=longlat +datum=NAD27") extent(dat) <- extent(polys) # illustration plot plot(dat, asp = TRUE) plot(polys, add = TRUE)
Three steps: First, find all the cells in each polygon, return a list of 2-column matrices with the cell number and the value: require(plyr) # for llply, laply in a bit... cell_value = extract(dat, polys,cellnumbers=TRUE) head(cell_value[[1]]) cell value [1,] 31 108 [2,] 32 108 [3,] 33 110 [4,] 92 110 [5,] 93 110 [6,] 94 111 Second, turn into a list of similar matrices but add the x and y coords: cell_value_xy = llply(cell_value, function(x)cbind(x,xyFromCell(dat,x[,"cell"]))) head(cell_value_xy[[1]]) cell value x y [1,] 31 108 8.581164 14.71973 [2,] 32 108 8.669893 14.71973 [3,] 33 110 8.758623 14.71973 [4,] 92 110 8.581164 14.67428 [5,] 93 110 8.669893 14.67428 [6,] 94 111 8.758623 14.67428 Third, compute the weighted mean coordinate. This neglects any edge effects and assumes all grid cells are the same size: centr = laply(cell_value_xy, function(m){c(weighted.mean(m[,3],m[,2]), weighted.mean(m[,4],m[,2]))}) head(centr) 1 2 [1,] 8.816277 14.35309 [2,] 8.327463 14.02354 [3,] 8.993655 13.82518 [4,] 8.467312 13.71929 [5,] 9.011808 13.28719 [6,] 9.745000 13.47444 Now centr is a 2-column matrix. In your example its very close to coordinates(polys) so I'd make a contrived example with some extreme weights to make sure its working as expected.
Another alternative. I like it for its compactness, but it will likely only make sense if you're fairly familiar with the full family of raster functions: ## Convert polygons to a raster layer z <- rasterize(polys, dat) ## Compute weighted x and y coordinates within each rasterized region xx <- zonal(init(dat, v="x")*dat, z) / zonal(dat,z) yy <- zonal(init(dat, v="y")*dat, z) / zonal(dat,z) ## Combine results in a matrix res <- cbind(xx[,2],yy[,2]) head(res) # [,1] [,2] # [1,] 8.816277 14.35309 # [2,] 8.327463 14.02354 # [3,] 8.993655 13.82518 # [4,] 8.467312 13.71929 # [5,] 9.011808 13.28719 # [6,] 9.745000 13.47444
The answers by Spacedman and Josh are really great, but I'd like to share two other alternatives which are relatively fast and simple. library(data.table) library(spatialEco) library(raster) library(rgdal) using a data.table approach: # get centroids of raster data data_points <- rasterToPoints(dat, spatial=TRUE) # intersect with polygons grid_centroids <- point.in.poly(data_points, polys) # calculate weighted centroids grid_centroids <- as.data.frame(grid_centroids) w.centroids <- setDT(grid_centroids)[, lapply(.SD, weighted.mean, w=layer), by=POLYID, .SDcols=c('x','y')] using wt.centroid{spatialEco} : # get a list of the ids from each polygon poly_ids <- unique(grid_centroids#data$POLYID) # use lapply to calculate the weighted centroids of each individual polygon w.centroids.list <- lapply(poly_ids, function(i){wt.centroid( subset(grid_centroids, grid_centroids#data$POLYID ==i) , 'layer', sp = TRUE)} )
My own less elegant solution below. Gives exactly the same results as Spacedman and Josh. # raster to pixels p = rasterToPoints(dat) %>% as.data.frame() coordinates(p) = ~ x + y crs(p) = crs(polys) # overlay pixels on polygons ol = over(p, polys) %>% mutate(pop = p$layer) %>% cbind(coordinates(p)) %>% filter(COLUMBUS_ %in% polys$COLUMBUS_) %>% # i.e. a unique identifier dplyr::select(x, y, pop, COLUMBUS_) %>% as_data_frame() # weighted means of x/y values, by pop pwcs = split(ol, ol$COLUMBUS_) %>% lapply(function(g){ data.frame(x = weighted.mean(g$x, g$pop), y = weighted.mean(g$y, g$pop)) }) %>% bind_rows() %>% as_data_frame()