Spatial correlogram - r

I am trying to run a spatial auto correlogram for a project looking at deforestation in the Atlantic forest, Brazil.
I am however confused as to why I am hitting this problem.
Problem
When I run the initial part of my code i receive an error of
Error: ncol(x) == 2 is not TRUE
My code is
r.nb <- dnearneigh(as.matrix(shapeS$POINT_X,shapeS$POINT_Y),
d1=200, d2=100000, latlong=FALSE)
and then I hope to move run this code
p.cor <- sp.correlogram(r.nb, deforestation, order=15,
method="I", randomisation=FALSE)
r.nb <- dnearneigh(as.matrix(shapeS$POINT_X,shapeS$POINT_Y),
d1=200, d2=100000, latlong=FALSE)
My data is
A vector data set with the headings
POINTID GRID_CODE POINT_X POINT_Y

You need to use cbind, not as.matrix, or the approach that I show below. Always identify the R packages you are using. You claim that your data set a 'vector data set'. I doubt that. I am assuming it is a matrix.
If it is a matrix, you can do
m <- shapeS[, c('POINT_X', 'POINT_Y')]
library(spdep)
r.nb <- dnearneigh(m, d1=200, d2=100000, latlong=FALSE)
It it is a data.frame, you can do
m <- as.matrix(shapeS[, c('POINT_X', 'POINT_Y')])
or
m <- cbind(shapeS$POINT_X, shapeS$POINT_Y)

Related

Using popbio package to calculate a population projection correctly?

So I have been working through a population ecology exercise using the popbio package in R-Studio that focuses on using Leslie Matrix's. I have successfully created a Leslie matrix with the proper dimensions using the Fecundity (mx) and Annual Survival values (sx) that I have calculated with my life table. I then am trying to use the pop.projection function in the popbio package to multiply my Leslie matrix (les.mat) by a starting population vector (N0) followed by the number of time intervals (4 years). It is my understanding that you should be able to take a Leslie matrix and multiply by a population vector to calculate a population size after a set number of time intervals. Have I done something wrong here, when I try to run my pop.projection line of code I get the following error message in R:
"> projA <- pop.projection(les.mat,N0,10)
Error in A %*% n : non-conformable arguments"
Could the problem be an issue with my pop.projection function? I am thinking it may be an issue with by N0 argument (population vector), when I look at my N0 values it seems like it has been saved in R as a "Numeric Type", should I be converting it into its own matrix, or as it's own vector somehow to get my pop.projection line of code to run? Any advice would be greatly appreciated, the short code I have been using will be linked below!
Sx <- c(0.8,0.8,0.7969,0.6078,0.3226,0)
mx <- c(0,0,0.6,1.09,0.2,0)
Fx <- mx # fecundity values
S <- Sx # dropping the first value
F <- Fx
les.mat <- matrix(rep(0,36),nrow=6)
les.mat[1,] <- F
les.mat
for(i in 1:5){
les.mat[(i+1),i] <- S[i]
}
les.mat
N0 <- c(100,80,64,51,31,10,0)
projA <- pop.projection(les.mat,N0,10)
The function uses matrix multiplication on the first and second arguments so they must match. The les.mat matrix is 6x6, but N0 is length 7. Try
projA <- pop.projection(les.mat, N0[-7], 10) # Delete last value
or
projA <- pop.projection(les.mat, N0[-1], 10) # Delete first value

Using cbind to take values from a dataframe to a matrix to do partition data into test/train for knn

I'm trying to run a knn for prediction, and im having trouble partitioning the data in a way that R will accept it for using the function. It seems to have a problem at the cbind step. There might be additional problems after but I haven't gotten past the ## step yet. Here is my code
library(class)
set.seed(1)
a<-runif(10)
b<-runif(10)
c<-rnorm(10)
sample_outcome<-sample(c(0,1), replace=TRUE, size=10)
sample.df<-data.frame(a,b,c,sample_outcome)
stack_smp_size <- floor(0.75 * nrow(sample.df))
sample_train_ind <- sample(seq_len(nrow(sample.df)), size = stack_smp_size)
sample_train <- sample.df[sample_train_ind, ]
sample_test <- sample.df[-sample_train_ind, ]
###Problem here
sample_train_x <-cbind(a,b,c)[sample_train,]
sample_test_x <-cbind(a,b,c)[!sample_train,]
sample_train_outcome<-sample_outcome[sample_train]
sample_knn=knn(sample_train_x,sample_test_x,sample_train_outcome,k=1)
sample_train is a dataframe.
class(sample_train)
#[1] "data.frame"
cbind(a, b, c) gives a matrix. I am not sure what do you expect to happen when you subset a matrix with the dataframe (sample_train_x <-cbind(a,b,c)[sample_train_ind,]).
I think what you might be looking for is :
library(class)
sample_train_x <- cbind(a,b,c)[sample_train_ind,]
sample_test_x <- cbind(a,b,c)[-sample_train_ind,]
sample_train_outcome <- sample_outcome[sample_train_ind]
sample_knn = knn(sample_train_x,sample_test_x,sample_train_outcome,k=1)

Loop in R not appending all the values

I have a testing data-set of 88 observations. I have built a model and predicting on this new data-set.
Here is the twist. I am adding a new column to it and trying to store the predictions into a dataframe.
My training and test data are all matching.
Now when I execute this loop, I am not getting the desired output.
#creating an EMI vector
em = c(10000,20000,30000,40000,50000,60000,70000,80000,90000,100000)
#my dataframe where i want to store predictions
v <- c()
v <- data.frame(v)
for(i in em){
newdata$EMI.Amount=i
prediction=predict(rf,newdata,type="response")
kl <- table(prediction)
v <- rbind(v,kl)
}
I am getting predictions of only the last EMI value from the vector em.
i.e for 1,00,000.
Here is the output
I want the output for each em vector i.e the predictions of the binary class to be in the dataframe like this.
You don't need to run a loop.
Instead I would simply run the code
newdata$EMI.Amount <- em
v = predict(rf, newdata, type="response")
I hope that is the answer to your problem.
I am not so sure how your code works as it is not reproducible. Maybe you would like something like this?
#creating an EMI vector
em=c(10000,20000,30000,40000,50000,60000,70000,80000,90000,100000)
#my dataframe where i want to store predictions
v<-c()
v <- data.frame(v)
newdata$EMI.Amount<-em
prediction=predict(rf,newdata,type="response")
kl <- table(prediction)
v <- rbind(v,kl)

spBayes spLM function with duplicate coordinates

I am using the spRecover function in package spBayes to produce a spatial univariate model.
Here is a reproducible example where there I made a duplicate coordinate point. The modeling procedure itself executes just fine, but it won't let me recover the spatial effects for each site:
require(spBayes)
set.seed(444)
N = 200
y = rnorm(N,0,100)
x = rnorm(N,2,7)
df <- as.data.frame(cbind((rnorm(N,5,2.5)),rep('location1',N)))
coord <- cbind(runif(N,-30,30),runif(N,-180,180))
coord[2,] <- coord [1,]
n.samples <- 1000
bef.sp <- spLM(y ~ x, ## the equation
data = df, coords=coord, ## data and coordinates
starting=list("phi"=3/200,"sigma.sq"=0.08,"tau.sq"=0.02),## start values
tuning=list("phi"=0.1, "sigma.sq"=0.05, "tau.sq"=0.05), ## tuning values
priors=list("phi.Unif"=c(3/1500, 3/50), "sigma.sq.IG"=c(2, 0.08),"tau.sq.IG"=c(2, 0.02)), ## priors
cov.model="exponential",n.samples=n.samples)
burn.in <- floor(0.75*n.samples)
bef.sp <- spRecover(bef.sp, start=burn.in, thin=2)
The error received is:
Error in spRecover(bef.sp, start = burn.in, thin = 2) :
c++ error: dpotrf failed
I found a post by the package author indicating this error might come up if one has replicated coordinates. I definitely have duplicated coordinates, since many sites were sampled many times (on the same day; this is not a time-series issue). How do I get the model to accept that there is lots of replication within each coordinate pair, and to recover individual spatial effects values for each site?
Thanks!

Error in plot.new() : figure margins too large in R (RGui 64-bit) [duplicate]

I am trying to perform KMeans clustering on over a million rows with 4 observations, all numeric. I am using the following code:
kmeansdf<-as.data.frame(rbind(train$V3,train$V5,train$V8,train$length))
km<-kmeans(kmeansdf,2)
As it can be seen, I would like to divide my data into two clusters. The object km is getting populated but I am having trouble plotting the results. Here is the code I am using to plot:
plot(kmeansdf,col=km$cluster)
This piece of code gives me the following error:
Error in plot.new() : figure margins too large
I tried researching online but could not find a solution, I tried working on command line as well but still getting the same error (I am using RStudio at the moment)
Any help to resolve the error would be highly appreciated.
TIA.
When I run your code on a df with 1e6 rows, I don't get the same error, but the system hangs (interrupted after 10 min). It may be that creating a scatterplot matrix with 1e6 points per frame is just too much.
You might consider taking a random sample:
# all this to create a df with two distinct clusters
set.seed(1)
center.1 <- c(2,2,2,2)
center.2 <- c(-2,-2,-2,-2)
n <- 5e5
f <- function(x){return(data.frame(V1=rnorm(n,mean=x[1]),
V2=rnorm(n,mean=x[2]),
V3=rnorm(n,mean=x[3]),
V4=rnorm(n,mean=x[4])))}
df <- do.call("rbind",lapply(list(center.1,center.2),f))
km <- kmeans(df,2) # run kmeans on full dataset
df$cluster <- km$cluster # append cluster column to df
# sample is 10% of population (100,000 rows)
s <- 1e5
df <- df[sample(nrow(df),s),]
plot(df[,1:4],col=df$cluster)
Running the same thing with a 1% sample (50,000 rows) gives this.

Resources