Computing Euclidean Distance whilst holding point A constant and changing point B in R - r

I am currently working on a project for which I am interested in calculating the distance between the location of a basketball player and the ball during an event.
To do this I created the following function:
## Euclidean distance
distance <- function(x,y){
x2 <- (x[i]-x[j])^2
y2 <- (y[i]-y[j])^2
dis <- sqrt(x2+y2)
}
What I want to achieve is to calculate the distance between the basketball and the players, and then repeat this process for each time frame of data I have. So for each time frame x1 and y1 would have to be constant whilst x[j] and y[j] would keep going from 2 to 11. I thought of this nested for loop, but it is giving me a constant result of 28.34639. I added a link to an image of a sample of my data frame. Data Frame Sample
for(i in i:length(all.movement$x_loc)){
for(j in j:11){
all.movement$distance[j] <- distance(all.movement$x_loc, all.movement$y_loc)
}
i <- i + 11
}
I would really appreciate some help with this problem.

I'd go about:
set.seed(101)
x <- rnorm(30, 10, 5) # x coordinate
y <- rnorm(30, 15, 7) # y coordinate
df <- data.frame(x, y) # sample data.frame
i = 0
for (i in i:length(df$x)) {
df$distance <- sqrt((x - 5)^2 + (y + 4)^2)} # assume basket coordinates (5, -4)
df # output

Related

min values by row and column of a distance matrix "without replacement"

I have a distance matrix. I would essentially like to perform the equivalent of matching without replacement on the minimum value by row and column.
Here is some reproducible code.
x1 <- c(runif(100, 0,5),runif(100, 1,6))
x2 <- c(runif(100, 0,5),runif(100, 1,6))
t <- c(rep("C",100),rep("T",100))
t.num <- ifelse(t=="C",0,1)
y <- 2*ifelse(t=="C",0,1) + x1 + x2 + rnorm(200)
x <- list(x1=x1,x2=x2,t=t, t.num=t.num, y=y) %>% as.data.frame()
X <- x[,1:2]
#calculate the mahalanobis distance between rows
cx <- cov(X)
out <- lapply(1:nrow(X), function(i) {
mahalanobis(x = X,
center = do.call("c", X[i, ]),
cov = cx)
})
#matrix of mahalanobis distances
d.m <- as.dist(do.call("rbind", out), upper = T) %>% as.matrix() %>% as.data.frame()
#remove irrelevant values from matrix
#(those where it matches with itself or control to control or treat to treat)
# d.m[d.m==0] <- NA
d.m[x$t=="C", x$t=="C"] <- NA
d.m[x$t=="T", x$t=="T"] <- NA
d.m is the distance matrix. Currently I am using a for loop with the which function to locate the minimum value in the matrix, store the position then remove the data from that row and column before moving on to finding the next minimum position.
The problem I have is that computationally, this process is way more expensive than I would like further down the line (I loop through 180x100 iterations on randomly generated date).
I have attempted to investigate an elegant solution via the usual means to no avail so I am appealing to the wider community (for the first time!). I have already been helped tremendously by this online resource so I have my fingers crossed. TIA!!!

Creating a point distance component to a monte carlo simulation function in R

I am attempting to do some Monte Carlo simulations, where I have a population of 325 samples in a field. I want to create a list of composite samples (samples consisting of multiple subsamples) from the dataset, while increasing sample size, repeated 100 times. I have created the function that will do so, and have supplied that below in the code.
##Create an example data set
# x and y are coordinates
x <- c(1:100)
y <- rev(c(1:100))
## z and w are soil test values
set.seed(2345)
z <- rnorm(100,mean=50, sd=10)
set.seed(2345)
w <- rnorm(100, mean=75, sd=5)
data <- data.frame(x, y, z, w)
##Initialize list
data.step.sim.list <- list()
## Code that increases sample size
for(i in seq_len(nrow(data))){
thisdat <- replicate(100,data[sample(1:nrow(data), size=i, replace = F),], simplify = F)
data.step.sim.list[[i]] <- thisdat
}
The product becomes a list n long (n being length of dataset), with each list consisting of a list of 100 dataframes (100 coming from 100 replications) that are length 1:n length long.
I have x and y data for each sample as well, and want to stipulate that each subsample collected would be at least 'm' meters from the other samples.
I have created a function that will calculate each distance seen below. I cannot find a way to implement this into my current code. Would anyone know how to do this?
#function to compute distances
calc.dist <- function(x1, y1, x2, y2) {
d <- sqrt(((x2 - x1)^2) + ((y2 - y1)^2))
return(d)
} #end function calc.dist

Generating bivariate data where x variable is uniformly distributed between 0 and 1 and Y is normally distributed with mean 1/x with some noise

I used x <- c(runif(100, 0, 1)) to generate 100 x's between 0 and 1.
Now for each of the x's I am trying to generate 10 y's with mean 1/x and variance of 1.
Preferably stored in a matrix and so if I was to plot the 1000 points on y and x, it would look like the graph y = 1/x + some error.
Any help would be greatly appreciated.
If you want the data in a matrix, then you can do
x <- runif(100, 0, 1)
y <- sapply(x, function(m) rnorm(10, 1/m, 1))
This uses sapply to generate 10 normal values for each x value.
If you wanted one, two-column, matrix, then maybe
points <- do.call("rbind", lapply(x, function(m) cbind(x=m, y=rnorm(10, 1/m, 1))))
is what you want. You can plot that with
plot(y~x, points)

Linear regresion on each raster pixel to predict future month (in R language)

I have successfull run this code. I have read it from:
Can't Calculate pixel-wise regression in R on raster stack with fun
library(raster)
# Example data
r <- raster(nrow=15, ncol=10)
set.seed(0)
# Now I make 6 raster (1 raster/months), then assign each pixel's value randomly
s <- stack(lapply(1:6, function(i) setValues(r, rnorm(ncell(r), i, 3))))
names(s) <- paste0('Month', c(1,2,3,4,5,6))
# Extract each pixel values
x <- values(s)
# Model with linreg
m <- lm(Month6 ~ ., data=data.frame(x))
# Prediction raster
p <- predict(s, m)
If you run that code, p will be a raster. But, I still confused. How to make raster in the future? For example, I want 'Month8' raster based on 6 previous raster?
What I mean is, each pixels has different linreg equations (where X=Month1, ... , Months6). If I input X=Month8, I will have 150 cells of Y for 8th Month that represent in each pixel of raster.
What I have done
# Lets try make a data frame for clear insight for my data
x <- values(s)
DF <- data.frame(x)
# Make X as month, and y is target.
library(data.table)
DF_T <- transpose(DF)
Month <- seq(1,nrow(DF_T))
DF_T <- cbind(Month, DF_T)
# Make prediction for first pixel
V1_lr <- lm(V1 ~ Month, data=DF_T)
# prediction for 8th Months in a pixel
V1_p <- predict(V1_lr, data.frame(Month=8))
V1_p
This is just one pixel. I want the entire raster for 'Month8'

Generate random values in R with a defined correlation in a defined range

For a science project, I am looking for a way to generate random data in a certain range (e.g. min=0, max=100000) with a certain correlation with another variable which already exists in R. The goal is to enrich the dataset a little so I can produce some more meaningful graphs (no worries, I am working with fictional data).
For example, I want to generate random values correlating with r=-.78 with the following data:
var1 <- rnorm(100, 50, 10)
I already came across some pretty good solutions (i.e. https://stats.stackexchange.com/questions/15011/generate-a-random-variable-with-a-defined-correlation-to-an-existing-variable), but only get very small values, which I cannot transform so the make sense in the context of the other, original values.
Following the example:
var1 <- rnorm(100, 50, 10)
n <- length(var1)
rho <- -0.78
theta <- acos(rho)
x1 <- var1
x2 <- rnorm(n, 50, 50)
X <- cbind(x1, x2)
Xctr <- scale(X, center=TRUE, scale=FALSE)
Id <- diag(n)
Q <- qr.Q(qr(Xctr[ , 1, drop=FALSE]))
P <- tcrossprod(Q) # = Q Q'
x2o <- (Id-P) %*% Xctr[ , 2]
Xc2 <- cbind(Xctr[ , 1], x2o)
Y <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2)))
var2 <- Y[ , 2] + (1 / tan(theta)) * Y[ , 1]
cor(var1, var2)
What I get for var2 are values ranging between -0.5 and 0.5. with a mean of 0. I would like to have much more distributed data, so I could simply transform it by adding 50 and have a quite simililar range compared to my first variable.
Does anyone of you know a way to generate this kind of - more or less -meaningful data?
Thanks a lot in advance!
Starting with var1, renamed to A, and using 10,000 points:
set.seed(1)
A <- rnorm(10000,50,10) # Mean of 50
First convert values in A to have the new desired mean 50,000 and have an inverse relationship (ie subtract):
B <- 1e5 - (A*1e3) # Note that { mean(A) * 1000 = 50,000 }
This only results in r = -1. Add some noise to achieve the desired r:
B <- B + rnorm(10000,0,8.15e3) # Note this noise has mean = 0
# the amount of noise, 8.15e3, was found through parameter-search
This has your desired correlation:
cor(A,B)
[1] -0.7805972
View with:
plot(A,B)
Caution
Your B values might fall outside your range 0 100,000. You might need to filter for values outside your range if you use a different seed or generate more numbers.
That said, the current range is fine:
range(B)
[1] 1668.733 95604.457
If you're happy with the correlation and the marginal distribution (ie, shape) of the generated values, multiply the values (that fall between (-.5, +.5) by 100,000 and add 50,000.
> c(-0.5, 0.5) * 100000 + 50000
[1] 0e+00 1e+05
edit: this approach, or any thing else where 100,000 & 50,000 are exchanged for different numbers, will be an example of a 'linear transformation' recommended by #gregor-de-cillia.

Resources