Get maximum distance between points in a vector (R) - r

I have two vectors of latitudes and longitudes. I would like to find the maximum distance between the points. The way I see it, I should get a matrix of distances between all points and get the max of those.
So far I’ve done (using geosphere package for the last command):
> lat = dt[assetId == u_assetIds[1000], latitude]
> lon = dt[assetId == u_assetIds[1000], longitude]
>
> head(cbind(lat, lon))
lat lon
[1,] 0.7266145 -1.512977
[2,] 0.7270650 -1.504216
[3,] 0.7267265 -1.499622
[4,] 0.7233676 -1.487970
[5,] 0.7232196 -1.443160
[6,] 0.7225059 -1.434848
>
> distm(c(lat_1K[1], lon_1K[1]), c(lat_1K[4], lon_1K[4]), fun = distHaversine)
[,1]
[1,] 2807.119
How do I convert the last command into giving me a matrix of all pairwise distances? I am not familiar of how to do that in R, having more experience in Python.
Thanks.

Just briefly read the help document of distm, here is what I found:
distm(x, y, fun=distHaversine)
x: longitude/latitude of point(s). Can be a vector of two numbers, a matrix of 2 columns (first one is longitude, second is latitude) or a SpatialPoints* object
y: Same as x. If missing, y is the same as x
So what you should do is to simply input your cbind(lat, lon) as the first argument x. Here is some test:
> lat <- c(0.7266145, 0.7270650, 0.7267265, 0.7233676, 0.7232196, 0.7225059)
> lon <- c(-1.512977, -1.504216, -1.499622, -1.487970, -1.443160, -1.434848)
> distm(cbind(lon,lat))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0000 976.4802 1486.6045 2806.912 7780.5544 8708.6036
[2,] 976.4802 0.0000 512.7471 1854.601 6809.6464 7738.0538
[3,] 1486.6045 512.7471 0.0000 1349.813 6296.9308 7225.3240
[4,] 2806.9123 1854.6008 1349.8129 0.000 4987.8561 5913.8213
[5,] 7780.5544 6809.6464 6296.9308 4987.856 0.0000 928.6189
[6,] 8708.6036 7738.0538 7225.3240 5913.821 928.6189 0.0000

Related

Count all values in a correlation matrix that are above 0.8 and below -0.8

I have a matrix of 2134 by 2134 of correlation values and I would like to count the total number of values that are above 0.8 and below -0.8. I have tried
length(TFcoTF[TFcoTF>.8])
but this does not seem to be correct as I am getting about 50 percent of values above .8 which does not correspond to the histogram I have for the data. Also when I do
length(TFcoTF[TFcoTF<-.8])
I got 0 as the output. Any help is appreciated.
The data table package has a function called between. This returns TRUE/FALSE value for each value in your matrix whether the value is between two values.
In my example below, I randomly created a 10x10 matrix with random values [-1,+1]. Using the length function and subsetting where the values are in your range of [-0.8,+0.8].
library(data.table)
data <- matrix(runif(100,-1,1), nrow = 10, ncol=10)
data
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.05585901 -0.7497720 -0.8371569 -0.401079424 -0.4130752 -0.788961736 0.2909987 0.48965177 0.4076504 -0.0682856
[2,] -0.42442920 0.7476111 0.8238973 -0.912507391 -0.4450897 -0.001308901 0.5151425 -0.16838841 -0.1648151 0.8370660
[3,] -0.73295874 0.5271986 0.5822628 -0.008554908 -0.2785803 -0.499058508 -0.5661172 0.35957967 0.5807055 0.2350893
[4,] 0.18949338 0.3827603 -0.6112584 0.209209240 -0.5883962 -0.087900052 0.1272227 0.58165922 -0.9950324 -0.9118599
[5,] 0.40862973 0.9496163 0.4996253 0.079538601 0.9839763 -0.119883751 0.3667418 -0.02751815 -0.6724141 0.3217434
[6,] 0.77338548 -0.7698167 -0.5632436 0.223301216 -0.9936610 0.650110638 -0.9400395 -0.47808065 -0.1579283 -0.6896787
[7,] 0.93210326 0.5360980 0.7677325 0.815231731 -0.4320206 0.647954028 0.5180600 -0.09574138 -0.3848389 0.9726445
[8,] -0.66411834 0.1125759 -0.4021577 -0.711363103 0.7161801 -0.071971464 0.7953436 0.40326575 0.6895480 0.7496597
[9,] 0.14118154 0.4775983 0.8966069 0.852880293 0.4715885 -0.542526148 0.5200246 -0.62649677 -0.3677738 0.1961003
[10,] -0.59353193 -0.2358892 0.5769562 -0.287113142 -0.7100862 -0.107092848 -0.8101459 -0.46754146 -0.4082147 -0.4475972
length(data[between(data,-0.8,0.8)])
[1] 84
It's difficult to answer without having your dataset, please provide a minimal reproducible example later.
For the first line of code, this looks correct.
For the second, the error comes from a syntax error. In R you can assign value with = and <-. So x<-1 assign the value whereas x < -1 return a boolean.
You can then combine logical values and run the code below :
set.seed(42)
m <- matrix(runif(25, min = -1, max = 1), nrow = 5, ncol = 5)
m
length(m[ m > .8]) + length(m[ m < -.8]) # long version from what you did.
length(m[ m < -.8 | m > .8]) # | mean or. TRUE | FALSE will return TRUE.
sum(m > .8 | m < -.8)
# The sum of logical is the length, since sum(c(TRUE, FALSE)) is sum(c(0, 1))
sum(abs(m) > .8) # is the shortest version

How to fix? 'It is recommended to have a covariance matrix with a determinant bigger than 1/ ((2*PI)^k) .' when using RcppHMM package in R

I am using a RcppHMM package to make a GHMM(Multivariate gaussian mixture HMM model) with continuous observation.
I want to learn an EM algorithm using continuous observations with different sequence lengths.
To be specific, each observation has a different sequence length from 3 to 6.
I tried to fit the model using the whole observation dataset at once (I made the dataset with ncol=6(maximum sequence length) and filled the empty part with all zero), but it didn't work
so I separated observations as groups with the same lengths [O3, O4, O5, O6]
and updated the model by each group.
Each observation group looks like this
O3
[,1] [,2] [,3]
[1,] 0.8550940 0.3231340 0.8639223
[2,] 0.4453262 0.5840305 0.4356958
[3,] 0.4344789 -1.2234760 0.4344789
[4,] -0.5003085 3.0322560 -0.5003085
[5,] -0.1459598 -0.4661041 -0.1459598
[6,] -0.1977263 -0.6352724 -0.1977263
O4
[,1] [,2] [,3] [,4]
[1,] 0.8965332 0.3338220 0.7270241 0.8824540
[2,] 0.4033438 0.4131293 0.1593136 0.4187023
[3,] -0.7329015 -1.6828296 -0.1550487 -0.1550487
[4,] -0.3213490 7.3449076 -0.2787857 -0.2787857
[5,] -0.2868067 -0.3743332 -0.1340566 -0.1340566
[6,] 2.6832742 -0.5844305 0.2320774 0.2320774
O5
[,1] [,2] [,3] [,4] [,5]
[1,] 0.83401341 0.2492370 0.47493190 0.6440035 0.84985396
[2,] 0.37988234 0.2335883 0.17043570 0.2116066 0.36260248
[3,] -0.05240445 -0.3034002 -0.05240445 -0.3034002 -0.05240445
[4,] -0.37240867 1.1500528 -0.37240867 1.1500528 -0.37240867
[5,] -0.02056839 0.9343497 -0.02056839 0.9343497 -0.02056839
[6,] -0.27586584 -0.4406833 -0.27586584 -0.4406833 -0.27586584
O6
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.9287066 0.35065802 0.4493442 0.6142040 0.7423286 0.9217381
[2,] 0.3852644 0.09612516 0.1623447 0.1320334 0.1875127 0.3928661
[3,] 0.1436024 -0.08326038 0.7800491 0.1436024 0.1926751 0.1436024
[4,] -0.4284304 -0.27916609 -0.5224586 -0.4284304 0.1267840 -0.4284304
[5,] -0.8846364 -0.81131525 -0.1781479 -0.8846364 -0.1266250 -0.8846364
[6,] -0.2141231 -0.78377461 -0.4440142 -0.2141231 -0.7888260 -0.2141231
nrow is the number of dimension of observation, and ncol is lengths of sequences.
When I updated the model with the first group that has sequence length 3, it operated.
But when I tried to re-update model with second group that has sequence length 4, the warning message came out as below,
In learnEM(newModel, O4[, 1:4, ], iter = 20, delta = 1e-05, print = TRUE) :
It is recommended to have a covariance matrix with a determinant bigger than 1/ ((2*PI)^k) .
Does anyone know how to fix this warning message?
And is there any proper way to learn a EM algorithm with observations that have different sequence lengths using this package?

How to obtain right eigenvectors of matrix in R?

Edition : the problem in my question was I've tried to find matrix S from equation 8 but this equation have error.
How to directly obtain right eigenvectors of matrix in R ? 'eigen()' gives only left eigenvectors
Really last edition, I've made big mess here, but this question is really important for me :
eigen() provides some matrix of eigenvectors, from function help :
" If ‘r <- eigen(A)’, and ‘V <- r$vectors; lam <- r$values’, then
A = V Lmbd V^(-1)
(up to numerical fuzz), where Lmbd =diag(lam)"
that is A V = V Lmbd, where V is matrix now we check it :
set.seed(1)
A<-matrix(rnorm(16),4,4)
Lmbd=diag(eigen(A)$values)
V=eigen(A)$vectors
A%*%V
> A%*%V
[,1] [,2] [,3] [,4]
[1,] 0.0479968+0.5065111i 0.0479968-0.5065111i 0.2000725+0i 0.30290103+0i
[2,] -0.2150354+1.1746298i -0.2150354-1.1746298i -0.4751152+0i -0.76691563+0i
[3,] -0.2536875-0.2877404i -0.2536875+0.2877404i 1.3564475+0i 0.27756026+0i
[4,] 0.9537141-0.0371259i 0.9537141+0.0371259i 0.3245555+0i -0.03050335+0i
> V%*%Lmbd
[,1] [,2] [,3] [,4]
[1,] 0.0479968+0.5065111i 0.0479968-0.5065111i 0.2000725+0i 0.30290103+0i
[2,] -0.2150354+1.1746298i -0.2150354-1.1746298i -0.4751152+0i -0.76691563+0i
[3,] -0.2536875-0.2877404i -0.2536875+0.2877404i 1.3564475+0i 0.27756026+0i
[4,] 0.9537141-0.0371259i 0.9537141+0.0371259i 0.3245555+0i -0.03050335+0i
and I would like to find matrix of right eigenvectors R,
equation which define matrix of left eigenvectors L is :
L A = LambdaM L
equation which define matrix of right eigenvectors R is :
A R = LambdaM R
and eigen() provides only matrix V:
A V = V Lmbd
I would like to obtain matrix R and LambdaM for real matrix A which may be negative-definite.
A worked example.
Default (= right eigenvectors):
m <- matrix(1:9,nrow=3)
e <- eigen(m)
e1 <- e$vectors
zapsmall((m %*% e1)/e1) ## right e'vec
## [,1] [,2] [,3]
## [1,] 16.11684 -1.116844 0
## [2,] 16.11684 -1.116844 0
## [3,] 16.11684 -1.116844 0
Left eigenvectors:
eL <- eigen(t(m))
eL1 <- eL$vectors
(We have to go to a little more effort since we need
to be multiplying by row vectors on the left; if
we extracted just a single eigenvector, R's ignorance
of row/column vector distinctions would make it
"do the right thing" (i.e. (eL1[,1] %*% m)/eL1[,1]
just works).)
zapsmall(t(eL1) %*% m/(t(eL1)))
## [,1] [,2] [,3]
## [1,] 16.116844 16.116844 16.116844
## [2,] -1.116844 -1.116844 -1.116844
## [3,] 0.000000 0.000000 0.000000
This should work
Given a matrix A.
lefteigen <- function(A){
return(t(eigen(t(A))$vectors))
}
Every left eigenvector is the transpose of a right eigenvector of the transpose of a matrix

Weighted variance-covariance matrices and lapply

I have a list prob with 50 elements. Each element is a 601x3 matrix of probabilities, each row of which represents a complete sample space (i.e., each row of each matrix sums to 1). For instance, here are the first five rows of the first element of prob:
> prob[[1]][1:5,]
[,1] [,2] [,3]
[1,] 0.6027004 0.3655563 0.03174335
[2,] 0.6013667 0.3665756 0.03205767
[3,] 0.6000306 0.3675946 0.03237481
[4,] 0.5986921 0.3686131 0.03269480
[5,] 0.5973513 0.3696311 0.03301765
Now, what I want to do is to create the following matrix for each row of each matrix/element in the list prob. Taking the first row, let a = .603, b = .366, and c = .032 (rounding to three decimal places). Then,
> w
[,1] [,2] [,3]
[1,] a*(1-a) -a*b -a*c
[2,] -b*a b*(1-b) -b*c
[3,] -c*a -c*b c*(1-c)
Such that:
> w
[,1] [,2] [,3]
[1,] 0.239391 -0.220698 -0.019296
[2,] -0.220698 0.232044 -0.011712
[3,] -0.019296 -0.011712 0.030976
I want to obtain a similar 3x3 matrix 600 more times (for the rest of the rows of this matrix) and then to repeat this entire process 49 more times for the rest of the elements of prob. The only thing I can think of is to call apply within lapply so that I am accessing each row of each matrix one-at-a-time. I'm sure that is not an elegant way to do this (not to mention I can't get it to work), but I can't think of anything else. Can anyone help me out with this? I'd also love to hear suggestions for using a different structure (e.g., is it bad to use matrices within lists?).
Running this process with lapply on a list of similarly dimensioned matrices should be very simple. If it represents a challenge, then you should post the dput(.) output for a two element list with similar matrices. The challenge is really to do the processing row by row which is illustrated here with the output being a 3x3xN array:
w <- apply(M, 1, function(rw) diag( rw*(1-rw) ) +
rbind( rw*c(0, -rw[1], -rw[1] ),
rw*c(-rw[2],0, -rw[2] ),
rw*c(-rw[3], -rw[3], 0)
)
)
w
[,1] [,2] [,3] [,4] [,5]
[1,] 0.23945263 0.23972479 0.23999388 0.24025987 0.24052272
[2,] -0.22032093 -0.22044636 -0.22056801 -0.22068575 -0.22079962
[3,] -0.01913173 -0.01927842 -0.01942588 -0.01957412 -0.01972314
[4,] -0.22032093 -0.22044636 -0.22056801 -0.22068575 -0.22079962
[5,] 0.23192489 0.23219793 0.23246881 0.23273748 0.23300395
[6,] -0.01160398 -0.01175156 -0.01190081 -0.01205173 -0.01220435
[7,] -0.01913173 -0.01927842 -0.01942588 -0.01957412 -0.01972314
[8,] -0.01160398 -0.01175156 -0.01190081 -0.01205173 -0.01220435
[9,] 0.03073571 0.03102998 0.03132668 0.03162585 0.03192748
w <- array(w, c(3,3,5) )
w
, , 1
[,1] [,2] [,3]
[1,] 0.23945263 -0.22032093 -0.01913173
[2,] -0.22032093 0.23192489 -0.01160398
[3,] -0.01913173 -0.01160398 0.03073571
, , 2
[,1] [,2] [,3]
[1,] 0.23972479 -0.22044636 -0.01927842
[2,] -0.22044636 0.23219793 -0.01175156
[3,] -0.01927842 -0.01175156 0.03102998
.... snipped remaining output

How to calculate spatial averag for certain area in abinary file using R?

I have been searching for long time to find out how we can calculate a spatial average by R.However,i failed to find something.I have a binary file has 720 columns and 360 rows,float of air temperature. and I want first to calculate the spatial average over the whole area and to calculate the spatial average over an area of (500-600 colu and 200-250rows) and then extract the area.
t<- file("C:annual_Prc2000_without999_1.img","rb")
e=readBin(t, double(), size=4,n=720*360, signed=TRUE)
from answer 1 :
mean(e)
I got
NaN
from answer 2:i got
spat_mean = apply(e, c(2,1), mean)
Error in apply(e, c(2, 1), mean) : dim(X) must have a positive length
let me explain it:
assume
m <- matrix(rnorm(10000),4,4)
> m
[,1] [,2] [,3] [,4]
[1,] 0.7930132 2.1973895 0.063493345 -0.5484056
[2,] 0.5222513 0.4331308 -0.002393336 0.1725495
[3,] 1.7462222 -1.5701996 -2.276781240 0.5628531
[4,] -1.2713361 -0.9349057 0.757412225 1.5118180
> mean(m)
[1] 0.134757
now I want this:
[,1] [,2] [,3] [,4]
[1,] 0.134757 0.134757 0.134757 0.134757
[2,] 0.134757 0.134757 0.134757 0.134757
[3,] 0.134757 0.134757 0.134757 0.134757
[4,] 0.134757 0.134757 0.134757 0.134757
I did this manually but i want to do it thru R
Assuming you have a regular spatial grid, you probably want something like this:
set.seed(42)
m <- matrix(rnorm(10000),100,100)
#calculate mean of all values
mean(m)
#calculate mean of region 1:10 * 1:10
mean(m[1:10,1:10])
From your comment to the other answer I gather that you want to calculate the mean value per x,y pair. For this you can use apply. Assuming you have a multi-dimensional array with x,y,time dimensions, getting the mean per x,y pair is done like this:
spat_mean = apply(multi_dim_array, c(1,2), mean)
spat_mean is now a 2d array with x,y dimensions, giving the spatial mean.

Resources