R: Converting cartesian to polar and sorting - r

Consider a polygon (with out hole) with the vertices in cartesian co-ordinate system:
> X=
x y
[1,] -10.951654 5.1999753
[2,] -8.648792 7.5526423
[3,] -5.682459 8.2535191
[4,] -1.714430 6.4598705
[5,] -1.606767 3.1157852
[6,] -0.143845 0.3147358
[7,] 3.823051 -1.4814188
[8,] 7.789705 -3.2781072
[9,] 10.053431 -0.8261339
[10,] 13.061571 -0.2348249
[11,] 13.394428 -3.4885483
[12,] 11.767807 -7.5279214
[13,] 9.264051 -9.3765475
[14,] 5.298010 -7.5785077
[15,] 1.331421 -5.7816749
[16,] -2.635669 -3.9859493
[17,] -6.603322 -2.1914693
[18,] -10.571699 -0.3985887
[19,] -14.541248 1.3916934
[20,] -14.102558 4.9583269
[21,] -10.951654 5.1999753
Here is the plot of the polygon with centre (red point)
Transforming X into polar co-ordinate system
>theta_1=(atan(X[,2]/X[,1]))
>r_1=sqrt((X[,1]^2)+(X[,2]^2))
Assume the centre as (0,0). Now I want to arrange the points as increasing values of theta taking values from 0 to 2*pi.
Stuck here in doing this.
Any suggestion!!

Since atan returns values from -pi/2 to pi/2, you can convert the theta value calculated to *(0, 2*pi)* conditioning on the sign of (x, y) coordinate. Something like this may work for you:
library(dplyr)
coords <- as.data.frame(X)
mutate(coords, theta = ifelse(x < 0, atan(y / x) + pi,
ifelse(y < 0 , atan(y / x) + 2*pi, atan(y / x)))) %>%
arrange(theta)
x y theta
1 -1.714430 6.4598705 1.830213
2 -0.143845 0.3147358 1.999484
3 -1.606767 3.1157852 2.046914
4 -5.682459 8.2535191 2.173755
5 -8.648792 7.5526423 2.423749
6 -10.951654 5.1999753 2.698298
7 -10.951654 5.1999753 2.698298
8 -14.102558 4.9583269 2.803502
9 -14.541248 1.3916934 3.046177
10 -10.571699 -0.3985887 3.179278
11 -6.603322 -2.1914693 3.462029
12 -2.635669 -3.9859493 4.128153
13 1.331421 -5.7816749 4.938726
14 5.298010 -7.5785077 5.322500
15 9.264051 -9.3765475 5.491752
16 11.767807 -7.5279214 5.714082
17 7.789705 -3.2781072 5.884856
18 3.823051 -1.4814188 5.913504
19 13.394428 -3.4885483 6.028398
20 10.053431 -0.8261339 6.201195
21 13.061571 -0.2348249 6.265209
If the points are in the second or third quadrants, add pi to the atan(); if the points are in the fourth quadrant, add 2*pi to the atan(); else keep itself. In this way, you guarantee that your theta lies in (0, 2*pi).

Related

Calculate average of lowest values of matrix rows

I have a large matrix, e.g.
> mat = matrix(runif(100), ncol = 5)
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 0.264442954 0.6408534 0.76472904 0.2437074 0.08019882
[2,] 0.575443586 0.6428957 0.44188123 0.0230842 0.07502289
[3,] 0.894885901 0.5926238 0.55431966 0.7717503 0.52806173
[4,] 0.231978411 0.1192595 0.08170498 0.4264405 0.97486053
[5,] 0.344765840 0.5349323 0.85523617 0.2257759 0.20549035
[6,] 0.499130844 0.9882825 0.99417390 0.8070708 0.29963075
[7,] 0.613479990 0.8877605 0.34282782 0.9525512 0.91488004
[8,] 0.967166001 0.6115709 0.68169111 0.3067973 0.30094691
[9,] 0.957612804 0.5565989 0.88180650 0.3359184 0.17980137
[10,] 0.342177768 0.7735620 0.48154937 0.3692096 0.31299886
[11,] 0.871928110 0.3397143 0.57596030 0.4749349 0.47800019
[12,] 0.387563040 0.1656725 0.47796646 0.8956274 0.68345302
[13,] 0.628535870 0.3418692 0.86513964 0.8052477 0.01850535
[14,] 0.379472842 0.9176644 0.08829197 0.8548662 0.42151935
[15,] 0.071958980 0.6644800 0.90061596 0.4484674 0.32649345
[16,] 0.229463192 0.9995178 0.63995121 0.8369698 0.35091430
[17,] 0.291761976 0.5014815 0.35260028 0.6188047 0.68192891
[18,] 0.077610797 0.2747788 0.07084273 0.5977530 0.37134566
[19,] 0.675912490 0.6059304 0.29321852 0.5638336 0.73866322
[20,] 0.006010715 0.7697045 0.43627939 0.1723969 0.88665973
I want to extract the lowest and highest 2 values of each row and calculate their average.
Eventually, I'd like to generate a new matrix where the first column in the average of the lowest values, and the second column is the average of the highest values.
Thanks in advance!
I believe this does what you want:
do.call(rbind, apply(mat,1, function(x) {sorted = sort(x);
return(data.frame(min=mean(head(sorted,2)), max=mean(tail(sorted,2))))}))
Output:
min max
1 0.14333229 0.8877635
2 0.12311651 0.5283049
3 0.09367614 0.5433373
4 0.39926848 0.6361645
5 0.05196898 0.5473783
6 0.12876148 0.6153546
7 0.29893684 0.8436462
8 0.14254481 0.7023039
9 0.20889814 0.8863141
10 0.44838327 0.8641790
11 0.14859312 0.5533045
12 0.19728414 0.8619284
13 0.37049481 0.7448965
14 0.30070570 0.9320575
15 0.30333510 0.6774024
16 0.21908982 0.7077274
17 0.61804571 0.9239816
18 0.36525615 0.8531795
19 0.22751108 0.4993744
20 0.14251095 0.6353147
Hope this helps!

R: calling a matrix value of column 2 dependent on the value of column 1

I admit that I am totally new to R and have a few beginner's problems;
my problem is the following:
I have quite a long matrix TEST of length 5000 with 2 columns (column 1 = time; column 2 = concentration of a species).
I want to use the right concentration values for calculation of propensities in stochastic simulations.
I already have an alogrithm that gives me the simulation time t_sim; what I would need is a line of code that gives the respective concentration value at t= t_sim;
also: the time vector might have a big step size so that t_sim would have to be rounded to a bigger value in order to call the respective concentration value.
I know this probably quite an easy problem but I really do not see the solution in R.
Best wishes and many thanks,
Arne
Without sample data this answer is kind of a shot in the dark, but I think that this might work:
t_conc <- TEST[which.min(abs(t_sim-TEST[,1])),2]
where TEST is the matrix with two columns as described in the OP and the output t_conc is the concentration that corresponds to the value of time in the matrix that is closest to the input value t_sim.
Here's another shot in the dark:
set.seed(1);
N <- 20; test <- matrix(c(sort(sample(100,N)),rnorm(N,0.5,0.2)),N,dimnames=list(NULL,c('time','concentration')));
test;
## time concentration
## [1,] 6 0.80235623
## [2,] 16 0.57796865
## [3,] 19 0.37575188
## [4,] 20 0.05706002
## [5,] 27 0.72498618
## [6,] 32 0.49101328
## [7,] 34 0.49676195
## [8,] 37 0.68876724
## [9,] 43 0.66424424
## [10,] 57 0.61878026
## [11,] 58 0.68379547
## [12,] 61 0.65642726
## [13,] 62 0.51491300
## [14,] 63 0.10212966
## [15,] 67 0.62396515
## [16,] 83 0.48877425
## [17,] 86 0.46884090
## [18,] 88 0.20584952
## [19,] 89 0.40436999
## [20,] 97 0.58358831
t_sim <- 39;
test[findInterval(t_sim,test[,'time']),'concentration'];
## concentration
## 0.6887672
Note that findInterval() returns the index of the lesser time value if t_sim falls between two time values, as my example shows. If you want the greater, you need a bit more work:
i <- findInterval(t_sim,test[,'time']);
if (test[i,'time'] != t_sim && i < nrow(test)) i <- i+1;
test[i,'concentration'];
## concentration
## 0.6642442
If you want the nearest, see R: find nearest index.

Visualization of multi-dimensional data clusters in R

For a set of documents, I have a feature matrix of size 30 X 32 where rows represent documents and columns = features. So basically 30 documents and 32 features for each of them. After running a PSO Algorithm, I have been able to find some cluster centroids (that I am not at the moment sure if they are optimum) each of which is a row vector of length 32. And I have a column vector of size 30X1 which shows the centroid each document has been assigned to. So index one of this vector would contain the index of the centroid to which document 1 has been assigned and so on. This is obtained after computing euclidean distances of each of the documents from the centroids. I wanted to get some hints regarding whether there is a way in R to plot this multidimensional data in the form of clusters. Is there a way, for example, by which I could either collapse these dimensions to 1-D, or somehow show them in a graph that might be a bit pretty to look at. I have been reading on Multidimensional Scaling. So far what I understand about it is that it is a way to reduce a multi-dimensional data to lower dimensions, which does seem what I want. So, I tried it on with this code (the centroids[[3]] basically consists of 4 X 32 matrix and represents the 4 centroids):
points <- features.dataf[2:ncol(features.dataf)]
row.names(points) <- features.dataf[,1]
fit <- cmdscale(points, eig = TRUE, k = 2)
x <- fit$points[, 1]
y <- fit$points[, 2]
plot(x, y, pch = 19, xlab="Coordinate 1", ylab="Coordinate 2", main="Clustering Text Based on PSO", type="n")
text(x, y, labels = row.names(points), cex=.7)
It gives me this error:
Error in cmdscale(pointsPlusCentroids, eig = TRUE, k = 2) :
distances must be result of 'dist' or a square matrix
However, it does seem to give a plot alright. But the pch = 19 point symbols do not appear, just the text names. Like this:
In addition to above, I want to color these such that the documents that lie in cluster 1 get colored to one color and those in 2 to a different color and so on. Is there any way to do this if I have a column vector with centroids present in this way:
[,1]
[1,] 1
[2,] 3
[3,] 1
[4,] 4
[5,] 1
[6,] 4
[7,] 3
[8,] 4
[9,] 4
[10,] 4
[11,] 2
[12,] 2
[13,] 2
[14,] 2
[15,] 1
[16,] 2
[17,] 1
[18,] 4
[19,] 2
[20,] 4
[21,] 1
[22,] 1
[23,] 1
[24,] 1
[25,] 1
[26,] 3
[27,] 4
[28,] 1
[29,] 4
[30,] 1
Could anyone please help me with this? Or if there is any other way to plot multi-dimensional clusters like these. Thank you!
As cmdscale needs distances, try cmdscale(dist(points), eig = TRUE, k = 2). Symbols do not appear because of type = "n". For coloring text, use: text(x, y, rownames(points), cex = 0.6, col = centroids)

Modified rollapply mean

I have a data file which consists of daily xy locations and a logical vector denoting whether or not the location is an outlier. Here is some (poorly created, I know) sample data:
x=seq(3,10,length.out=30)
y=seq(42,45,length.out=30)
outlier=c(F,F,F,F,F,F,F,F,T,T,T,F,F,F,F,F,F,F,F,F,F,T,F,T,F,F,F,F,F,F)
data=cbind(x,y,outlier)
> data
x y outlier
[1,] 3.000000000 42.00000000 0
[2,] 3.241379310 42.10344828 0
[3,] 3.482758621 42.20689655 0
[4,] 3.724137931 42.31034483 0
[5,] 3.965517241 42.41379310 0
[6,] 4.206896552 42.51724138 0
[7,] 4.448275862 42.62068966 0
[8,] 4.689655172 42.72413793 0
[9,] 4.931034483 42.82758621 1
[10,] 5.172413793 42.93103448 1
[11,] 5.413793103 43.03448276 1
[12,] 5.655172414 43.13793103 0
[13,] 5.896551724 43.24137931 0
[14,] 6.137931034 43.34482759 0
[15,] 6.379310345 43.44827586 0
[16,] 6.620689655 43.55172414 0
[17,] 6.862068966 43.65517241 0
[18,] 7.103448276 43.75862069 0
[19,] 7.344827586 43.86206897 0
[20,] 7.586206897 43.96551724 0
[21,] 7.827586207 44.06896552 0
[22,] 8.068965517 44.17241379 1
[23,] 8.310344828 44.27586207 0
[24,] 8.551724138 44.37931034 1
[25,] 8.793103448 44.48275862 0
[26,] 9.034482759 44.58620690 0
[27,] 9.275862069 44.68965517 0
[28,] 9.517241379 44.79310345 0
[29,] 9.758620690 44.89655172 0
[30,] 10.000000000 45.00000000 0
What I need is to take a non-overlapping 6-day mean of the x and y columns. This is easy enough with rollapply(). However, I do not want outlier=1 values to be included in the 6-day mean; nor do I want the 6-day window to 'span' the gap left behind by removing all rows where outlier=T. Instead, I want to make an exception to the 'non-overlapping rule'.
I think this is best explained using the sample data above: the first value should be the mean of rows 1:6, but rather than the second value being the mean of rows 7:12 (including outlier=1 values) or of rows c(7:8,12:15) (skipping over outlier=1 values) I want it to overlap with the first window and take the mean of rows 3:8.
So for the length 30 sample data above, the end result should be of length 5, showing the mean values of rows 1:6, 3:8, 12:17, 16:21 & 25:30 (ideally all values which result from overlapping windows should be labelled as such; i.e. values 1:4 overlap, whereas the final value is unique)
Here is a function that will give you the indices of the endpoints of the averages that you want:
findIndices<-function(outlier,window=6){
r<-rle(outlier)
rends<-cumsum(r$lengths)
segs<-cbind(rends-r$lengths+1,rends)
segs<-segs[with(r,lengths>=window & values==0),]
indices<-unlist(apply(segs,1,function(x) seq(x[1]+window-1,x[2],by=window)))
sort(unique(c(indices,segs[,2])))
}
findIndices(data[,3])
## [1] 6 8 17 21 30
You can then get the averages you want like this:
id<-findIndices(data[,3])
require(zoo)
cbind(index=id,rollmean(data[,1:2],6)[id-5,])
## index x y
## [1,] 6 3.603448 42.25862
## [2,] 8 4.086207 42.46552
## [3,] 17 6.258621 43.39655
## [4,] 21 7.224138 43.81034
## [5,] 30 9.396552 44.74138
You can put it all together in a single function like this:
maWithOutliers<-function(x,outlier,window){
id<-findIndices(outlier,window)
cbind(index=id,rollmean(x,window)[id-window+1,])
}
> maWithOutliers(data[,1:2],data[,3],6)
index x y
[1,] 6 3.603448 42.25862
[2,] 8 4.086207 42.46552
[3,] 17 6.258621 43.39655
[4,] 21 7.224138 43.81034
[5,] 30 9.396552 44.74138
> maWithOutliers(data[,1:2],data[,3],4)
index x y
[1,] 4 3.362069 42.15517
[2,] 8 4.327586 42.56897
[3,] 15 6.017241 43.29310
[4,] 19 6.982759 43.70690
[5,] 21 7.465517 43.91379
[6,] 28 9.155172 44.63793
[7,] 30 9.637931 44.84483
>

transform this function using normal programming code and without using R functions

I have this function in R from a previous question here
shift <- function(d, k) rbind( tail(d,k), head(d,-k), deparse.level = 0 )
this function will rotate the data frame d by K, that's mean it will take K rows from the end of the data frame and place them on the top.
I want to create the same function(in the same language) but without using R pre-made functions(head, tail,...), but only using basics of programming.(for , ...)
How this can be done?
Well I don't know what you mean with without using R functions since pretty much everything is an R function, but here is a solution using only the very generic nrow() (Number of rows of a matrix), %% (modulus) and seq_len (equivalent to 1:length(x) except that it works better):
m <- matrix(1:40,,2,byrow=TRUE)
shift2 <- function(d, k) d[(seq_len(nrow(d))-k-1)%%(nrow(d))+1,]
shift2(m,5)
[,1] [,2]
[1,] 31 32
[2,] 33 34
[3,] 35 36
[4,] 37 38
[5,] 39 40
[6,] 1 2
[7,] 3 4
[8,] 5 6
[9,] 7 8
[10,] 9 10
[11,] 11 12
[12,] 13 14
[13,] 15 16
[14,] 17 18
[15,] 19 20
[16,] 21 22
[17,] 23 24
[18,] 25 26
[19,] 27 28
[20,] 29 30
If you mean with "normal programming code" that it shouldn't be vectorized then, well, you are learning either the wrong language in the right way or the right language in the wrong way. Everytime you come up with a vectorized solution instead of for loops you are happy in R.
But if you really really want to do this with loops here is exactly the same function unvectorized:
shift3 <- function(d, k)
{
out <- matrix(,nrow(d),ncol(d))
sorts <- (seq_len(nrow(d))-k-1)%%(nrow(d))+1
for (i in seq_len(nrow(d))) out[i,] <- d[sorts[i],]
return(out)
}
Proof they are all equal:
all(shift(m,5) == shift2(m,5) & shift2(m,5) == shift3(m,5))
[1] TRUE
EDIT:
Actually shift3() there STILL contained a lot of vectorizations, showing just how native that is in R. Here is a fully unvectorized version:
shift3 <- function(d, k)
{
out <- matrix(,nrow(d),ncol(d))
sor <- numeric(1)
for (i in seq_len(nrow(d)))
{
if (i-k < 1) sor <- nrow(d)-k+i else sor <- i-k
for (j in seq_len(ncol(d))) out[i,j] <- d[sor,j]
}
return(out)
}

Resources