Given a bunch of 2D points and a polygon, I want to evaluate which points are on the boundary of the polygon, and which are strictly inside/outside of the polygon.
The 2D points are:
> grp2
x2 y2
1 -5.233762 1.6213203
2 -1.107843 -7.9349705
3 4.918313 8.9073019
4 7.109651 -3.9571781
5 7.304966 -4.3280168
6 6.080564 -3.5817545
7 8.382685 0.4638735
8 6.812215 6.1610483
9 -4.773094 -3.4260797
10 -3.269638 1.1299852
and the vertices of the polygon are:
> dfC
px py
1 7.304966 -4.3280167
2 8.382685 0.4638735
3 6.812215 6.1610483
4 5.854366 7.5499780
5 2.385478 7.0895268
6 -5.233762 1.6213203
7 -4.773094 -3.4260797
8 -1.107843 -7.9349705
The plot of the situation looks like following:
Clearly, there are 3 points inside the polygon, 1 point outside and 6 points on the edge (as is also evident from the data points).
Now I am using point.in.polygon to estimate this. According to the documentation of package sp, this should return 'integer array; values are: 0: point is strictly exterior to pol; 1: point is strictly interior to pol; 2: point lies on the relative interior of an edge of pol; 3: point is a vertex of pol.'
But my code is not being able to detect the points which are vertices of the polygon:
> point.in.polygon(grp2$x2,grp2$y2,dfC$px,dfC$py)
[1] 0 0 0 1 0 1 0 0 0 1
How can I resolve this problem?
The points are not equal. For example, grp2$x2[1] == -5.23376158438623 but for fpC$px[6] == -5.23376157160271. These are not equal. As the comments suggest, you will have more luck if you round the values:
grp3 <- round(grp2, 3)
dfC3 <- round(dfC, 3)
point.in.polygon(grp3$x2,grp3$y2,dfC3$px,dfC3$py)
# [1] 3 3 0 1 3 1 3 3 3 1
Now
grp3[1, ]
# x2 y2
# 1 -5.234 1.621
fpc3[6, ]
# px py
# 6 -5.234 1.621
Changing the number of decimals to 4 or 5 gives the same results as 3. For floating point numbers to be equal, they must match exactly over all 14 decimal places.
I need to write a program that will create a vector of size N that will contain K non-zero elements according to the following requirements:
Non-zero elements should be mostly concentrated near the middle element (at position N/2) of the vector.
Elements at distance D or further from the middle element (on either side) should be zero.
As we move away from the middle element, the probability that an element is non-zero should be decreasing.
A rather small example of what I would like to accomplish follows, where N = 40 (middle element is 20), K = 11 non-zero elements, and D = 8. Since D = 8, elements at positions > 20 + 8 = 28 and elements at positions < 20 - 8 = 12 should always be zero. In the zone where non-zeros are allowed (positions from 12 to 28) K = 11 non-zero elements are present. There are more non-zero elements close to position 20 and they become more sparse as we move further away from the middle element.
Position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Vector
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
0
1
1
1
1
1
0
1
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
I have not yet written any code, since I cannot wrap my head around on how to even start. One idea I had was to somehow use the binomial distribution to generate random indices and set the non-zero elements. However, this distribution can give multiple times the same index and hence less than K non-zero elements will be produced. If I use a loop to generate new random numbers until a non-used index is found, will the result still follow a binomial distribution, so that more non-zero elements will be around the middle element?
The programming language that will be used is not that important, but I would prefer something in Matlab, Python, C++ or C, as I am more familiar with them.
I hope someone can provide directions and/or examples.
This is existing functionality in numpy (choice)
import numpy as np
from scipy import stats
N = 40
K = 11
Your vague description of the distribution you want is not adequate, so I'm just going to use a normal probability distribution with a mean of N/2 and a standard deviation of sqrt(N/2).
center = int(N / 2)
scale = np.sqrt(N / 2)
Create a probability vector from the probability density function for each possible index (up to N):
p = stats.norm(loc=center, scale=scale).pdf(np.arange(N))
Make sure it sums to 1:
p /= np.sum(p)
Initialize a random number generator and call .choice() on the possible indices, with the probability distribution p, setting replace to False:
rng = np.random.default_rng()
nz_indices = rng.choice(np.arange(N), size=K, p=p, replace=False)
>>> nz_indices
array([27, 20, 23, 19, 16, 24, 13, 25, 26, 22, 21])
(Rstudio) suppose I have a data set of:
# Circle X Y
1 A 21 8
2 A 32 17
3 A 23 32
4 B 22 4
5 B 43 12
6 C 12 4
.....
I need to find the instantaneous velocity of each circle at each time frame.
For line 1 is the starting point so the velocity is 0, and the formula I want to achieve for each circle's (X, Y) coordinates is sqrt(((x2-x1)^2 + (y2-y1)^2)/2)) where the x2 and x1 is from the previous line (e.g. line 1 & line 2, Line 2 & line 3). the final result I want to have is as below:
# Circle X Y Instant velocity
1 A 21 8 0
2 A 32 17 sqrt(((32-21)^2 + (17-8)^2)/2))
3 A 23 32 sqrt(((23-32)^2 + (32-17)^2)/2))
4 B 22 4 0
5 B 43 12 sqrt(((43-22)^2 + (12-4)^2)/2))
6 C 12 4 0
.....
Could anyone help me in achieving this on Rstudio???
You have one more ) than ( in your code example, which makes me a bit confused about where the /2 goes, but if you verify my syntax something like this should work:
library(dplyr)
your_data %>%
group_by(Circle) %>%
mutate(
instant_velocity = coalesce(sqrt(((x - lag(x))^2 + (y - lag(y))^2)/2), 0)
)
Suppose we have a data frame of two columns
X Y
10 14
12 16
14 17
15 19
21 19
The first element of Y that is 14, the nearest value (or same) to it is 14 (which is 3rd element of X). Similarly, next element of Y is closest to 15 that is 4th element of X
So, the output I would like should be
3
4
4
5
5
As my data is large, Can you give me some advice on the systemic/proper code for doing it?
You can try this piece of code:
apply(abs(outer(d$X,d$Y,FUN = '-')),2,which.min)
# [1] 3 4 4 5 5
Here, abs(outer(d$X,d$Y,FUN = '-')) returns a matrix of unsigned differences between d$X and d$Y, and apply(...,2,which.min) will return position of the minimum by row.
I have this code in R :
corr = function(x, y) {
sx = sign(x)
sy = sign(y)
cond_a = sx == sy && sx > 0 && sy >0
cond_b = sx < sy && sx < 0 && sy >0
cond_c = sx > sy && sx > 0 && sy <0
cond_d = sx == sy && sx < 0 && sy < 0
cond_e = sx == 0 || sy == 0
if(cond_a) return('a')
else if(cond_b) return('b')
else if(cond_c) return('c')
else if(cond_d) return('d')
else if(cond_e) return('e')
}
Its role is to be used in conjunction with the mapply function in R in order to count all the possible sign patterns present in a time series. In this case the pattern has a length of 2 and all the possible tuples are : (+,+)(+,-)(-,+)(-,-)
I use the corr function this way :
> with(dt['AAPL'], table(mapply(corr, Return[-1], Return[-length(Return)])) /length(Return)*100)
a b c d e
24.6129416 25.4466058 25.4863041 24.0174672 0.3969829
> dt["AAPL",list(date, Return)]
symbol date Return
1: AAPL 2014-08-29 -0.3499903
2: AAPL 2014-08-28 0.6496702
3: AAPL 2014-08-27 1.0987923
4: AAPL 2014-08-26 -0.5235654
5: AAPL 2014-08-25 -0.2456037
I would like to generalize the corr function to n arguments. This mean that for every nI would have to write down all the conditions corresponding to all the possible n-tuples. Currently the best thing I can think of for doing that is to make a python script to write the code string using loops, but there must be a way to do this properly. Do you have an idea about how I could generalize the fastidious condition writing, maybe I could try to use expand.grid but how do the matching then ?
I think you're better off using rollapply(...) in the zoo package for this. Since you seem to be using quantmod anyway (which loads xts and zoo), here is a solution that does not use all those nested if(...) statements.
library(quantmod)
AAPL <- getSymbols("AAPL",auto.assign=FALSE)
AAPL <- AAPL["2007-08::2009-03"] # AAPL during the crash...
Returns <- dailyReturn(AAPL)
get.patterns <- function(ret,n) {
f <- function(x) { # identifies which row of `patterns` matches sign(x)
which(apply(patterns,1,function(row)all(row==sign(x))))
}
returns <- na.omit(ret)
patterns <- expand.grid(rep(list(c(-1,1)),n))
labels <- apply(patterns,1,function(row) paste0("(",paste(row,collapse=","),")"))
result <- rollapply(returns,width=n,f,align="left")
data.frame(100*table(labels[result])/(length(returns)-(n-1)))
}
get.patterns(Returns,n=2)
# Var1 Freq
# 1 (-1,-1) 22.67303
# 2 (-1,1) 26.49165
# 3 (1,-1) 26.73031
# 4 (1,1) 23.15036
get.patterns(Returns,n=3)
# Var1 Freq
# 1 (-1,-1,-1) 9.090909
# 2 (-1,-1,1) 13.397129
# 3 (-1,1,-1) 14.593301
# 4 (-1,1,1) 11.722488
# 5 (1,-1,-1) 13.636364
# 6 (1,-1,1) 13.157895
# 7 (1,1,-1) 12.200957
# 8 (1,1,1) 10.765550
The basic idea is to create a patterns matrix with 2^n rows and n columns, where each row represents one of the possible patterns (e,g, (1,1), (-1,1), etc.). Then pass the daily returns to this function n-wise using rollapply(...) and identify which row in patterns matches sign(x) exactly. Then use this vector of row numbers an an index into labels, which contains a character representation of the patterns, then use table(...) as you did.
This is general for an n-day pattern, but it ignores situations where any return is exactly zero, so the $Freq columns do not add up to 100. As you can see, this doesn't happen very often.
It's interesting that even during the crash it was (very slightly) more likely to have two up days in succession, than two down days. If you look at plot(Cl(AAPL)) during this period, you can see that it was a pretty wild ride.
This is a little different approach but it may give you what you're looking for and allows you to use any size of n-tuple. The basic approach is to find the signs of the adjacent changes for each sequential set of n returns, convert the n-length sign changes into n-tuples of 1's and 0's where 0 = negative return and 1 = positive return. Then calculate the decimal value of each n-tuple taken as binary number. These numbers will clearly be different for each distinct n-tuple. Using a zoo time series for these calculations provides several useful functions including get.hist.quote() to retrieve stock prices, diff() to calculate returns, and the rollapply() function to use in calculating the n-tuples and their sums.The code below does these calculations, converts the sum of the sign changes back to n-tuples of binary digits and collects the results in a data frame.
library(zoo)
library(tseries)
n <- 3 # set size of n-tuple
#
# get stock prices and compute % returns
#
dtz <- get.hist.quote("AAPL","2014-01-01","2014-10-01", quote="Close")
dtz <- merge(dtz, (diff(dtz, arithmetic=FALSE ) - 1)*100)
names(dtz) <- c("prices","returns")
#
# calculate the sum of the sign changes
#
dtz <- merge(dtz, rollapply( data=(sign(dtz$returns)+1)/2, width=n,
FUN=function(x, y) sum(x*y), y = 2^(0:(n-1)), align="right" ))
dtz <- fortify.zoo(dtz)
names(dtz) <- c("date","prices","returns", "sum_sgn_chg")
#
# convert the sum of the sign changes back to an n-tuple of binary digits
#
for( i in 1:nrow(dtz) )
dtz$sign_chg[i] <- paste(((as.numeric(dtz$sum_sgn_chg[i]) %/%(2^(0:2))) %%2), collapse="")
#
# report first part of result
#
head(dtz, 10)
#
# report count of changes by month and type
#
table(format(dtz$date,"%Y %m"), dtz$sign_chg)
An example of possible output is a table showing the count of changes by type for each month.
000 001 010 011 100 101 110 111 NANANA
2014 01 1 3 3 2 3 2 2 2 3
2014 02 1 2 4 2 2 3 2 3 0
2014 03 2 3 0 4 4 1 4 3 0
2014 04 2 3 2 3 3 2 3 3 0
2014 05 2 2 1 3 1 2 3 7 0
2014 06 3 4 3 2 4 1 1 3 0
2014 07 2 1 2 4 2 5 5 1 0
2014 08 2 2 1 3 1 2 2 8 0
2014 09 0 4 2 3 4 2 4 2 0
2014 10 0 0 1 0 0 0 0 0 0
so this would show that in month 1, January of 2014, there was one set of three days with 000 indicating 3 down returns , 3 days with the 001 change indicating two down return and followed by one positive return and so forth. Most months seem to have a fairly random distribution but May and August show 7 and 8 sets of 3 days of positive returns reflecting the fact that these were strong months for AAPL.