Optimizing surrogates radial basis function in Julia - julia

I am trying to implement optimization function for the radial basis using surrogate_optimize in the following example.
x = [1, 3, 4, 6, 7, 7.5, 8]
y = [2, 3, 6, 6, 7, 8.5, 9]
z = [1, 2, 4, 7, 8, 2, 4]
XY = zip(x,y) |> collect
lb, ub = vcat(minimum(x),minimum(y)), vcat(maximum(x), maximum(y))
radial = Surrogates.RadialBasis(XY, z, lb, ub)
# I have tried using the surrogate_optimize directly, but i know it requires objective function,
surrogate_optimize(z, SRBF(), lb, ub, radial, UniformSample(), maxiters=50)
As the above implementation is wrong, hence I would like to know how to implement surrogate optimize function on the above mentioned script?
Documentation reference:
http://surrogates.sciml.ai/stable/radials/
http://surrogates.sciml.ai/stable/optimizations/
Thanks!

Related

How to make seaborn plot display and take multiplicity of points into account

I have these two lists of data extracted from a dataframe. 
[5, 5, 5, 5, 5, 4, 4, 5, 4, 4, 2, 4, 5, 5, 5] (Col 1)
[5, 5, 5, 4, 4, 3, 2, 2, 3, 2, 2, 4, 2, 2, 5] (Col 2)
Calling stats.preasonr from the scipy library gives (-0.5062175977346661, 0.20052806464412476), indicating a negative correlation. However, the line of best fit calling
graph = sns.jointplot(x = 'col1name', y = 'col2name', data = df_name, kind = 'reg')
is positive. I realized that this is because I don't think that the calculation of the line of best fit is taking the multiplicity of the points into account. In particular, (5,2) is only considered once even if it happens 3 times. So what do I do that (a) someone can look at this plot and tell how many students are represented with a single data point and (b) the line of best fit takes into account the multiplicity of points?
Here is a picture of the plot: 
The coinciding points aren't ignored. Here is a visualization adding some random noise to show all points, and marking the mean of 'col2' for each value in 'col1'. Also the r-value is calculated before applying the random jitter.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
df = pd.DataFrame({'col1': [5, 5, 5, 5, 5, 4, 4, 5, 4, 4, 2, 4, 5, 5, 5],
'col2': [5, 5, 5, 4, 4, 3, 2, 2, 3, 2, 2, 4, 2, 2, 5]})
r, p = pearsonr(df['col1'], df['col2'])
xs = np.unique(df['col1'])
ys = [df[df['col1'] == x]['col2'].mean() for x in xs]
df['col1'] += np.random.uniform(-0.1, 0.1, len(df))
df['col2'] += np.random.uniform(-0.1, 0.1, len(df))
g = sns.jointplot(x='col1', y='col2', data=df, kind='reg')
g.ax_joint.scatter(x=xs, y=ys, marker='X', color='crimson') # show the means
g.ax_joint.text(2.5, 4.5, f'$r={r:.2f}$', color='navy') # display the r-value
plt.show()
The regression line seems to go quite close to the means, as expected. For col1==5 there are 4 values at 5, 2 at 4 and 3 at 2, their mean is 3.78.

Why does -1*List object return an empty list?

I was trying some operations on the List object and wanted to see some "broadcast" behavior :
x = [-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]
x = -1*x
In [46]: x
Out[46]: []
I was expecting something like x = [1, -1, -2, -3, -4, -5, -6, -7, -8, -9].
What is actually happening?
You can only this kind of multiplication with a pandas Series (or better the underlaying numpy array). If you write something like
List = n * List
with n as an integer your list gets resized by n:
x = [-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]
x = 3*x
print(x)
>> [-1, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, 1, 2, 3, 4, 5, 6, 7, 8, 9]
And negative numbers will remove your list entries (treated as 0 - see here).
Values of n less than 0 are treated as 0 (which yields an empty
sequence of the same type as s).
So you have to use one of these methods to multiply each list element:
NewList = [i * 5 for i in List]
for i in List:
NewList.append(i * 5)
import pandas as pd
s = pd.Series(List)
NewList = (s * 5).tolist()
You want the following:
x = [-1 * i for i in x]

How to calculate Euclidian distance between two points stored in rows of two separate matrixes?

I have two matrixes:
I would like to count the distance between point X and point Y without using a loop and in the way that when the matrix is expanded by additional columns the expression/function works.
For validation one could use:
sqrt((m1[,1] - m2[,1])^2 + (m1[,2] - m2[,2])^2 + (m1[,3] - m2[,3])^2 + (m1[,4] - m2[,4])^2 + (m1[,5] - m2[,5])^2)
The expression above gives the correct result for the distance between X and Y however once the matrix is expanded by additional columns the expression has also to be expanded and that is an unacceptable solution...
Would you be so kind and tell how to achieve this? Any help would be more than welcome. I'm stuck with this one for a while...
- between matrix is element-wise in R and the rowSums is useful for calculating the sum of along the row:
m1 <- matrix(
c(4, 3, 1, 6,
2, 4, 5, 7,
9, 0, 1, 2,
6, 7, 8, 9,
1, 6, 4, 3),
nrow = 4
)
m2 <- matrix(
c(2, 6, 3, 2,
9, 4, 1, 4,
1, 3, 0, 1,
4, 5, 0, 2,
7, 2, 1, 3),
nrow = 4
)
sqrt((m1[,1] - m2[,1])^2 + (m1[,2] - m2[,2])^2 + (m1[,3] - m2[,3])^2 + (m1[,4] - m2[,4])^2 + (m1[,5] - m2[,5])^2)
# [1] 12.529964 6.164414 9.695360 8.660254
sqrt(rowSums((m1 - m2) ^ 2))
# [1] 12.529964 6.164414 9.695360 8.660254

R error using DBSCAN on Data frame

Error in data - x : non-numeric argument to binary operator
My code is as follows:
x <- as.factor(c(2, 2, 8, 5, 7, 6, 1, 4))
y <- as.factor(c(10, 5, 4, 8, 5, 4, 2, 9))
coordinates <- data.frame(x, y)
colnames(coordinates) <- c("x_coordinate", "y_coordinate")
print(coordinates)
point_clusters <- dbscan(coordinates, 2, MinPts = 2, scale = FALSE,
method = c("hybrid", "raw", "dist"), seeds = TRUE,
showplot = 1, countmode = NULL)
point_clusters
But I'm getting following error while executing the above code:
> point_clusters <- dbscan(coordinates, 2, MinPts = 2, scale = FALSE, method = c("hybrid", "r ..." ... [TRUNCATED]
Error in data - x : non-numeric argument to binary operator
I don't know what is the problem with above code.
I solved the problem as per my need. I saw somewhere that the data needs to be numeric matrix, although I'm not sure about that. So, here is what I did:
x <- c(2, 2, 8, 5, 7, 6, 1, 4)
y <- c(10, 5, 4, 8, 5, 4, 2, 9)
coordinates <- matrix(c(x, y), nrow = 8, byrow = FALSE)
Remaining code is same as above. Now it works fine for me.

Subsample a matrix by selection locations with specific values within a matrix in R

I'm have to use R instead of Matlab and I'm new to it.
I have a large array of data repeating like 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10...
I need to find the locations where values equal to 1, 4, 7, 10 are found to create a sample using those locations.
In this case it will be position(=corresponding value) 1(=1) 4(=4) 7(=7) 10(=10) 11(=1) 14(=4) 17(=7) 20(=10) and so on.
in MatLab it would be y=find(ismember(x,[1, 4, 7, 10 ])),
Please, help! Thanks, Pavel
something like this?
foo <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
bar <- c(1, 4, 7, 10)
which(foo %in% bar)
#> [1] 1 4 7 10 11 14 17 20
#nicola, feel free to copy my answer and get the recognition for your answer, simply trying to close answered questions.
The %in% operator is what you want. For example,
# data in x
targets <- c(1, 4, 7, 10)
locations <- x %in% targets
# locations is a logical vector you can then use:
y <- x[locations]
There'll be an extra step or two if you wanted the row and column indices of the locations, but it's not clear if you do. (Note, the logicals will be in column order).

Resources