Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 days ago.
Improve this question
I am using a modified Weibull CDF to predict the height of trees in a stand at a given age: Y = b0 * {1.0 + EXP[-b1 * (X**b2)]}. Where Y = height of dominant trees; X = stand age; b0 is the asymptote parameter to be predicted; b1 is the rate parameter to be predicted; and b2 is the shape parameter to be predicted. Each of b0, b1, b2 are linear functions of one or more bio-geo-climatic and/or physiographic variables. I have a data set of N ~ 2000 stands. I get an initial fit with a higher tolerance, and then sample the data with replacement (i.e. bootstrap) and re-fit at a lower tolerance. With ~200 iterations, I generate a distribution of parameter estimates from which I can identify the least-significant element across b0, b1, b2. Eliminate (or add), and repeat.
I have a working version of this process in R using OPTIM, where the minimizing function is evaluating Z = the sum of squares (SSQ) rather than the Y values directly. My problem is computer time: the initial fit requires about 1 day, and the 200 bootstraps require an additional 2-3 days depending. The 40 or so additions/reductions in variables have been spinning around continuously on a server since August 2022. So I am attempting to port this into a FORTRAN 90 .dll called from R.
Here is what my data look like:
`
Y <- c(50.1, 80.3, 60.4, ... , 75.5, 90.2), len(Y) = 2000
X <- c(21, 38, 27, ... , 34, 37), len(X) = 2000
b0 <- f(P1, P2, P3, P4) len(b0) = 4
b1 <- f(P3, P5, P7, P8, P12) len(b1) = 5
b2 <- f(P6, P2, P8, P9, P10) len(b2) = 5
`
where P is the set of bio-geo-climatic and physiographic variables, with values at each X. Note that some of the same predictors are repeated across the parameters, but since they act on different parts of the equation (asymp,rate,shape), their sign and magnitude will vary, and are therefore treated as separate variables. My data matrix has 2000 rows by (len(b0) + len(b1) + len(b2) + 3) columns, one for each predictor in each parameter, plus an intercept term for each of b0,b1,b2. The number of columns may vary over time as I'm adding/subtracting terms to/from each. so my fitting data is a matrix with 2000 rows and a column structure that looks like this:
(icpt0, P1, P2, P3, P4, icpt1, P3, P5, P7, P8, P12, icpt, P6, P2, P8, P9, P10), cols = 17, rows = 2000
When evaluating the function I grab the appropriate columns to get the parameters:
Y.hat = (icpt0 + P1 + P2 + P3 + P4) * {1.0 + EXP[-(icpt1 + P3 + P5 + P7 + P8 + P12) * (X**(icpt2 + P6 + P2 + P8 + P9 + P10))]}
residual = Y(X) - Y.hat(X)
squares = residual**2
sum squares across D to get SSQ
Z(i) = SSQ (this is what I'm actually minimizing, not Y across D)
I need help constructing the initial coefficient simplex S = (cols+1) x cols vertices to pass to the FORTRAN implementation of Nelder-Mead. Within R and OPTIM, I only need to pass a single point (b0,b1,b2), where the estimates are first applied only to the intercepts. I'm not sure how to construct the appropriate unit vectors to build a robust initial matrix.
if my initial point esimate is b0 = 200.0, b1 = 0.001, b2 = 1.5, then my first row of vertices S(1) looks like:
icpt0
P1
P2
P3
P4
icpt1
P3
P5
P7
P8
P12
icpt
P6
P2
P8
P9
P10
200.0
0.0
0.0
0.0
0.0
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
0.0
4.0
0.0
0.0
0.0
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.7
0.0
0.0
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
12.0
0.0
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
2.3
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
If the average of P1 across D is 50, then S(2) looks like row 2 with the expected parameter = 200/50. I repeated this process for each row, so that I have a diagonal of positive coefficient values across S, but zeroes or intercept terms otherwise. I see that I could input 0.00025 for all zeroes, and mess with positive values by 0.05, but I'm not sure what that really changes.
My FORTRAN .dll appears to work, however the results do no match outputs from R/OPTIM version (using same data/predictors); The FORTRAN results yield Z values that are at least an order of magnitude larger than R, and my re-starts don't improve anything. I figure that R/OPTIM is constructing a different version of S than above. What would a better initial simplex S look like? what would the unit vectors look like? I'm struggling since my initial points are (b0,b1,b2), but each are linear functions, so I don't know how to construct unit vectors to get a full array of positive values, which I suspect is what's needed.
Suppose I have a very simple undirected graph G with n=4 nodes:
G = graph_from_literal(1--2, 1:2--3, 1--4)
and a nxn weight matrix W such as:
1 2 3 4
1 0.0 0.5 0.9 1.3
2 0.5 0.0 1.0 0.0
3 0.9 1.0 0.0 0.0
4 1.3 0.0 0.0 0.0
Question: What is the fastest way of applying weights in W to the edges of G?
I could use the graph_from_adjacency_matrix function like the following:
G1 = graph_from_adjacency_matrix(ECV, mode="undirected", weighted=TRUE, diag=FALSE)
and then map the weightattribute of G1 to edges in G.
But it is a very expensive (and not elegant) solution when G is a very big graph.
How can this be done?
Something like the following should work:
library(igraph)
G = graph_from_literal(1--2, 1:2--3, 1--4)
# The weighted adjacency matrix of the new graph should be obtained simply
# by element-wise multiplication of W with the binary adjacency of G
G=graph_from_adjacency_matrix(as.matrix(W) *
as.matrix(get.adjacency(G, type="both")), mode="undirected", weighted=TRUE)
plot(G, edge.label=E(G)$weight)
[Edit]
As per the quick fixes discussed in the comments, if the weight matrix contains zeros and we don't want deletion of the corresponding edges, we can set the edge values to a small number:
W[ which(W == 0) ] = .Machine$double.xmin.
Now, in order to show the value of the weight in the graph plot correctly, before plotting the graph we can update the edge weights, without affecting the adjacency matrix, as follows:
E(graph)[ which(E(graph)$weight == .Machine$double.xmin) ]$weight = 0.0
My math is a bit elementary so I apologize for any assumptions in advance.
I want to fetch values that exist on a simulated bell curve. I don't want to actually create a bell curve or plot one, I'd just like to use a function that given an input value can tell me the corresponding Y axis value on a hypothetical bell curve.
Here's the full problem statement:
I am generating floating point values between 0.0 and 1.0.
0.50 represents 2.0 on the bell curve, which is the maximum. The values < 0.50 and > 0.50 start dropping on this bell curve, so for example 0.40 and 0.60 are the same and could be something like 1.8. 1.8 is arbitrarily chosen for this example, and I'd like to know how I can tweak this 'gradient'.
Right now Im doing a very crude implementation, for example, for any value > 0.40 and < 0.60 the function returns 2.0, but I'd like to 'smooth' this and gain more 'control' over the descent/gradient
Any ideas how I can achieve this in Go
Gaussian function described here : https://en.wikipedia.org/wiki/Gaussian_function
has a bell-curve shape. Example of implementation :
package main
import (
"math"
)
const (
a = 2.0 // height of curve's peak
b = 0.5 // position of the peak
c = 0.1 // standart deviation controlling width of the curve
//( lower abstract value of c -> "longer" curve)
)
func curveFunc(x float64) float64 {
return a *math.Exp(-math.Pow(x-b, 2)/(2.0*math.Pow(c, 2)))
}
I'm interested in the fastest way to linearly interpolate a 1D function on regularly spaced data.
I don't quite understand how to use the scale function in Interpolations.jl:
using Interpolations
v = [x^2 for x in 0:0.1:1]
itp=interpolate(v,BSpline(Linear()),OnGrid())
itp[1]
# 0.0
itp[11]
# 1.0
scale(itp,0:0.1:1)
itp[0]
# -0.010000000000000002
# why is this not equal to 0.0, i.e. the value at the lowest index?
the function does not alter the object, as would be by scale!.
julia> sitp = scale(itp,0:0.1:1)
11-element Interpolations.ScaledInterpolation{Float64,1,Interpolations.BSplineInterpolation{Float64,1,Array{Float64,1},Interpolations.BSpline{Interpolations.Linear},Interpolations.OnGrid,0},Interpolations.BSpline{Interpolations.Linear},Interpolations.OnGrid,Tuple{FloatRange{Float64}}}:
julia> sitp[0]
0.0
thanks to spencerlyon for pointing that out.
I would like to plot a 4 dimensional graph in R. I have three coordinates of position and a fourth variable (time). In this plot I would like to show these 3 coordinates in function of time. I only have one observation for each coordinate in each time. Somebody know if this is possible in R? I only found solutions for 3D plots.
For example:
coord_1<-c(0.5,0.3,0.9)
coord_2<-c(0.2,0.1,0.6)
coord_3<-c(0.7,0.4,0.8)
time_seg<-c(0.1,0.5,1)
data_plot<-data.frame(coord_1,coord_2,coord_3, time_seg)
data_plot
coord_1 coord_2 coord_3 time_seg
1 0.5 0.2 0.7 0.1
2 0.3 0.1 0.4 0.5
3 0.9 0.6 0.8 1.0