Initial simplex for non-linear application of nelder-mead [closed] - r
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 days ago.
Improve this question
I am using a modified Weibull CDF to predict the height of trees in a stand at a given age: Y = b0 * {1.0 + EXP[-b1 * (X**b2)]}. Where Y = height of dominant trees; X = stand age; b0 is the asymptote parameter to be predicted; b1 is the rate parameter to be predicted; and b2 is the shape parameter to be predicted. Each of b0, b1, b2 are linear functions of one or more bio-geo-climatic and/or physiographic variables. I have a data set of N ~ 2000 stands. I get an initial fit with a higher tolerance, and then sample the data with replacement (i.e. bootstrap) and re-fit at a lower tolerance. With ~200 iterations, I generate a distribution of parameter estimates from which I can identify the least-significant element across b0, b1, b2. Eliminate (or add), and repeat.
I have a working version of this process in R using OPTIM, where the minimizing function is evaluating Z = the sum of squares (SSQ) rather than the Y values directly. My problem is computer time: the initial fit requires about 1 day, and the 200 bootstraps require an additional 2-3 days depending. The 40 or so additions/reductions in variables have been spinning around continuously on a server since August 2022. So I am attempting to port this into a FORTRAN 90 .dll called from R.
Here is what my data look like:
`
Y <- c(50.1, 80.3, 60.4, ... , 75.5, 90.2), len(Y) = 2000
X <- c(21, 38, 27, ... , 34, 37), len(X) = 2000
b0 <- f(P1, P2, P3, P4) len(b0) = 4
b1 <- f(P3, P5, P7, P8, P12) len(b1) = 5
b2 <- f(P6, P2, P8, P9, P10) len(b2) = 5
`
where P is the set of bio-geo-climatic and physiographic variables, with values at each X. Note that some of the same predictors are repeated across the parameters, but since they act on different parts of the equation (asymp,rate,shape), their sign and magnitude will vary, and are therefore treated as separate variables. My data matrix has 2000 rows by (len(b0) + len(b1) + len(b2) + 3) columns, one for each predictor in each parameter, plus an intercept term for each of b0,b1,b2. The number of columns may vary over time as I'm adding/subtracting terms to/from each. so my fitting data is a matrix with 2000 rows and a column structure that looks like this:
(icpt0, P1, P2, P3, P4, icpt1, P3, P5, P7, P8, P12, icpt, P6, P2, P8, P9, P10), cols = 17, rows = 2000
When evaluating the function I grab the appropriate columns to get the parameters:
Y.hat = (icpt0 + P1 + P2 + P3 + P4) * {1.0 + EXP[-(icpt1 + P3 + P5 + P7 + P8 + P12) * (X**(icpt2 + P6 + P2 + P8 + P9 + P10))]}
residual = Y(X) - Y.hat(X)
squares = residual**2
sum squares across D to get SSQ
Z(i) = SSQ (this is what I'm actually minimizing, not Y across D)
I need help constructing the initial coefficient simplex S = (cols+1) x cols vertices to pass to the FORTRAN implementation of Nelder-Mead. Within R and OPTIM, I only need to pass a single point (b0,b1,b2), where the estimates are first applied only to the intercepts. I'm not sure how to construct the appropriate unit vectors to build a robust initial matrix.
if my initial point esimate is b0 = 200.0, b1 = 0.001, b2 = 1.5, then my first row of vertices S(1) looks like:
icpt0
P1
P2
P3
P4
icpt1
P3
P5
P7
P8
P12
icpt
P6
P2
P8
P9
P10
200.0
0.0
0.0
0.0
0.0
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
0.0
4.0
0.0
0.0
0.0
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.7
0.0
0.0
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
12.0
0.0
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
2.3
0.001
0.0
0.0
0.0
0.0
0.0
1.5
0.0
0.0
0.0
0.0
0.0
If the average of P1 across D is 50, then S(2) looks like row 2 with the expected parameter = 200/50. I repeated this process for each row, so that I have a diagonal of positive coefficient values across S, but zeroes or intercept terms otherwise. I see that I could input 0.00025 for all zeroes, and mess with positive values by 0.05, but I'm not sure what that really changes.
My FORTRAN .dll appears to work, however the results do no match outputs from R/OPTIM version (using same data/predictors); The FORTRAN results yield Z values that are at least an order of magnitude larger than R, and my re-starts don't improve anything. I figure that R/OPTIM is constructing a different version of S than above. What would a better initial simplex S look like? what would the unit vectors look like? I'm struggling since my initial points are (b0,b1,b2), but each are linear functions, so I don't know how to construct unit vectors to get a full array of positive values, which I suspect is what's needed.
Related
merge two plots next to each other in one figure in R (no ggplot) [closed]
Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 2 years ago. Improve this question I want to generate a figure that merges two plots next to each other in the same figure, I was able to do it in excel, but it was time consuming, is it possible to do it in R, this screenshot from excel: My data are: C WT PO RO MO 0 0 0 0 0 24 3.0 2.0 1.0 3.0 48 4.0 3.0 2.0 1.0 72 3.0 4.0 6.5 3.0 96 7.0 2.0 4.0 3.0 D WT PO RO MO 0 0 0 0 0 24 0.0 1.0 1.0 0.0 48 0.0 1.0 0.0 0.0 72 0.0 1.0 0.0 1.0 96 0.0 1.0 0.0 0.5 The problem with this figure, that was made in excel, is that the starting point of second plot extends beyond 0.
Here's a non-ggplot solution. Note I put the x data in a column names x. # Make two subplots par(mfrow=c(1,2)) # 1 row with 2 columns # Make the first plot plot(df1$x,df1$wt,data=df1,type='b',main='First Plot', ylab='Titer') lines(df1$x,df1$po,col='blue',type='b') lines(df1$x,df1$ro,col='red',type='b') lines(df1$x,df1$mo,col='darkgreen',type='b') # Make the second plot plot(df2$x,df2$wt,type='b',main='Second Plot', ylim=c(0,8),ylab=NA) lines(df2$x,df2$po,col='blue',type='b') lines(df2$x,df2$ro,col='red',type='b') lines(df2$x,df2$mo,col='darkgreen',type='b') If you want them truly on the same x axis, you have to shift the lines for the second plot and adjust the axes ticks/labels like below. # Shift the x data for the second plot df1 = data.frame(x=c(0,24,48,72,96), wt=c(0,3,4,3,7), po=c(0,2,3,4,2), ro=c(0,1,2,6.5,4), mo=c(0,3,1,3,3)) # shift the x data for the second plot df2 = data.frame(x=c(0,24,48,72,96) + 110, wt=rep(0,5), po=c(0,rep(1,4)), ro=c(0,1,0,0,0), mo=c(0,0,0,1,.5)) # Make the first plot plot(df1$x,df1$wt,data=df1,type='b',main='Combined Plots', ylab='Titer',xlab = 'X Axis', xlim = c(0,210),axes=F) lines(df1$x,df1$po,col='blue',type='b') lines(df1$x,df1$ro,col='red',type='b') lines(df1$x,df1$mo,col='darkgreen',type='b') # Add lines for the second plot lines(df2$x,df2$wt,type='b',col='black') lines(df2$x,df2$po,col='blue',type='b') lines(df2$x,df2$ro,col='red',type='b') lines(df2$x,df2$mo,col='darkgreen',type='b') # Reset the axes labels axis(1,at=c(df1$x,df2$x),labels = rep(df1$x,2)) axis(2,at = 0:8)
3D vector field
I need help plotting a vector field in 3D, my problem is that I have a table with x, y, z, Vx, Vy, Vz These are the position and components of each vector, the vectors are normalized so its magnitude is one, and this is the problem because the unit cell on which my vectors are centered is a cube of side 5nm. And from this follow that drawing a vector with magnitude 1 on a cell so small won't work. So my question is, how can I normalize the vectors to the size of the cell if this is possible??
I'm not sure whether I fully understand your problem. Please provide more information, data and some code. My guess: you want to scale the length of your vector by factor 5 (or maybe 5e-9?). Please clarify. Code: ### scale vectors reset session set view equal xyz # example data $Data <<EOD 0.0 0.0 0.0 1.0000 0.0000 0.0000 0.0 0.0 0.0 0.0000 1.0000 0.0000 0.0 0.0 0.0 0.7071 0.7071 0.0000 0.0 0.0 0.0 0.5773 0.5773 0.5773 EOD myFactor = 5 # or do you mean 5e-9 ??? set view 70,45 splot $Data u 1:2:3:($4*myFactor):($5*myFactor):($6*myFactor) w vectors notitle ### end of code Result:
Squish points to within plot axes in R plot
I am making a volcano plot in R. I have a huge range of pvalues and log2fold changes. I set an xlim and ylim because I want to focus in on the central region of the plot. However, naturally setting my limits excludes some of my data. I would like to have the data outside of my axes limits displayed at my limits. So for example, a fold change of 4 would be displayed as a point just outside of my xlim of 2. with(mydata, plot(ExpLogRatio, -log10(Expr_p_value), pch=20, main = "Volcano Plot",xlim=c(-2,2),ylim=c(0,40))) this works but cuts out some of my datapoints (those with fold change above 2 and less than -2 and with pvalue of less than -log10(40)
if I understand correctly, I'd just use pmin and pmax to limit your values, e.g.: values = seq(-3, 3, len=21) pmin(pmax(values, -2), 2) gives back: [1] -2.0 -2.0 -2.0 -2.0 -1.8 -1.5 -1.2 -0.9 -0.6 -0.3 0.0 0.3 0.6 0.9 1.2 [16] 1.5 1.8 2.0 2.0 2.0 2.0 i.e. it's limited values to the range (-2, +2). applying this to your data, you'd do something like: with(mydata, { lratio <- pmin(pmax(ExpLogRatio, -2.1), 2.1) pch <- ifelse(ExpLogRatio == lratio, 20, 4) plot(lratio, -log10(Expr_p_value), pch=pch, ylim=c(0, 40)) }) you'll probably want to set xlab and main to set titles, but I've not included that to keep the answer tidier. also extending this to the y-axis would obviously be easy note I've also changed the plotting point style to indicate which points were truncated
Applying a weight matrix to an existing graph
Suppose I have a very simple undirected graph G with n=4 nodes: G = graph_from_literal(1--2, 1:2--3, 1--4) and a nxn weight matrix W such as: 1 2 3 4 1 0.0 0.5 0.9 1.3 2 0.5 0.0 1.0 0.0 3 0.9 1.0 0.0 0.0 4 1.3 0.0 0.0 0.0 Question: What is the fastest way of applying weights in W to the edges of G? I could use the graph_from_adjacency_matrix function like the following: G1 = graph_from_adjacency_matrix(ECV, mode="undirected", weighted=TRUE, diag=FALSE) and then map the weightattribute of G1 to edges in G. But it is a very expensive (and not elegant) solution when G is a very big graph. How can this be done?
Something like the following should work: library(igraph) G = graph_from_literal(1--2, 1:2--3, 1--4) # The weighted adjacency matrix of the new graph should be obtained simply # by element-wise multiplication of W with the binary adjacency of G G=graph_from_adjacency_matrix(as.matrix(W) * as.matrix(get.adjacency(G, type="both")), mode="undirected", weighted=TRUE) plot(G, edge.label=E(G)$weight) [Edit] As per the quick fixes discussed in the comments, if the weight matrix contains zeros and we don't want deletion of the corresponding edges, we can set the edge values to a small number: W[ which(W == 0) ] = .Machine$double.xmin. Now, in order to show the value of the weight in the graph plot correctly, before plotting the graph we can update the edge weights, without affecting the adjacency matrix, as follows: E(graph)[ which(E(graph)$weight == .Machine$double.xmin) ]$weight = 0.0
How to plot three coordinates in function of a fourth (3 positional coordinates in function of time) in R?
I would like to plot a 4 dimensional graph in R. I have three coordinates of position and a fourth variable (time). In this plot I would like to show these 3 coordinates in function of time. I only have one observation for each coordinate in each time. Somebody know if this is possible in R? I only found solutions for 3D plots. For example: coord_1<-c(0.5,0.3,0.9) coord_2<-c(0.2,0.1,0.6) coord_3<-c(0.7,0.4,0.8) time_seg<-c(0.1,0.5,1) data_plot<-data.frame(coord_1,coord_2,coord_3, time_seg) data_plot coord_1 coord_2 coord_3 time_seg 1 0.5 0.2 0.7 0.1 2 0.3 0.1 0.4 0.5 3 0.9 0.6 0.8 1.0