Related
I have a very specific error so googling wasn't helpful and I'm sorry I don't know how to provide a simple producible example for this issue. The code below runs perfectly on my local machine but on the HPC it is producing this error:
*** caught segfault ***
address 0x2ad718ba0440, cause 'memory not mapped'
Traceback:
1: array(.Fortran("hus_vertical_interpolation", m = as.integer(DIM[1]), n = as.integer(DIM[2]), o = as.integer(DIM[3]), p = as.integer(DIM[4]), req = as.integer(length(req_press_levels)), hus_on_model_level = as.numeric(spec_hum_data[]), pres = as.numeric(req_press_levels), pressure_full_level = as.numeric(pressure[]), hus_on_press_level = as.numeric(output_array[]))$hus_on_press_level, dim = output_DIM)
2: Specific_humidity_afterburner(spec_hum_file = q_nc.files[x], req_press_levels = required_PLev)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
The code is supposed to:
Loop over a vector of NetCDF files and pass the filename spec_hum_file to function Specific_humidity_afterburner.
The function reads the NetCDF file, extract data pass to the first compiled subroutine, do the math and return the values.
Take the result, pass it to another FORTRAN subroutine and return the second result.
Write the second result to a new NetCDF file.
The error occurs in step 3. The R function is:
Specific_humidity_afterburner<-function(spec_hum_file,req_press_levels){
require(ff)
require(ncdf4)
require(stringi)
require(DescTools)
library(stringr)
library(magrittr)
#1============================================================================
#Reading data from netCDF file
#2============================================================================
#Reading other variables
#3============================================================================
# First Fortran subroutine
#4============================================================================
#load vertical interpolate subroutine for specific humidity
dyn.load("spec_hum_afterburner/vintp2p_afterburner_hus.so")
#check
is.loaded("hus_vertical_interpolation")
DIM<-dim(spec_hum_data)
output_DIM<-c(DIM[1],DIM[2],length(req_press_levels),DIM[4])
output_array<-ff(array(0.00,dim =output_DIM),dim =output_DIM)
result<- array(.Fortran("hus_vertical_interpolation",
m=as.integer(DIM[1]),
n=as.integer(DIM[2]),
o=as.integer(DIM[3]),
p=as.integer(DIM[4]),
req = as.integer(length(req_press_levels)),
pres=as.numeric(req_press_levels),
pressure_full_level=as.numeric(pressure[]),
hus_on_model_level=as.numeric(spec_hum_data[]),
hus_on_press_level=as.numeric(output_array[]))$hus_on_press_level,
dim =output_DIM)
DIMNAMES<-dimnames(spec_hum_data)
DIMNAMES[["lev"]]<-req_press_levels
Specific_humidity<- ff(result, dim = output_DIM,
dimnames =DIMNAMES )
rm(result)
#5============================================================================
# Writing NetCDF file of the interpolated values
}
Fortran subroutine:
subroutine hus_vertical_interpolation(m,n,o,p,req,pres, &
pressure_full_level,hus_on_model_level,hus_on_press_level)
implicit none
integer :: m,n,o,p,req
integer :: x,y,s,t,plev
double precision :: pres(req),hus_on_model_level(m,n,o,p)
double precision :: pressure_full_level(m,n,o,p)
double precision :: delta_hus,delta_p,grad_hus_p,diff_p
double precision, intent(out) :: hus_on_press_level(m,n,req,p)
real :: arg = -1.0,NaN
NaN= sqrt(arg)
do plev=1,req
do t=1,p
do x=1,m
do y=1,n
do s=1,o
!above uppest level
if(pres(plev) .LT. pressure_full_level(x,y,1,t)) then
hus_on_press_level(x,y,plev,t) = NaN
end if
! in between levels
if(pres(plev) .GE. pressure_full_level(x,y,s,t) .AND. pres(plev) .LE. &
pressure_full_level(x,y,s+1,t) ) then
delta_hus = hus_on_model_level(x,y,s,t) - hus_on_model_level(x,y,s+1,t)
delta_p = log(pressure_full_level(x,y,s,t))&
- log(pressure_full_level(x,y,s+1,t))
grad_hus_p = delta_hus /delta_p
diff_p = log(pres(plev)) - log(pressure_full_level(x,y,s,t))
hus_on_press_level(x,y,plev,t) = hus_on_model_level(x,y,s,t)&
+ grad_hus_p * diff_p
end if
! extrapolation below the ground
if(pres(plev) .GT. pressure_full_level(x,y,o,t)) then
hus_on_press_level(x,y,plev,t) = hus_on_model_level(x,y,o,t)
end if
end do
end do
end do
end do
end do
end subroutine hus_vertical_interpolation
Fortran subroutine was compiled with:
gfortran -fPIC -shared -ffree-form vintp2p_afterburner_hus.f90 -o vintp2p_afterburner_hus.so
The error behaviour is unpredictable for example, can happen at index 1, 2, 8, .. etc of the loop. We have tried to hand over the big array to the Fortran subroutine as the last variable, it minimized the occurrence of the error.
Also, the NetCDF files have a size of ~2GB. Another point to mention, The modules are built with EasyBuild so conflicts are not probable as HPC support team stated. We have tried many solutions as far as we know and no progress!
I've written a rudimentary algorithm in Fortran 95 to calculate the gradient of a function (an example of which is prescribed in the code) using central differences augmented with a procedure known as Richardson extrapolation.
function f(n,x)
! The scalar multivariable function to be differentiated
integer :: n
real(kind = kind(1d0)) :: x(n), f
f = x(1)**5.d0 + cos(x(2)) + log(x(3)) - sqrt(x(4))
end function f
!=====!
!=====!
!=====!
program gradient
!==============================================================================!
! Calculates the gradient of the scalar function f at x=0using a finite !
! difference approximation, with a low order Richardson extrapolation. !
!==============================================================================!
parameter (n = 4, M = 25)
real(kind = kind(1d0)) :: x(n), xhup(n), xhdown(n), d(M), r(M), dfdxi, h0, h, gradf(n)
h0 = 1.d0
x = 3.d0
! Loop through each component of the vector x and calculate the appropriate
! derivative
do i = 1,n
! Reset step size
h = h0
! Carry out M successive central difference approximations of the derivative
do j = 1,M
xhup = x
xhdown = x
xhup(i) = xhup(i) + h
xhdown(i) = xhdown(i) - h
d(j) = ( f(n,xhup) - f(n,xhdown) ) / (2.d0*h)
h = h / 2.d0
end do
r = 0.d0
do k = 3,M r(k) = ( 64.d0*d(k) - 20.d0*d(k-1) + d(k-2) ) / 45.d0
if ( abs(r(k) - r(k-1)) < 0.0001d0 ) then
dfdxi = r(k)
exit
end if
end do
gradf(i) = dfdxi
end do
! Print out the gradient
write(*,*) " "
write(*,*) " Grad(f(x)) = "
write(*,*) " "
do i = 1,n
write(*,*) gradf(i)
end do
end program gradient
In single precision it runs fine and gives me decent results. But when I try to change to double precision as shown in the code, I get an error when trying to compile claiming that the assignment statement
d(j) = ( f(n,xhup) - f(n,xhdown) ) / (2.d0*h)
is producing a type mismatch real(4)/real(8). I have tried several different declarations of double precision, appended every appropriate double precision constant in the code with d0, and I get the same error every time. I'm a little stumped as to how the function f is possibly producing a single precision number.
The problem is that f is not explicitely defined in your main program, therefore it is implicitly assumed to be of single precision, which is the type real(4) for gfortran.
I completely agree to the comment of High Performance Mark, that you really should use implicit none in all your fortran code, to make sure all object are explicitely declared. This way, you would have obtained a more appropriate error message about f not being explicitely defined.
Also, you could consider two more things:
Define your function within a module and import that module in the main program. It is a good practice to define all subroutines/functions within modules only, so that the compiler can make extra checks on number and type of the arguments, when you invoke the function.
You could (again in module) introduce a constant for the precicision and use it everywhere, where the kind of a real must be specified. Taking the example below, by changing only the line
integer, parameter :: dp = kind(1.0d0)
into
integer, parameter :: dp = kind(1.0)
you would change all your real variables from double to single precision. Also note the _dp suffix for the literal constants instead of the d0 suffix, which would automatically adjust their precision as well.
module accuracy
implicit none
integer, parameter :: dp = kind(1.0d0)
end module accuracy
module myfunc
use accuracy
implicit none
contains
function f(n,x)
integer :: n
real(dp) :: x(n), f
f = 0.5_dp * x(1)**5 + cos(x(2)) + log(x(3)) - sqrt(x(4))
end function f
end module myfunc
program gradient
use myfunc
implicit none
real(dp) :: x(n), xhup(n), xhdown(n), d(M), r(M), dfdxi, h0, h, gradf(n)
:
end program gradient
I cannot solve a problem in Scilab because it get stucked because of round-off errors. I get the message
!--error 9999
Error: Round-off error detected, the requested tolerance (or default) cannot be achieved. Try using bigger tolerances.
at line 2 of function scalpol called by :
at line 7 of function gram_schmidt_pol called by :
gram_schmidt_pol(a,-1/2,-1/2)
It's a Gram Schmidt process with the integral of the product of two functions and a weight as the scalar product, between -1 and 1.
gram_schmidt_pol is the process specially designed for polynome, and scalpol is the scalar product described for polynome.
The a and b are parameters for the weigth, which is (1+x)^a*(1-x)^b
The entry is a matrix representing a set of vectors, it works well with the matrix [[1;2;3],[4;5;6],[7;8;9]], but it fails with the above message error on matrix eye(2,2), in addition to this, I need to do it on eye(9,9) !
I have looked for a "tolerance setting" in the menus, there is some in General->Preferences->Xcos->Simulation but I believe this is not for what I wan't, I have tried low settings (high tolerance) in it and it hasn't change anything.
So how can I solve this rounf-off problem ?
Feel free to tell me my message lacks of clearness.
Thank you.
Edit: Code of the functions :
// function that evaluate a polynomial (vector of coefficients) in x
function [y] = pol(p, x)
y = 0
for i=1:length(p)
y = y + p(i)*x^(i-1)
end
endfunction
// weight function evaluated in x, parametrized by a and b
// (poids = weight in french)
function [y] = poids(x, a, b)
y = (1-x)^a*(1+x)^b
endfunction
// scalpol compute scalar product between polynomial p1 and p2
// using integrate, the weight and the pol functions.
function [s] = scalpol(p1, p2, a, b)
s = integrate('poids(x,a, b)*pol(p1,x)*pol(p2,x)', 'x', -1, 1)
endfunction
// norm associated to scalpol
function [y] = normscalpol(f, a, b)
y = sqrt(scalpol(f, f, a, b))
endfunction
// finally the gram schmidt process on a family of polynome
// represented by a matrix
function [o] = gram_schmidt_pol(m, a, b)
[n,p] = size(m)
o(1:n) = m(1:n,1)/(normscalpol(m(1:n,1), a, b))
for k = 2:p
s =0
for i = 1:(k-1)
s = s + (scalpol(o(1:n,i), m(1:n,k), a, b) / scalpol(o(1:n,i),o(1:n,i), a, b) .* o(1:n,i))
end
o(1:n,k) = m(1:n,k) - s
o(1:n,k) = o(1:n,k) ./ normscalpol(o(1:n,k), a, b)
end
endfunction
By default, Scilab's integrate routine tries to achieve absolute error at most 1e-8 and relative error at most 1e-14. This is reasonable, but its treatment of relative error does not take into account the issues that occur when the exact value is zero. (See How to calculate relative error when true value is zero?). For this reason, even the simple
integrate('x', 'x', -1, 1)
throws an error (in Scilab 5.5.1).
And this is what happens in the process of running your program: some integrals are zero. There are two solutions:
(A) Give up on the relative error bound, by specifying it as 1:
integrate('...', 'x', -1, 1, 1e-8, 1)
(B) Add some constant to the function being integrated, then subtract from the result:
integrate('100 + ... ', 'x', -1, 1) - 200
(The latter should work in most cases, though if the integral happens to be exactly -200, you'll have the same problem again)
The above works for gram_schmidt_pol(eye(2,2), -1/2, -1/2) but for larger, say, gram_schmidt_pol(eye(9,9), -1/2, -1/2), it throws the error "The integral is probably divergent, or slowly convergent".
It appears that the adaptive integration routine can't handle the functions of the kind you have. A fallback is to use the simple inttrap instead, which just applies the trapezoidal rule. Since at x=-1 and 1 the function poids is undefined, the endpoints have to be excluded.
function [s] = scalpol(p1, p2, a, b)
t = -0.9995:0.001:0.9995
y = poids(t,a, b).*pol(p1,t).*pol(p2,t)
s = inttrap(t,y)
endfunction
In order for this to work, other related functions must be vectorized (* and ^ changed to .* and .^ where necessary):
function [y] = pol(p, x)
y = 0
for i=1:length(p)
y = y + p(i)*x.^(i-1)
end
endfunction
function [y] = poids(x, a, b)
y = (1-x).^a.*(1+x).^b
endfunction
The result is guaranteed to work, though the precision may be a bit lower: you are going to get some numbers like 3D-16 which are actually zeros.
I asked a question a few days ago here and got an answer that seems like it would work- it involves using linsolve to find the solutions to a system of equations that are all modulo p, where p is a non-prime integer.
However, when I try to run the commands from the provided answer, or the linsolve help page, I get an error saying linsolve doesn't support arguments of type 'sym'. Is using linsolve with sym variables only possible in R2013b? I've also tried it with my school's copy, which is R2012b. Here is the code I'm attempting to execute (from the answer at the above link):
A = [0 5 4 1;1 7 0 2;8 1 0 2;10 5 1 0];
b = [2946321;5851213;2563617;10670279];
s = mod(linsolve(sym(A),sym(b)),8)
And the output is:
??? Undefined function or method linsolve' for input arguments of type 'sym'.
I've also tried to use the function solve for this, however even if I construct the equations represented by the matrices A and b above, I'm having issues. Here's what I'm attempting:
syms x y z q;
solve(5*y + 4*z + q == 2946321, x + 7*y + 2*q == 5851213, 8*x + y + 2*q == 2563617, 10*x + 5*y + z == 10670279,x,y,z,q)
And the output is:
??? Error using ==> char
Conversion to char from logical is not possible.
Error in ==> solve>getEqns at 169
vc = char(v);
Error in ==> solve at 67
[eqns,vars] = getEqns(varargin{:});
Am I using solve wrong? Should I just try to execute my code in R2013b to use linsolve with symbolic data types?
The Symbolic Math toolbox math toolbox has changed a lot (for the better) over the years. You might not have sym/linsolve, but does this work?:
s = mod(sym(A)\sym(b),8)
That will basically do the same thing. sym/linsolve just does some extra input checking and and rank calculation to mirror the capabilities of linsolve.
You're using solve correctly for current versions, but it looks like R2010b may not understand the == operator (sym/eq) in this context. You can use the old string format to specify your equations:
eqs = {'5*y + 4*z + q = 2946321',...
'x + 7*y + 2*q = 5851213',...
'8*x + y + 2*q = 2563617',...
'10*x + 5*y + z = 10670279'};
vars = {'x','y','z','q'};
[x,y,z,q] = solve(eqs{:},vars{:})
What happens for a global variable when running in the parallel mode?
I have a global variable, "to_be_optimized_parameterIndexSet", which is a vector of indexes that should be optimized using gamultiobj and I have set its value only in the main script(nowhere else).
My code works properly in serial mode but when I switch to parallel mode (using "matlabpool open" and setting proper values for 'gaoptimset' ) the mentioned global variable becomes empty (=[]) in the fitness function and causes this error:
??? Error using ==> parallel_function at 598
Error in ==> PF_gaMultiFitness at 15 [THIS LINE: constants(to_be_optimized_parameterIndexSet) = individual;]
In an assignment A(I) = B, the number of elements in B and
I must be the same.
Error in ==> fcnvectorizer at 17
parfor (i = 1:popSize)
Error in ==> gamultiobjMakeState at 52
Score =
fcnvectorizer(state.Population(initScoreProvided+1:end,:),FitnessFcn,numObj,options.SerialUserFcn);
Error in ==> gamultiobjsolve at 11
state = gamultiobjMakeState(GenomeLength,FitnessFcn,output.problemtype,options);
E rror in ==> gamultiobj at 238
[x,fval,exitFlag,output,population,scores] = gamultiobjsolve(FitnessFcn,nvars, ...
Error in ==> PF_GA_mainScript at 136
[x, fval, exitflag, output] = gamultiobj(#(individual)PF_gaMultiFitness(individual, initialConstants), ...
Caused by:
Failure in user-supplied fitness function evaluation. GA cannot continue.
I have checked all the code to make sure I've not changed this global variable everywhere else.
I have a quad-core processor.
Where is the bug? any suggestion?
EDIT 1: The MATLAB code in the main script:
clc
clear
close all
format short g
global simulation_duration % PF_gaMultiFitness will use this variable
global to_be_optimized_parameterIndexSet % PF_gaMultiFitness will use this variable
global IC stimulusMoment % PF_gaMultiFitness will use these variables
[initialConstants IC] = oldCICR_Constants; %initialize state
to_be_optimized_parameterIndexSet = [21 22 23 24 25 26 27 28 17 20];
LB = [ 0.97667 0.38185 0.63529 0.046564 0.23207 0.87484 0.46014 0.0030636 0.46494 0.82407 ];
UB = [1.8486 0.68292 0.87129 0.87814 0.66982 1.3819 0.64562 0.15456 1.3717 1.8168];
PopulationSize = input('Population size? ') ;
GaTimeLimit = input('GA time limit? (second) ');
matlabpool open
nGenerations = inf;
options = gaoptimset('PopulationSize', PopulationSize, 'TimeLimit',GaTimeLimit, 'Generations', nGenerations, ...
'Vectorized','off', 'UseParallel','always');
[x, fval, exitflag, output] = gamultiobj(#(individual)PF_gaMultiFitness(individual, initialConstants), ...
length(to_be_optimized_parameterIndexSet),[],[],[],[],LB,UB,options);
matlabpool close
some other piece of code to show the results...
The MATLAB code of the fitness function, "PF_gaMultiFitness":
function objectives =PF_gaMultiFitness(individual, constants)
global simulation_duration IC stimulusMoment to_be_optimized_parameterIndexSet
%THIS FUNCTION RETURNS MULTI OBJECTIVES AND PUTS EACH OBJECTIVE IN A COLUMN
constants(to_be_optimized_parameterIndexSet) = individual;
[smcState , ~, Time]= oldCICR_CompCore(constants, IC, simulation_duration,2);
targetValue = 1; % [uM]desired [Ca]i peak concentration
afterStimulus = smcState(Time>stimulusMoment,14); % values of [Ca]i after stimulus
peak_Ca_value = max(afterStimulus); % smcState(:,14) is [Ca]i
if peak_Ca_value < 0.8 * targetValue
objectives(1,1) = inf;
else
objectives(1, 1) = abs(peak_Ca_value - targetValue);
end
pkIDX = peakFinder(afterStimulus);
nPeaks = sum(pkIDX);
if nPeaks > 1
peakIndexes = find(pkIDX);
period = Time(peakIndexes(2)) - Time(peakIndexes(1));
objectives(1,2) = 1e5* 1/period;
elseif nPeaks == 1 && peak_Ca_value > 0.8 * targetValue
objectives(1,2) = 0;
else
objectives(1,2) = inf;
end
end
Global variables do not get passed from the MATLAB client to the workers executing the body of the PARFOR loop. The only data that does get sent into the loop body are variables that occur in the text of the program. This blog entry might help.
it really depends on the type of variable you're putting in. i need to see more of your code to point out the flaw, but in general it is good practice to avoid assuming complicated variables will be passed to each worker. In other words anything more then a primitive may need to be reinitialized inside a parallel routine or may need have specific function calls (like using feval for function handles).
My advice: RTM