I know how to make softmax stable by adding to element -max _i x_i. This avoids overflow and underflow.
Now, taking log of this can cause underflow. log softmax(x) can evaluate to zero, leading to -infinity.
I am not sure how to fix it. I know this is a common problem. I read several answers on it, which I didn't understand. But I am still confused on how to solve this problem.
PS: If you provide a simple example, it would be awesome.
In order to stabilize Logsoftmax, most implementations such as Tensorflow and Thenao, use a trick which takes out the largest component max(x_i). This trick is often used for stably computing softmax. For logsoftmax, we begin with:
After extracting out the exp(b) and using the fact that log(exp(x)) = x, we have:
If we set , this new equation has both overflow and underflow stability conditions.
In terms of code, if x is a vector:
def log_softmax(x):
x_off = x - np.max(x)
return x_off - np.log(np.sum(np.exp(x_off)))
See also: https://timvieira.github.io/blog/post/2014/02/11/exp-normalize-trick/
logsoftmax = logits - log(reduce_sum(exp(logits), dim))
refer: https://www.tensorflow.org/api_docs/python/tf/nn/log_softmax
Just use this as it take care of Nan
tf.nn.softmax_cross_entropy_with_logits(
labels, logits, axis=-1, name=None
)
logits = tf.constant([[4, 5, 1000]], dtype = tf.float32)
labels = tf.constant([[1,0,1]], dtype = tf.float32)
# Case-1
output = tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)
print(output)
>>> tf.Tensor([996.], shape=(1,), dtype=float32)
#Case-2
a = tf.nn.softmax(logits)
output = tf.reduce_sum(-(labels * tf.math.log(a)))
print(output)
>>> tf.Tensor(nan, shape=(), dtype=float32)
# this happens because value of softmax truncates to zero
print(a)
>>> <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[0., 0., 1.]], dtype=float32)>
Mathematical tricks cannot help you create log 0 be something other that -inf.
If you think it trough, the only way is you normalize the data so that you don't end in there.
Related
I am trying to use the DifferentialEquations.jl provided by julia, and it's working all right until I try to use it on a second order ODE.
Consider for instance the second order ODE
x''(t) = x'(t) + 2* x(t), with initial conditions
x'(0) = 0, x(0) = 1
which has an analytic solution given by: x(t) = 2/3 exp(-t) + 1/3 exp(2t).
To solve it numerically, I run the following code:
using DifferentialEquations;
function f_simple(ddu, du, u, p, t)
ddu[1] = du[1] + 2*u[1]
end;
du0 = [0.]
u0 = [1.]
tspan = (0.0,5.0)
prob2 = SecondOrderODEProblem(f_simple, du0, u0, tspan)
sol = solve(prob2,reltol=1e-8, abstol=1e-8);
With that,
sol(3)[2] = 122.57014434362732
whereas the analytic solution yields 134.50945587649028, and so I'm a bit lost here.
According to the the documentation for DifferentialEquations.jl, Vern7() is appropriate for high-accuracy solutions to non-stiff equations:
sol = solve(prob2, Vern7(), reltol=1e-8, abstol=1e-8)
julia> println(sol(3)[2])
134.5094558872943
On my machine, this matches the analytical solution quite closely. I'm not exactly sure what the default method used is: the documentation indicates that solve has some method of choosing an appropriate solver when one isn't specified.
For more information on Vern7(), check out Jim Verner's page on Runge-Kutta algorithms.
I've got a function, KozakTaper, that returns the diameter of a tree trunk at a given height (DHT). There's no algebraic way to rearrange the original taper equation to return DHT at a given diameter (4 inches, for my purposes)...enter R! (using 3.4.3 on Windows 10)
My approach was to use a for loop to iterate likely values of DHT (25-100% of total tree height, HT), and then use optimize to choose the one that returns a diameter closest to 4". Too bad I get the error message Error in f(arg, ...) : could not find function "f".
Here's a shortened definition of KozakTaper along with my best attempt so far.
KozakTaper=function(Bark,SPP,DHT,DBH,HT,Planted){
if(Bark=='ob' & SPP=='AB'){
a0_tap=1.0693567631
a1_tap=0.9975021951
a2_tap=-0.01282775
b1_tap=0.3921013594
b2_tap=-1.054622304
b3_tap=0.7758393514
b4_tap=4.1034897617
b5_tap=0.1185960455
b6_tap=-1.080697381
b7_tap=0}
else if(Bark=='ob' & SPP=='RS'){
a0_tap=0.8758
a1_tap=0.992
a2_tap=0.0633
b1_tap=0.4128
b2_tap=-0.6877
b3_tap=0.4413
b4_tap=1.1818
b5_tap=0.1131
b6_tap=-0.4356
b7_tap=0.1042}
else{
a0_tap=1.1263776728
a1_tap=0.9485083275
a2_tap=0.0371321602
b1_tap=0.7662525552
b2_tap=-0.028147685
b3_tap=0.2334044323
b4_tap=4.8569609081
b5_tap=0.0753180483
b6_tap=-0.205052535
b7_tap=0}
p = 1.3/HT
z = DHT/HT
Xi = (1 - z^(1/3))/(1 - p^(1/3))
Qi = 1 - z^(1/3)
y = (a0_tap * (DBH^a1_tap) * (HT^a2_tap)) * Xi^(b1_tap * z^4 + b2_tap * (exp(-DBH/HT)) +
b3_tap * Xi^0.1 + b4_tap * (1/DBH) + b5_tap * HT^Qi + b6_tap * Xi + b7_tap*Planted)
return(y=round(y,4))}
HT <- .3048*85 #converting from english to metric (sorry, it's forestry)
for (i in c((HT*.25):(HT+1))) {
d <- KozakTaper(Bark='ob',SPP='RS',DHT=i,DBH=2.54*19,HT=.3048*85,Planted=0)
frame <- na.omit(d)
optimize(f=abs(10.16-d), interval=frame, lower=1, upper=90,
maximum = FALSE,
tol = .Machine$double.eps^0.25)
}
Eventually I would like this code to iterate through a csv and return i for the best d, which will require some rearranging, but I figured I should make it work for one tree first.
When I print d I get multiple values, so it is iterating through i, but it gets held up at the optimize function.
Defining frame was my most recent tactic, because d returns one NaN at the end, but it may not be the best input for interval. I've tried interval=c((HT*.25):(HT+1)), defining KozakTaper within the for loop, and defining f prior to the optimize, but I get the same error. Suggestions for what part I should target (or other approaches) are appreciated!
-KB
Forestry Research Fellow, Appalachian Mountain Club.
MS, University of Maine
**Edit with a follow-up question:
I'm now trying to run this script for each row of a csv, "Input." The row contains the values for KozakTaper, and I've called them with this:
Input=read.csv...
Input$Opt=0
o <- optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
SPP='Input$Species',
DHT=x,
DBH=(2.54*Input$DBH),
HT=(.3048*Input$Ht),
Planted=0)),
lower=Input$Ht*.25, upper=Input$Ht+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
Input$Opt <- o$minimum
Input$Mht <- Input$Opt/.3048. # converting back to English
Input$Ht and Input$DBH are numeric; Input$Species is factor.
However, I get the error invalid function value in 'optimize'. I get it whether I define "o" or just run optimize. Oddly, when I don't call values from the row but instead use the code from the answer, it tells me object 'HT' not found. I have the awful feeling this is due to some obvious/careless error on my part, but I'm not finding posts about this error with optimize. If you notice what I've done wrong, your explanation will be appreciated!
I'm not an expert on optimize, but I see three issues: 1) your call to KozakTaper does not iterate through the range you specify in the loop. 2) KozakTaper returns a a single number not a vector. 3) You haven't given optimize a function but an expression.
So what is happening is that you are not giving optimize anything to iterate over.
All you should need is this:
optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
SPP='RS',
DHT=x,
DBH=2.54*19,
HT=.3048*85,
Planted=0)),
lower=HT*.25, upper=HT+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
$minimum
[1] 22.67713 ##Hopefully this is the right answer
$objective
[1] 0
Optimize will now substitute x in from lower to higher, trying to minimize the difference
I am calculating the solution of Chen's chaotic system using differential transform method. The code that I am using is:
x=zeros(1,7);
x(1)=-0.1;
y=zeros(1,7);
y(1)=0.5;
z=zeros(1,7);
z(1)=-0.6;
for k=0:5
x(k+2)=(40*gamma(1+k)/gamma(2+k))*(y(k+1)-x(k+1));
sum=0;
for l=0:k
sum=sum+x(l+1)*z(k+1-l);
end
y(k+2)=(gamma(1+k)/gamma(2+k))*(-12*x(k+1)-sum+28*y(k+1));
sum=0;
for l=0:k
sum=sum+x(l+1)*y(k+1-l);
end
z(k+2)=(gamma(1+k)/(1+k))*(sum-3*z(k+1));
end
s=fliplr(x);
t=0:0.05:2;
a=polyval(s,t);
plot(t,a)
What this code does is calculate x(k), y(k) and z(k) these are the coefficients of the polynomial that is approximating the solution.
The solution x(t) = sum_0^infinity x(k)t^k, and similarly the others. But this code doesn't give the desired output of a chaotic sequence the graph of x(t) that I am getting is:
This is not an answer, but a clearer and more correct (programmatically speaking) to write your loop:
for k = 1:6
x(k+1)=(40*1/k)*(y(k)-x(k));
temp_sum = sum(x(1:k).*z(k:-1:1),2);
y(k+1) = (1/k)*(-12*x(k)-temp_sum+28*y(k));
temp_sum = sum(x(1:k).*y(k:-1:1),2);
z(k+1) = (1/k)*(temp_sum-3*z(k));
end
The most important issue here is not overloading the built-in function sum (I replaced it with temp_sum. Other things include vectorization of the inner loops (using sum...), indexing that starts in 1 (instead of writing k+1 all the time), and removing unnecessary calls to gamma (gamma(k)/gamma(k+1) = 1/k).
I need to make a histogram, and my data points each carry a statistical weight. The standard hist function isn't equipped to handle this. I could of course import the numpy.histogram function, which handles weighted data just fine, but I thought it would be a good exercise in learning julia to try and augment the hist() function to accept weights as an optional (named) argument.
I started by looking at the julia source for hist(), and was able to modify it slightly (if amateurishly -- suggestions for improvements welcome), to get it sort of working:
function sturges(n) # Sturges' formula
n==0 && return one(n)
iceil(log2(n))+1
end
function weightedhist!{HT}(h::AbstractArray{HT}, v::AbstractVector, edg::AbstractVector; init::Bool=true, weights::AbstractVector = ones(HT,length(v)))
n = length(edg) - 1
length(weights) == length(v) || error("length(weights) must equal length(v)")
length(h) == n || error("length(h) must equal length(edg) - 1.")
if init
fill!(h, zero(HT))
end
for j=1:length(v)
i = searchsortedfirst(edg, v[j])-1
if 1 <= i <= n
h[i] += weights[j]
end
end
edg, h
end
weightedhist(v::AbstractVector, edg::AbstractVector; weights::AbstractVector = ones(Int,length(v))) = weightedhist!(Array(Float64, length(edg)-1), v, edg; weights=weights)
weightedhist(v::AbstractVector, n::Integer; weights::AbstractVector = ones(Int,length(v))) = weightedhist(v, histrange(v,n); weights=weights)
weightedhist(v::AbstractVector; weights::AbstractVector = ones(Int,length(v))) = weightedhist(v, sturges(length(v)); weights=weights)
If I generate some random data with
v = randn(10^5);
w = rand(length(v));
edges = floor(minimum(v)):0.1:ceil(maximum(v));
then weightedhist(v, edges; weights=w) agrees with numpy.histogram(v, edges, weights=w). If I leave out the optional keyword argument for weights, then weightedhist(v, edges) agrees with the built in hist(v, edges), and weightedhist(v) agrees with the built in hist(v), except for the fact that my function outputs floats rather than ints when no weights are provided.
I don't understand why this is the case (is h getting created as a float array? promoted?), and I'd like for the my function to fall back on the behavior of the built in one as closely as possible when no weights are provided.
Can anyone suggest why my function is outputting floats, and how I might change that behavior to output ints when no weights are provided? I'd like to do this without first creating the h array and then converting it from one type to another, since I'd like the code to be as fast as possible.
If I understand correctly, when you call
weightedhist(v, edges)
you are using the first of your three "extra" definitions at the bottom.
This calls
weightedhist!(Array(Float64, length(edg)-1), v, edg; weights=weights)
so in your "main" weightedhist! the HT parameterization will be Float64, so h will be filled with HT == Float64, hence the Float64 output. So changing it to Array(eltype(weights), length(edg)-1) would be sufficient, I believe.
How could one approximate Inverse Incomplete gamma function Г(s,x) by some simple analytical function f(s,Г)?
That means write something like x = f(s,Г) = 12*log(123.45*Г) + Г + 123.4^s .
(I need at least ideas or references.)
You can look at the code in Boost: http://www.boost.org/doc/libs/1_35_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/sf_gamma/igamma.html and see what they're using.
EDIT: They also have inverses: http://www.boost.org/doc/libs/1_35_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/sf_gamma/igamma_inv.html
I've found out that x = f(s,Г) with given s can be nicely approximated by x = p0*(1-Г)^p1*ln(Г*p2). At least it worked for me with s <= 15 in region 0.001 < Г < 0.999.
Here p0,p1,p2 - is constants, which are chosen by approximation of f(s,Г) after you have chosen s.
There's a pretty good implementation in Cephes. There's also a D translation that I think fixes a few bugs in the Cephes version.