I'm trying to predict the gender for some samples that have missing values. I'm doing that using the gene expression.
So first I train the logistic regression model using the samples that don't have missing values (that have the value male or female). This is the data I'm using (called mydata) to train the model, and here in the Gender feature, the 1 are males and 0 are females:
structure(list(CA5B = c(30.8594477594147, 30.8773853294407, 31.5109543268185,
29.852812443292, 31.9303544611987, 32.1541109784662, 32.6520127984013,
32.9726252284503, 31.4152036112846, 32.6206677736732), DDX3X = c(35.25792,
35.17134, 36.28966, 36.08013, 36.2734, 35.60448, 36.01073, 36.28618,
35.42917, 35.85764), EIF1AX = c(32.12871, 31.99721, 33.5218,
34.90091, 33.33981, 33.07818, 32.95223, 34.47241, 31.50087, 32.53821
), VAX2 = c(26.0371, 23.2217, 19.53356, 23.92908, 22.51166, 22.45692,
23.62209, 19.53356, 19.53356, 19.53356), KLRC1 = c(30.35354,
28.63985, 25.67501, 26.18108, 30.0377, 29.63008, 25.20041, 28.79883,
30.04889, 31.12243), KLRC2 = c(30.69315, 29.72534, 23.88161,
28.60153, 30.28375, 28.74612, 24.03185, 25.71121, 28.1028, 30.75633
), ARSD = c(31.6010966942421, 31.2081406187661, 32.525989520392,
33.4006989772133, 31.8554455039159, 32.3438989185126, 32.103684088194,
32.2785447752453, 32.028984695614, 31.5829276898759), DDX43 = c(29.90975,
28.0152, 26.15494, 25.70774, 26.4806, 27.44477, 30.52285, 31.97889,
31.50345, 26.90941), RPS4Y1 = c(35.94301, 36.79795, 38.03506,
26.53381, 29.87951, 37.13222, 35.91265, 26.53172, 35.37051, 37.71164
), TRAPPC2 = c(31.73251, 32.12647, 32.91964, 33.16043, 32.28315,
33.24194, 31.20461, 31.56589, 32.482, 34.21314), SNCG = c(28.78017,
33.80945, 31.28264, 35.49992, 31.63203, 29.34577, 29.78785, 30.73165,
29.9412, 26.04425), KDM6A = c(34.19294, 34.71109, 33.94433, 34.64027,
34.93768, 34.25181, 34.2198, 34.88605, 33.38825, 34.8068), ZFX = c(33.84244,
34.04817, 33.83408, 34.90102, 34.77175, 33.54326, 34.39611, 34.50292,
33.27768, 33.87074), PNPLA4 = c(31.15101, 31.32295, 33.38545,
34.34879, 30.98438, 32.77684, 31.26002, 32.36503, 31.15222, 32.12835
), KDM5C = c(33.6612, 34.3589, 33.50819, 34.56994, 34.46354,
33.27832, 34.10299, 34.48084, 34.4775, 34.5186), SMC1A = c(34.18368,
33.39101, 34.2632, 34.28327, 34.15166, 33.94223, 34.71688, 34.61705,
33.99106, 33.76364), DDX3Y = c(34.14224, 34.8835, 34.7245, 26.66744,
29.06797, 34.71189, 33.96947, 26.66531, 34.68055, 34.48187),
SYAP1 = c(32.03834, 32.42337, 32.51431, 33.51916, 32.82407,
32.4735, 32.49154, 33.51064, 31.29551, 31.83166), Gender = c(1,
1, 1, 0, 0, 1, 1, 1, 1, 1)), row.names = c("EA595454", "EA595500",
"EA595522", "EA595529", "EA595597", "EA595624", "EA595632", "EA595635",
"EA595647", "EA595654"), class = "data.frame")
Code:
split = sample.split(mydata, SplitRatio = 0.8)
train_reg = subset(mydata, split == "TRUE")
test_reg = subset(mydata, split == "FALSE")
logistic_model = glm(Gender~., data = train_reg, family = binomial)
predict_reg = predict(logistic_model, test_reg, type = "response")
predict_reg = ifelse(predict_reg >0.5, 1, 0)
This produced AUC of 0.75 (on the test set). Not bad.
Then I take only the samples with the missing values of gender, and predict if they are male or female using the model.
pred = predict(logistic_model,mydata_NA_samples)
This is some of the results I get:
Pt1 Pt10 Pt101 Pt103 Pt106 Pt11 Pt17 Pt18
1548291146811975 -443770882316732 100625892356271 420508521495519 1756507132742650 -883868739619674 -262910227380331 2442533193074350
Pt2 Pt24 Pt26 Pt27 Pt28 Pt29 Pt3 Pt30
569411355627798 1699537030844227 -703783585812457 3495433064250008 -399805416449645 -339035064434972 2024260475793067 109885153661113
Pt31 Pt34 Pt36 Pt37 Pt38 Pt39 Pt4 Pt44
-367070086585505 1330361581729001 1740587250736183 3489930082447853 -976790159879838 1751865170092986 -283113980482947 1902539723154004
Pt46 Pt47 Pt48 Pt49 Pt5 Pt52 Pt59 Pt62
1412716353779596 1108256151592894 1074657527777400 -113959545517722 109187189819909 -57895108035064 792635620314 255566834903770
Pt65 Pt66 Pt67 Pt72 Pt77 Pt78 Pt79 Pt8
-46167159563698 -346701109064255 51185327645114 -795349064523229 244860086302444 4635500642717655 926236606202554 645399266579567
Pt82 Pt84 Pt85 Pt89 Pt9 Pt90 Pt92 Pt94
-651113408988261 -641572344400162 -594901636707441 1514453985992888 -227744411687312 166300730517187 2842003327373200 2502780813663413
I mean, what is this? I'm supposed to get 0 or 1, and maybe some very small number that is close to 0, but this is very strange. I should mention that mydata and mydata_NA_samples have the exact same features, but of course just different samples. How could this happen in logistic regression, which in the first place should only return a binary result?
Thanks!
You forgot to add type="response" in the second predict call.
I'm trying to fit one Neural ODE to a time series usind Julia's DiffEqFlux. Here my code:
u0 = Float32[2.;0]
train_size = 15
tspan_train = (0.0f0,0.75f0)
function trueODEfunc(du,u,p,t)
true_A = [-0.1 2.0; -2.0 -0.1]
du .= ((u.^3)'true_A)'
end
t_train = range(tspan_train[1],tspan_train[2],length = train_size)
prob = ODEProblem(trueODEfunc, u0, tspan_train)
ode_data_train = Array(solve(prob, Tsit5(),saveat=t_train))
dudt = Chain(
Dense(2,50,tanh),
Dense(50,2))
ps = Flux.params(dudt)
n_ode = NeuralODE(dudt, tspan_train, Tsit5(), saveat = t_train, reltol=1e-7, abstol=1e-9)
**n_ode.p**
function predict_n_ode(p)
n_ode(u0,p)
end
function loss_n_ode(p)
pred = predict_n_ode(p)
loss = sum(abs2, ode_data_train .- pred)
loss,pred
end
final_p = []
losses = []
cb = function(p,l,pred)
display(l)
display(p)
push!(final_p, p)
push!(losses,l)
pl = scatter(t_train, ode_data_train[1,:],label="data")
scatter!(pl,t_train,pred[1,:],label="prediction")
display(plot(pl))
end
DiffEqFlux.sciml_train!(loss_n_ode, n_ode.p, ADAM(0.05), cb = cb, maxiters = 100)
**n_ode.p**
The problem is that calling n_ode.p (or Flux.params(dudt)) before and after the train function gives me back the save values. I would have expected to receive the latest updated values from the training. That's why I've created an array to gather all parameter values during the training and then access it to get the updated parameters.
Am I doing something wrong in the code? Does the train function automatically update the parameters? If not how to enforce it?
Thanks in advance!
The result is an object that holds the best parameters. Here's a complete example:
using DiffEqFlux, OrdinaryDiffEq, Flux, Optim, Plots
u0 = Float32[2.; 0.]
datasize = 30
tspan = (0.0f0,1.5f0)
function trueODEfunc(du,u,p,t)
true_A = [-0.1 2.0; -2.0 -0.1]
du .= ((u.^3)'true_A)'
end
t = range(tspan[1],tspan[2],length=datasize)
prob = ODEProblem(trueODEfunc,u0,tspan)
ode_data = Array(solve(prob,Tsit5(),saveat=t))
dudt2 = FastChain((x,p) -> x.^3,
FastDense(2,50,tanh),
FastDense(50,2))
n_ode = NeuralODE(dudt2,tspan,Tsit5(),saveat=t)
function predict_n_ode(p)
n_ode(u0,p)
end
function loss_n_ode(p)
pred = predict_n_ode(p)
loss = sum(abs2,ode_data .- pred)
loss,pred
end
loss_n_ode(n_ode.p) # n_ode.p stores the initial parameters of the neural ODE
cb = function (p,l,pred;doplot=false) #callback function to observe training
display(l)
# plot current prediction against data
if doplot
pl = scatter(t,ode_data[1,:],label="data")
scatter!(pl,t,pred[1,:],label="prediction")
display(plot(pl))
end
return false
end
# Display the ODE with the initial parameter values.
cb(n_ode.p,loss_n_ode(n_ode.p)...)
res1 = DiffEqFlux.sciml_train(loss_n_ode, n_ode.p, ADAM(0.05), cb = cb, maxiters = 300)
cb(res1.minimizer,loss_n_ode(res1.minimizer)...;doplot=true)
res2 = DiffEqFlux.sciml_train(loss_n_ode, res1.minimizer, LBFGS(), cb = cb)
cb(res2.minimizer,loss_n_ode(res2.minimizer)...;doplot=true)
# result is res2 as an Optim.jl object
# res2.minimizer are the best parameters
# res2.minimum is the best loss
At the end, the sciml_train function returns a result object that holds information about the optimization, including the final parameters as .minimizer.
Using R, it is very easy to approximate basic functions through a neural network:
library(nnet)
x <- sort(10*runif(50))
y <- sin(x)
nn <- nnet(x, y, size=4, maxit=10000, linout=TRUE, abstol=1.0e-8, reltol = 1.0e-9, Wts = seq(0, 1, by=1/12) )
plot(x, y)
x1 <- seq(0, 10, by=0.1)
lines(x1, predict(nn, data.frame(x=x1)), col="green")
predict( nn , data.frame(x=pi/2) )
A simple neural network with one hidden layer of a mere 4 neurons is sufficient to approximate a sine. (As per stackoverflow question Approximating function with Neural Network.)
But I cannot obtain the same in PyTorch.
In fact, the neural network created by R contains not only an input, four hidden and an output, but also two "bias" neurons - the first connected towards the hidden layer, the second towards the output.
The plot above is obtained through the following:
library(devtools)
library(scales)
library(reshape)
source_url('https://gist.github.com/fawda123/7471137/raw/cd6e6a0b0bdb4e065c597e52165e5ac887f5fe95/nnet_plot_update.r')
plot.nnet(nn$wts,struct=nn$n, pos.col='#007700',neg.col='#FF7777') ### this plots the graph
plot.nnet(nn$wts,struct=nn$n, pos.col='#007700',neg.col='#FF7777', wts.only=1) ### this prints the weights
Attempting the same with PyTorch produces a different network: the bias neurons are missing.
Following is an attempt to do in PyTorch what was done previously in R. The results will not be satisfactory: the function is not approximated. The most evident difference is that absence of the bias neurons.
import torch
from torch.autograd import Variable
import random
import math
N, D_in, H, D_out = 1000, 1, 4, 1
l_x = []
l_y = []
for a in range(1000):
r = random.random()*10
l_x.append( [r] )
l_y.append( [math.sin(r)] )
tx = torch.cuda.FloatTensor(l_x)
ty = torch.cuda.FloatTensor(l_y)
x = Variable(tx, requires_grad=False)
y = Variable(ty, requires_grad=False)
w1 = Variable(torch.randn(D_in, H ).type(torch.cuda.FloatTensor), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(torch.cuda.FloatTensor), requires_grad=True)
learning_rate = 1e-5
for t in range(1000):
y_pred = x.mm(w1).clamp(min=0).mm(w2)
loss = (y_pred - y).pow(2).sum()
if t<10 or t%100==1: print(t, loss.data[0])
loss.backward()
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
w1.grad.data.zero_()
w2.grad.data.zero_()
t = [ [math.pi] ]
print( str(t) +" -> "+ str( (Variable(torch.cuda.FloatTensor( t ))).mm(w1).clamp(min=0).mm(w2).data ) )
t = [ [math.pi/2] ]
print( str(t) +" -> "+ str( (Variable(torch.cuda.FloatTensor( t ))).mm(w1).clamp(min=0).mm(w2).data ) )
How to make the network approximate to the given function (sine in this case), through either inserting the "bias" neurons or other missing detail?
Moreover: I have difficulties in understanding why R inserts the "bias". I found information that the bias could be akin to the "Intercept in a Regression Model" - I still find it not clear. Any information would be appreciated.
EDIT: an excellent explanation turned out to be at stackoverflow question Role of Bias in Neural Networks
EDIT:
An example to obtain the result, though using the "fuller" framework ("not reinventing the wheel") is as follows:
import torch
from torch.autograd import Variable
import torch.nn.functional as F
import math
N, D_in, H, D_out = 1000, 1, 4, 1
l_x = []
l_y = []
for a in range(1000):
t = (a/1000.0)*10
l_x.append( [t] )
l_y.append( [math.sin(t)] )
x = Variable( torch.FloatTensor(l_x) )
y = Variable( torch.FloatTensor(l_y) )
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.to_hidden = torch.nn.Linear(n_feature, n_hidden)
self.to_output = torch.nn.Linear(n_hidden, n_output)
def forward(self, x):
x = self.to_hidden(x)
x = F.tanh(x) # activation function
x = self.to_output(x)
return x
net = Net(n_feature = D_in, n_hidden = H, n_output = D_out)
learning_rate = 0.01
optimizer = torch.optim.Adam( net.parameters() , lr=learning_rate )
for t in range(1000):
y_pred = net(x)
loss = (y_pred - y).pow(2).sum()
if t<10 or t%100==1: print(t, loss.data[0])
loss.backward()
optimizer.step()
optimizer.zero_grad()
t = [ [math.pi] ]
print( str(t) +" -> "+ str( net( Variable(torch.FloatTensor( t )) ) ) )
t = [ [math.pi/2] ]
print( str(t) +" -> "+ str( net( Variable(torch.FloatTensor( t )) ) ) )
Unfortunately, while this code works properly, it does not solve the matter of making the original, more "low level" code work as expected (e.g. introducing the bias).
Following up on #jdhao's comment - this is a super-simple PyTorch model that computes exactly what you want:
class LinearWithInputBias(nn.Linear):
def __init__(self, in_features, out_features, out_bias=True, in_bias=True):
nn.Linear.__init__(self, in_features, out_features, out_bias)
if in_bias:
in_bias = torch.zeros(1, out_features)
# in_bias.normal_() # if you want it to be randomly initialized
self._out_bias = nn.Parameter(in_bias)
def forward(self, x):
out = nn.Linear.forward(self, x)
try:
out = out + self._out_bias
except AttributeError:
pass
return out
However, there's an additional bug in your code: from what I can see, you don't train it - i.e. you do not call an optimizer (like torch.optim.SGD(mod.parameters()) before you delete the gradient information by calling grad.data.zero_().