I'm trying to generate prediction of the column "dubina" using this algorithm made in R's "neuralnet" package. But I keep getting non-reliable neural-net output. I have tried changing the number of hidden layers, normalizing and denormalizing data. Is there a mistake in the algorithm, maybe because of the activation function being logistic, not sigmoid?
My dataset:
http://www26.zippyshare.com/v/Nc2Sbuny/file.html
Here is the algorithm:
#Data input
podatci<-read.csv("rs5.csv", sep=";", header=T)
#Division into train and test data
smp_size <- floor(0.75 * nrow(podatci))
smp_size
train_ind <- sample(seq_len(nrow(podatci)), size = smp_size)
train <- podatci[train_ind, ]
test <- podatci[-train_ind, ]
train
#Normalization
train_normal<-(train-min(train))/(max(train)-min(train))
test_normal<-(test-min(test))/(max(test)-min(test))
#Training of the network
install.packages('neuralnet')
library("neuralnet")
n <- names(train_normal)
f <- as.formula(paste("dubina ~", paste(n[!n %in% "dubina"], collapse = " + ")))
net_tr <- neuralnet(f,data=train_normal,hidden=c(3,2),linear.output=T)
plot(net_tr)
#TESTing of the network
pr.nn<-compute(net_tr, test_normal[,1:2])
ls(pr.nn)
print(pr.nn$net.result)
#Printing
cleanoutput <- cbind(test_normal, as.data.frame(pr.nn$net.result))
colnames(cleanoutput) <- c("x","y","dubina","Neural Net Output")
print(cleanoutput)
#Denormalization of output data
denorm<-(min(test) + cleanoutput* (max(test)-min(test)))
denorm
Test sample (part of it, there are 743 data which are split 75/25)
x y dubina
6 6451993 5057986 0
7 6451993 5055986 0
15 6447993 5063986 0
17 6447993 5059986 0
25 6445993 5059986 0
27 6445993 5055986 0
28 6443993 5073986 1980
29 6443993 5071986 1910
30 6443993 5069986 1700
Neural-net output:
x y dubina Neural Net Output
6 6451993 5057986 0 3180.43153834
7 6451993 5055986 0 3342.40866485
15 6447993 5063986 0 2564.23694019
17 6447993 5059986 0 2888.11216362
25 6445993 5059986 0 2822.94367093
27 6445993 5055986 0 3146.83988044
28 6443993 5073986 1980 1624.49331419
29 6443993 5071986 1910 1786.36349347
30 6443993 5069986 1700 1948.24298143
Related
I hope I don't have a big gap in education.
I need to get the final best alpha - learning rate of the model, but I can't manage to get the function right.
I have a data that looks something like this:
ID Turn_no p_mean t_mean
1 1 170 99
1 2 176 93
1 3 138 92
1 4 172 118
1 5 163 96
1 6 170 105
1 7 146 99
1 8 172 94
and so on...
I want to use the equation:
p(turn) = p(turn-1) + alpha[(p(turn-1) - t(turn-1)]
I'm pretty stuck on making a function and log-likelihood based on the Rescorla-Wagner model.
This is the function so far:
RWmodel = function(data, par) {
ll <- NA
alpha <- par[1]
ID <- data$ID
Turn_no <- data$Turn_no
p_mean<- data$p_mean
t_mean<- data$t_mean
num_reps <- length(df$Turn_no)
i <- 2
for (i in 2:num_reps) {
#calculate prediction error
PE <- p_mean[i-1] - t_mean[i-1]
#update p's value
p_mean[i] <- p_mean[i-1] + alpha*PE
}
#minus maximum log likelihood, use sum and log functions
ll <- -sum(log(??))
#return ll
ll
}`
I know I'm missing an important step in the function, I just can't figure out how to execute the log likelihood right in this situation.
I would like to know how to proceed with the following non linear regression analysis, which is a simplified version of my real problem.
5 Participants where asked to observe the speed of three different cars: Audis, VWs and Porsches over a ten second time frame. This gives me the following data set:
S_t_c <- read.table(text = "
time S_c_1 S_c_2 S_c_3
1 20 15 40
2 45 30 50
3 60 45 60
4 75 60 60
5 90 70 60
6 105 70 90
7 120 70 120
8 125 70 140
9 130 70 160
10 145 70 180
",header = T)
After observing the last 10 seconds, the 5 participants where then asked to guess how fast the car would go in t=11. This gives me this data:
S_11_i_c <-read.table(text = "
i c_1 c_2 c_3
1 150 70 190
2 155 70 200
3 150 75 195
4 160 80 190
5 150 75 180
",header = T)
I now want to execute a non linear regression to estimate the free parameters of the following model:
The indices stand for the following:
i= participant
c=car brand
s=time
My problems are the sums as well as the fact that I have to estimate the parameters based on three different observations sets (for each car one). So I do not know how to code sums into a regression and I have problems with the facts that my DVs are dependent on different time-series IVs. I would like to learn how to do this in R.
EDIT: Attempt at solving the problem.
What I managed to do so far is write w_s and Sum_S:
function (x) {
x = 0
for (j in 0:9) {
x <- x+ x^j
}
}
w_s = beta_2^s / function(beta_2)
Sum_S_t_c <- data.frame(
s = seq(1:9),
c_1 = rnorm(9)
c_2 = rnorm(9)
c_3 = rnorm(9)
)
Sum_S_t_c = 0
for (c in 2:4) {
for (s in 0:9) {
Sum_S_t_c[s,c] <- Sum_S_t_c + S_t_c[10-s, c]
Sum_S_t_c = Sum_S_t_c[s,c]
}
}
Now, I somehow need to fit these variables into a non-linear regression. This would be my dummy code for it:
For (c in 2:4) {
for (i in 1:5) {
for (s in 0:9) {
S_11_i_c ~ beta_0 + beta_1 * Sum_S_t_c[s,c] * beta_2^s / function(beta_2)
}
}
}
I also need to set an upper and lower limit for beta_2, which I do not know how to do. I also wonder, if it even possible to use a function within a regression?
Edit:
Should I possibly group the DV and IVS somehow? If so, is it possible to group variables of two different data tables together?
My program divides my dataset into train and test set, builds a decision tree based on the train and test set and calculates the accuracy, sensitivity and the specifity of the confusion matrix.
I added a for loop to rerun my program 100 times. This means I get 100 train and test sets. The output of the for loop is a result_df with columns of accuracy, specifity and sensitivity.
This is the for loop:
result_df<-matrix(ncol=3,nrow=100)
colnames(result_df)<-c("Acc","Sens","Spec")
for (g in 1:100 )
{
# Divide into Train and test set
smp_size <- floor(0.8 * nrow(mydata1))
train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
train <- mydata1[train_ind, ]
test <- mydata1[-train_ind, ]
REST OF MY CODE
}
My result_df (first 20 rows) looks like this:
> result_df[1:20,]
Acc Sens Spec id
1 26 22 29 1
2 10 49 11 2
3 37 43 36 3
4 4 79 4 4
5 21 21 20 5
6 31 17 34 6
7 57 4 63 7
8 33 3 39 8
9 56 42 59 9
10 65 88 63 10
11 6 31 7 11
12 57 44 62 12
13 25 10 27 13
14 32 24 32 14
15 19 8 19 15
16 27 27 29 16
17 38 89 33 17
18 54 32 56 18
19 35 62 33 19
20 37 6 40 20
I use ggplot() to plot the specifity and the sensitivity as a scatterplot:
What I want to do :
I want to see e.g. the train and test set of datapoint 17.
I think I can do this by using the set.seed function, but I am very unfamiliar with this function.
First, clearly, if in your code you store your estimate models, e.g., in a list, then you could recover your data from those models. However, it doesn't look like that's the case.
With your current code all you can do is to see that last train and test sets (number 100). That is because you keep redefining test, train, train_ind variables. The cheapest (in terms of memory) way to achieve what you want would be to somehow store train_ind from each iteration. For instance, you could use
train_inds <- list()[rep(1, 100)]
for (g in 1:100 )
{
smp_size <- floor(0.8 * nrow(mydata1))
train_inds[[g]] <- sample(seq_len(nrow(mydata1)), size = smp_size)
train <- mydata1[train_inds[[g]], ]
test <- mydata1[-train_ind[[g]], ]
# The rest
}
and in this way you would always know which observations were in which set. If you somehow are interested only in one specific iteration, you could save only that one.
Lastly, set.seed isn't really going to help here. If all you were doing was running rnorm(1) hundred times, then yes, by using set.seed you could quickly recover the n-th generated value later. In your case, however, you are not only using sample for train_ind; the model estimation functions are also very likely generating random values.
I would like to produce nested tables for a multilevel factorial experiment. I have 10 paints examined for time to reach an end point under 4 levels of humidity, 3 temperatures and 2 wind speeds. Of course I have searched on line but without success.
Some sample code can be generated using:
## Made Up Data # NB the data is continuous whereas observations were made 40/168 so data is censored.
time3 <- 4*seq(1:24) # Dependent: times in hrs, runif is not really representative but will do
wind <- c(1,2) # Independent: factor draught on or off
RH <- c(0,35,75,95) # Independent: value for RH but can be processes as a factor
temp <- c(5,11,20) # Independent: value for temperature but can be processed as a factor
paint <- c("paintA", "paintB", "paintC") # Independent: Experimental material
# Combine into dataframe
dfa <- data.frame(rep(temp,8))
dfa$RH <- rep(RH,6)
dfa$wind <- rep(wind,12)
dfa$time3 <- time3
dfa$paint <- rep(paint[1],24)
# Replicate for different paints
dfb <- dfa
dfb$paint <- paint[2]
dfc <- dfa
dfc$paint <- paint[3]
dfx <- do.call("rbind", list(dfa,dfb,dfc))
# Rename first col
colnames(dfx)[1] <- "temp"
# Prepare xtab tables
tx <- xtabs(dfx$time3 ~ dfx$wind + dfx$RH + dfx$temp + dfx$paint)
tx
And the target I hope to obtain would be like this xtab example
This
tx <- xtabs(dfx$time3 ~ dfx$wind + dfx$RH + dfx$temp)
does not work well enough. I would also like to write to C:\file.csv for printing and reporting etc. Please advise on how to achieve the desired output.
You can paste the two variables you want to nest together. Since the items will be ordered lexicographically, you will need to zero-pad the temp variable, to get numerical ordering.
xtabs(time3~wind+paste(sprintf("%02d",temp),RH,sep=":")+paint,dfx)
, , paint = paintA
paste(sprintf("%02d", temp), RH, sep = ":")
wind 05:0 05:35 05:75 05:95 11:0 11:35 11:75 11:95 20:0 20:35 20:75 20:95
1 56 0 104 0 88 0 136 0 120 0 72 0
2 0 128 0 80 0 64 0 112 0 96 0 144
, , paint = paintB
paste(sprintf("%02d", temp), RH, sep = ":")
wind 05:0 05:35 05:75 05:95 11:0 11:35 11:75 11:95 20:0 20:35 20:75 20:95
1 56 0 104 0 88 0 136 0 120 0 72 0
2 0 128 0 80 0 64 0 112 0 96 0 144
, , paint = paintC
paste(sprintf("%02d", temp), RH, sep = ":")
wind 05:0 05:35 05:75 05:95 11:0 11:35 11:75 11:95 20:0 20:35 20:75 20:95
1 56 0 104 0 88 0 136 0 120 0 72 0
2 0 128 0 80 0 64 0 112 0 96 0 144
I am trying to carry out diagnostics on the mixed effects logistic regression model below .
mod <- lmer(CEever ~ (1|SL)
+ birthWeightCat
+ AFno
+ FRAgeY*factor(genCat)
+ damGirBir
+ factor(YNSUPPLEM),
data=Data, family="binomial")
The data for this model is in the form:
head(data)
CalfID CEever birthWeightCat AFno FRAgeY damGirBir YNSUPPLEM
305 CA010110001 1 <20 2 48 140.0 1
306 CA010110002 1 21-25 1 45 144.0 0
307 CA010110004 0 21-25 1 47 151.5 0
308 CA010110005 0 <20 2 71 147.0 0
309 CA010110006 0 <20 1 57 141.5 1
310 CA010110007 0 <20 1 53 141.5 1
I can plot the residuals:
res <- resid(mod)
plot(res)
.... but can't get values for leverage or Cook's Distance and Dfbeta.
Firstly are these useful techniques for use with this model type, and then if so what code have people used to get these values.
Have a look at the influence.ME package at CRAN.
alt.est <- influence(modJ, group = "SL")
will produce an estex object from which you may derive dfbetas, cooks d, etc.
alt.est.cooks <- cooks.distance(alt.est)
alt.est.dfB <- dbetas(alt.est)