I have a very large data set with around 700000 lines and 26 predictor variables. My neural netwrok cannot run it all because the default max weights is set to 100. I am trying to use the MaxNWts to set it at 10000 but this gives me the following error. What can I do so it runs, even if it takes a bit longer.
trainingData=sample(1:nrow(data),0.7*nrow(data))
predictors=c(1:21,23:27)
myNet=nnet.formula(data[,22]~ data$ï..PatientMRN+data$IsNewToProvider+data$IsOverbooked+data$IsOverride+data$AppointmentDateandTime+data$VisitType+data$DayofWeek+data$DepartmentSpecialty+data$LeadDays,data=data, subset=trainingData, size=7, MaxNWts = 10000)
Forgot to include this is the error message i get
Error in nnet.default(x, y, w, ...) :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In nnet.default(x, y, w, ...) : NAs introduced by coercion
Instead of trying to change the max weight, you should first normalize column-wise your data (before splitting into train-test!) so that the maximal value of each table is set to 1.0
since then all values of the table will be 0 <= value <= 1.0
your weights won't be huge.
Related
I'm trying to compute the covariance matrix of a very large image data matrix. I have tried both
cov(data)
and
data %*% t(data)/ (nrow(t(data))-1)
and ended up with a matrix of NaN values which makes absolutely no sense. The size of the covariance matrix is correct but why the values are all NaN does not. If I try
cov(data)
and
t(data) %*% data/ (nrow(data)-1)
I get an error message saying
Error: cannot allocate vector of size ...
I have also tried using the bigcor() but I get this error every time:
Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 0 and .Machine$integer.max") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In ff(vmode = "double", dim = c(NCOL, NCOL)) :
NAs introduced by coercion to integer range
Any idea of what could be causing this and how to fix it?
I'm following this tutorial:
https://rpubs.com/dherrero12/543854
I am trying to impute missing values but am coming up with a system is computationally singular error. Hence, I am trying to drop collinear variables.
I tried the following code:
indexesToDrop <- findCorrelation(cor(df_before, use = "pairwise.complete.obs"), cutoff = 0.85)
Which producers the following error:
Error in findCorrelation_fast(x = x, cutoff = cutoff, verbose = verbose) :
The correlation matrix has some missing values.
In addition: Warning message:
In cor(as.matrix(df_before), use = "pairwise.complete.obs") :
the standard deviation is zero
I am trying to fit a panel spatial model in R using the package spml. I first define the NxN weighting matrix as follows
neib <- dnearneigh(coordinates(coord), 0, 50, longlat = TRUE)
dlist <- nbdists(neib, coordinates(coord))
idlist <- lapply(dlist, function(x) 1/x)
w50 <- nb2listw(neib,zero.policy=TRUE, glist=idlist, style="W")
Thus I define two observations to be neighbours if they are distant within a range of 50km at most. The weights attached to each pairs of neighbour observations correspond to the inverse of their distance, so that closer neighbours receive higher weights. I also use the option zero.policy=TRUE so that observations which do not have neighbours are associated with a vector of zero weights.
Once I do this I try to fit the panel spatial model in the following way
mod <- spml(y ~ x , data = data_p, listw = w50, na.action = na.fail, lag = F, spatial.error = "b", model = "within", effect = "twoways" ,zero.policy=TRUE)
but I get the following error and warning messages
Error in lag.listw(listw, u) : Variable contains non-finite values In
addition: There were 50 or more warnings (use warnings() to see the
first 50)
Warning messages: 1: In mean.default(X[[i]], ...) : argument is not
numeric or logical: returning NA
...
50: In mean.default(X[[i]], ...) : argument is not numeric or
logical: returning NA
I believe this to be related to the non-neighbour observations. Can please anyone help me with this? Is there any way to deal with non-neighbour observations besides the zero.policy option?
Many many thanks for helping me.
You should check two things:
1) Make sure that the weight matrix is row-normalized.
2) Treat properly if you have any NA values in the dataset and as well in the W matrix.
I am trying to apply logistf() on a dataframe (dim:11359x139). All the variables are binary. I get the following message:
"Error in logistf.fit(x = x, y = y, weight = weight, offset = offset, firth, :
no memory available".
Even if I take into consideration only 20 rows and 139 predictors of the dataframe I get the same. Is it a hardware issue or my fault?
I am trying to fit a neural network to predict if a transaction should be flagged and I have a large sample from my data (50,000+ Rows by 211 variables with no blanks, NAs, etc due to preprocessing and only sampling complete data). I am trying to fit both a NN on the data and another NN after running PCA. The variable I want to predict is in column 23 Here is my code:
apply(Train,2,function(x) sum(is.na(x)))
#returns 0 for all columns
NN=nnet(Train[,-23],Train[,23], softmax = TRUE)
# Error in nnet.default(x, y, ...) : missing values in 'x'
PCANN=pcaNNet(CBdata_Train2_IT[,-23],CBdata_Train2_IT[,23])
# Error in nnet.default(x, y, ...) : missing values in 'x'
I can't for the life of me figure out why and have been debugging and it seems related to the nnet.default function call...