How to solve out of memory error? - out-of-memory

I am doing my project in OCR.For this i am using image size of 64x64 because when i tried 32x32 etc some pixels is lost.I have tried features such as zonal density, Zernike's moments,Projection Histogram,distance profile,Crossing .The main problem is feature vector size is too big .I have take the combination of above features and tried.But whenever i train the neural network ,i have got an error "out of memory". I have tried PCA dimensionality reduction but its not work good.i didn't get efficiency during training.Run the code in my PC and laptop.In both of them i have got same error.my RAM is 2GB.so i think about reducing the size of an image.is there any solution to solve this problem.
I have one more problem whenever i tried to train the neural network using same features result is varied.how to solve this also?

It's not about the size of the image. A 64*64 image is sure not to blow your RAM. There must be bugs in your Neuron Network or other algorithms.
And please paste more details about your implementation. We don't even know what language you are using.

Related

Architecture of regression with Convolution Neural Network

I would like to use AlexNet architecture to solve a regression problem, which is initially used to classification tasks.
Furthermore, for learning step I want to include a parameter to batch size.
So I have several question :
What do I need to change in the network architecture to achieve a regression ? Precisely in the last layer, the loss function or other things.
If I use a batch size of 5, what is the output size in the last layer ?
Thanks !
It would be helpful to share:
Q Framework: Which deep learning framework you are working with and/or share specific piece of code that you need help modifying
A: eg. TensorFlow, PyTorch, Keras etc.
Q Type of Loss, Output size: What is the task you are trying to achieve with regression? This would impact the kind of loss you want to use, the output dimension, fine-tuning the VGGnet etc.
A: eg. Auto-colorization of grayscale images (here is an example) is an example of a regression task, where you would try to regress the RGB channel pixel values from a monochrome image. You may have an L2 loss (or some other loss for improved performance). The output size should be independent of the batch size, it would be determined by the dimension of the output from the final layer (i.e. the prediction op). The batch size is a training parameter that you can change without having to alter the model architecture or output dimensions.

OCR tables in R, tesseract and pre-pocessing images

I am trying to extract tables from old books using tesseract in R. Here is an example: Image
The quality of the image is quite poor and the recognition rate was quite bad at first. However, I managed to increase it with gimp: Rescaling, grey scale, auto threshold for colours, Gaussian blur and/or sharpen filters.
I also gave a shot to Fred's imageMagick scripts - textcleaner - and used imageMagick to successfully remove the black lines.
This is what I'm doing in R:
library(tesseract)
library(magick)
img <- image_read('img.png')
img_data <- ocr(img, engine = tesseract('eng', options = list(tessedit_char_whitelist = '0123456789.-',
tessedit_pageseg_mode = 'auto',
textord_tabfind_find_tables = '1',
textord_tablefind_recognize_tables = '1')))
cat(img_data)
Given that I only want to deal with digits, I set tessedit_char_whitelist and, while I get better results, they are still not reliable.
What would you do in this case? Any other thoughts to improve accuracy before I try to train tesseract? I've never done it - let alone with digits only. Any idea/experience on how to do it? I've checked this out: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 but I'm still a bit baffled.
I worked on a project that used Tesseract to read data fields off of video frames and create an indexed spreadsheet from them. What I found to work well was to crop each text field (using ffmpeg) out each image, process (with ImageMagick, using similar techniques you mentioned), OCR, and then I had Python (something similar could be done in R) create a spreadsheet from the OCR results. The benefit of this method is that Tesseract only has to deal with small, single line text images, which in my case seemed to improve results (with the -psm 7 option). The downside is it's quite processing intensive. Perhaps creating an image for each line of the page would help.
I did find that training Tesseract for a new font/language helped my results immensely. It can be tedious and time consuming, but it significantly improved my results, sometimes going from 0% correct to 100% correct. This site helped me understand the process. I just followed their steps and it worked, sure enough. From my experience in creating training images, it helped a lot to crop out single characters, with about at least a dozen of each character to create a good training sample. And try to have a similar number of samples for each character; it seemed like if you did many many more of one character Tesseract would give that character as a result (incorrectly) more often.

Stochastic Gradient Descent design matrix too big for R

I'm trying to implement a baseline prediction model of movie ratings (akin to the various baseline models from the NetFlix prize), with parameters learned via stochastic gradient descent. However, because both explanatory variables are categorical (users and movies), the design matrix is really big, and cannot fit into my RAM.
I thought that the sgd package would automagically find its way around this issue (since it's designed for large amounts of data), but that does not seem to be the case.
Does anyone know a way around this? Maybe a way to build the design matrix as a sparse matrix.
Cheers,
You can try to use Matrix::sparseMatrix to create a triplet that will describe the matrix in a more efficient way.
You can also try to export your problem on Amazon EC2 and use and instance with more RAM or configure cluster to create mapped reduced job.
Check out the xgboost Package https://github.com/dmlc/xgboost and their documentation to understand how to deal with memory problems.
This is also a more practical tutorial: https://cran.r-project.org/web/packages/xgboost/vignettes/discoverYourData.html

Extracting feature vector from Images Tensorflow OOM

I have used pretrained network weights that I have downloaded from Caffe zoo to build a feature extractor (VGG-16) in tensorflow.
I have therefore redefined the architecture of the network in TF with the imported weights as constants and added an extra fully connected layer with tf.Variables to train a linear SVM by SGD on Hinge loss cost.
My initial training set is composed of 100000 32x32x3 images in the form of a numpy array.
I therefore had to resize them to 224x224x3 which is the input size of VGG but that does not fit into memory.
So I removed unnecessary examples and narrowed it down to 10000x224x224x3 images which is awful but still acceptable as only support vectors are important but even then I still get OOM with TF while training.
That should not be the case as the only important representation is the one from penultimate layer of size 4096 which is easily manageable and the weights to backprop on are of size only (4096+1bias).
So what I can do is first transform all my images to features with TF network with only constants to form a 10000x4096 dataset and then train a second tensorflow model.
Or at each batch recalculate all features for the batch. In the next_batch method. Or use the panoply of buffers/queue runners that TF provides but it is a bit scary as I am not really familiar with those.
I do not like those method I think there should be something more elegant (without too much queues if possible).
What would be the most Tensorflow-ic method to deal with this ?
If I understand your question correctly, 100K images are not fitting in memory at all, while 10K images do fit in memory, but then the network itself OOMs. That sounds very reasonable, because 10K images alone, assuming they are represented using 4 bytes per pixel per channel, occupy 5.6GiB of space (or 1.4GiB if you somehow only spend 1 byte per pixel per channel), so even if the dataset happens to fit in memory, as you add your model, that will occupy couple more GiBs, you will OOM.
Now, there are several ways you can address it:
You should train using minibatches (if you do not already). With a minibatch if size 512 you will load significantly less data to the GPU. With minibatches you also do not need to load your entire dataset into a numpy array at the beginning. Build your iterator in a way that will load 512 images at a time, run forward and backward pass (sess.run(train...)), load next 512 images etc. This way at no point you will need to have 10K or 100K images in memory simultaneously.
It also appears to be very wasteful to upscale images, when your original images are so much smaller. What you might consider doing is taking convolution layers from VGG net (dimensions of conv layers do not depend on dimensions of the original images), and train the fully connected layers on top of them from scratch. To do that just trim the VGG net after the flatten layer, run it for all the images you have and produce the output of the flatten layer for each image, then train a three layer fully connected network on those features (this will be relatively fast compared to training the entire conv network), and plug the resulting net after the flatten layer of the original VGG net. This might also produce better results, because the convolution layers are trained to find features in the original size images, not blurry upscaled images.
I guess a way to do that using some queues and threads but not too much would be to save the training set into a tensorflow protobuf format (or several) using tf.python_io.TFRecordWriter.
Then creating a method to read and decode a single example from the protobuf and finally use tf.train.shuffle_batch to feed BATCH_SIZE examples to the optimizer using the former method.
This way there is only a maximum of capacity (defined in shuffle_batch) tensors in the memory at the same time.
This awesome tutorial from Indico explains it all.

R becomes unresponsive while running randomforest on huge data. Does this mean it is still running or it has stopped working?

My data contains 229907 rows and 200 columns. I am training randomforest on it. I know it will take time. But do not know how much. While running randomforest on this data, R becomes unresponsive. "R Console (64 Bit) (Not Responding)". I just want to know what does it mean? Is R still working or it has stopped working and I should close it and start again?
It's common for RGui to be unresponsive during a long calculation. If you wait long enough, it will usually come back.
The running time won't scale linearly with your data size. With the default parameters, more data means both more observations to process and more nodes per tree. Try building some small forests with ntree=1, different values of the maxnodes parameter and different amounts of data, to get a feel for how long it should take. Have the Windows task manager or similar open at the same time so that you can monitor CPU and RAM usage.
Another thing you can try is making some small forests (small values of ntree) and then using the combine function to make a big forest.
You should check your CPU usage and memory usage. If the CPU is still showing a high usage with the R process, R is probably still going strong.
Consider switching to R 32 bit. For some reason, it seems more stable for me - even when my system is perfectly capable of 64 bit support.

Resources