How can I use half floats with CoreML neural nets? - coreml

I have converted a Keras model to CoreML. I want to make sure that CoreML uses half floats and not full floats for all its textures. How can I do this?
Updated:
How can I make sure that the output of the network is half-float or at least Float (or any other type) and not Double?

You don't have to do anything for this if you run the CoreML model on the GPU. MPS (Metal Performance Shaders) will use half floats automatically. You can see this if you run a GPU Frame Capture on your CoreML model.
I wrote a blog post about how CoreML works under the hood, that actually demonstrates it uses half floats: http://machinethink.net/blog/peek-inside-coreml/

Related

Performance drop when converting Mask RCNN to uff format

My goal is to deploy a Mask RCNN model trained with the well known Matterport's repo with Nvidia deepstream.
To do so, first I have to convert the generated .h5 model into a .uff. This operation is decribed here.
After the conversion, I have run the generated .uff model with TensoRT and deepstream and it has a very poor performance compared to the .h5model (almost never detects/masks the objects).
Before the conversion, I have done the corresponding changes to handle NCWH models and configured the number of classes and backbone (in this case resnet50).
I don't know how to continue. Any advice could really healp me. Thanks!
To solve the problem one must use the same configuration for the training and the conversion.
In particular, since most of models start from tranfering learning from the pretrained coco model, one has to use its very same config.
In adition, the input images sizes have to be coherent with the trainning configuration.

Keras & R: cropping center of layer output

I want to use the U-Net architecture for an image segmentation task. To reduce edge effects, I thought about using U-Net without zero-padding, same as the creators of U-Net did. But everywhere I looked, I did not find an implementation of U-Net without padding. There is even a package called "unet" in R (link to the package on github), but it does not allow for padding = "valid". I guess the reason is the resulting necessity to "crop" the images before the concatenation (as can be seen in this picture). How could I implement this in Keras with R? I am still a newbie to Keras and R and would appreciate any help.

StyleGAN how to generate B image using A source image

I am studying the StyleGAN. It is new for me, and I could not understand mix style of generating images.
In this image showed, that using A created B images. How can I do that, If I want to use A source image not from training data?
In that example image, the A source images are not training data. They are generated images, of people who do not exist. The trained network (which is just the generator part) does not take any images as input, it only takes a random 512-dimensional vector (latent).
Thus, it is impossible to do what you ask using just the StyleGAN. You would need some way to reduce an input image to a latent vector, which is hard to do and isn't guaranteed to give reasonable results anyway.
The followup paper, StyleGAN2 (https://github.com/NVlabs/stylegan2) has an architecture where it is slightly easier to try to find a matching latent for an input image, and they even talk a bit about how to do it.

Extracting feature vector from Images Tensorflow OOM

I have used pretrained network weights that I have downloaded from Caffe zoo to build a feature extractor (VGG-16) in tensorflow.
I have therefore redefined the architecture of the network in TF with the imported weights as constants and added an extra fully connected layer with tf.Variables to train a linear SVM by SGD on Hinge loss cost.
My initial training set is composed of 100000 32x32x3 images in the form of a numpy array.
I therefore had to resize them to 224x224x3 which is the input size of VGG but that does not fit into memory.
So I removed unnecessary examples and narrowed it down to 10000x224x224x3 images which is awful but still acceptable as only support vectors are important but even then I still get OOM with TF while training.
That should not be the case as the only important representation is the one from penultimate layer of size 4096 which is easily manageable and the weights to backprop on are of size only (4096+1bias).
So what I can do is first transform all my images to features with TF network with only constants to form a 10000x4096 dataset and then train a second tensorflow model.
Or at each batch recalculate all features for the batch. In the next_batch method. Or use the panoply of buffers/queue runners that TF provides but it is a bit scary as I am not really familiar with those.
I do not like those method I think there should be something more elegant (without too much queues if possible).
What would be the most Tensorflow-ic method to deal with this ?
If I understand your question correctly, 100K images are not fitting in memory at all, while 10K images do fit in memory, but then the network itself OOMs. That sounds very reasonable, because 10K images alone, assuming they are represented using 4 bytes per pixel per channel, occupy 5.6GiB of space (or 1.4GiB if you somehow only spend 1 byte per pixel per channel), so even if the dataset happens to fit in memory, as you add your model, that will occupy couple more GiBs, you will OOM.
Now, there are several ways you can address it:
You should train using minibatches (if you do not already). With a minibatch if size 512 you will load significantly less data to the GPU. With minibatches you also do not need to load your entire dataset into a numpy array at the beginning. Build your iterator in a way that will load 512 images at a time, run forward and backward pass (sess.run(train...)), load next 512 images etc. This way at no point you will need to have 10K or 100K images in memory simultaneously.
It also appears to be very wasteful to upscale images, when your original images are so much smaller. What you might consider doing is taking convolution layers from VGG net (dimensions of conv layers do not depend on dimensions of the original images), and train the fully connected layers on top of them from scratch. To do that just trim the VGG net after the flatten layer, run it for all the images you have and produce the output of the flatten layer for each image, then train a three layer fully connected network on those features (this will be relatively fast compared to training the entire conv network), and plug the resulting net after the flatten layer of the original VGG net. This might also produce better results, because the convolution layers are trained to find features in the original size images, not blurry upscaled images.
I guess a way to do that using some queues and threads but not too much would be to save the training set into a tensorflow protobuf format (or several) using tf.python_io.TFRecordWriter.
Then creating a method to read and decode a single example from the protobuf and finally use tf.train.shuffle_batch to feed BATCH_SIZE examples to the optimizer using the former method.
This way there is only a maximum of capacity (defined in shuffle_batch) tensors in the memory at the same time.
This awesome tutorial from Indico explains it all.

How to solve out of memory error?

I am doing my project in OCR.For this i am using image size of 64x64 because when i tried 32x32 etc some pixels is lost.I have tried features such as zonal density, Zernike's moments,Projection Histogram,distance profile,Crossing .The main problem is feature vector size is too big .I have take the combination of above features and tried.But whenever i train the neural network ,i have got an error "out of memory". I have tried PCA dimensionality reduction but its not work good.i didn't get efficiency during training.Run the code in my PC and laptop.In both of them i have got same error.my RAM is 2GB.so i think about reducing the size of an image.is there any solution to solve this problem.
I have one more problem whenever i tried to train the neural network using same features result is varied.how to solve this also?
It's not about the size of the image. A 64*64 image is sure not to blow your RAM. There must be bugs in your Neuron Network or other algorithms.
And please paste more details about your implementation. We don't even know what language you are using.

Resources