I am looking for best-practice or case studies using R's "mxnet" for the pixel-wise classification of multi-band imagery (RGB, mlutispectral/hyperspectral aerial or satellite remote sensing). Indeed there is bunch of best practise in image tagging (e.g. dogs vs cats in huge image archives like imagenet) where the whole image is classified and usually a lot of training data (or pretrained models) are available. However, I do not find anything concerning pixel-wise image classification/regression, where training data typically is a bit more sparse and applications deal e.g. with land cover categories, objects (like cars, buildings etc.) or biophysical variables (biomass, soil wetness, chlorophyll content etc.).
FCN (Fully Conv Network) is the pixel-based segmentation that seem to fit your need.
MXNet has a nice example showing FCN-xs that uses Python, but if you are really into R you should be able to convert it, or just use the pre-trained network that is available in the example.
Related
I would like to compare a satellite pseudo-image generated by the WRF model and a real satellite image with the R-package named Spatstat. However, I do not know how to begin. I have read that it is possible to carry out a spatial pattern comparison but I do not know which function I should use. I have two temperature images in a predefined domain and I would like to know if the comparison is carried out point-by-point over those images or I have to facilitate the model output and satellite data. In that case, how should I do that? Is there any available script?
Thanks in advance.
Kind regards, Lara.
The spatstat package is designed mainly to analyse spatial patterns of point locations (like the pattern of locations of accidents that occurred in a given year). It does have some basic facilities for comparing pixel images. If A and B are two pixel images, you can do arithmetic by typing e.g. mean(A-B) or mean(abs(A-B)). There are also some image metrics (distances between images) available. Please visit the spatstat.org website for more information.
My question is inspired by the following Kaggle competition: https://www.kaggle.com/c/leaf-classification
I have a set of leaves which I would like to classify by how they look like. The classification part I managed to do it by using Random Forests and K-means. However, I am more interested in the pre-processing part so as to replicate this analysis with my own set of pictures.
The characteristics that conform each leaf are given by:
id - an anonymous id unique to an image
margin_1, margin_2, margin_3, ... margin_64 - each of the 64 attribute vectors for the margin feature
shape_1, shape_2, shape_3, ..., shape_64 - each of the 64 attribute vectors for the shape feature
texture_1, texture_2, texture_3, ..., texture_64 - each of the 64 attribute vectors for the texture feature
So, focusing on the question: I would like to get these characteristics from a raw picture. I have tried with Jpeg R package, but I haven't succeed. I do not show any code I've tried as this is a rather more theoretical question on how to tackle the issue, no need for the code.
I would really appreciate any advise on how to proceed to get the best descriptors of each image.
The problem is more of which kind of features can you extract based on margin, shape and texture of the images you have (leaves). This depends, some plants can easy be identified with shapes only while others need more features such as texture as they have similar shapes so this is still an open area of research. There have been a number of proposed features for the case of plant species identification which aims to focus on performance, efficiency or usability and good features must be invariant to scale, translation and rotation. Please refer to the link below providing a state-of-art review on the current feature extraction techniques that have been used in plant species identification
https://www.researchgate.net/publication/312147459_Plant_Species_Identification_Using_Computer_Vision_Techniques_A_Systematic_Literature_Review
I have used pretrained network weights that I have downloaded from Caffe zoo to build a feature extractor (VGG-16) in tensorflow.
I have therefore redefined the architecture of the network in TF with the imported weights as constants and added an extra fully connected layer with tf.Variables to train a linear SVM by SGD on Hinge loss cost.
My initial training set is composed of 100000 32x32x3 images in the form of a numpy array.
I therefore had to resize them to 224x224x3 which is the input size of VGG but that does not fit into memory.
So I removed unnecessary examples and narrowed it down to 10000x224x224x3 images which is awful but still acceptable as only support vectors are important but even then I still get OOM with TF while training.
That should not be the case as the only important representation is the one from penultimate layer of size 4096 which is easily manageable and the weights to backprop on are of size only (4096+1bias).
So what I can do is first transform all my images to features with TF network with only constants to form a 10000x4096 dataset and then train a second tensorflow model.
Or at each batch recalculate all features for the batch. In the next_batch method. Or use the panoply of buffers/queue runners that TF provides but it is a bit scary as I am not really familiar with those.
I do not like those method I think there should be something more elegant (without too much queues if possible).
What would be the most Tensorflow-ic method to deal with this ?
If I understand your question correctly, 100K images are not fitting in memory at all, while 10K images do fit in memory, but then the network itself OOMs. That sounds very reasonable, because 10K images alone, assuming they are represented using 4 bytes per pixel per channel, occupy 5.6GiB of space (or 1.4GiB if you somehow only spend 1 byte per pixel per channel), so even if the dataset happens to fit in memory, as you add your model, that will occupy couple more GiBs, you will OOM.
Now, there are several ways you can address it:
You should train using minibatches (if you do not already). With a minibatch if size 512 you will load significantly less data to the GPU. With minibatches you also do not need to load your entire dataset into a numpy array at the beginning. Build your iterator in a way that will load 512 images at a time, run forward and backward pass (sess.run(train...)), load next 512 images etc. This way at no point you will need to have 10K or 100K images in memory simultaneously.
It also appears to be very wasteful to upscale images, when your original images are so much smaller. What you might consider doing is taking convolution layers from VGG net (dimensions of conv layers do not depend on dimensions of the original images), and train the fully connected layers on top of them from scratch. To do that just trim the VGG net after the flatten layer, run it for all the images you have and produce the output of the flatten layer for each image, then train a three layer fully connected network on those features (this will be relatively fast compared to training the entire conv network), and plug the resulting net after the flatten layer of the original VGG net. This might also produce better results, because the convolution layers are trained to find features in the original size images, not blurry upscaled images.
I guess a way to do that using some queues and threads but not too much would be to save the training set into a tensorflow protobuf format (or several) using tf.python_io.TFRecordWriter.
Then creating a method to read and decode a single example from the protobuf and finally use tf.train.shuffle_batch to feed BATCH_SIZE examples to the optimizer using the former method.
This way there is only a maximum of capacity (defined in shuffle_batch) tensors in the memory at the same time.
This awesome tutorial from Indico explains it all.
There exist a very large own-collected dataset of size [2000000 12672] where the rows shows the number of instances and the columns, the number of features. This dataset occupies ~60 Gigabyte on the local hard disk. I want to train a linear SVM on this dataset. The problem is that I have only 8 Gigabyte of RAM! so I cannot load all data once. Is there any solution to train the SVM on this large dataset? Generating the dataset is on my own desire, and currently are is HDF5 format.
Thanks
Welcome to machine learning! One of the hard things about working in this space is the compute requirements. There are two main kinds of algorithms, on-line and off-line.
Online: supports feeding in examples one at a time, each one improving the model slightly
Offline: supports feeding in the entire dataset at once, achieving higher accuracy than an On-line model
Many typical algorithms have both on-line, and off-line implementations, but an SVM is not one of them. To the best of my knowledge, SVMs are traditionally an off-line only algorithm. The reason for this is a lot of the fine details around "shattering" the dataset. I won't go too far into the math here, but if you read into it it should become apparent.
It's also worth noting that the complexity of an SVM is somewhere between n^2 and n^3, meaning that even if you could load everything into memory it would take ages to actually train the model. It's very typical to test with a much smaller portion of your dataset before moving to the full dataset.
When moving to the full dataset you would have to run this on a much larger machine than your own, but AWS should have something large enough for you, though at your size of data I highly advise using something other than an SVM. At large data sizes, neural net approaches really shine, and can be trained in a more realistic amount of time.
As alluded to in the comments, there's also the concept of an out-of-core algorithm that can operate directly on objects stored on disk. The only group I know with a good offering of out-of-core algorithms is dato. It's a commercial product, but might be your best solution here.
A stochastic gradient descent approach to SVM could help, as it scales well and avoids the n^2 problem. An implementation available in R is RSofia, which was created by a team at Google and is discussed in Large Scale Learning to Rank. In the paper, they show that compared to a traditional SVM, the SGD approach significantly decreases the training time (this is due to 1, the pairwise learning method and 2, only a subset of the observations end up being used to train the model).
Note that RSofia is a little more bare bones than some of the other SVM packages available in R; for example, you need to do your own centering and scaling of features.
As to your memory problem, it'd be a little surprising if you needed the entire dataset - I would expect that you'd be fine reading in a sample of your data and then training your model on that. To confirm this, you could train multiple models on different samples and then estimate performance on the same holdout set - the performance should be similar across the different models.
You don't say why you want Linear SVM, but if you can consider another model that often gives superior results then check out the hpelm python package. It can read an HDF5 file directly. You can find it here https://pypi.python.org/pypi/hpelm It trains on segmented data, that can even be pre-loaded (called async) to speed up reading from slow hard disks.
I am trying to build a basic Emotion detector from speech using MFCCs, their deltas and delta-deltas. A number of papers talk about getting a good accuracy by training GMMs on these features.
I cannot seem to find a ready made package to do the same. I did play around with scilearn in Python, Voicebox and similar toolkits in Matlab and Rmixmod, stochmod, mclust, mixtools and some other packages in R. What would be the best library to calculate GMMs from trained data?
Challenging problem is training data, which contains the emotion information, embedded in feature set. The same features that encapsulate emotions should be used in the test signal. The testing with GMM will only be good as your universal background model. In my experience typically with GMM you can only separate male female and a few unique speakers. Simply feeding the MFCC’s into GMM would not be sufficient, since GMM does not hold time varying information. Since emotional speech would contain time varying parameters such as pitch and changes in pitch over periods in addition to the frequency variations MFCC parameters. I am not saying it not possible with current state of technology but challenging in a good way.
If you want to use Python, here is the code in the famous speech recognition toolkit Sphinx.
http://sourceforge.net/p/cmusphinx/code/HEAD/tree/trunk/sphinxtrain/python/cmusphinx/gmm.py