Can any one suggest me a real example of dimension reduction using any model like PCA,ICA or others of Hyper spectral image with R or Python language.
Dimension reduction is critical when dealing with hyperspectral data and it is quite easily implementable in python.
Import the spectral python library and use principal_components function on your data to get the PCA result.
As for example, you should check out this
Related
Coming from Tensorflow and Pytorch, does Flux.jl contain a tensor like structure? If not, what is the common way to structure your data?
From the Flux.jl docs:
The starting point for all of our models is the Array (sometimes referred to as a Tensor in other frameworks). This is really just a list of numbers, which might be arranged into a shape like a square.
So given this, the way to represent data is just via traditional matrices (which are just arrays). You can find out more about Julia's first class array support here: https://docs.julialang.org/en/v1/manual/arrays/
Could you please help me to add zooming option for wordcloud
Please find reproducible example #
´http://shiny.rstudio.com/gallery/word-cloud.html´
I tried to incorporate rbokeh and plotly but couldnt find wordcloud equivalent render function
Additionally, I found ECharts from github #
´https://github.com/XD-DENG/ECharts2Shiny/tree/8ac690a8039abc2334ec06f394ba97498b518e81´
But incorporating this ECharts are also not convenient for really zoom.
Thanks in advance,
Abi
Normalisation is required only if the predictors are not meant to be comparable on the original scaling. There's no rule that says you must normalize.
PCA is a statistical method that gives you a new linear transformation. By itself, it loses nothing. All it does is to give you new principal components.
You lose information only if you choose a subset of those principal components.
Usually PCA includes centering the data as a Pre Process Step.
PCA only arranges the data in its own Axis (Eigne Vectors) System.
If you use all axis you lose no information.
Yet, usually we want to apply Dimensionality Reduction, intuitively, having less coordinates for the data.
This process means projecting the data into Sub Space which is spanned by only some of the Eigen Vectors of the data.
If one chose wisely the number of vectors one might end up with a significant reduction in the number of dimensions of the data with negligible loss of data / information.
The way to do so is by choosing Eigen Vectors which their Eigen Values sum to most of the data power.
PCA itself is invertible, so lossless.
But:
It is common to drop some components, which will cause a loss of information.
Numerical issues may cause a loss in precision.
Goal: I aim to use t-SNE (t-distributed Stochastic Neighbor Embedding) in R for dimensionality reduction of my training data (with N observations and K variables, where K>>N) and subsequently aim to come up with the t-SNE representation for my test data.
Example: Suppose I aim to reduce the K variables to D=2 dimensions (often, D=2 or D=3 for t-SNE). There are two R packages: Rtsne and tsne, while I use the former here.
# load packages
library(Rtsne)
# Generate Training Data: random standard normal matrix with J=400 variables and N=100 observations
x.train <- matrix(nrom(n=40000, mean=0, sd=1), nrow=100, ncol=400)
# Generate Test Data: random standard normal vector with N=1 observation for J=400 variables
x.test <- rnorm(n=400, mean=0, sd=1)
# perform t-SNE
set.seed(1)
fit.tsne <- Rtsne(X=x.train, dims=2)
where the command fit.tsne$Y will return the (100x2)-dimensional object containing the t-SNE representation of the data; can also be plotted via plot(fit.tsne$Y).
Problem: Now, what I am looking for is a function that returns a prediction pred of dimension (1x2) for my test data based on the trained t-SNE model. Something like,
# The function I am looking for (but doesn't exist yet):
pred <- predict(object=fit.tsne, newdata=x.test)
(How) Is this possible? Can you help me out with this?
From the author himself (https://lvdmaaten.github.io/tsne/):
Once I have a t-SNE map, how can I embed incoming test points in that
map?
t-SNE learns a non-parametric mapping, which means that it does not
learn an explicit function that maps data from the input space to the
map. Therefore, it is not possible to embed test points in an existing
map (although you could re-run t-SNE on the full dataset). A potential
approach to deal with this would be to train a multivariate regressor
to predict the map location from the input data. Alternatively, you
could also make such a regressor minimize the t-SNE loss directly,
which is what I did in this paper (https://lvdmaaten.github.io/publications/papers/AISTATS_2009.pdf).
So you can't directly apply new data points. However, you can fit a multivariate regression model between your data and the embedded dimensions. The author recognizes that it's a limitation of the method and suggests this way to get around it.
t-SNE does not really work this way:
The following is an expert from the t-SNE author's website (https://lvdmaaten.github.io/tsne/):
Once I have a t-SNE map, how can I embed incoming test points in that
map?
t-SNE learns a non-parametric mapping, which means that it does not
learn an explicit function that maps data from the input space to the
map. Therefore, it is not possible to embed test points in an existing
map (although you could re-run t-SNE on the full dataset). A potential
approach to deal with this would be to train a multivariate regressor
to predict the map location from the input data. Alternatively, you
could also make such a regressor minimize the t-SNE loss directly,
which is what I did in this paper.
You may be interested in his paper: https://lvdmaaten.github.io/publications/papers/AISTATS_2009.pdf
This website in addition to being really cool offers a wealth of info about t-SNE: http://distill.pub/2016/misread-tsne/
On Kaggle I have also seen people do things like this which may also be of intrest:
https://www.kaggle.com/cherzy/d/dalpozz/creditcardfraud/visualization-on-a-2d-map-with-t-sne
This the mail answer from the author (Jesse Krijthe) of the Rtsne package:
Thank you for the very specific question. I had an earlier request for
this and it is noted as an open issue on GitHub
(https://github.com/jkrijthe/Rtsne/issues/6). The main reason I am
hesitant to implement something like this is that, in a sense, there
is no 'natural' way explain what a prediction means in terms of tsne.
To me, tsne is a way to visualize a distance matrix. As such, a new
sample would lead to a new distance matrix and hence a new
visualization. So, my current thinking is that the only sensible way
would be to rerun the tsne procedure on the train and test set
combined.
Having said that, other people do think it makes sense to define
predictions, for instance by keeping the train objects fixed in the
map and finding good locations for the test objects (as was suggested
in the issue). An approach I would personally prefer over this would
be something like parametric tsne, which Laurens van der Maaten (the
author of the tsne paper) explored a paper. However, this would best
be implemented using something else than my package, because the
parametric model is likely most effective if it is selected by the
user.
So my suggestion would be to 1) refit the mapping using all data or 2)
see if you can find an implementation of parametric tsne, the only one
I know of would be Laurens's Matlab implementation.
Sorry I can not be of more help. If you come up with any other/better
solutions, please let me know.
t-SNE fundamentally does not do what you want. t-SNE is designed only for visualizing a dataset in a low (2 or 3) dimension space. You give it all the data you want to visualize all at once. It is not a general purpose dimensionality reduction tool.
If you are trying to apply t-SNE to "new" data, you are probably not thinking about your problem correctly, or perhaps simply did not understand the purpose of t-SNE.
Functions in spatstat are mainly made for 2-3-dim data analysis. Is there a good possibility to apply them to one-dim data?
There is huge capability for class ppp in 2-dim.
There is a very general class ppx for arbitrary dimensions - but this is the problem - only very few functions are available.
Can I take a sledgehammer to crack a nut in inflating one-dim data to two-dim one and in the end projecting back to one-dim?
Or should I better rewrite functions for one-dim (rpoispp, rmpoispp, ...)?
It all depends on which analysis you are doing. In general I would not recommend inflating one-dim data to two-dim data.
As you state the class ppx is general, but doesn't have many functions implemented for it yet. If you just need to simulate an unmarked Poisson point process in 1-dim you can use rpoisppx.
To have more functions available one solution could be to represent your data as a point pattern on a linear network (class lpp). Here is a crude example representing a section of 1-dim space as a line in the unit square and simulating a Poisson process on the line with intensity 10:
X <- ppp(x=c(0,1), y=c(.5,.5), window=square(1))
L <- linnet(X, edges=matrix(1:2,1,2))
Y <- rpoislpp(10, L)
I have nonuniformly located samples of an image, and would like to interpolate to a regular grid because (among other things) most image graphics functions expect a regular grid.
I notice there are some MatLab functions (see Image interpolation from random pixels for example) which apparently will do this, but couldn't find an R-package that does.
Here's a simple example.
#make up some 2D func
y<-matrix(rep(1:10,10) -.5 + runif(100),nrow=10)
x<-matrix(rep(1:10,10) -.5 + runif(100),nrow=10)
inmat<-sin(x) + cos(y)
So the values of inmat are on random locations. I want some sort of outmat<-interpolate(inmat,x,y,gridx,gridy) function where inmat , x,and y are either all matrices or all vectors (unwrapped matrices).
I see also that SciPy has http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html which does this. Is there such a function in an R package or do I need to port from SciPy or MatLab code?
The linked pages provide pointers to a gazillion R packages which do Kriging or other interpolation functions.
I'm posting my personal choice as an answer just to close out this question.
I found akima::interp to be a straightforward function to do 2D interpolation on arbitrary collections of sample locations.
That doesn't mean it's going to be best for everyone, and my guess is those working with geodata may prefer packages designed to muck with specific geo-survey-related file types and lat/long coordinate systems.