Kernel for classification of variable length sequences of factors in kernlab

Kernel for classification of variable length sequences of factors in kernlab - r

Which is the best approach to define a suitable kernel for classification of variable length sequences of factors. I'm using kernlab with R.
Thanks!

There is no general good way. Variable length factors mean, that there is no dimension-dimension relation, so the suitable kernel function is fully data (problem) dependent.
However, the most basic approach, assuming, that your factors are just elements of some big set is to use Jaccard-based kernel,
K(A,B) = |A n B|
Which simply measures size of the intersection. It is easy to prove, that it is a valid kernel, as one can think about kernel projection phi(A) which encodes the set A as the bit-vector with "1" on the i'th dimension iff i'th element of the Universe (from which A is sampled) is contained in A. K defines a regular scalar product of such elements.

You should read about:
Dynamic Time Warping (DTW) inspired kernels (with PDS constraints, such as global alignment kernels).
String kernels usually used for ADN-structure analysis (see spectrum kernel, mismatch kernel, ...).

Related

What do you call it when kernel of a matrix is sought with a set (nonzero) tolerance?

This will be a strange question: I know what to do, and I am actually doing it, and it works, but I don't know how to write about it. Looking for solutions to a homogeneous matrix equation, say AX=0, I use the kernel of the parameter matrix A. But, the world being imperfect as it is, the matrix does not have a "perfect" kernel; it does have an "imperfect" one if you set a nonzero "tolerance" parameter. FWIW I'm using Scilab, the function is kernel(A,tol).
Now what are the correct terms for "imperfect kernel", or "tolerance" (of what?), how should this whole process be described in correct English and maths terminology? Should I say something like a "least-squares kernel"? "Approximate kernel"? Is tol the "tolerance of kernel-determination algorithm"? Sounds lame to me...

Depending on the method used (QR or SVD, third flag allows to choose this in Scilab implementation) the tolerance is used to determine when pivots (QR case) or singular values (SVD case) are consider to be zero. The kernel is then considered to be the associated subspace.

How can OpenMDAO be used to solve a linear system of equations without inverting the A matrix?

I have a system of equations that is in the form:
Ax = b
Where A and b are a mixture of known states and state rates derived from earlier components and x is a vector of four yet unknown state rates. I've used Matlab to linearise the problem, all I need to do now is to create some components to find x. However, the inverse of A is large in terms of the number of variables in each index, so I can't just turn these into a straightforward linear equation. Could someone suggest a route to go?

I don't fully understand what you mean by "the inverse of A is large in terms of the number of variables in each index", however I think mean that the inverse of A is to larger and dense to compute and store in memory.
OpenMDAO or not, When you run into this situation you are forced to use an iterative linear solver such as gmres. So that is broadly the approach that is needed here too.
OpenMDAO does have a LinearSystemComponent that you can use as a rough blueprint here. However, it does compute a factorization and store it which is not what you want. Regardless, it gives you the blueprint for how to represent a linear system as an implicit component in OpenMDAO.
Broadly, you have to think of defining a linear residual:
R = Ax-b = 0
Your component will have two inputs A and b, and and one output x.
The two key methods here are apply_nonlinear and solve_nonlinear. I realize that the word nonlinear in the method names is confusing. OpenMDAO assumes that the analysis is nonlinear. In your case it happens to be linear, but you use the nonlinear methods all the same.
I will assume that, although you can't compute/store [A] inverse you can compute/store A (perhaps in a sparse format). In that case you might pass the sparse data array of [A] as the input and fill the sparse matrix as needed from that.
the apply_nonlinear method would look like this:
def apply_nonlinear(self, inputs, outputs, residuals):
"""
R = Ax - b.
Parameters
----------
inputs : Vector
unscaled, dimensional input variables read via inputs[key]
outputs : Vector
unscaled, dimensional output variables read via outputs[key]
residuals : Vector
unscaled, dimensional residuals written to via residuals[key]
"""
residuals['x'] = inputs['A'].dot(outputs['x']) - inputs['b']
The key to your question is really the solve_nonlinear method. It would look something like this (using scipy gmres):
def solve_nonlinear(self, inputs, outputs):
"""
Use numpy to solve Ax=b for x.
Parameters
----------
inputs : Vector
unscaled, dimensional input variables read via inputs[key]
outputs : Vector
unscaled, dimensional output variables read via outputs[key]
"""
x, exitCode = gmres(inputs['A'], inputs['b'])
outputs['x'] = x

frequencies in Julia's real FFT

I'm using Julia's FFT implementation to perform a 2D real FFT on a couple of arrays but I can't be sure of the order of the frequencies in the output. Consider the MWE
N=64
U = rand(Float64, N, N);
FFTW.set_num_threads(2)
prfor = plan_rfft(U, (1,2), flags=FFTW.MEASURE);
size(prfor*U)
The output is an array of size (33, 64).
Julia doesn't have a rfftfreq function like Numpy does, and the fact that Julia's output is different from Numpy's fft.rfftn default output makes me not want to use Numpy's default here. I read the documentation but it's not clear how the frequencies are organized just by reading that.
Is there anywhere that tells us the order of the frequencies?

I'm not sure what you are seeking exactly, but if you use DSP.jl, its util.jl file probably has what you may need:
https://github.com/JuliaDSP/DSP.jl/blob/master/src/util.jl
"""
rfftfreq(n, fs=1)
Return discrete fourier transform sample frequencies for use with
`rfft`. The returned Frequencies object is an AbstractVector
containing the frequency bin centers at every sample point. `fs`
is the sample rate of the input signal.
"""

Memory Efficient Centered Sparse SVD/PCA (in Julia)?

I have a 3 million x 9 million sparse matrix with several billion non-zero entries. R and Python do not allow sparse matrices with more than MAXINT non-zero entries, thus why I found myself using Julia.
While scaling this data with the standard deviation is trivial, demeaning is of course a no-go in a naive manner as that would create a dense, 200+ terabyte matrix.
The relevant code for doing svd is julia can be found at https://github.com/JuliaLang/julia/blob/343b7f56fcc84b20cd1a9566fd548130bb883505/base/linalg/arnoldi.jl#L398
From my reading, a key element of this code is the AtA_or_AAt struct and several of the functions around those, specifically A_mul_B!. Copied below for your convenience
struct AtA_or_AAt{T,S} <: AbstractArray{T, 2}
A::S
buffer::Vector{T}
end
function AtA_or_AAt(A::AbstractMatrix{T}) where T
Tnew = typeof(zero(T)/sqrt(one(T)))
Anew = convert(AbstractMatrix{Tnew}, A)
AtA_or_AAt{Tnew,typeof(Anew)}(Anew, Vector{Tnew}(max(size(A)...)))
end
function A_mul_B!(y::StridedVector{T}, A::AtA_or_AAt{T}, x::StridedVector{T}) where T
if size(A.A, 1) >= size(A.A, 2)
A_mul_B!(A.buffer, A.A, x)
return Ac_mul_B!(y, A.A, A.buffer)
else
Ac_mul_B!(A.buffer, A.A, x)
return A_mul_B!(y, A.A, A.buffer)
end
end
size(A::AtA_or_AAt) = ntuple(i -> min(size(A.A)...), Val(2))
ishermitian(s::AtA_or_AAt) = true
This is passed into the eigs function, where some magic happens, and the output is then processed in to the relevant components for SVD.
I think the best way to make this work for a 'centering on the fly' type setup is to do something like subclass AtA_or_AAT with a AtA_or_AAT_centered version that more or less mimics the behavior but also stores the column means, and redefines the A_mul_B! function appropriately.
However, I do not use Julia very much and have run in to some difficulty modifying things already. Before I try to dive into this again, I was wondering if I could get feedback if this would be considered an appropriate plan of attack, or if there is simply a much easier way of doing SVD on such a large matrix (I haven't seen it, but I may have missed something).
edit: Instead of modifying base Julia, I've tried writing a "Centered Sparse Matrix" package that keeps the sparsity structure of the input sparse matrix, but enters the column means where appropriate in various computations. It's limited in what it has implemented, and it works. Unfortunately, it is still too slow, despite some pretty extensive efforts to try to optimize things.

After much fuddling with the sparse matrix algorithm, I realized that distributing the multiplication over the subtraction was dramatically more efficient:
If our centered matrix Ac is formed from the original nxm matrix A and its vector of column means M, with a nx1 vector of ones that I will just call 1. We are multiplying by a mxk matrix X
Ac := (A - 1M')
AcX = X
= AX - 1M'X
And we are basically done. Stupidly simple, actually.
AX is can be carried out with the usual sparse matrix multiplication function, M'X is a dense vector-matrix inner product, and the vector of 1's "broadcasts" (to use Julia's terminology) to each row of the AX intermediate result. Most languages have a way of doing that broadcasting without realizing the extra memory allocation.
This is what I've implemented in my package for AcX and Ac'X. The resulting object can then be passed to algorithms, such as the svds function, which only depend on matrix multiplication and transpose multiplication.

R: SVM performance using custom kernel (user defined kernel) is not working in kernlab

I'm trying to use user defined kernel. I know that kernlab offer user defined kernel(custom kernel functions) in R. I used data spam including package kernlab.
(number of variables=57 number of examples =4061)
I'm defined kernel's form,
kp=function(d,e){
as=v*d
bs=v*e
cs=as-bs
cs=as.matrix(cs)
exp(-(norm(cs,"F")^2)/2)
}
class(kp)="kernel"
It is the transformed kernel for gaussian kernel, where v is the continuously changed values that are inverse of standard deviation vector about each variables, for example:
v=(0.1666667,........0.1666667)
The training set defined 60% of spam data (preserving the proportions of the different classes).
if data's type is spam, than data's type = 1 for train svm
m=ksvm(xtrain,ytrain,type="C-svc",kernel=kp,C=10)
But this step is not working. It's always waiting for a response.
So, I ask you this problem, why? Is it because the number of examples are too big? Is there any other R package that can train SVMs for user defined kernel?

First, your kernel looks like a classic RBF kernel, with v = 1/sigma, so why do you use it? You can use a built-in RBF kernel and simply set the sigma parameter. In particular - instead of using frobenius norm on matrices you could use classic euclidean on the vectorized matrices.
Second - this is working just fine.
> xtrain = as.matrix( c(1,2,3,4) )
> ytrain = as.factor( c(0,0,1,1) )
> v= 0.01
> m=ksvm(xtrain,ytrain,type="C-svc",kernel=kp,C=10)
> m
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 10
Number of Support Vectors : 4
Objective Function Value : -39.952
Training error : 0
There are at least two reasons for you still waiting for results:
RBF kernels induce the most hard problem to optimize for SVM (especially for large C)
User defined kernels are far less efficient then builtin
As I am not sure, whether ksvm actually optimizes the user-defined kernel computation (in fact I'm pretty sure it does not), you could try to build the kernel matrix ( K[i,j] = K(x_i,x_j) where x_i is i'th training vector) and provide ksvm with it. You can achieve this by
K <- kernelMatrix(kp,xtrain)
m <- ksvm(K,ytrain,type="C-svc",kernel='matrix',C=10)
Precomputing kernel matrix can be quite long process, but then optimization itself will be much faster, so it is a good method if you want to test many different C values (which you for sure should do). Unfortunately this requires O(n^2) memory, so if you use more then 100 000 vectors, you will need really great amount of RAM.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Kernel for classification of variable length sequences of factors in kernlab - r

Which is the best approach to define a suitable kernel for classification of variable length sequences of factors. I'm using kernlab with R. Thanks!

You should read about: Dynamic Time Warping (DTW) inspired kernels (with PDS constraints, such as global alignment kernels). String kernels usually used for ADN-structure analysis (see spectrum kernel, mismatch kernel, ...).

Related

What do you call it when kernel of a matrix is sought with a set (nonzero) tolerance?

How can OpenMDAO be used to solve a linear system of equations without inverting the A matrix?

frequencies in Julia's real FFT

Memory Efficient Centered Sparse SVD/PCA (in Julia)?

R: SVM performance using custom kernel (user defined kernel) is not working in kernlab

Categories

Resources