I want to produce a matrix which I put produce process below. But it tooks so much time.
Is there any other way to get same result quicker with using foreach and parallel processing?
"""
library("foreach")
library("doParallel")
lambdas=seq(0.01,7, by = 0.01)
cl <- makeCluster(2) # create a cluster with 2 cores
registerDoParallel(cl) # register the cluster
nlambdas <-foreach(i = 1:1, .inorder=FALSE ,.combine = 'cbind', .multicombine=TRUE, .packages = "quantreg") %dopar% {
first<-rep()
second<-rep()
third<-rep()
fourth<-rep()
for (m in 1:700) {
for (j in 1:700) {
for (n in 1:700) {
for (k in 1:700) {
first<-rbind(first,lambdas[m])
second<-rbind(second,lambdas[j])
third<-rbind(third,lambdas[n])
fourth<-rbind(fourth,lambdas[k])
}
}
}
}
lambda_total<-cbind(first,second,third,fourth)
}
stopCluster(cl)
"""
You don't need parallel processing, you just need an algorithm with less than (O**4) complexity and a slow operation to compose this matrix.
The same matrix may easily be constructed by repeatedly sorting four vectors. In your case, however, with 700**4= 2.4e+11 elements, this might take some time.
I illustrate the algorithm with only 7 different values in the vector lambdas (and 2401 elements in total).
nsteps = 7
lambdas = seq(0.01,7,(7/nsteps))
h = rep(lambdas,nsteps**3)
i = rep(lambdas,nsteps**3)
j = rep(lambdas,nsteps**3)
k = rep(lambdas,nsteps**3)
ordering = order(k)
k = k[ordering]
k = k[ordering]
k = k[ordering]
k = k[ordering]
j = j[ordering]
j = j[ordering]
j = j[ordering]
i = i[ordering]
i = i[ordering]
h = h[ordering]
lambdas.total = cbind(first=h,second=i,third=j,fourth=k)
If your memory is large enough to cope with ordering a vector with length>10**10, you can do this with 700 steps for lambda in no time.
No need for any for loops.
Related
I am trying to create an SEIR model with multiple patches using the package deSolve in R. At each time step, there is some movement of individuals between patches that can infect individuals in other patches. I also have an external forcing parameter that is specific to each patch (representing different environmental conditions). I've been able to get this working in base R, but given the number of patches and compartments and the duration of the model, I'm trying to convert it to compiled code to speed it up.
I've gotten the different patches working, but am struggling with how to incorporate a different forcing parameter for each patch. When forcings are provided, there is an automatic check checkforcings (https://rdrr.io/cran/deSolve/src/R/forcings.R) that doesn't allow for a matrix with more than two columns, and I'm not quite sure what the best workaround is for this. Write my own ode and checkforcings functions to override this? Restructure the forcings data once it gets into C? My final model has 195 patches so I'd prefer to be to automate it somehow so I am not writing out thousands of equations or hundreds of functions.
Also fine if the answer is just, do this in a different language, but would appreciate insight into what language I should switch to. Julia maybe?
Below is code for a very simple example that just highlights this "different forcings in different patches problem".
R Code
# Packages #########################################################
library(deSolve)
library(ggplot2); theme_set(theme_bw())
library(tidyr)
library(dplyr)
# Initial Parameters and things ####################################
times <- 1:500
n_patch <- 2
patch_ind <- 100
state_names <- (c("S", "I"))
n_state <- length(state_names)
x <-rep(0, n_patch*n_state)
names(x) <- unlist(lapply(state_names, function(x) paste(x,
stringr::str_pad(seq(n_patch), width = 3, side = "left", pad =0),
sep = "_")))
#start with infected individuals in patch 1
x[startsWith(names(x), "S")] <- patch_ind
x['S_001'] <- x['S_001'] - 5
x['I_001'] <- x['I_001'] + 5
x['I_002'] <- x['I_002'] + 20
params <- c(gamma = 0.1, betam = 0.2)
#seasonality
forcing <- data.frame(times = times,
rain = rep(rep(c(0.95,1.05), each = 50), 5))
new_approx_fun <- function(rain.column, t){
approx_col <- approxfun(rain.column, rule = 2)
return(approx_col(t))
}
rainfall2 <- data.frame(P1 = forcing$rain,
P2 = forcing$rain+0.01)
# model in R
r.mod2 <- function(t,x,params){
# turn state.vec into matrix
# columns are different states, rows are different patches
states <- matrix(x,
nrow = n_patch,
ncol = n_state, byrow = F)
S <- states[,1]
I <- states[,2]
N <- rowSums(states[,1:2])
with(as.list(params),{
#seasonal forcing
rain <- as.numeric(apply(as.matrix(rainfall2), MARGIN = 2, FUN = new_approx_fun, t = t))
dS <- gamma*I - rain*betam*S*I/N
dI <- rain*betam*S*I/N - gamma*I
return(list(c(dS, dI), rain))
})
}
out.R2 <- data.frame(ode(y = x, times =times, func = r.mod2,
parms = params))
#create seasonality for C
ftime <- seq(0, max(times), by = 0.1)
rain.ft <- approx(times, rainfall2$P1, xout = ftime, rule = 2)$y
forcings2 <- cbind(ftime, rain.ft, rain.ft +0.01)
# C model
system("R CMD SHLIB ex-patch-season-multi.c")
dyn.load(paste("ex-patch-season-multi", .Platform$dynlib.ext, sep = ""))
out.dll <- data.frame(ode(y = x, times = times, func = "derivsc",
dllname = "ex-patch-season-multi", initfunc = "parmsc",
parms = params, forcings = forcings2,
initforc = "forcc", nout = 1, outnames = "rain"))
C code
#include <R.h>
#include <math.h>
#include <Rmath.h>
// this is for testing to try and get different forcing for each patch //
/*define parameters, pay attention to order */
static double parms[2];
static double forc[1];
#define gamma parms[0]
#define betam parms[1]
//define forcing
#define rain forc[0]
/* initialize parameters */
void parmsc(void (* odeparms)(int *, double *)){
int N=2;
odeparms(&N, parms);
}
/* forcing */
void forcc(void (* odeforcs)(int *, double *))
{
int N=1;
odeforcs(&N, forc);
}
/* model function */
void derivsc(int *neq, double *t, double *y, double *ydot, double *yout, int *ip){
//use for-loops for patches
//define all variables at start of block
int npatch=2;
double S[npatch]; double I[npatch]; double N[npatch];
int i;
for(i=0; i<npatch; i++){
S[i] = y[i];
};
for(i=0; i <npatch; i++){
int ind = npatch+i;
I[i] = y[ind];
};
for(i=0; i<npatch; i++){
N[i] = S[i] + I[i];
};
//use for loops for equations
{
// Susceptible
for(i=0; i<npatch; i++){
ydot[i] = gamma*I[i] - rain*betam*I[i]*S[i]/N[i] ;
};
//infected
for(i=0; i<npatch; i++){
int ind=npatch+i;
ydot[ind] = rain*betam*I[i]*S[i]/N[i] - gamma*I[i];
};
};
yout[0] = rain;
}
The standard way for multiple forcings in compiled code of the deSolve package is described in the lsoda help page:
forcings only used if ‘dllname’ is specified: a list with the forcing function data sets, each present as a two-columned matrix
Such a list can be created automatically in a script.
There are also other ways possible with some creative C or Fortran programming.
For more complex models, I would recommend to use the rodeo package. It allows to specify dynamic models in a tabular form (CSV, LibreOffice, Excel), including parameters and forcing functions. The code generator of the package creates then a fast Fortran code, that can be solved with deSolve. An overview can be found in a paper of Kneis et al (2017), https://doi.org/10.1016/j.envsoft.2017.06.036 and a more extended tutorial at https://dkneis.github.io/ .
I want to use StaticArray with StatsBase. Consider the following function
function update_weights_1(N, M)
weights_vector_to_update = ones(N) / N
wvector = Weights(weights_vector_to_update, 1)
res = [0.0]
for m in 1:M
sample!(M_vector, wvector, res)
end
end
function update_weights_2(N, M)
weights_vector_to_update = ones(N) / N
res = [0.0]
for m in 1:M
sample!(M_vector, Weights(weights_vector_to_update, 1), res)
end
end
update_weights_1 requires substantially less memory allocation than update_weights_2 because Weights(weights_vector_to_update, 1) needs memory allocation. However, suppose I have a list of small vectors, say z,
z = [ones(3) / 3 for i in 1:10000]
and this function
function update_weights_3(z,M)
N = size(z[1],1)
M_vector = 1:N
for i in 1:size(z,1)
rand!(z[i])
res = [0.0]
for m in 1:M
sample!(M_vector, Weights(z[i]), res)
end
end
end
update_weights_3(z,1000) allocates a lot of memory. I know that using StaticArrays for z can significantly speed up the code and reduce memory allocation. However, following the procedure in this post, whenever I wrap Weights around a StaticArray, it creates memory.
Would you know how to apply StaticArray in this case? Essentially I have a collection of small arrays that I would like to transform into Weights.
Weights is a mutable type, which can cause unnecessary heap allocations (sometimes they are stack allocated... I don't fully understand when this optimization happens). You can define your own immutable weights type, though:
struct StaticWeights{S<:Real, T<:Real, N, V<: StaticVector{N, T}} <: AbstractWeights{S, T, V}
values::V
sum::S
end
StaticWeights(values) = StaticWeights(values, sum(values))
Used in your example:
function update_weights_3(z,M)
N = size(z[1],1)
M_vector = 1:N
for i in 1:size(z,1)
rand!(z[i])
res = [0.0]
for m in 1:M
sample!(M_vector, StaticWeights(z[i]), res)
end
end
end
With this change I don't see any allocations in the inner loop.
I am trying to solve some wave equation problem (related to my Phd) using finite difference method. For this, I have translated (line by line) a fortran code (link below): (https://github.com/geodynamics/seismic_cpml/blob/master/seismic_CPML_2D_anisotropic.f90)
Inside these code and within the time loop, there are four main loops that are independent. In fact, I could arrange them into four functions.
As I have to run this code about a hundred times, it would be nice to speed up the process. In this sense, I am turning my eyes toward parallelization. See below, as an example:
function main()
...some common code...
for time=1:N
function fun1() # I want this function to run parallel...
function fun2() # ..this function to run parallel with 1,3,4
function fun3() # ..This function to run parallel with 2,3,4
function fun4() # ..This function to run parallel with 1,2,3
end
... more code here...
return
end
So,
1) Is it possible to do what I mention before?
2) Will this approach speed up my code?
3) Is there a better way to think this problem?
A minimal working example could be like this:
function fun1(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t+(0.3)^(t-1);
end
end
return t
end
function fun2(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t;
end
end
return t
end
function fun3(r)
for i=1:1000
for j=1:1000
r = (r + rand())/r;
end
end
return r
end
function main()
a = 2;
b = 2.5;
c = 3.0;
for i=1:100
a = fun1(a);
b = fun2(b);
c = fun3(c);
end
return;
end
So, As can be seen, non of the three functions above (fun1, fun2 & fun3) depend from any ohter, so they can sure run parallel. can these be achieved?, will it bust my computational speed?
Edited:
Hi #BogumiłKamiński I have altered the finite-Diff-eq in order to implement a "loop" (as you sugested) over the inputs and outputs of my functions. If there is no much trouble, I would like your opinion over the parellelization design of the code:
Key elements
1) I have packed all inputs in 4 tuples: sig_xy_in and sig_xy_cros_in (for the 2 sigma functions) and vel_vx_in and vel_vy_in (for 2 velocity functions). I then packed the 4 tuples into 2 vectors for "looping" purposes...
2) I packed the 4 functions in 2 vectors for "looping" purposes...
3) I run the first parallel loop and then unpack its output tuple...
4) I run the second parallel loop(for velocities) and then unpack its output tuple...
5) finally, I packed the outputed elements into the inputs tuples and continue the time loop until finish..
...code
l = Threads.SpinLock()
arg_in_sig = [sig_xy_in,sig_xy_cros_in]; # Inputs tuples x sigma funct
arg_in_vel = [vel_vx_in, vel_vy_in]; # Inputs tuples x velocity funct
func_sig = [sig_xy , sig_xy_cros]; # Vector with two sigma functions
func_vel = [vel_vx , vel_vy]; # Vector with two velocity functions
for it = 1:NSTEP # time steps
#------------------------------------------------------------
# Compute sigma functions
#------------------------------------------------------------
Threads.#threads for j in 1:2 # Star parallel of two sigma functs
Threads.lock(l);
Threads.unlock(l);
arg_in_sig[j] = func_sig[j](arg_in_sig[j]);
end
# Unpack tuples for sig_xy and sig_xy_cros
# Unpack tuples for sig_xy
sigxx = arg_in_sig[1][1]; # changed by sig_xy
sigyy = arg_in_sig[1][2]; # changed by sig_xy
m_dvx_dx = arg_in_sig[1][3]; # changed by sig_xy
m_dvy_dy = arg_in_sig[1][4]; # changed by sig_xy
vx = arg_in_sig[1][5]; # unchanged by sig_xy
vy = arg_in_sig[1][6]; # unchanged by sig_xy
delx_1 = arg_in_sig[1][7]; # unchanged by sig_xy
dely_1 = arg_in_sig[1][8]; # unchanged by sig_xy
...more unpacking...
# Unpack tuples for sig_xy_cros
sigxy = arg_in_sig[2][1]; # changed by sig_xy_cros
m_dvy_dx = arg_in_sig[2][2]; # changed by sig_xy_cros
m_dvx_dy = arg_in_sig[2][3]; # changed by sig_xy_cros
vx = arg_in_sig[2][4]; # unchanged by sig_xy_cros
vy = arg_in_sig[2][5]; # unchanged by sig_xy_cros
...more unpacking....
#--------------------------------------------------------
# velocity
#--------------------------------------------------------
Threads.#threads for j in 1:2 # Start parallel ot two velocity funct
Threads.lock(l)
Threads.unlock(l)
arg_in_vel[j] = func_vel[j](arg_in_vel[j])
end
# Unpack tuples for vel_vx
vx = arg_in_vel[1][1]; # changed by vel_vx
m_dsigxx_dx = arg_in_vel[1][2]; # changed by vel_vx
m_dsigxy_dy = arg_in_vel[1][3]; # changed by vel_vx
sigxx = arg_in_vel[1][4]; # unchanged changed by vel_vx
sigxy = arg_in_vel[1][5];....
# Unpack tuples for vel_vy
vy = arg_in_vel[2][1]; # changed changed by vel_vy
m_dsigxy_dx = arg_in_vel[2][2]; # changed changed by vel_vy
m_dsigyy_dy = arg_in_vel[2][3]; # changed changed by vel_vy
sigxy = arg_in_vel[2][4]; # unchanged changed by vel_vy
sigyy = arg_in_vel[2][5]; # unchanged changed by vel_vy
.....
...more unpacking...
# ensamble new input variables
sig_xy_in = (sigxx,sigyy,
m_dvx_dx,m_dvy_dy,
vx,vy,....);
sig_xy_cros_in = (sigxy,
m_dvy_dx,m_dvx_dy,
vx,vy,....;
vel_vx_in = (vx,....
vel_vy_in = (vy,.....
end #time loop
Here is a simple way to run your code in multithreading mode:
function fun1(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t+(0.3)^(t-1);
end
end
return t
end
function fun2(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t;
end
end
return t
end
function fun3(r)
for i=1:1000
for j=1:1000
r = (r + rand())/r;
end
end
return r
end
function main()
l = Threads.SpinLock()
a = [2.0, 2.5, 3.0]
f = [fun1, fun2, fun3]
Threads.#threads for i in 1:3
for j in 1:4
Threads.lock(l)
println((thread=Threads.threadid(), iteration=j))
Threads.unlock(l)
a[i] = f[i](a[i])
end
end
return a
end
I have added locking - just as an example how you can do it (in Julia 1.3 you would not have to do this as IO is thread safe there).
Also note that rand() is sharing data among threads prior to Julia 1.3 so it would be not safe to run these functions if all of them used rand() (again in Julia 1.3 it would be safe to do so).
To run this code first set the maximum number of threads you want to use e.g. like this on Windows: set JULIA_NUM_THREADS=4 (in Linux you should export). Here is an example of this code run (I have reduced the number of iterations done in order to shorten the output):
julia> main()
(thread = 1, iteration = 1)
(thread = 3, iteration = 1)
(thread = 2, iteration = 1)
(thread = 3, iteration = 2)
(thread = 3, iteration = 3)
(thread = 3, iteration = 4)
(thread = 2, iteration = 2)
(thread = 1, iteration = 2)
(thread = 2, iteration = 3)
(thread = 2, iteration = 4)
(thread = 1, iteration = 3)
(thread = 1, iteration = 4)
3-element Array{Float64,1}:
21.40311930108456
21.402807510451463
1.219028489573526
Now one smal cautionary note - while it is relatively easy to make code multithreaded in Julia (and in Julia 1.3 it will be even simpler) you have to be careful when you do it as you have to take care of race conditions.
I am working on the NSGA2 package on R (library mco).
My NSGA2 code takes forever to run, so I am wondering:
1) Is there a way to limit the precision of the solution values (say, maybe up to 3 decimal places) instead of infinite?
2) How do I set an equality constraint (the ones online all seemed to be about >= or <= than =)? Not sure if I'm doing it right.
My entire relevant code for reference, for easy tracing: https://docs.google.com/document/d/1xj7OPng11EzLTTtWLdRWMm8zJ9f7q1wsx2nIHdh3RM4/edit?usp=sharing
Relevant sample part of code reproduced here:
VTR = get.hist.quote(instrument = 'VTR',
start="2010-01-01", end = "2015-12-31",
quote = c("AdjClose"),provider = "yahoo",
compress = "d")
ObjFun1 <- function (xh){
f1 <- sum(HSVaR_P(merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP), xh, 0.05, 2))
tempt = merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP)
tempt2 = tempt[(nrow(tempt)-(2*N)):nrow(tempt),]
for (i in 1:nrow(tempt2))
{
for (j in 1:ncol(tempt2))
{
if (is.na(tempt2[i,j]))
{
tempt2[i,j] = 0
}
}
}
f2 <- ((-1)*abs(sum((xh*t(tempt2)))))
c(f1=f1,f2=f2)
}
Constr <- function(xh){
totwt <- (1-sum(-xh))
totwt2 <- (sum(xh)-1)
c(totwt,totwt2)
}
Solution1 <- nsga2(ObjFun1, n.projects, 2,
lower.bounds=rep(0,n.projects), upper.bounds=rep(1,n.projects),
popsize=n.solutions, constraints = Constr, cdim=1,
generations=generations)
The function HSVaR_P returns matrix(x,2*500,1).
Even when I set generations = 1, the code does not seem to run. Clearly there should be some error in the code, somewhere, but I am not entirely sure about the mechanics of the NSGA2 algorithm.
Thanks.
I wanted to use user-defined kernel function for Ksvm in R.
so, I tried to make a vanilladot kernel and compare with "vanilladot" which is built in "kernlab" as practice.
I write my kernel as follow.
#
###vanilla kernel with class "kernel"
#
kfunction.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "kernel"
k}
l<-0.1 ; C<-1/(2*l)
###use kfunction.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.k
I thouhgt result of this operation is eqaul to that of following.
However It dosn't.
#
###use "vanilladot"
#
l<-0.1 ; C<-1/(2*l)
tmp1<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel="vanilladot", C = C)
alpha(tmp1)[[1]]
ind1<-alphaindex(tmp1)[[1]]
x.s<-x[ind1,] ; y.s<-y[ind1]
w.tmp1<-t(alpha(tmp1)[[1]]*y.s)%*%x.s
w.tmp1
I think maybe this problem is related to kernel class.
When class is set to "kernel", this problem is occured.
However When class is set to "vanillakernel", the result of ksvm using user-defined kernel is equal to that of ksvm using "vanilladot" which is built in Kernlab.
#
###vanilla kernel with class "vanillakernel"
#
kfunction.v.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "vanillakernel"
k}
# The only difference between kfunction.k and kfunction.v.k is "class(k)".
l<-0.1 ; C<-1/(2*l)
###use kfunction.v.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.v.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.v.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.v.k
I don't understand why the result is different from "vanilladot", when setting the class to "kernel".
Is there an error in my operation?
First, it seems like a really good question!
Now to the point. In the sources of ksvm we can find when is a line drawn between using user-defined kernel, and the built-ins:
if (type(ret) == "spoc-svc") {
if (!is.null(class.weights))
weightedC <- class.weights[weightlabels] * rep(C,
nclass(ret))
else weightedC <- rep(C, nclass(ret))
yd <- sort(y, method = "quick", index.return = TRUE)
xd <- matrix(x[yd$ix, ], nrow = dim(x)[1])
count <- 0
if (ktype == 4)
K <- kernelMatrix(kernel, x)
resv <- .Call("tron_optim", as.double(t(xd)), as.integer(nrow(xd)),
as.integer(ncol(xd)), as.double(rep(yd$x - 1,
2)), as.double(K), as.integer(if (sparse) xd#ia else 0),
as.integer(if (sparse) xd#ja else 0), as.integer(sparse),
as.integer(nclass(ret)), as.integer(count), as.integer(ktype),
as.integer(7), as.double(C), as.double(epsilon),
as.double(sigma), as.integer(degree), as.double(offset),
as.double(C), as.double(2), as.integer(0), as.double(0),
as.integer(0), as.double(weightedC), as.double(cache),
as.double(tol), as.integer(10), as.integer(shrinking),
PACKAGE = "kernlab")
reind <- sort(yd$ix, method = "quick", index.return = TRUE)$ix
alpha(ret) <- t(matrix(resv[-(nclass(ret) * nrow(xd) +
1)], nclass(ret)))[reind, , drop = FALSE]
coef(ret) <- lapply(1:nclass(ret), function(x) alpha(ret)[,
x][alpha(ret)[, x] != 0])
names(coef(ret)) <- lev(ret)
alphaindex(ret) <- lapply(sort(unique(y)), function(x)
which(alpha(ret)[,
x] != 0))
xmatrix(ret) <- x
obj(ret) <- resv[(nclass(ret) * nrow(xd) + 1)]
names(alphaindex(ret)) <- lev(ret)
svindex <- which(rowSums(alpha(ret) != 0) != 0)
b(ret) <- 0
param(ret)$C <- C
}
The important parts are two things, first, if we provide ksvm with our own kernel, then ktype=4 (while for vanillakernel, ktype=0) so it makes two changes:
in case of user-defined kernel, the kernel matrix is computed instead of actually using the kernel
tron_optim routine is ran with the information regarding the kernel
Now, in the svm.cpp we can find the tron routines, and in the tron_run (called from tron_optim), that LINEAR kernel has a separate optimization routine
if (param->kernel_type == LINEAR)
{
/* lots of code here */
while (Cpj < Cp)
{
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w,
Cpj, Cnj, param->eps, sii, param->shrinking,
param->qpsize);
/* lots of code here */
}
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
delete[] w;
}
else
{
Solver_B s;
s.Solve(l, BSVC_Q(*prob,*param,y), minus_ones, y, alpha, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
}
As you can see, the linear case is treated in the more complex, more detailed way. There is an inner optimization loop calling the solver many times. It would require really deep analysis of actual optimization being performed here, but at this step one can answer your question in a following way:
There is no error in your operation
kernlab's svm has a separate routine for training SVM with linear kernel, which is based on the type of kernel passed to the code, changing "kernel" to "vanillakernel" made the ksvm think it is actually working with vanillakernel, and so performed this separate optimization routine
It does not seem as a bug in fact, as the linear SVM is in fact very different from the kernelized version in terms of efficient optimization techniques. Amount of heuristic as well as numerical issues that has to be taken care of is really big. As a result, some approximations are required and can lead to the different results. While for the rich feature space (like those induced by RBF kernel) it should not really matter, for simple kernels line linear ones - this simplifications can lead to significant output changes.