I have a dataset with over 10 categorical variables and about 20 numerical ones. I'm trying to edit Stef van Buuren's mice.impute.logreg function which is available on github, to call glm.fit(), but with a higher maxit value to try to reach convergence. However, on running the code as is, I get the following error:
Error: Only strings can be converted to symbols
and it comes from this line in the code:
rv <- t(chol(sym(fit.sum$cov.unscaled)))
I went ahead to print out the content of fit.sum$cov.unscaled, and got a huge covariance matrix(?) with all variables (categorical ones kinda one-hot-encoded(?)), something like this, but way larger:
Proteinuria22 Proteinuria23 Proteinuria24 Proteinuria25 Aetiol22 Aetiol23 Aetiol24
-0.0775687218 6.603074e-02 6.995692e-01 -1.0462947407 -1.990400e-01 -3.756997e+01 -6.198267e-01
Weight2 -0.0003022753 6.802872e-04 -1.138967e-03 -0.0043737786 2.550278e-04 3.380858e-02 6.343819e-04
Height2 0.0174235854 -8.945169e-02 -2.588742e-01 0.2947104430 -1.763788e-01 2.027542e+00 -3.676413e-02
BMI22 0.0038176385 -2.246294e-02 3.529623e-02 0.0507158023 -1.959203e-03 1.515110e+00 3.618223e-02
BMI23 0.0463573025 4.600740e-02 1.210799e-01 0.1009359117 6.368376e-03 7.268413e-01 -4.677462e-03
BMI24 0.0230542190 4.822956e-02 1.424563e-01 0.2136974371 -7.688207e-02 -4.099045e+00 -4.920604e-02
Proteinuria21 0.2564365948 2.399999e-01 2.869407e-01 0.2866854741 -3.345524e-02 7.021764e+00 -1.380307e-02
Proteinuria22 0.5114421153 2.658057e-01 2.444392e-01 0.2575295706 -5.555202e-02 2.132465e+00 -2.367527e-02
Proteinuria23 0.2658056994 8.278569e-01 2.805812e-01 0.1743841777 -5.433797e-02 -5.289189e+00 -1.905688e-02
Proteinuria24 0.2444391680 2.805812e-01 5.436426e-01 0.2272864202 -4.551615e-02 2.533664e+00 -1.962130e-02
Proteinuria25 0.2575295706 1.743842e-01 2.272864e-01 1.1656567100 -7.355628e-02 9.412580e+00 -1.330318e-01
Aetiol22 -0.0555520221 -5.433797e-02 -4.551615e-02 -0.0735562813 4.327236e-01 4.698377e+00 1.196196e-01
Aetiol23 2.1324651321 -5.289189e+00 2.533664e+00 9.4125804535 4.698377e+00 1.175992e+04 2.984111e+00
Since I'm still not very conversant with r, I really have no idea what this means... I understand that sym() is used to convert a string to a symbol, but I don't understand how (or why) such a huge matrix would be converted into a symbol. Any ideas, please?
Thanks to pointers from #arun's comment, I discovered that I only needed to remove the sym() function, given the use of the surrounding chol function:
Compute the Choleski factorization of a real symmetric positive-definite square matrix.
I'm yet to figure out why the code author put the sym() function there in the first place, though, since the code apparently breaks with it, but works fine without it.
Related
Hi I have the following code in Scilab:
>Tc=0;
>Tm=1;
>Tf=1800;
>t=(Tc:1:Tf)';
where t is a vector of 1800 components.
And I am asked to do a piecewise function that satisfies certain conditions,
My first try was to do something on the line of
> function vg=simula_vg(t,Tcg,Tfg,Ag)
> if (t<Tcg | t>Tfg) then
> vg=0;
> else
> vg=Ag*Ag*(1-cos(2*%pi*(t-Tcg)/(Tfg-Tcg)));
>end
>endfunction
But it doesnt work as I am asking it to compare vector and scalars.
Then I tried to write this
>for i=[Tc:1:Tf]
>function vg=simula_vg(t,Tcg,Tfg,Ag)
> vg(t<Tcg)=0
>vg(t>Tfg)=0
>vg((Tcg<=t)&(t<=Tfg))=sin(t(i))
>endfunction
>end
But I doesnt work either and I have run out of ideas, is there anything else I can do? All the variables are well defined
>vm=10;
>Ag=2;
>Tcg=200;
>Tfg=400;
>Ar=2;
>Tcr=1000;
>Tfr=1500;
>As=2;
>fs=0.0008;
>h=20;
>d=0.6;
There are more because there are more functions similar to that one that I have to define and I dont know how. Any suggestions on how to do it?
You can do it like this, where the zero values are defined afterwards:
function vg=simula_vg(t,Tcg,Tfg,Ag)
vg=Ag*Ag*(1-cos(2*%pi*(t-Tcg)/(Tfg-Tcg)));
vg(t<Tcg|t>Tfg)=0;
endfunction
Ag=2;
Tcg=200;
Tfg=400;
Tc=0;
Tm=1;
Tf=1800;
t=Tc:1:Tf;
vg = simula_vg(t,Tcg,Tfg,Ag);
plot(t,vg)
I trying to plot vectors of electric field in scilab. But it always error :
champ: Wrong size for input arguments: Incompatible sizes.
the code:
epsilon0=1e-9/(36*%pi);
q=3e-9;
p=[-1,0,0];
x=-2:0.2:2;
y=-2:0.2:2;
[px,py]=meshgrid(x,y);
for m=1:length(x),
for n=1:length(y),
xp=px(m,n);
yp=py(m,n);
vektorr1x=xp-p(1);
vektorr1y=yp-p(3);
r1=sqrt(vektorr1x^2+vektorr1z^2);
if r1~=0 then
ar1x=vektorr1x/r1;
ar1y=vektorr1y/r1;
E1x=q*ar1x/(4*%pi*epsilon0*r1^2);
E1y=q*ar1y/(4*%pi*epsilon0*r1^2);
else
E1x=0;
E1y=0;
end,
end,
end,
pl=champ(px,py,E1x,E1y,[-2,-1,2,-1]);
You don't have to use loops, the following script does what you want:
epsilon0=1e-9/(36*%pi);
q=3e-9;
p=[-1,0,0];
x=-2:0.2:2;
y=-2:0.2:2;
[px,py]=ndgrid(x,y);
vektorr1x=px-p(1);
vektorr1y=py-p(3);
r1=sqrt(vektorr1x.^2+vektorr1y.^2);
ar1x=vektorr1x./r1;
ar1y=vektorr1y./r1;
E1x=q*ar1x./(4*%pi*epsilon0*r1.^2);
E1y=q*ar1y./(4*%pi*epsilon0*r1.^2);
E1x(r1==0)=0;
E1y(r1==0)=0;
clf
champ(x,y,E1x,E1y,[-2,-1,2,-1]);
To plot fields don't use meshgrid to sample the domain use ngrid instead. Moreover, don't forget to use dot-prefixed operators.
I have a sequence encoded in a string, but one type of step in this sequence is entirely conditional on a previous step.
When this occurs, I'd like to remove the previous step.
For example, in the case:
"alpha_i, bravo_i, alpha_i, alpha_c, charlie_i, bravo_i, bravo_c,
alpha_i, delta_c"
those steps where a *_c event occurs directly after an *_i event, I'd like to have the *_i event removed, the desired result being:
"alpha_i, bravo_i, alpha_c, charlie_i, bravo_c, alphai_i,
delta_c"
In other words,
"alpha_i, alpha_c" goes to just "alpha_c"
"bravo_i, bravo_c" goes to just "bravo_c",
but we do not change "alpha_i, delta_c" because they are a different event name.
I think the syntax would use the gsub function, but I don't know how to match the prefixed term either side of the comma, and would appreciate some help.
*In addition to the point raised below; yes there will be many different examples of event names, not just the two being replaced here.
Try this:
wds <- c("alpha_i", "bravo_i", "alpha_i", "alpha_c", "charlie_i", "bravo_i", "bravo_c", "alpha_i", "delta_c")
wds[cumsum(rle(as.character(substr(wds, 1, gregexpr('_', wds))))$lengths)]
Alternatively, if your vector is of length 1, try this:
wds <- c("alpha_i, bravo_i, alpha_i, alpha_c, charlie_i, bravo_i, bravo_c, alpha_i, delta_c")
wds_split <- unlist(strsplit(wds, ', '))
wds_split[cumsum(rle(as.character(substr(wds_split, 1, gregexpr('_', wds_split))))$lengths)]
I have some problems with the viterbiTraining function from the HMM package.
I tried using it on a pretty straightforward hmm and a vector of observations.
Here's the code:
Emisije<-rep("IntervalC",length(Cl1.res))
Emisije[IntervalA[,1]]<-"IntervalA"
Emisije[IntervalB[,1]]<-"IntervalB"
The Emisije vector looks like this:
head(Emisije)
[1] "IntervalA" "IntervalA" "IntervalA" "IntervalC" "IntervalB" "IntervalA"
startProbs<-c(0.6873065,0.3126935)
transProbs<-matrix(c(0.8, 0.7, 0.2,0.3),ncol=2)
emissionProbs<-matrix(rep(1/3,6),ncol=3)
stanji<-initHMM(c("NizkaVar", "VisokaVar"), c("IntervalA", "IntervalB",
"IntervalC"), startProbs, transProbs, emissionProbs)
After running this everything works, except for the viterbiTraining function, which gives the following result:
viterbiTraining(stanji,Emisije)
Error in if (d < delta) { : missing value where TRUE/FALSE needed
Even the similar function baumWelch, which takes the exact same parameters, works without errors, so I really don't understand what's wrong here.
Can anyone please explain to me what I am doing wrong? Thank you in advance.
I got warnings when running this code.
For example, when I put
tm1<- summary(tmfit)[c(4,8,9)],
I can get the result, but I need to run this code for each $i$.
Why do I get this error?
Is there any way to do this instead of via a for loop?
Specifically, I have many regressants ($y$) with the same two regressors ($x$'s).
How I can get these results of regression analysis(to make some comparisons)?
dreg=read.csv("dayreg.csv")
fundr=read.csv("fundreturnday.csv")
num=ncol(fundr)
exr=dreg[,2]
tm=dreg[,4]
for(i in 2:num)
{
tmfit=lm(fundr[,i]~exr+tm)
tm1[i]<- summary(tmfit)[c(4,8,9)]
}
Any help is highly appreciated
Try storing your result into a list instead of a vector.
dreg=read.csv("dayreg.csv")
fundr=read.csv("fundreturnday.csv")
num=ncol(fundr)
exr=dreg[,2]
tm = list()
for(i in 2:num)
{
tmfit=lm(fundr[,i]~exr+tm)
tm1[[i]]<- summary(tmfit)[c(4,8,9)]
}
You can look at an element in the list like so
tm1[[2]]