the power regression by basicTrendline packages in R - r

I wanna perform a power regression model for my analysis in water quality.
In one of its sections I have these 2 series data:
Q= 0.7409845 1.2736854 0.0713900 1.5316926 1.4607059 0.6124793 1.5902551 1.7286422
1.6547936 1.6088377 1.6054299 1.7810355 1.4429110 1.1905836 2.2374064 1.3004641
1.7137979 1.6578471 1.6386083 1.0181250
Cl= 1.6863990 0.9932518 1.7749524 1.1631508 2.0918641 0.9162907 1.1631508 1.3862944
1.2809338 1.0647107 2.3978953 1.4350845 1.6677068 1.8245493 1.7578579 1.6677068
1.4816045 1.3862944 1.2527630 1.3862944
I want to set power regression like this that performed in Excel :
enter image description here
So show the equation and the square R and the regression line in the plot in R.
I just know the basicTrendline packages for this work
it works for the linear model but for power, No.
library(basicTrendline)
Q = log(gol$debi)
Cl = log(gol$Cl)
trendline(x = Q,y = Cl, model ="power2P",show.pvalue = F
,ePos.x = "topleft",eDigit = 3,CI.level = 0.95,xlab = "Q",ylab = "Cl"
,type="p",Pvalue.corrected = F)
It showed this message when I ran it, while none of my data is below zero:
Error in trendline_summary(x = x, y = y, model = model, Pvalue.corrected = Pvalue.corrected, :
'power2P' model need ALL x values greater than 0. Try other models.
Please help me with this packager other I just want to make something like excel that I send pictures of in R studio.

The results seem a bit different than the Excel version. Wondering if those "Logs" are being done to a different base?
plot(log(Cl) ~ log(Q) )
trendline <- lm(log(Cl)~log(Q))
abline(trendline)
trendline
#------
Call:
lm(formula = log(Cl) ~ log(Q))
Coefficients:
(Intercept) log(Q)
0.37431 -0.02898
#---------------
title(main="Log(Cl)~Log(C)")
Your Excel "logs" are all greater than 0, so it wouldn't be the case that Excel was using log-base-10. That would make the values even more negative. So how the log of 0.7 becomes positive is a great mystery. Maybe you need to explain or offer a citation to this "power regression" method. It doesn't look like standard statistics or mathematics.

Related

How to simulate the posterior filtered estimates of a Kalman Filter using the DSE package in R

How do I call for the posterior (refined) state estimates from a Kalman Filter simulation in R using the DSE package?
I have added an example below. Assume that I have created a simple random walk state space with the error being a standard normal distribution. The model is created using the SS function, with initialised state and covariance estimates of zero. The theoretical model form is thus:
X(t) = X(t-1) + e(t)~N(0,1) for state evolution
Y(t) = X(t) + w(t)~N(0,1)
We now implement this in R by following the instructions on page 6 and 7 of the "Kalman Filtering in R" article in the Journal of Statistical Software. First we create the state space model using the SS() function and store it in the variable called kalman.filter:
kalman.filter=dse::SS(F = matrix(1,1,1),
Q = matrix(1,1,1),
H = matrix(1,1,1),
R = matrix(1,1,1),
z0 = matrix(0,1,1),
P0 = matrix(0,1,1)
)
Then we simulate a 100 observations from the model form using simulate() and put them in a variable called simulate.kalman.filter:
simulate.kalman.filter=simulate(kalman.filter, start = 1, freq = 1, sampleT = 100)
Then we run the kalman filter against the measurements using l() and store it under the variable called test:
test=l(kalman.filter, simulate.kalman.filter)
From the outputs, which ones are my filtered estimates?
I have found the answer to this question.
Firstly, the filtered estimates of the model are not given in the l() function. This function only gives the one step ahead predictions. The above framing of my problem was coded as:
kalman.filter=dse::SS(F = matrix(1,1,1),
Q = matrix(1,1,1),
H = matrix(1,1,1),
R = matrix(1,1,1),
z0 = matrix(0,1,1),
P0 = matrix(0,1,1)
)
simulate.kalman.filter=simulate(kalman.filter, start = 1, freq = 1, sampleT = 100)
test=l(kalman.filter, simulate.kalman.filter)
The one step ahead predictions are given by:
predictions = test$estimates$pred
A quick way to visualize this is given by:
tfplot(test)
This allows you to quickly plot your one step ahead predictions against the actual data. To get your filtered estimates, you need to use the smoother() function, in the same dse package. It inputs the state model as well as the data, in this case it is kalman.filter and simulate.kalman.filter respectively. The output is smoothed estimates for all the time points. But note that it does this after considering the full data set, so it does not do this as each observation comes in. See code below. The first line of the code gives you your smoothed estimates, the following lines plot the example:
smooth = smoother(test, simulate.kalman.filter)
plot(test$estimates$pred, ylim=c(max(test$estimates$pred,smooth$filter$track,simulate.kalman.filter$outpu), min(test$estimates$pred,smooth$filter$track,simulate.kalman.filter$output)))
points(smooth$smooth$state, col = 3)
points(simulate.kalman.filter$output, col = 4)
The above plot plots all your actual data, model estimates and smoothed estimates against one another.

How to weight observations in mxnet?

I am new to neural networks and the mxnet package in R. I want to do a logistic regression on my predictors since my observations are probabilities varying between 0 and 1. I'd like to weight my observations by a vector obsWeights I have, but I'm not sure where to implement the weights. There seems to be a weight= option in mx.symbol.FullyConnected but if I try weight=obsWeights I get the following error message
Error in mx.varg.symbol.FullyConnected(list(...)) :
Cannot find argument 'weight', Possible Arguments:
----------------
num_hidden : int, required
Number of hidden nodes of the output.
no_bias : boolean, optional, default=False
Whether to disable bias parameter.
How should I proceed to weight my observations? Here is my code at the moment.
# Prepare data
train.mm = model.matrix(obs ~ . , data = train_data)
train_label = train_data$obs
# Normalize
train.mm = apply(train.mm, 2, function(x) (x-min(x))/(max(x)-min(x)))
# Create MXDataIter compatible iterator
batch_size = 128
train.iter = mx.io.arrayiter(data=t(train.mm), label=train_label,
batch.size=batch_size, shuffle=T)
# Symbolic model definition
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data=data, num.hidden=128, name='fc1')
act1 = mx.symbol.Activation(data=fc1, act.type='relu', name='act1')
final = mx.symbol.FullyConnected(data=act1, num.hidden=1, name='final')
logistic = mx.symbol.LogisticRegressionOutput(data=final, name='logistic')
# Run model
mxnet_train = mx.model.FeedForward.create(
symbol = logistic,
X = train.iter,
initializer = mx.init.Xavier(rnd_type = 'gaussian', factor_type = 'avg', magnitude = 2),
num.round = 25)
Assigning the fully connected weight argument is not what you want to do at any rate. That weight is a reference to parameters of the layer; i.e., what you multiply in the inputs by to get output values These are the parameter values you're trying to learn.
If you want to make some samples matter more than others, then you'll need to adjust the loss function. For example, multiply the usual loss function by your weights so that they do not contribute as much to the overall average loss.
I do not believe the standard Mxnet loss functions have a spot for assigning weights (that is LogisticRegressionOutput won't cover this). However, you can make your own cost function that does. This would involve passing your final layer through a sigmoid activation function to first generate the usual logistic regression output value. Then pass that into the loss function you define. You could do squared error, but for logistic regression you'll probably want to use the cross entropy function:
l * log(y) + (1 - l) * log(1 - y),
where l is the label and y is the predicted value.
Ideally, you'd write a symbol with an efficient definition of the gradient (Mxnet has a cross entropy function, but its for softmax input, not a binary output. You could translate your output to two outputs with softmax as an alternative, but that seems less easy to work with in this case), but the easiest path would be to let Mxnet do its autodiff on it. Then you multiply that cross entropy loss by the weights.
I haven't tested this code, but you'd ultimately have something like this (this is what you'd do in python, should be similar in R):
label = mx.sym.Variable('label')
out = mx.sym.Activation(data=final, act_type='sigmoid')
ce = label * mx.sym.log(out) + (1 - label) * mx.sym.log(1 - out)
weights = mx.sym.Variable('weights')
loss = mx.sym.MakeLoss(weigths * ce, normalization='batch')
Then you want to input your weight vector into the weights Variable along with your normal input data and labels.
As an added tip, the output of an mxnet network with a custom loss via MakeLoss outputs the loss, not the prediction. You'll probably want both in practice, in which case its useful to group the loss with a gradient-blocked version of the prediction so that you can get both. You'd do that like this:
pred_loss = mx.sym.Group([mx.sym.BlockGrad(out), loss])

Query about SVM classifier in R

I am working on a music data set where I have to classify the music data in to genres. I have both test and train data sets.
I have linked the datasets for you to check
here.
I am working in Rstudio
Here's the code I have written. I am a beginner and have no clue what I am doing. I am shooting arrows randomly. Let me know if you need more information.
The library used is :-
library("e1071")
The code :-
svm.model <- svm(GENRE ~ ., data = musictraindata, cost = 62.5, gamma = 0.5)
Now my problem is what to put in x parameter. I have put "GENRE" from train data set but it's giving me the following error.
Error in svm.default(x, y, scale = scale, ..., na.action = na.action) :
Need numeric dependent variable for regression.
Someone please guide me on what I should do. Thanks.
After corrections :-
I ran the code with the said corrections.I got an svm.model as follows :-
svm.model
Call:
svm(formula = factor(GENRE) ~ ., data = musictraindata, cost = 62.5, gamma = 0.5, type = "C-classification",
tolerance = 0.01)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 62.5
gamma: 0.5
Number of Support Vectors: 11880
Now I try to create a predict model by using it with test data.
svm.pred <- predict(svm.model,musictestdata)
When I plot the svm.pred, I get a graph as follows which is highly unlikely. Here it is:
This how I am suppose to proceed right ? Am I doing something wrong ?
Let me know.
Tough to say without a reproducible example, but I would confirm that the class of your dependent variable (Genre) is a factor, and doesn't have anything goofy going on like NA's. Check this with class(musictraindata$GENRE). Also worth a note, R is cap-sensitive so "Genre" and "GENRE" make a difference.
You can also try specficying the type of SVM you want to run by using
(type = "C-classification")
and see if it throws you a more helpful error?

Issues plotting count distribution displot()

I have count data. I'm trying to document my decision to use a negative binomial distribution rather than Poisson (I couldn't get a quasi-poisson dist. in lme4) and am having graphical issues (the vector is appended to the end of the post).
I've been trying to implement the distplot() function to inform my decision about which distribution to model:
here's the outcome variable (physician count):
plot(d1.2$totalmds)
Which might look poisson
but the mean and variance aren't close (the variance is doubled by two extreme values; but is still not anywhere near the mean)
> var(d1.2$totalmds, na.rm = T)
[1] 114240.7
> mean(d1.2$totalmds, na.rm = T)
[1] 89.3121
My outcome is partly population driven so I'm using the total population as an offset variable in preliminary models. This, as I understand it, divides the outcome by the natural log of the offset variable so totalmds/log(poptotal) is essentially what's being modeled. Which looks something like:
But when I try to model this using:
plot 1: distplot(x = d1.2$totalmds, type = "poisson")
plot 2: distplot(x = d1.2$totalmds, type = "nbinomial") # looks way off
plot 3: plot(fitdist(data = d1.2$totalmds, distr = "pois", method = "mle"))
plot 4: plot(fitdist(data = d1.2$totalmds, distr = "nbinom", method = "mle")) # throws warnings
plot 5: qqcomp(fitdist(data = d1.2$totalmds, distr = "pois", method = "mle"))
plot 6: qqcomp(fitdist(data = d1.2$totalmds, distr = "nbinom", method = "mle")) # throws warnings
Does anyone have suggestions for why the following plots look a little screwy/inconsistent?
As I mentioned I'm using another variable as an offset variable in my actual analysis, if that makes a difference.
Here's the vector:
https://gist.github.com/timothyslau/f95a777b713eb33a2fe6
I'm fairly sure NB is better than poisson since var(d1.2$totalmds)/mean(d1.2$totalmds) # variance-to-mean ratio (VMR) > 1
But if NB is appropriate the plots should look a lot cleaner (I think, unless I'm doing something wrong with these plotting functions/packages).

gwr fitting using package mgcv and R2Bayesx in R

I want to compare GWR fittings produced between spgwr and mgcv, but I got a error with gam function of mgcv . Here is a example :
require(spgwr)
require(mgcv)
require(R2BayesX)
data(columbus)
col.bw <- gwr.sel(crime ~ income + housing, data=columbus,verbose=F,
coords=cbind(columbus$x, columbus$y))
col.gauss <- gwr(crime ~ income + housing, data=columbus,
coords=cbind(columbus$x, columbus$y),
bandwidth=col.bw, hatmatrix=TRUE)
#gwr fitting with Intercept
col.gam<-gam(crime ~s(x,y)+s(x,y)*income+s(x,y)*housing, data=columbus)#mgcv ERROR
b1<-bayesx(crime ~sx(x,y)+sx(x,y)*income+sx(x,y)*housing, data=columbus)#R2Bayesx ERROR
Question:
How to fit the same gwr using gam and bayesx function(the smooth functions of location )
How to control the parameters to be similiar as possible including optimal bandwidth
The mgcv error comes from the factor that you are specifying the "interactions" between the spatial smooth and variables income and housing. Read ?gam.models for details on using by terms. I think for this you need
col.gam <- gam(crime ~s(x,y, k = 5) + s(x,y, by = income, k = 5) +
s(x,y, by = housing, k = 5), data=columbus)
In this example, as there are only 49 observations, you need to restrict the dimensions of the basis functions, which I do here with k = 5, but you should investigate whether you need to vary these a little, within the constraints of the data.
By the looks of the error from bayesx, you have the same issue of specifying the model incorrectly. I'm not familiar with bayesx(), but it looks like it uses the same s() function as supplied with mgcv, so the model specification should be the same as I show above.
As for 2. can you expand on what you mean here Comparable getween gam() and bayesx() or getting both or one of these comparable with the spgwr() model?

Resources