Hi guys so my problem is possibly either a stats or a programming issue. I have two xts time series of mostly overlapping time periods and I'm simply plotting a regression of their log differences:
logdiff <- merge.xts(diff(log(ts1)),diff(log(ts2)))
plot(logdiff[,1],logdiff[,2])
abline(lm(logdiff[,1]~logdiff[,2]),col=2)
which gives me this plot
So just on an intuitive level I would rather the regression line fit the wider range of data points even if this result its giving me is technically the correct one on a least squares basis. Is there any inbuilt capability to do this "broader regression" or do i have to resort to manual fudging?
I think you are plotting y as a function of x, but regressing x as a function of y.
Try abline(lm(logdiff[,2]~logdiff[,1]),col=2) -- and yes, using column names instead of indices is a good idea.
Related
I'm trying to plot some measurements over time with a best-fit curve, using Julia's plotting tools. There's already a fairly good answer how to do this with a simple numeric data series:
Julia: How to find best fit curve / equation when I have a plot?
However, these tools (well - at least the Polynomials package) do not like being given Date values. I imagine some sort of conversion from Dates into a numeric value needs to be done, but I want the resulting plots to retain their scales in date units, which would need a conversion back again.
I suspect this is a fairly common problem and there must be an elegant way to solve it, but being new to Julia I would like to ask for some pointers to what a good solution looks like?
Thanks!
I'm completely new to R, so apologies for asking something I'm sure must be basic. I just wonder if I can use the nls() command in R to fit a non-linear curve to a data structure where I have means and sd's, but not the actual replicates. I understand how to fit a curve to single data points or to replicates, but I can't see how to proceed when I have a mean+sd for each data point and I want R to consider variation in my data when fitting.
One possible way to go would be to simulate data using your means and standard deviations and do the regression with the simulated data. Doing this a number of times could give you a good impression on the margin of plausible values for your regression coefficients.
First, I gathered from this link Applying a function to multiple columns that using the "function" function would perhaps do what I'm looking for. However, I have not been able to make the leap from thinking about it in the way presented to making it actually work in my situation (or really even knowing where to start). I'm a beginner in R so I apologize in advance if this is a really "newb" question. My data is a data frame that consists of an event variable (tumor recurrence) and a time variable (followup time/time to recurrence) as well as recurrence risk factors (t-stage, tumor size,age at dx, etc.). Some risk factors are categorical and some are continuous. I have been running my univariate analysis by hand, one at a time like this example univariateageatdx<-coxph(survobj~agedx), and then collecting the data. This gets very tedious for multiple factors and doing it for a few different recurrence types. I figured there must be a way to code such that I could basically have one line of code that had the coxph equation and then applied it to all of my variables of interest and spit out a result that had the univariate analysis results for each factor. I tried using cbind to bind variables (i.e x<-cbind("agedx","tumor size") then running cox coxph(recurrencesurvobj~x) but this of course just did the multivariate analysis on these variables and didn't split them out as true univariate analyses.
I also tried the following code based on a similar problem that I found on a different site, but it gave the error shown and I don't know quite what to make of it. Is this on the right track?
f <- as.formula(paste('regionalsurvobj ~', paste(colnames(nodcistradmasvssubcutmasR)[6-9], collapse='+')))
I then ran it has coxph(f)
Gave me the results of a multivariate cox analysis.
Thanks!
**edit: I just fixed the error, I needed to use the column numbers I suppose not the names. Changes are reflected in the code above. However, it still runs the variables selected as a multivariate analysis and not as the true univariate analysis...
If you want to go the formula-route (which in your case with multiple outcomes and multiple variables might be the most practical way to go about it) you need to create a formula per model you want to fit. I've split the steps here a bit (making formulas, making models and extracting data), they can off course be combined this allows you to inspect all your models.
#example using transplant data from survival package
#make new event-variable: death or no death
#to have dichot outcome
transplant$death <- transplant$event=="death"
#making formulas
univ_formulas <- sapply(c("age","sex","abo"),function(x)as.formula(paste('Surv(futime,death)~',x))
)
#making a list of models
univ_models <- lapply(univ_formulas, function(x){coxph(x,data=transplant)})
#extract data (here I've gone for HR and confint)
univ_results <- lapply(univ_models,function(x){return(exp(cbind(coef(x),confint(x))))})
I'm trying to plot curves for a data set with a large number of different groups. I want to visualize the curves all together on one graph fit to a common model (stat_smooth with a glm with a quasipoisson error), so, I'm using color to group them. However, for some curves, the fitting function borks out and I get
Error: no valid set of coefficients has been found: please supply starting values
And then there is no plot.
Is there a way to have the plot come up without the curves for those "bad" groups? I ask as there are a huge number of groups, and while I could write an error-check script to then kick them out of the data, it would be nicer if everything but those with an error would plot.
I don't think there's a very easy way to do this, but here's what I would try:
Write a loop or an ldply statement to run the model you have in mind, wrapped in try: e.g.
trymodelList <- ldply(mydata,.(grp1,grp2),glm,formula=y~x,family="quasipoisson")
(I think that the current data chunk should get filled in automatically as the data argument).
Figure out which ones were bad: something like alply(trymodelList,inherits,what="try-error")
Use this logical vector to subset out the groups you don't want, then pass the subsetted data to geom_smooth instead of the full data set.
I know there are a few details left out ...
edit: I see that I've essentially written down your "write an error-check script ... then kick them out of the data" strategy. Sorry, I don't think there's an easier way to do this. You might try the ggplot users' list ...
I have data points that represent a logarithmic function.
Is there an approach where I can just estimate the function that describes this data using R?
Thanks.
I assume that you mean that you have vectors y and x and you try do fit a function y(x)=Alog(x).
First of all, fitting log is a bad idea, because it doesn't behave well. Luckily we have x(y)=exp(y/A), so we can fit an exponential function which is much more convenient. We can do it using nonlinear least squares:
nls(x~exp(y/A),start=list(A=1.),algorithm="port")
where start is an initial guess for A. This approach is a numerical optimization, so it may fail.
The more stable way is to transform it to a linear function, log(x(y))=y/A and fit a straight line using lm:
lm(log(x)~y)
If I understand right you want to estimate a function given some (x,y) values of it. If yes check the following links.
Read about this:
http://en.wikipedia.org/wiki/Spline_%28mathematics%29
http://en.wikipedia.org/wiki/Polynomial_interpolation
http://en.wikipedia.org/wiki/Newton_polynomial
http://en.wikipedia.org/wiki/Lagrange_polynomial
Googled it:
http://www.stat.wisc.edu/~xie/smooth_spline_tutorial.html
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/smooth.spline.html
http://www.image.ucar.edu/GSP/Software/Fields/Help/splint.html
I never used R so I am not sure if that works or not, but if you have Matlab i can explain you more.