In R and ggplot2 package, How to Add Lines? - r

I am doing a survival analysis and have produced the survival graph using
plot function to plot the Kaplan-Meier(KA variable) estimate as y value against time.
lines function to plot the step lines between estimate i and i+1, for each i=1,2,....
The code is as follows:
plot(KA)
for( i in 1:(length(KA)-1)){
lines(c(i,i+1),c(KA[i],KA[i])) # The horizontal step lines
lines(c(i+1,i+1),c(KA[i],KA[i+1])) # The vertical step lines
}
Now I want to make a more beautiful survival graph using ggplot2 package.
The question is: how to add the step lines into the graph?
I'm sorry, I can not put graphs as my reputation is less than 10.

Have a look at either geom_step, geom_path or geom_segment.
They might be helpful in what you are trying to achieve.
http://had.co.nz/ggplot2/geom_step.html
http://had.co.nz/ggplot2/geom_path.html
http://had.co.nz/ggplot2/geom_segment.html

Related

How to plot multiple curves on same plot in julia?

I am using julia to build linear regression model from scratch. After having done all my mathematical calculations, I need to plot a linear regression graph
I have a scatter plot and linear fit (Linear line) plots separately ready, How do I combine them or use my linear fit plot on scatter plot?
Basically, how do I draw multiple plots on a single plot in Julia?
Note: Neither do I know python or R
x = [1,2,3,4,5]
y = [2,3,4,5,6]
plot1 = scatter(x,y)
plot2 = plot(x,y) #line plot
#plot3 = plot1+plot2 (how?)
Julia doesn't come with one built-in plotting package, so you need to choose one. Popular plotting packages are Plots, Gadfly, PyPlot, GR, PlotlyJS and others. You need to install them first, and with Plots you'll also need to install a "backend" package (e.g. one of the last three mentioned above).
With Plots, e.g., you'd do
using Plots; gr() # if GR is the plotting "backend" you've chosen
scatter(point_xs, point_ys) # the points
plot!(line_xs, line_ys) # the line
The key here is the plot! command (as opposed to plot), which modifies an existing plot rather than creating a new one.
More simply you can do
scatter(x,y, smooth = true) # fits the trendline automatically
See also http://docs.juliaplots.org/latest/
(disclaimer: I'm associated with Plots - others may give you different advice)

How to smooth non-linear regression curve in R

So I'm asked to obtain an estimate theta of the variable Length in the MASS package. The code I used is shown below, as well as the resulting curve. Somehow, I don't end up with a smooth curve, but with a very "blocky" one, as well as some lines between points on the curve. Can anyone help me to get a smooth curve?
utils::data(muscle,package = "MASS")
Length.fit<-nls(Length~t1+t2*exp(-Conc/t3),muscle,
start=list(t1=3,t2=-3,t3=1))
plot(Length~Conc,data=muscle)
lines(muscle$Conc, predict(Length.fit))
Image of the plot:
.
Edit: as a follow-up question:
If I want to more accurately predict the curve, I use nonlinear regression to predict the curve for each of the 21 species. This gives me a vector
theta=(T11,T12,...,T21,T22,...,T3).
I can create a for-loop that plots all of the graphs, but like before, I end up with the blocky curve. However, seeing as I have to plot these curves as follows:
for(i in 1:21) {
lines(muscle$Conc,theta[i]+theta[i+21]*
exp(-muscle$Conc/theta[43]), col=color[i])
i = i+1
}
I don't see how I can use the same trick to smooth out these curves, since muscle$Conc still only has 4 values.
Edit 2:
I figured it out, and changed it to the following:
lines(seq(0,4,0.1),theta[i]+theta[i+21]*exp(-seq(0,4,0.1)/theta[43]), col=color[i])
If you look at the output of cbind(muscle$Conc, predict(Length.fit)), you'll see that many points are repeated and that they're not sorted in order of Conc. lines just plots the points in order and connects the points, giving you multiple back-and-forth lines. The code below runs predict on a unique set of ordered values for Conc.
plot(Length ~ Conc,data=muscle)
lines(seq(0,4,0.1),
predict(Length.fit, newdata=data.frame(Conc=seq(0,4,0.1))))

How to hide the undrlying data in a ggplot scatterplot

I am trying to make a scatter plot in ggplot and would like to loose the points and show only the smooth line and the confidence interval. I checked geom_point() function and there is no option for turning it off such that the points/underlying data is hidden. Any suggestions? much appreciated.
Joseph
To plot smooth line with confidence interval around the line you should use geom_smooth(). This will smoothed line using loess if there are less than 1000 observations and gam if more. But you can change smoothing method with argument method=.
ggplot(mtcars,aes(wt,mpg))+geom_smooth()

plot least squared line in a scatter plot

Having created a scatter plot with pandas. I don't know how to create the regresion line that would be the least squared from the points.
looking for examples in http://matplotlib.org i haven't found any similar graph.
Thanks you a lot in advance !!
Pandas has an ordinary least squares (ols) function, there is a very detailed example in the 0.10.1 docs of how to plot the result, here's a snippet:
model = ols(y=rets['AAPL'], x=rets.ix[:, ['GOOG']], window=250)
# just plot the coefficient for GOOG
model.beta['GOOG'].plot()
Note: this example is no longer in the docs (since 0.10.1), I'm not sure why.

Problem with axis limits when plotting curve over histogram [duplicate]

This question already has an answer here:
How To Avoid Density Curve Getting Cut Off In Plot
(1 answer)
Closed 6 years ago.
newbie here. I have a script to create graphs that has a bit that goes something like this:
png(Test.png)
ht=hist(step[i],20)
curve(insert_function_here,add=TRUE)
I essentially want to plot a curve of a distribution over an histogram. My problem is that the axes limits are apparently set by the histogram instead of the curve, so that the curve sometimes gets out of the Y axis limits. I have played with par("usr"), to no avail. Is there any way to set the axis limits based on the maximum values of either the histogram or the curve (or, in the alternative, of the curve only)?? In case this changes anything, this needs to be done within a for loop where multiple such graphs are plotted and within a series of subplots (par("mfrow")).
Inspired by other answers, this is what i ended up doing:
curve(insert_function_here)
boundsc=par("usr")
ht=hist(A[,1],20,plot=FALSE)
par(usr=c(boundsc[1:2],0,max(boundsc[4],max(ht$counts))))
plot(ht,add=TRUE)
It fixes the bounds based on the highest of either the curve or the histogram.
You could determine the mx <- max(curve_vector, ht$counts) and set ylim=(0, mx), but I rather doubt the code looks like that since [] is not a proper parameter passing idiom and step is not an R plotting function, but rather a model selection function. So I am guessing this is code in Matlab or some other idiom. In R, try this:
set.seed(123)
png("Test.png")
ht=hist(rpois(20,1), plot=FALSE, breaks=0:10-0.1)
# better to offset to include discrete counts that would otherwise be at boundaries
plot(round(ht$breaks), dpois( round(ht$breaks), # plot a Poisson density
mean(ht$counts*round(ht$breaks[-length(ht$breaks)]))),
ylim=c(0, max(ht$density)+.1) , type="l")
plot(ht, freq=FALSE, add=TRUE) # plot the histogram
dev.off()
You could plot the curve first, then compute the histogram with plot=FALSE, and use the plot function on the histogram object with add=TRUE to add it to the plot.
Even better would be to calculate the the highest y-value of the curve (there may be shortcuts to do this depending on the nature of the curve) and the highest bar in the histogram and give this value to the ylim argument when plotting the histogram.

Resources