How can one display multiple overlaid lines? - graph

I'm working in Stata 13.1. Is there any way to plot 3 different predicted_responses (with different colour) for every hour (24,48,72), in the same plot? With my code I receive 3 graphs but I want only one.
My code:
gr7 predicted_response Response Concentration if Hours==24, xlab ylab c(l.) s(iO)
gr7 predicted_response Response Concentration if Hours==48, xlab ylab c(l.) s(iO)
gr7 predicted_response Response Concentration if Hours==72, xlab ylab c(l.) s(iO)

This is a fairly literal translation, with some extra bells and whistles.
line predicted_response Concentration if Hours==24, lc(red) sort
|| line predicted_response Concentration if Hours==48, lc(blue) sort
|| line predicted_response Concentration if Hours==72, lc(black) sort
|| scatter Response Concentration if Hours==24, mc(red)
|| scatter Response Concentration if Hours==48, mc(blue)
|| scatter Response Concentration if Hours==72, mc(black)
legend(order(4 "24 hours" 5 "48 hours" 6 "72 hours"))
ytitle(Observed and predicted response)
Note also the separate command and what it allows.

twoway scatter x y if z == 1, c(L) || ///
scatter x y if z == 2, c(L) || ///
scatter x y if z == 3, c(L) || ///
gr7 is not compatible with twoway but you should be able to use scatter or line instead.
Note: gr7 is an outdated Stata command that you shouldn't rely on; learn to use the standard graphing commands; see help graph.

Related

Passing labels to Plots.jl histogram

I am new to Julia and was wondering how to pass labels to the Histogram function in Plots.jl package.
using Plots
gr()
histogram(
data[:sentiment_labels],
title = "Hstogram of sentiment labels",
xlabel = "Sentiment",
ylabel = "count",
label = ["Negative" "Positive" "Neutral"],
fillcolor = [:coral,:dodgerblue,:slategray]
)
Only the first labels "Negative" appears in the plot.
So the short answer is: there's only one label in your plot because there's only one data series in your plot - a histogram only plots one data series, which has one label attached to it. It might seem a bit unusual that you get multiple colours but only one legend, so I'll break down why that happens as it's instructive and a frequent source of confusion for Plots.jl users I believe:
It is a bit of a coincidence that you are getting three different colours for the bars you are plotting. What happens here is that you are providing a single color argument that is cycled through for the bars in the histogram. You can see this if you provide more colours to your histogram call:
using Plots
sentiment_labels = [fill(-1, 200); fill(0, 700); fill(1, 100)]
histogram(
sentiment_labels,
fillcolor = [:coral, :red, :green, :dodgerblue, :slategray]
)
gives:
What's happening here? We have provided five colours, and it turns out that your histogram only has a bar every five increments (there are bins between -1, 0, and 1, it's just that there are zero observations in those bins). Therefore every fifth bar has the same colour, and with the zero bars disappearing, we only end up with one colour visible in the plot.
Another way of seeing this is having data that's more continuous than your sentiment labels:
cont_data = rand(1_000)
histogram(
cont_data,
fillcolor = [:coral, :red, :green, :dodgerblue, :slategray]
)
gives:
So actually there's only one colour argument passed in here. The crucial difference between colours and labels in your histogram call is that one is a row, the other a column vector:
julia> ["Negative" "Neutral" "Positive"]
1Ɨ3 Array{String,2}:
"Negative" "Neutral" "Positive"
julia> [:coral, :slategrey, :dodgerblue]
3-element Array{Symbol,1}:
:coral
:slategrey
:dodgerblue
Plots will interpret the first of these as applying to three different series ("Negative" is the label for the first series, "Neutral" for the second, "Positive" for the third), while it interprets the second as applying to one series only (so :coral, :slategrey, :dodgerblue are all colours for the first series passed in. This is quite a subtle distinctions in Plots.jl, which often trips people up (me included!)
To get three labels, you should therefore have three series for which you plot histograms. One way of doing this is to split your vector of sentiment labels into three vectors:
histogram(
[filter(x -> x == y, sentiment_labels) for y āˆˆ -1:1],
fillcolor = [:coral :dodgerblue :slategray],
label = ["Negative" "Positive" "Neutral"]
)
gives:
Although I would probably argue that in your case a histogram isn't the right tool - if your labels are only ever going to be negative, neutral and positive, a simple bar chart will do, as you don't need the automatic binning functionality that a histogram provides. So I would probably do:
bar(
title = "Count of sentiment labels",
xlabel = "Sentiment",
ylabel = "count",
[-1 0 1], [[sum(sentiment_labels .== x)] for x āˆˆ -1:1],
label = ["Negative" "Positive" "Neutral"],
fillcolor = [:coral :dodgerblue :slategray],
linecolor = [:coral :dodgerblue :slategray],
xticks = -1:1
)
to get:

How to add baseline prevalence in survival/cumulative incidence curve in R

I have a approx. 40 years' follow-up data from a cohort where I want to calculate and plot cumulative incidence of an event (DM) in five groups (Clustnumner) and want to show them in same plot. I have made an initial survival curve using the code
fit = survfit(Surv(Folowup_time, DM_inc) ~ as.factor(Clustnumber), data=Co_followUp and then
plot(fit, conf.int=F, xlab = "Time in years", ylab = "Survival probability") to get the following survival curve. Each line represent one group.
I converted it to cumulative incidence plot using code plot(fit, conf.int=F, fun = function(x) 1-x, xlab = "Time in years", ylab = "Cumulative incidence")and got following plot
Question My question is if for the said event 'DM' that is showing incidence of event over time, I have another column (DM_B) that is also showing it's prevalence at baseline (at follow-up start date) and I want to show that prevalence in the plot e.g., say I don't want my cumulative incidence plot to start from zero and instead I want a line to start from 0.3 to show that 30% individual had prevalent event at baseline when the follow-up started, how do I go with it to get similar graphs? A help will be appreciated as I am really struggling with it :(

change axis/scale for time series plot after forecast

I'm struggling with changing the x-axis (time) for my time series forecast plot. I have ran many models but I am struggling with the same issue. I'm going to write the code for the model fit, forecast and the plot here for one of the models. First here is my original time series. Note: I'm fitting my model on my training data that is from 2008-2016 and testing my model on my test data for the 11 months in 2017.
Data Split.
sal.ts <- window(sal.ts.original, start=c(2008,1), end=c(2016,12))
sal.test <- window(sal.ts.original, start=c(2017,1))
Now, the model.
sal.hw.mul <- HoltWinters(sal.ts, seasonal = "mult")
sal.hw.mul
fc.hwm <- forecast(sal.hw.mul, h=11)
fc.hwm
plot(fc.hwm, xlim=c(2017,2017+11/12), main = "Forecast from Mutltiplicative HW", xlab = "Year", ylab = "Total Sales, $M")
lines(sal.test,col='red', lwd=2)
legend("topleft", c("Actual", "Predicted"), col = c(4,2), lty = 1)
Here's my forecast plot:
See that ugly 2017.0, 2017.2.... 2017.8? I want it to instead say 1,2,3,....11 for the 11 months of 2017.
Yes, I only want to plot my test data and forecast on it and not the whole series.
I am pretty sure my problem is around my use of the xlim function. I am using that xlim function to just plot the months of 2017 and if I don't use that then R plots the whole series from 2008-2017. I tried to play around with the axis function a lot by setting xaxt="n" in the plot command but still couldn't figure it out.
Let me know if you need more information from me. Any help will be appreciated.
Update, on someone's suggestion I tried to write a custom axis by setting xaxt = 'n' in my plot. Here's the change in code.
x <- seq(1,11,1)
fc.hwm <- forecast(sal.hw.mul, h=11)
fc.hwm
layout(1:1)
plot(fc.hwm, xaxt='n', xlim=c(2017,2017+11/12), main = "Forecast from Mutltiplicative HW", xlab = "Year", ylab = "Total Sales, $M")
axis(side=1, at= x, labels=c("1","2","3","4","5","6","7","8","9","10","11"))
lines(sal.test,col='red', lwd=2)
legend("topleft", c("Actual", "Predicted"), col = c(4,2), lty = 1)
Like you can see. It gets me there half way. I can remove my current axis label but I am not being able to write a new axis. This new code is not even giving me an error or else I would've tried to debug it. It accepts my code but doesn't give me the desired output.
Here's an idea. I'm not sure what the data look like, but I'm guessing that you have a Date type for the date variable -- and that means that your "by" sequence of integer 1 to 11 might be placing those new labels outside the plot limits. Try using a Date sequence instead.
Change this:
x <- seq(1,11,1)
To something like this:
x <- seq.Date(as.Date("2017-01-01"), as.Date("2017-11-01"), "months")
I'm not sure how far into November your data go, so you might want to set that "to" Date in the sequence to December instead, so you can fully cover your November data points.

Understanding the Local Polynomial Regression

Could someone explain me why I get different lines when I plot? Somehow I thought the line should be the same
data(aircraft)
help(aircraft)
attach(aircraft)
lgWeight <- log(Weight)
library(KernSmooth)
# a) Fit a nonparametric regression to data (xi,yi) and save the estimated values mĖ† (xi).
# Regression of degree 2 polynomial of lgWeight against Yr
op <- par(mfrow=c(2,1))
lpr1 <- locpoly(Yr,lgWeight, bandwidth=7, degree = 2, gridsize = length(Yr))
plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year")
lines(lpr1,lwd=2, col="blue")
lines(lpr1$y, col="black")
How can I get the values from the model? If I print the model, it gives me the values on $x and $y, but somehow if I plot them, is not the same as the blue line. I need the values of the fitted model (blue) for every x, could someone help me?
The fitted model (blue curve) is correctly in lpr1. As you said, the correct y-values are in lpr1$y and the correct x-values are in lpr1$x.
The reason the second plot looks like a straight line is because you are only giving the plot function one variable, lpr1$y. Since you don't specify the x-coordinates, R will automatically plot them along an index, from 1 to the length of the y variable.
The following are two explicit and equivalent ways to plot the curve and line:
lines(x = lpr1$x, y = lpr1$y,lwd=2, col="blue") # plots curve
lines(x = 1:length(lpr1$y), y = lpr1$y, col="black") # plot line

R X-axis Date Labels using plot()

Using the plot() function in R, I'm trying to produce a scatterplot of points of the form (SaleDate,SalePrice) = (saldt,sapPr) from a time-series, cross-section real estate sales dataset in dataframe format. My problem concerns labels for the X-axis. Just about any series of annual labels would be adequate, e.g. 1999,2000,...,2013 or 1999-01-01,...,2013-01-01. What I'm getting now, a single label, 2000, at what appears to be the proper location won't work.
The following is my call to plot():
plot(r12rgr0$saldt, r12rgr0$salpr/1000, type="p", pch=20, col="blue", cex.axis=.75,
xlim=c(as.Date("1999-01-01"),as.Date("2014-01-01")),
ylim=c(100,650),
main="Heritage Square Sales Prices $000s 1990-2014",xlab="Sale Date",ylab="$000s")
The xlim and ylim are called out to bound the date and price ranges of the data to be plotted; note prices are plotted as $000s. r12rgr0$saldt really is a date; str(r12rgr0$saldt) returns:
Date[1:4190], format: "1999-10-26" "2013-07-06" "2003-08-25" NA NA "2000-05-24" xx
I have reviewed several threads here concerning similar questions, and see that the solution probably lies with turning off the default X-axis behavior and using axis.date, but i) At my current level of R skill, I'm not sure I'd be able to solve the problem, and ii) I wonder why the plotting defaults are producing these rather puzzling (to me, at least) results?
Addl Observations: The Y-axis labels are just fine 100, 200,..., 600. The general appearance of the scatterplot indicates the called-for date ranges are being observed and the relative positions of the plotted points are correct. Replacing xlim=... as above with xlim=c("1999-01-01","2014-01-01")
or
xlim=c(as.numeric(as.character("1999-01-01")),as.numeric(as.character("2014-01-01")))
or
xlim=c(as.POSIXct("1999-01-01", format="%Y-%m-%d"),as.POSIXct("2014-01-01", format="%Y-%m-%d"))
all result in error messages.
With plots it's very hard to reproduce results with out sample data. Here's a sample I'll use
dd<-data.frame(
saldt=seq(as.Date("1999-01-01"), as.Date("2014-01-10"), by="6 mon"),
salpr = cumsum(rnorm(31))
)
A simple plot with
with(dd, plot(saldt, salpr))
produces a few year marks
If i wanted more control, I could use axis.Date as you alluded to
with(dd, plot(saldt, salpr, xaxt="n"))
axis.Date(1, at=seq(min(dd$saldt), max(dd$saldt), by="30 mon"), format="%m-%Y")
which gives
note that xlim will only zoom in parts of the plot. It is not directly connected to the axis labels but the axis labels will adjust to provide a "pretty" range to cover the data that is plotted. Doing just
xlim=c(as.Date("1999-01-01"),as.Date("2014-01-01"))
is the correct way to zoom the plot. No need for conversion to numeric or POSIXct.
If you are running a plot in real time and don't mind some warnings, you can just pass, e.g., format = "%Y-%m-%d" in the plot function. For instance:
plot(seq((Sys.Date()-9),Sys.Date(), 1), runif(10), xlab = "Date", ylab = "Random")
yields:
while:
plot(seq((Sys.Date()-9), Sys.Date(), 1), runif(10), format = "%Y-%m-%d", xlab = "Date", ylab = "Random")
yields:
with lots of warnings about format not being a graphical parameter.

Resources