R Plot of Two Detrended Series Shows Line Chart Rather Than Scatterplot - r

I have a set of data which are within the exact same time frame, with the exact same number of points. I have detrended both so comovement can be analyzed. When I plot them against each other the graph attempts to create a line chart including dates.
plot
This is what the series look like in the environment:
environment variables
This is what the data looks like:
data screenshot
I would like this in a scatterplot measuring against both variables, just points and no lines or dates in the plot.

So I sorta figured it out but it's super botched and I do not recommend anyone else to do this.
Essentially, I bound the two datasets together doing:
testvar <- cbind(dewagerealM, dewagerealF)
I was then able to select all the data on the left and the right, then plot them against each other like so:
plot(testvar[1:23,1], testvar[1:23,2])
This seems to have worked but it's not pretty and definitely not what should be done but it seems to have gotten the job done.

The easiest way to do this is to use the options xy.lines and xy.labels set to FALSE
plot(dewagerealM, dewagerealF, type = 'p',xy.lines = FALSE, xy.labels = FALSE)
Since you are plotting time series (ts) type objects, the help function help("plot.ts") can give you more details on the options you can use to plot these objects.

Related

dygraphs doesn't show the line for more than 10,000 datapoints

I am trying to plot a graph using dygraph function for a dataset with more than 100,000 datapoints. As soon as I try it the graph appears empty. I tried to shorten the dataset and it turns out that dygraph shows graph for dataset up to 10,000 entries only. Here is a sample with 9,999 datapoints
dygraph(ts(1:9999))
up to 9999 datapoints
as soon as I change to 10,000 it doesn't show anything
dygraph(ts(1:10000))
10000 datapoints
After some research I came to conclusion that this is a bug. Nevertheless I found a solution to this. If you convert your data to time series using timeSeries function, it starts working.
For example:
y = timeSeries(1:1000000, 1:1000000)
dygraph(y)

geom_bspline across multiple plots combined into a single figure

I would like to create a ggplot2 layer that includes multiple geom_bspline(), or something similar, to point to regions on different plots after combining them into a single figure. A feature in the data seen in one plot appears in another plot after a transformation. However, it may not be clear to a non-expert they are due to the same phenomenon. The plots are to be combined into a single figure using ggarrange(), cowplot(), patchwork() or something similar.
I can get by using ggforce::geom_ellipse() on each plot but it's not as clean. Any suggestions?
Of course, after asking the question and staring at the figure in question, it came to me that I simply need to add a geom_bspline() to the combined figure. Tried that earlier but didn't give enough thought to the coordinates on the new layer. The coordinates of the spline are given in the range of 0 to 1 for both the x and y values on this new layer. Simple and obvious.

How to get the actual data from the function hist

I am very new to R, so I apologize if this is a basic question.
Is there any way to have the data behind the graph the function "hist" produces?
I don't need the graphic, I just the data.
In general, it would be nice if I have the option to only get the data behind the functions that produce graphs and prevent drawing the actual plots.
Thank you,
There is no way to obtain the original data behind the function hist.
If you are referring just to the data required to generate the plot, they are stored in hist(x)$mids and hist(x)$count, which contains respectively the midpoints and the counts. If you want just the data without drawing the plot, you can call this function on the object hist:
dataHist<-function(y){
rbind(y$mids,y$counts)
}
Try using hist(*yourvectorname*, plot = FALSE)

R - Adding series to multiple plots

I have the following plot:
plot.ts(returns)
I have another dataframe ma_sd which contains the rolling SD from moving averages of the above returns. The df is structured exactly like returns. Is there a simple way to add each line to the corresponding plots?
lines(1:N, ma_sd) seemed intuitive, but it does not work.
Thanks
The only way I can see you doing this is to plot them separately. This code is a bit clunky but will allow you full flexibility to be able to specify labels and axis ranges. You can build on this.
par(mfrow=c(3,1),oma=c(5,4,4,2),mar=c(0,0,0,0))
time<-as.data.frame(matrix(c(1:length(returns[,1])),length(returns[,1]),3))
plot(time[,1],returns[,1],type='l',xaxt='n')
points(time[,1],ma_sd[,1],type='l',col='red')
plot(time[,2],returns[,2],type='l',xaxt='n')
points(time[,2],ma_sd[,2],type='l',col='red')
plot(time[,3],returns[,3],type='l')
points(time[,3],ma_sd[,3],type='l',col='red')

Actuarial survival analysis, divided into intervals

I'm trying to create an actuarial survival analysis in R (I'm following some worked examples). I think the best way to do this is using the survival package. So something like:
library(survival)
surv.test <- survfit(Surv(TIME,STATUS), data=test)
However, to get the correct answer I will need to divide the TIME variable into 365 day intervals and I can't quite work out how to do this so it matches the given result.
As far as I can make out, there is no option within the survfit function that will do this. I went through several document examples and none of them were trying to create a stairstep type of plot (there is a type='interval' option, but seems to do something different). So I guess I need to regroup my data before I apply the survival function?
Any ideas?
P.S: In SPSS this would be INTERVAL = THRU 10000 BY 365; in Stata intervals(365) ... connect(stairsteps)
I am guessing that you want to divide the TIME variable into intervals because you want to plot a Kaplan-Meier curve. In R, that isn't necessary, you can just call plot on the survfit object. For example,
s=survfit(Surv(futime, fustat)~rx, data=ovarian)
plot(s)
I think I understand your question a little better. The reason why you are getting a thick black line is because you have a lot of censoring, and a + is being plotted at every single point where there is censoring, you can turn this off with mark.time=F. (You can see other options in ?survival:::plot.survfit)
However, if you still want to aggregate by year, simply divide your follow up time by 365, and round up. ceiling is used to round up. Here is an example of aggregating at different time levels without censoring.
par(mfrow=c(1,3))
plot(survfit(Surv(ceiling(futime), fustat)~rx, data=ovarian),col=c('blue','red'),main='Day',mark.time=F)
plot(survfit(Surv(ceiling(futime/30), fustat)~rx, data=ovarian),col=c('blue','red'),main='Month',mark.time=F)
plot(survfit(Surv(ceiling(futime/365), fustat)~rx, data=ovarian),col=c('blue','red'),main='Year',mark.time=F)
par(mfrow=c(1,1))
But I think that plotting the Kaplan-Meier without the censoring symbols will look very nice, and provide more insight.
Hurray, I should be able to post the images now:
1) this is how the R basic survival plot looks like at the moment
2) and this is how it should look like (SPSS example)
That was exactly what I was missing! Thanks!
Solution:
vas.surv <- survfit(Surv(ceiling(TIME/365), STATUS)~1, conf.type="none", data=vasectomy)
plot(vas.surv, ylim=c(0.975,1), mark.time=F, xlab="Years", ylab="Cumulative Survival")
A nice touch would be to displays the days on the x-axis instead of the years (as in SPSS) example, but I'm not too bothered about this.

Resources