R survift() - separate plots for each level of IV - r

I'm very new to R and to survival analyses. I'm trying to plot survival curves for each level of a categorical IV. Importantly, I want them plotted on separate plots. Is there an easy way to do this? An obvious solution is to just create separate data frames for the categories, however, that seems rather cumbersome.
my code:
figure_site <- survfit(survObject~site, data = uis_data)
plot(figure_site)
site has 2 levels (0, 1) and I want two plots - one for site==0 and site==1.
Thanks!

Related

is there a function in R to quickly calculate the difference between two geom_bin2d maps?

I have a large 2-variable dataset that may be classified into 2 groups using a third variable. Overplotting is an issue, so I've resorted to visualizing my data using bin2d and other similar approaches. I would like to calculate the difference between the binned counts of the two groups and visualize that as well (e.g subtract one 2d histogram from another).
example code:
df <- diamonds
df_color_H <- filter(df,color=="H")
df_color_E <- filter(df,color=="E")
ggplot(df_color_H)+
geom_bin2d(aes(carat,price),bins=40)
ggplot(df_color_E)+
geom_bin2d(aes(carat,price),bins=40)
Ultimately, I want to visualize the difference between overlapping bins. I know the solution is likely a pre-processing step before bringing them into GGplot but I haven't found exactly what I'm looking for. I also don't need a sophisticated solution using KDEs or something like that.
Any suggestions would be welcome!

Create ggplot2 density plot from binned data?

I would like to visualize a distribution using ggplot2 tools like density plots and ECDFs. The challenge is that I only have binned data available and not the individual samples. That is, each row in my data frame has columns bin,count rather than individual samples. However, the bins can be quite narrow e.g. with data spread over 500 bins.
Are there some reasonable solutions? My first thought was to somehow expand each bin into individual samples by repeating the upper bound of the bin count times. I am not sure the best way to do that nor whether it is especially inadvisable.
Tips welcome!

Adding multiple lines to plot, without ggplot

I would like to plot multiple lines on the same plot, without using ggplot.
I have scores for different individuals across a set time period and wish to plot a line between yearly scores for each individual. Data is organised with each row representing an individual and each column an observed value in a given year.
Currently I am using a for loop, but am aware that this is often not efficient in R, and am interested if there are any more suitable approaches available within base R.
I will be working with up 100,000 individuals
Thanks.
Code:
df=data.frame(runif(10,0,100),runif(10,0,100),runif(10,0,100),runif(10,0,100))
df=data.frame(t(df))
Years=seq(1,10,1)
plot(1,type="n",xlab="Year",ylab="Score", xlim=c(1,10), ylim=c(0,100))
for(x in 1:4){lines(Years,df[x,])}
Efficiency is not much of a consideration when plotting since plotting to a device is a slow operation in itself. You can use matplot (which uses a loop internally). It's basically a more sophisticated version of your code wrapped in a function.
matplot(Years, t(df), xlab="Year", ylab="Score", type = "l")

Plot group in lattice, using different data sources

Using the lattice package in R, I would like to plot one row of 7 diagrams, all using the same Y-axis. The diagrams should be (vertical) line diagrams. The problem is that my data are each in 7 separate dataframes (containing X and Y data), with different slightly different limits on the Y-axis data.
Besides all tutorials, I don't get it right. What must my Code look like? Is there even a clean solution for this in lattice?
You could combine all your data frames into one and then do something like
xyplot(Y~X|odf,data=combinedDF,layout=c(7,1))
where odf is an indicator column of the original data frame. This by default should use a common y scale.
Apart from combining the data, you could create 7 separate plots, then print them.
p1 <- xyplot(Y~X,data=DF1,ylim=c(Y1,Y2))
p2 <- xyplot(Y~X,data=DF2,ylim=c(Y1,Y2))
etc.
To print:
print(p1,split=c(1,1,7,1),more=TRUE)
print(p2,split=c(2,1,7,1),more=TRUE)
...
print(p7,split=c(7,1,7,1),more=FALSE)
see ?print.trellis.
Of course, arranging single plots like this doesn't really use the features of lattice. You could just as easily do this with base graphics using layout or par(mfrow=c(1,7)) for example, and a common ylim.

How to structure data for R?

So... newbie R user here. I have some observations that I'd like to record using R and be able to add to later.
The items are sorted by weights, and the number at each weight recorded. So far what I have looks like this:
weights <- c(rep(171.5, times=1), rep(171.6, times=2), rep(171.7, times=4), rep(171.8, times=18), rep(171.9, times=39), rep(172.0, times=36), rep(172.1, times=34), rep(172.2, times=25))
There will be a total of 500 items being observed.
I'm going to be taking additional observations over time to (hopefully) see how the distribution of weights changes with use/wear. I'd like to be able plots showing either stacked histograms or boxplots.
What would be the best way to format / store this data to facilitate this kind of use case? A matrix, dataframe, something else?
As other comments have suggest, the most versatile (and perhaps useful) container (structure) for your data would be a data frame - for use with the library(ggplot2) for your future plotting and graphing needs(such as BoxPlot with ggplot and various histograms
Toy example
All the code below does is use your weights vector above, to create a data frame with some dummy IDs and plot a box and whisker plot, and results in the below plot.
library(ggplot2)
IDs<-sample(LETTERS[1:5],length(weights),TRUE) #dummy ID values
df<-data.frame(ID=IDs,Weights=weights) #make data frame with your
#original `weights` vector
ggplot(data=df,aes(factor(ID),Weights))+geom_boxplot() #box-plot

Resources