Survival Curve in R with survfit

Survival Curve in R with survfit - r

I wanted to plot a survival curve using the following data. I called the data file as A.txt and the object A
A <- read.table(text = "
Time Status Group
8 1 A
8 1 A
8 1 A
9 1 A
9 1 A
9 1 A
15 0 A
15 0 A
7 1 B
7 1 B
8 1 B
9 1 B
10 1 B
10 1 B
15 0 B
15 0 B", header = TRUE)
I tried to plot a survival curve using this code:
title(main="Trial for Survival Curve")
fit <- survfit(Surv(Time, Status) ~ Group, data = A)
par(col.lab="red")
legend(10, .9, c("A", "B"), pch=c(2,3),col=2:3)
plot(fit, lty=2:3, col=2:3,lwd=5:5, xlab='Time(Days)',
ylab='% Survival',mark.time=TRUE,mark=2:3)
I would like to put marks (triangle for A and "+" for B) every time when survival % decreases for instance at Day 7 and Day 8. I want this labeling throughout the graph, but it adds the labels only at the end of the experiment.

First, I'd recommend rearranging the plotting calls:
par(col.lab="red")
plot(fit, lty=2:3, col=2:3,lwd=5:5, xlab='Time(Days)',
ylab='% Survival',mark.time=TRUE,mark=2:3)
title(main="Trial for Survival Curve")
legend(10, .9, c("A", "B"), pch=c(2,3),col=2:3)
You can add points to the survival plot with the points function. However, it looks like there's a small bug, which you can get around fairly easily:
firsty <- 1 ## Gets around bug
points(fit[1], col = 2, pch = 2) # Plots first group in fit
points(fit[2], col = 3, pch = 3) # Plots second group in fit
The points are plotted at the bottom of the "cliff" in the survival plot.

Related

Creating clusters based on a plot

I have a dataset like this:
Region
Year
Month
rate
residuals
1
2010
1
0.5
0.5
2
2010
1
4.0
0.5
This dataset continues it has 15'000 observations.
I created a scatter plot :
plot(df$full.residuals, df$rate, main="Scatterplot",
xlab="rate", ylab="Residuals")
Now I can't do it further to create cluster in the plot? Does anyone know how to create clusters in the plot?

First of all I created some more random datapoints, because with 2 points it will be hard to create clusters. You could use kmeans as an algorithm to create clusters. In this case I decide to create 2 clusters which you can change if you want. With the factoextra package you can create some nice visualizations like this:
library(factoextra)
set.seed(123)
df <- data.frame(rate = runif(20, 0, 1),
full.residuals = runif(20, 0, 1))
kmeans_cluster <- kmeans(scale(df), 2, nstart = 5)
kmeans_cluster$cluster
#> [1] 2 2 2 2 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 2
fviz_cluster(kmeans_cluster, data = df,
palette = c("#2E9FDF", "#00AFBB"),
geom = "point",
ellipse.type = "convex",
ggtheme = theme_bw())
Created on 2022-08-18 with reprex v2.0.2
I would suggest to have a look at this link for some extra information about using this package.

R: "Animate" Points on a Scatter Plot

I am working with R. Suppose I have the following data frame:
my_data <- data.frame(
"col" = c("red","red","red","red","red","blue","blue","blue","blue","blue","green", "green", "green", "green","green"),
"x_cor" = c(1,2,5,6,7,4,9,1,0,1,4,4,7,8,2),
"y_cor" = c(2,3,4,5,9,5,8,1,3,9,11,5,7,9,1),
"frame_number" = c(1,2,3,4,5, 1,2,3,4,5, 1,2,3,4,5)
)
my_data$col = as.factor(my_data$col)
head(my_data)
col x_cor y_cor frame_number
1 red 1 2 1
2 red 2 3 2
3 red 5 4 3
4 red 6 5 4
5 red 7 9 5
6 blue 4 5 1
In R, is it possible to create a (two-dimensional) graph that will "animate" each colored point to a new position based on the "frame number"?
For example:
I started following the instructions from this website here: https://www.datanovia.com/en/blog/gganimate-how-to-create-plots-with-beautiful-animation-in-r/
First, I made a static graph:
library(ggplot2)
library(gganimate)
p <- ggplot(
my_data,
aes(x = x_cor, y=y_cor, colour = col)
Then, I tried to animate it:
p + transition_time(frame_number) +
labs(title = "frame_number: {frame_number}")
Unfortunately, this produced an empty plot and the following warnings:
There were 50 or more warnings (use warnings() to see the first 50)
1: Cannot get dimensions of plot table. Plot region might not be fixed
2: values must be length 1,
but FUN(X[[1]]) result is length 15
Can someone please show me how to fix this problem?
Thanks

How to make a boxplot with 3D array with ggplot?

I have technical question for you please.
Here are my observed data. :
observed <- structure(c(4.06530084555243e-05, 4.34037362577724e-05, 5.25472735118296e-05,
5.75250282219017e-05, 5.33322813829422e-05, 4.31323519093776e-05,
2.93059438168564e-05, 3.2907253754896e-05, 3.93244409813805e-05,
4.44607200813546e-05, 4.28121839343577e-05, 4.41339340180233e-05,
2.45819615043229e-05, 2.77652788697063e-05, 3.471280169582e-05,
4.0759303004447e-05, 4.1444945573338e-05, 3.91053759171617e-05
), .Dim = c(6L, 3L))
After a simulation I have this dataset :
simul <- structure(c(4.19400641566714e-05, 4.34037362577724e-05, 5.21778240776188e-05,
5.72766282640455e-05, 5.33322813829422e-05, 4.4984474595369e-05,
3.04758260711529e-05, 3.35466566427138e-05, 4.07527347018512e-05,
4.51672959887775e-05, 4.42496416020706e-05, 4.41339340180233e-05,
2.38725672336555e-05, 2.78960210968267e-05, 3.42390390339277e-05,
4.0759303004447e-05, 4.1444945573338e-05, 4.16181419135288e-05,
4.06530084555243e-05, 4.52163381730998e-05, 5.37744538705153e-05,
5.75250282219017e-05, 5.44384786782902e-05, 4.27640158845638e-05,
2.93059438168564e-05, 3.16988003284864e-05, 3.88757470111112e-05,
4.16839537839391e-05, 4.1923490779897e-05, 4.43697930071784e-05,
2.53312977844189e-05, 2.82780740113101e-05, 3.49483644305925e-05,
4.23308636691264e-05, 4.36574393087853e-05, 3.91053759171617e-05,
3.97856427517231e-05, 4.25485977213641e-05, 5.21380124071012e-05,
5.62879076217168e-05, 5.18161751345512e-05, 4.22404154190924e-05,
2.84842421189343e-05, 3.2907253754896e-05, 3.93244409813805e-05,
4.28921326811218e-05, 4.2391125283836e-05, 4.28233487269764e-05,
2.45819615043229e-05, 2.67311845213199e-05, 3.3715109777394e-05,
4.00991849427121e-05, 4.07259705233212e-05, 3.62825448554739e-05,
3.95854341194398e-05, 4.23930151174446e-05, 5.25472735118296e-05,
5.76202168197769e-05, 5.23957149070388e-05, 4.31323519093776e-05,
2.90350657890489e-05, 3.22693947104228e-05, 3.90988677457566e-05,
4.44607200813546e-05, 4.28121839343577e-05, 4.28542288317551e-05,
2.56149959419174e-05, 2.77652788697063e-05, 3.49302533009518e-05,
4.13777396322285e-05, 4.12908495437265e-05, 3.92084109551252e-05,
4.14887591359563e-05, 4.39273564362111e-05, 5.31197050290816e-05,
5.77484133948985e-05, 5.36319646972061e-05, 4.62472643466539e-05,
3.06756490605887e-05, 3.49917045844483e-05, 4.15936967740209e-05,
4.66221720234964e-05, 4.48785430220286e-05, 4.44766996381653e-05,
2.36916432633518e-05, 2.69248181080789e-05, 3.471280169582e-05,
3.94762090257435e-05, 4.17765202936009e-05, 3.8021359310749e-05
), .Dim = c(6L, 3L, 5L))
This is a 3D array with 3 dimensions. The columns correspond to the study areas, and the rows to the "months" followed. The third dimension corresponds to the values of the simulation.
My question : Is it possible, with ggplot, to present a multipanel graph (grid) - 1 panel for 1 study area - of boxplots simulations (values of the 3rd dimension) with months at "x axis" please (= 6 boxplots per panel) ? I would also like to draw the lines of the values observed through the boxplots of each panel. Thank you !

I hope I understood it right: for each type of study - make boxplots for each month, summarizing values obtained from all of 5 simulations.
First I gave dimension names to array:
attributes(simul)$dimnames <- list(
month = month.abb[1:6],
study = letters[1:3],
simval = 1:5
)
After that I converted the named array to the cube_tibble, and further into the tibble so I can plot data using usual tidyverse routine:
library(tidyverse)
library(magrittr)
as.tbl_cube(simul) %>%
as_tibble() %>%
rename('value' = simul) %>%
mutate(
study = factor(paste('Study', study)),
month = factor(month, levels = month.abb[1:6])
) %T>%
print %>%
ggplot(aes(x = month, y = value)) +
geom_boxplot(outlier.colour = 'red') +
facet_wrap(~ study, nrow = 1, scale = 'free_y') +
ggthemes::theme_few()
# # A tibble: 90 x 4
# month study simval value
# <fct> <fct> <int> <dbl>
# 1 Jan Study a 1 0.0000419
# 2 Feb Study a 1 0.0000434
# 3 Mar Study a 1 0.0000522
# 4 Apr Study a 1 0.0000573
# 5 May Study a 1 0.0000533
# 6 Jun Study a 1 0.0000450
# 7 Jan Study b 1 0.0000305
# 8 Feb Study b 1 0.0000335
# 9 Mar Study b 1 0.0000408
# 10 Apr Study b 1 0.0000452
# # ... with 80 more rows

How do you plot the first few values of a PCA

I've run a PCA with a moderately-sized data set, but I only want to visualize a certain amount of points from that analysis because they are from repeat observations and I want to see how close the paired observations are to each other on the plot. I've set it up so that the first 18 individuals are the ones I want to plot, but I can't seem to only plot just the first 18 points without only doing an analysis of only the first 18 instead of the whole data set (43 individuals).
# My data file
TrialsMR<-read.csv("NER_Trials_Matrix_Retrials.csv", row.names = 1)
# I ran the PCA of all of my values (without the categorical variable in col 8)
R.pca <- PCA(TrialsMR[,-8], graph = FALSE)
# When I try to plot only the first 18 individuals with this method, I get an error
fviz_pca_ind(R.pca[1:18,],
labelsize = 4,
pointsize = 1,
col.ind = TrialsMR$Bands,
palette = c("red", "blue", "black", "cyan", "magenta", "yellow", "gray", "green3", "pink" ))
# This is the error
Error in R.pca[1:18, ] : incorrect number of dimensions
The 18 individuals are each paired up, so only using 9 colours shouldn't cause an error (I hope).
Could anyone help me plot just the first 18 points from a PCA of my whole data set?
My data frame looks similar to this in structure
TrialsMR
Trees Bushes Shrubs Bands
JOHN1 1 4 18 BLUE
JOHN2 2 6 25 BLUE
CARL1 1 3 12 GREEN
CARL2 2 4 15 GREEN
GREG1 1 1 15 RED
GREG2 3 11 26 RED
MIKE1 1 7 19 PINK
MIKE2 1 1 25 PINK
where each band corresponds to a specific individual that has been tested twice.

You are using the wrong argument to specify individuals. Use select.ind to choose the individuals required, for eg.:
data(iris) # test data
If you want to rename your rows according to a specific grouping criteria for readily identifiable in a plot. For eg. let setosa lies in series starting with 1, something like in 100-199, similarly versicolor in 200-299 and virginica in 300-399. Do it before the PCA.
new_series <- c(101:150, 201:250, 301:350) # there are 50 of each
rownames(iris) <- new_series
R.pca <- prcomp(iris[,1:4],scale. = T) # pca
library(factoextra)
fviz_pca_ind(X= R.pca, labelsize = 4, pointsize = 1,
select.ind= list(name = new_series[1:120]), # 120 out of 150 selected
col.ind = iris$Species ,
palette = c("blue", "red", "green" ))
Always refer to R documentation first before using a new function.
R documentation: fviz_pca {factoextra}
X
an object of class PCA [FactoMineR]; prcomp and princomp [stats]; dudi and pca [ade4]; expOutput/epPCA [ExPosition].
select.ind, select.var
a selection of individuals/variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib
For your particular dummy data, this should do:
R.pca <- prcomp(TrailsMR[,1:3], scale. = TRUE)
fviz_pca_ind(X= R.pca,
select.ind= list(name = row.names(TrialsMR)[1:4]), # 4 out of 8
pointsize = 1, labelsize = 4,
col.ind = TrialsMR$Bands,
palette = c("blue", "green" )) + ylim(-1,1)
Dummy Data:
TrialsMR <- read.table( text = "Trees Bushes Shrubs Bands
JOHN1 1 4 18 BLUE
JOHN2 2 6 25 BLUE
CARL1 1 3 12 GREEN
CARL2 2 4 15 GREEN
GREG1 1 1 15 RED
GREG2 3 11 26 RED
MIKE1 1 7 19 PINK
MIKE2 1 1 25 PINK", header = TRUE)

Plot In R with Multiple Lines Based On A Particular Variable?

I have this accelerometer dataset and, let's say that I have some n number of observations for each subject (30 subjects total) for body-acceleration x time.
I want to make a plot so that it plots these body acceleration x time points for each subject in a different color on the y axis and the x axis is just an index. I tried this:
ggplot(data = filtered_data_walk, aes(x = seq_along(filtered_data_walk$'body-acceleration-mean-y-time'), y = filtered_data_walk$'body-acceleration-mean-y-time')) +
geom_line(aes(color = filtered_data_walk$subject))
But, the problem is that it doesn't superimpose the 30 lines, instead, they run along side each other. In other words, I end up with n1 + n2 + n3 + ... + n30 x index points, instead of max{n1, n2, ..., n30}. This is my first time posting, so I hope this makes sense (I know my formatting is bad).
One solution I thought of was to create a new variable which gives a value of 1 to n for all the observations of each subject. So, for example, if I had 6 observations for subject1, 4 observations for subject2, and 9 observations for subject3, this new variable would be sequenced like:
1 2 3 4 5 6 1 2 3 4 1 2 3 4 5 6 7 8 9
Is there an easy way to do this? Please help, ty.

Assuming your data is formatted as a data.frame or matrix, for a toy dataset like
x <- data.frame(replicate(5, rnorm(10)))
x
# X1 X2 X3 X4 X5
# 1 -1.36452272 -1.46446475 2.0444381 0.001585876 -1.1085990
# 2 -1.41303046 -0.14690269 1.6179084 -0.310162018 -1.5528733
# 3 -0.15319554 -0.18779791 -0.3005058 0.351619212 1.6282955
# 4 -0.38712167 -0.14867239 -1.0776359 0.106694311 -0.7065382
# 5 -0.50711166 -0.95992916 1.3522922 1.437085757 -0.7921355
# 6 -0.82377208 0.50423328 -0.5366513 -1.315263679 1.0604499
# 7 -0.01462037 -1.15213287 0.9910678 0.372623508 1.9002438
# 8 1.49721113 -0.84914197 0.2422053 0.337141898 1.2405208
# 9 1.95914245 -1.43041783 0.2190829 -1.797396822 0.4970690
# 10 -1.75726827 -0.04123615 -0.1660454 -1.071688768 -0.3331887
...you might be able to get there with something like
plot(x[,1], type='l', xlim=c(1, nrow(x)), ylim=c(min(x), max(x)))
for(i in 2:ncol(x)) lines(x[,i], col=i)
You could play with formatting some more, of course, do things with lty= and lwd= and maybe a color ramp of your own choosing, etc.
If your data is in the format below...
x <- data.frame(id=c("A","A","A","B","B","B","B","C","C"), acc=rnorm(9))
x
# id acc
# 1 A 0.1796964
# 2 A 0.8770237
# 3 A -2.4413527
# 4 B 0.9379746
# 5 B -0.3416141
# 6 B -0.2921062
# 7 B 0.1440221
# 8 C -0.3248310
# 9 C -0.1058267
...you could get there with
maxn <- max(with(x, tapply(acc, id, length)))
ids <- sort(unique(x$id))
plot(x$acc[x$id==ids[1]], type='l', xlim=c(1,maxn), ylim=c(min(x$acc),max(x$acc)))
for(i in 2:length(ids)) lines(x$acc[x$id==ids[i]], col=i)
Hope this helps, and that I interpreted your problem right--

That's pretty quick to do if you are OK with using dplyr. group_by to enforce a separate counter for each subject, mutate to add the actual counter, and your ggplot should work. Example with iris dataset:
group_by(iris, Species) %>%
mutate(index = seq_along(Petal.Length)) %>%
ggplot() + geom_line(aes(x=index, y=Petal.Length, color=Species))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Survival Curve in R with survfit - r

Related

Creating clusters based on a plot

R: "Animate" Points on a Scatter Plot

How to make a boxplot with 3D array with ggplot?

How do you plot the first few values of a PCA

Plot In R with Multiple Lines Based On A Particular Variable?

Categories

Resources