I have two questions that are almost completely related to how to do things in R.
I am running an ordinal regression analysis in R. The dependent variable has three levels (0=no action; 1=warning; 2=sanction).
I use the lrm command in the rms package:
print( res1<- lrm(Y ~ x1+x2+x3+x4+x5+x6, y=TRUE, x=TRUE, data=mydata))
I simply couldn't make any sense of the information generated my ?predict.lrm. What I want to do is to calculate the marginal effects of all explanatory variables for each level of the dependent variable. In Stata, this is very simple: mfx compute, predict (outcome(#0)); mfx compute, predict (outcome(#2)) and mfx compute, predict (outcome(#3)).
So my first question is: how do I generate marginal effects for each outcome in R? Please keep in mind that my skills in R are not advanced.
The second question is related to interaction effects, which I need to include in the same model:
print( res1<- lrm(Y ~ x1+x2+x3+x4+x5+x6+x5*x6, y=TRUE, x=TRUE, data=mydata))
If I knew the answer to the first question, I would have ran marginal effects with the interaction term included. Then, I would have plotted the predicted values of the interaction term.
So the second question is: how do I plot the effects (predicted values) of variables in the interaction term?
Many thanks!
EDIT:
Small sample from my dataset (only one country)
dput(mydatasample)
structure(list(year = 1989:2014, country = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Canada", class = "factor"),
id = structure(1:26, .Label = c("CAN 1989", "CAN 1990", "CAN 1991",
"CAN 1992", "CAN 1993", "CAN 1994", "CAN 1995", "CAN 1996",
"CAN 1997", "CAN 1998", "CAN 1999", "CAN 2000", "CAN 2001",
"CAN 2002", "CAN 2003", "CAN 2004", "CAN 2005", "CAN 2006",
"CAN 2007", "CAN 2008", "CAN 2009", "CAN 2010", "CAN 2011",
"CAN 2012", "CAN 2013", "CAN 2014"), class = "factor"), stage1 = c(1L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 0L, 0L, 0L,
0L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L), x1 = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L), x2 = c(1L, 2L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 2L, 2L), x3 = c(9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 8L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L), x4 = c(31L, 31L, 31L, 31L,
31L, 30L, 30L, 30L, 31L, 30L, 29L, 30L, 28L, 28L, 28L, 27L,
29L, 29L, 29L, 28L, 25L, 24L, 23L, NA, NA, NA), x5 = structure(1:26, .Label = c("17,12528685",
"17,14022279", "17,15382785", "17,16610202", "17,17704534",
"17,18665779", "17,19493938", "17,20571103", "17,21628118",
"17,22493732", "17,23321101", "17,242041", "17,25213621",
"17,26110753", "17,27106985", "17,2810902", "17,29094924",
"17,29891768", "17,30861622", "17,31943819", "17,33088659",
"17,34202619", "17,35190237", "17,36381421", "17,37537139",
"17,38618117"), class = "factor"), x5.1 = c(0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 1L, 1L, 0L)), .Names = c("year", "country",
"id", "stage1", "x1", "x2", "x3", "x4", "x5", "x5.1"), class = "data.frame", row.names = c(NA,
-26L))
Related
I am using distributed lag non-linear models . I ran a glm model with a cross-basis matrix from the DLNM package. When I tried to get the predictions, I got this error:
Error in crosspred(cbpm1, Tp1, by = 1, bylag = 1, at = speimin:speimax) :
coef/vcov not consistent with basis matrix. See help(crosspred).
This happened when I tried lag 1,2, and 3, but there was no error when I tried lag 0, 4, and 5. I read about a similar question from this link. But still, I cannot figure it out with my own code. Your help is really meaningful for me. Thanks.
The code is:
Dis <- ss$dis1
vkt <- equalknots(ss$T,nk=2)
lkt = logknots(1,nk=2)
vkpm <- equalknots(ss$spei3,nk=2)
lkpm <- logknots(1,nk=2)
speimin <- min(ss$spei3, na.rm = TRUE)
speimax <- max(ss$spei3, na.rm = TRUE)
cbt1 = crossbasis(ss$T, lag=1, argvar=list(fun="bs",degree=2,knots=vkt), arglag=list(knots=lkt))
cbpm1 <- crossbasis(ss$spei3, lag=1, argvar=list(fun="bs",degree=2,knots=vkpm), arglag=list(knots=lkpm))
Tp1 <- glm(Dis ~ cbt1 + cbpm1 + ns(RH,3)+ns(timeseries,2*5),
family=poisson(link=log),ss)
at=speimin:speimax
predsltp1 <- crosspred(cbpm1,Tp1,by=1,bylag=1,at=speimin:speimax)
Here is the used library:
library(splines);library(class);library(stats);library(mda)
library(akima);library(gam);library(mgcv);library(foreign);library(som)
library(dlnm) #equalknots logknots crossbasis
library(splines) #ns
library(magrittr)
Here is the reproducible sample of my dataset:
a<-structure(list(job = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "all", class = "factor"),
age3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "00_05", class = "factor"),
sexA = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L,
2L, 1L, 2L, 2L, 1L), .Label = c("F", "M"), class = "factor"),
All = c(65L, 53L, 92L, 68L, 81L, 103L, 144L, 92L, 44L, 40L,
54L, 19L, 55L, 61L, 72L, 89L, 77L, 68L, 71L, 27L, 15L, 18L,
39L, 52L, 52L, 58L, 27L, 44L, 32L, 37L), dis1 = c(6L, 0L,
9L, 0L, 0L, 0L, 9L, 0L, 3L, 6L, 3L, 0L, 0L, 3L, 6L, 0L, 9L,
3L, 0L, 3L, 6L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L), dis2 = c(3L,
6L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 3L, 0L, 0L, 6L, 6L,
0L, 0L, 0L, 3L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
T = c(20.39032258, 20.39032258, 19.78387097, 19.78387097,
19.64193548, 19.64193548, 18.78709677, 18.78709677, 19.17419355,
19.17419355, 20.46774194, 21.63870968, 21.85806452, 21.85806452,
19.73448276, 19.73448276, 20.55357143, 20.55357143, 19.925,
29.12580645, 29.12580645, 29.39354839, 29.39354839, 28.96129032,
28.96129032, 27.36666667, 27.40333333, 27.40333333, 27.82333333,
27.82333333), RH = c(70.09677419, 70.09677419, 70.03225806,
70.03225806, 70.35483871, 70.35483871, 72.32258065, 72.32258065,
69.80645161, 69.80645161, 74.58064516, 77.58064516, 71.32258065,
71.32258065, 75.82758621, 75.82758621, 62.28571429, 62.28571429,
72.60714286, 77.61290323, 77.61290323, 75.06451613, 75.06451613,
75.61290323, 75.61290323, 76.03333333, 76.23333333, 76.23333333,
75.03333333, 75.03333333), PP = c(11.5, 11.5, 44.5, 44.5,
25.9, 25.9, 14, 14, 5, 5, 35.7, 34.1, 30.8, 30.8, 44.4, 44.4,
15.6, 15.6, 40.7, 184, 184, 137.1, 137.1, 377, 377, 110.5,
129.8, 129.8, 292, 292), spei3 = c(0.447495072, 0.447495072,
1.537295165, 1.537295165, 1.285067571, 1.285067571, 0.441010834,
0.441010834, 1.505630159, 1.505630159, 1.725831329, 1.075029338,
-1.227673724, -1.227673724, 0.329690702, 0.329690702, 0.724314874,
0.724314874, 1.228544608, 0.60782059, 0.60782059, 0.191804009,
0.191804009, 1.752145476, 1.752145476, 1.94554333, 1.139058482,
1.139058482, -0.554472376, -0.554472376), timeseries = 1:30), class = "data.frame", row.names = c(NA,
-30L))
I'm very inexperienced with R, but I'm required to use it for the statistics class I'm taking. I'm trying to make a dot plot using
library(lattice)
dotplot(Bio$SS,
main = "Plants by Number of Short Shoots",
xlab = "Number of Short Shoots",
ylab = "Number of Plants",)
However, the graph doesn't provide a count for the y-value. It looks like this instead:
As you can see, there are no y-values given to the dot plot, even though it should be listing the number of plants with each value. When I made a histogram using a similar formula it worked fine:
hist(Bio$SS,
main = "Plants by Number of Short Shoots",
xlab = "Number of Short Shoots",
ylab = "Number of Plants",
col = "green")
Here is how that turned out:
This one properly provided the count for a y-value. How can I make the dotplot do the same?
Here's the data I'm using:
structure(list(ï..Block = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), Treatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("NFCT", "NFNP", "SFCT", "SFNP"), class = "factor"),
Plant = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), Stem = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L), SS = c(4L, 2L, 3L, 2L,
1L, 2L, 5L, 5L, 4L, 4L, 5L, 3L, 3L, 2L, 4L, 2L, 6L, 3L, 10L,
2L, 5L, 2L, 6L, 2L, 4L), LS = c(4L, 7L, 1L, 7L, 7L, 6L, 5L,
5L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 4L, 1L, 4L, 4L, 4L,
2L, 4L, 1L), Leaves = c(30L, 30L, 13L, 32L, 32L, 35L, 33L,
34L, 27L, 23L, 21L, 20L, 25L, 24L, 25L, 25L, 24L, 25L, 29L,
20L, 20L, 22L, 25L, 23L, 13L), Inf. = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), TLength = c(10.5, 11.2, 6.2, 12.2, 11.3,
11.5, 11.9, 11.7, 10, 11.5, 10.9, 12.2, 12.6, 12.2, 12.1,
12, 6.5, 6.7, 13, 6.2, 7.6, 5.9, 7.7, 6, 5.6)), row.names = c(NA,
25L), class = "data.frame")
This question already has answers here:
ggplot: colour points by groups based on user defined colours
(3 answers)
Closed 4 years ago.
I try to perform scatterplot between variables by two groups
ggplot(terr, aes(x = Killed, y = Terr..Attacks,group=Religion,Macro.Region)) +
geom_point() +
geom_smooth()
but i didn't get the results
how can i create scatterplot by groups?
terr=structure(list(Macro.Region = structure(c(5L, 4L, 4L, 3L, 4L,
6L, 1L, 2L, 4L, 3L, 6L, 5L, 4L, 4L, 3L, 4L, 6L, 1L, 2L, 4L, 3L,
6L), .Label = c("Arab Countries", "Asia", "Eastern Europe and post-Soviet",
"Latin America", "Sub-Saharan Africa", "Western States"), class = "factor"),
Killed = c(0L, 0L, 0L, 6L, 0L, 0L, 1L, 76L, 0L, 0L, 36L,
0L, 0L, 0L, 6L, 0L, 0L, 1L, 76L, 0L, 0L, 36L), Terr..Attacks = c(2L,
0L, 2L, 2L, 0L, 9L, 3L, 88L, 0L, 0L, 6L, 2L, 0L, 2L, 2L,
0L, 9L, 3L, 88L, 0L, 0L, 6L), Religion = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 1L, 1L, 1L), .Label = c("Christianity", "Islam"
), class = "factor"), GDP.capita = c(6813L, 26198L, 20677L,
9098L, NA, 49882L, 51846L, 4207L, 17508L, 18616L, 46301L,
6813L, 26198L, 20677L, 9098L, NA, 49882L, 51846L, 4207L,
17508L, 18616L, 46301L)), class = "data.frame", row.names = c(NA,
-22L))
ggplot(terr, aes(x = Killed, y = Terr..Attacks)) +
geom_point(alpha=1/4) +
facet_wrap(Religion ~ Macro.Region)
I am running nonlinear PCA in r, using the homals package. Here is a chunk of the code I am using as an example:
res1 <- homals(data = mydata, rank = 1, ndim = 9, level = "nominal")
res1 <- rescale(res1)
I want to generate 1000 bootstrap estimates of the eigenvalues in this analysis (with replacement), but I can't figure out the code. Does anyone have any suggestions?
Sample data:
dput(head(mydata, 30))
structure(list(`W age` = c(45L, 43L, 42L, 36L, 19L, 38L, 21L,
27L, 45L, 38L, 42L, 44L, 42L, 38L, 26L, 48L, 39L, 37L, 39L, 26L,
24L, 46L, 39L, 48L, 40L, 38L, 29L, 24L, 43L, 31L), `W education` = c(1L,
2L, 3L, 3L, 4L, 2L, 3L, 2L, 1L, 1L, 1L, 4L, 2L, 3L, 2L, 1L, 2L,
2L, 2L, 3L, 3L, 4L, 4L, 4L, 2L, 4L, 4L, 4L, 1L, 3L), `H education` = c(3L,
3L, 2L, 3L, 4L, 3L, 3L, 3L, 1L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 2L,
2L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 4L), `N children` = c(10L,
7L, 9L, 8L, 0L, 6L, 1L, 3L, 8L, 2L, 4L, 1L, 1L, 2L, 0L, 7L, 6L,
8L, 5L, 1L, 0L, 1L, 1L, 5L, 8L, 1L, 0L, 0L, 8L, 2L), `W religion` = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), `W employment` = c(1L,
1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L), `H occupation` = c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 1L, 1L, 3L, 2L, 4L, 2L, 2L,
2L, 2L, 4L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 2L, 2L, 1L), `Standard of living` =
c(4L,
4L, 3L, 2L, 3L, 2L, 2L, 4L, 2L, 3L, 3L, 4L, 3L, 3L, 1L, 4L, 4L,
3L, 1L, 1L, 1L, 4L, 4L, 4L, 3L, 4L, 4L, 2L, 4L, 4L), Media = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Contraceptive = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("W age",
"W education", "H education", "N children", "W religion", "W employment",
"H occupation", "Standard of living", "Media", "Contraceptive"
), row.names = c(NA, 30L), class = "data.frame")
>
I was given the rescale function to use with the homals package, to do optimal scaling. Here is the function:
rescale <- function(res) {
# Rescale homals results to proper scaling
n <- nrow(res$objscores)
m <- length(res$catscores)
res$objscores <- (n * m)^0.5 * res$objscores
res$scoremat <- (n * m)^0.5 * res$scoremat
res$catscores <- lapply(res$catscores, FUN = function(x) (n * m)^0.5 * x)
res$cat.centroids <- lapply(res$cat.centroids, FUN = function(x) (n * m)^0.5 * x)
res$low.rank <- lapply(res$low.rank, FUN = function(x) n^0.5 * x)
res$loadings <- lapply(res$loadings, FUN = function(x) m^0.5 * x)
res$discrim <- lapply(res$discrim, FUN = function(x) (n * m)^0.5 * x)
res$eigenvalues <- n * res$eigenvalues
return(res)
}
The standard way to bootstrap in R is to use base package boot.
I am not very satistied with the code that follows because it is throwing lots of warnings. But maybe this is due to the dataset I have tested it with. I have used the dataset and 3rd example in help("homals").
I have run 10 bootstrap replicates only.
library(homals)
library(boot)
boot_eigen <- function(data, indices){
d <- data[indices, ]
res <- homals(d, active = c(rep(TRUE, 4), FALSE), sets = list(c(1,3,4),2,5))
res$eigenvalues
}
data(galo)
set.seed(7578) # Make the results reproducible
eig <- boot(galo, boot_eigen, R = 10)
eig
#
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = galo, statistic = boot_eigen, R = 10)
#
#
#Bootstrap Statistics :
# original bias std. error
#t1* 0.1874958 0.03547116 0.005511776
#t2* 0.2210821 -0.02478596 0.005741331
colMeans(eig$t)
#[1] 0.2229669 0.1962961
If this also doesn't run properly in your case, please say so and I will delete the answer.
EDIT.
In order to answer to the discussion in the comments, I have changed the function boot_eigen, the call to homals now follows the question code and rescale is called before returning.
boot_eigen <- function(data, indices){
d <- data[indices, ]
res <- homals(data = d, rank = 1, ndim = 9, level = "nominal")
res <- rescale(res)
res$eigenvalues
}
set.seed(7578) # Make the results reproducible
eig <- boot(mydata, boot_eigen, R = 10)
I have a line plot of some event at a hospital that I have been struggling with.
The challenges that I haven't solved yet are, 1) sorting the lines on the plot so that the patient-lines are sorted by Assessment-date, 2) coloring the lines by the variable 'openCase' and finally, 3) I would like to remove the Discharge-point (the blue square) for the cases that are in the year 2014 (or at some other random cut of date).
Any help would be appreciated?
Here is my sample data,
library(ggplot2)
library(plyr)
df <- data.frame(
date = seq(Sys.Date(), len= 156, by="5 day")[sample(156, 78)],
openCase = rep(0:1, 39),
patients = factor(rep(1:26, 3), labels = LETTERS)
)
df <- ddply(df, "patients", mutate, visit = order(date))
df$visit <- as.factor(df$visit)
levels(df$visit) <- c("Assessment (1)", "Treatment (2)", "Discharge (3)")
qplot(date, patients, data = df, geom = "line") +
geom_point(aes(colour = visit), size = 2, shape=0)
I'm aware that my example data is not perfect as some of the assessment datas is after the treatments and some of the discharge data is before the assessments data, but that part of the challenge that my base data is messed up.
What it looks like at the moment,
Update 2012-04-30 16:30:13 PDT
My data is delivered from a database and looks something like this,
df <- structure(list(date = structure(c(15965L, 15680L, 16135L, 15730L,
15920L, 15705L, 16110L, 15530L, 15575L, 15905L, 16140L, 15795L,
15955L, 15945L, 16205L, 15675L, 15525L, 15830L, 15625L, 15725L,
15855L, 15840L, 15615L, 15500L, 15780L, 15765L, 15610L, 15690L,
16080L, 15570L, 15685L, 16175L, 15740L, 15600L, 15985L, 15485L,
15605L, 16115L, 15535L, 15755L, 16145L, 16040L, 15970L, 16000L,
16075L, 15995L, 16010L, 15990L, 15665L, 15895L, 15865L, 16120L,
15880L, 15930L, 16055L, 15820L, 15650L, 16155L, 15700L, 15640L,
15505L, 15750L, 15800L, 15775L, 15825L, 15635L, 16150L, 15860L,
16100L, 15475L, 16050L, 15785L, 15495L, 15810L, 15805L, 15490L,
15460L, 16085L), class = "Date"), openCase = c(0L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L), patients = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L,
11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L, 15L,
16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L, 20L,
20L, 20L, 21L, 21L, 21L, 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L,
24L, 25L, 25L, 25L, 26L, 26L, 26L), .Label = c("A", "B", "C",
"D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P",
"Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"), class = "factor"),
visit = structure(c(2L, 1L, 3L, 3L, 1L, 2L, 2L, 3L, 1L, 3L,
1L, 2L, 2L, 1L, 3L, 2L, 1L, 3L, 1L, 2L, 3L, 3L, 2L, 1L, 3L,
2L, 1L, 3L, 1L, 2L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 1L,
3L, 2L, 1L, 2L, 3L, 3L, 1L, 2L, 1L, 3L, 2L, 2L, 3L, 1L, 3L,
2L, 1L, 3L, 2L, 1L, 1L, 2L, 3L, 3L, 1L, 2L, 2L, 3L, 1L, 1L,
3L, 2L, 1L, 3L, 2L, 2L, 1L, 3L), .Label = c("zym", "xov", "poi"
), class = "factor")), .Names = c("date", "openCase", "patients",
"visit"), row.names = c(NA, -78L), class = "data.frame")
The number of levels in visit, and specific labeling, will most likely change so I would like some kind of code where I rank or sort based on my existing data instead (visit) of generating new variables.
This is part-way:
Starting from after your initial definition of the data.
First, I think you want rank(date) rather than order(date) -- it made more sense to me, anyway.
df <- ddply(df, "patients", mutate, visit = rank(date))
df$visit <- as.factor(df$visit)
levels(df$visit) <- c("Assessment (1)", "Treatment (2)", "Discharge (3)")
Reorder patients by minimum date value (= Assessment date):
df$patients <- reorder(df$patients,df$date,function(x) min(as.numeric(x)))
Create a new data set missing the Discharge point, where they are after Jan 1 2014 (if you wanted to drop the Discharge point for cases that were assessed after a given date, you'd need to use ddply):
df2 <- subset(df,!(visit=="Discharge (3)" & date > as.Date("2014-01-01")))
As #Joran pointed out above it's a bit hard to get two separate colour scales for different variables, but this sort-of works (you have to make openCase into a factor in order to combine it with the colour scale for visit)
ggplot(df, aes(date, patients)) + geom_line(aes(colour=factor(openCase))) +
geom_point(data=df2,aes(colour = visit), size = 2, shape=0)
Alternately (and I think this is prettier anyway), you could code openCase with line type:
ggplot(df, aes(date, patients)) + geom_line(aes(linetype=factor(openCase))) +
geom_point(data=df2,aes(colour = visit), size = 2, shape=0)
I'm still not sure I understand what is wrong with #Ben's answer, but I'll try adding one of my own. Starting with the df given in the edit.
Create a new variable Visit (note the capital V) which is Assessment/Treatment/Discharge based on the ordering of the dates given. This is #Ben's code, just re-written.
df <- ddply(df, "patients", mutate,
Visit = factor(rank(date),
levels = 1:3,
labels=c("Assessment (1)", "Treatment (2)", "Discharge (3)")))
I don't understand how this relates to the visit column in the data originally; in fact, the original visit column is not used hereafter:
> table(df$Visit, df$visit)
zym xov poi
Assessment (1) 16 7 3
Treatment (2) 3 16 7
Discharge (3) 7 3 16
Reorder the patients (again copying Ben):
df$patients <- reorder(df$patients,df$date,function(x) min(as.numeric(x)))
Determine the subset of points that should be shown (same idea as Ben, but different code)
df2 <- df[!((df$Visit == "Discharge (3)") & (df$date > as.Date("2014-01-01"))),]
To add something new, here is a way to make the lines different colors without impacting the legend
ggplot(df, aes(date, patients)) +
geom_blank() +
geom_line(data = df[df$openCase == 0,], colour = "black") +
geom_line(data = df[df$openCase == 1,], colour = "red") +
geom_point(data = df2, aes(colour = Visit), size = 2, shape = 0)