Using lmer for multiple dependent variables - r

Here is an example of my dataset:
df <- data.frame(
id = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65),
collection_point = c(rep(c("Baseline", "Immediate", "3M"), each=28)),
intervention = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C",
"A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)),
scale_A = c(6.5, 7.0, 6.25, 6.0, NA, 7.5, 7.5,
8.0, 7.5, 6.75, 7.5, 6.75, 6.75, 6.5,
5.75, 6.75, 7.75, 7.5, 7.75, 7.25, 7.75,
7.25, 7.25, 5.75, 6.75, NA, 6.75, 7.5,
6.75, 7.0, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.25, 7.25, 7.25, 7.5, 6.5, 6.25,
6.25, 7.25, 7.5, 6.75, 7.25, 7.25, 7.5,
7.25, 7.5, 7.25, NA, 7.0, 7.5, 7.5,
6.75, 7.25, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.5, 7.5, 7.5, 7.5, 6.5, 5.75,
6.25, 6.75, 7.5, 7.25, 7.25, 7.5, 7.75,
7.75, 7.75, 7.5, NA, NA, NA, NA))
scale_B = c(5.0, 6.5, 6.25, 7.0, NA, 5.5, 6.5,
6.0, 7.5, 5.75, 6.5, 5.75, 7.75, 6.5,
6.75, 7.75, 7.75, 7.5, 7.75, 5.25, 7.75,
6.25, 6.25, 6.75, 5.75, NA, 6.75, 6.5,
7.75, 6.0, 7.5, 6.0, 7.5, 7.5, 6.5,
6.75, 6.25, 6.25, 6.25, 6.5, 6.5, 7.25,
7.25, 6.25, 6.5, 7.75, 6.25, 7.25, 6.5,
6.25, 6.5, 6.25, NA, 7.0, 6.5, 7.5,
7.75, 6.25, 7.5, 6.0, 7.5, 6.5, 6.5,
6.75, 6.5, 6.5, 6.5, 7.5, 7.5, 6.75,
7.25, 7.75, 6.5, 6.25, 7.25, 6.5, 6.75,
6.75, 6.75, 6.5, 5.5, NA, NA, 6.5))
scale_C = c(5.5, 5.0, 7.25, 7.0, 8.0, 5.5, 5.5,
8.0, 5.5, 7.75, 5.5, 7.75, 7.75, 7.5,
7.75, 7.75, 5.75, 5.5, 5.75, 5.25, 5.75,
5.25, 6.25, 7.75, 7.75, NA, 7.75, 5.5,
6.75, 6.0, 7.5, 5.0, 5.5, 5.5, 7.5,
5.75, 6.25, 5.25, 5.25, 5.5, 7.5, 7.25,
7.25, 6.25, 5.5, 7.75, 5.25, 5.25, 7.5,
5.25, 6.5, 5.25, 5.0, 5.0, 5.5, 5.5,
7.75, 6.25, 7.5, 5.0, 5.5, 5.5, 7.5,
5.75, 6.5, 5.5, 5.5, 5.5, 7.5, 7.75,
7.25, 7.75, 5.5, 5.25, 5.25, 5.5, 6.75,
5.75, 5.75, 5.5, 6.75, NA, 5.75, NA))
where,
id = participant
collection_point = times data was collected from participant (repeated measure)
intervention = group each participant was randomized to (fixed effect)
scale_A = questionnaire score that each participant completed at each data collection point (outcome)
Participants were randomized to one of three interventions and completed the same scales (scales A-C) at three different time points to determine any improvements over time.
I have used the code
mixed.lmer.A1<-lmer(scale_A~intervention+(collection_point|id), control =
lmerControl(check.nobs.vs.nRE = "ignore"), data = df)
but I would like to run MANOVA as all scales measure different aspects of a cohesive theme. However, I can't run
mixed.lmer.comb<-lmer(cbind(scale_A, scale_B, scale_C)~intervention+
(collection_point|id), control = lmerControl(check.nobs.vs.nRE = "ignore"),
data = df)
like I originally thought. It does work if I run using lm but that wouldn't be super meaningful as I need to account for collection_point as a repeated measure.
Is there a way I can run multiple dependent variables using lmer?

You can do this by converting the data to long format: there are a lot of ways to do this, e.g. reshape in base R or reshape2::melt, but I find tidyr::pivot_longer the easiest:
df_long <- tidyr::pivot_longer(df, cols = starts_with("scale_"),
names_to = "scales",
values_to = "value")
The fixed effects are 0 + scales + scales:intervention: we don't want an overall intercept, we want a scale-specific intercept, plus intervention effects for each scale.
The random effects are collection_point|scales/id: this allows the effect of collection point to vary across scales and across id (as in the original model).
mm <- lmer(value ~ 0 + scales + scales:intervention + (collection_point|scales/id),
data = df_long,
control = lmerControl(check.nobs.vs.nRE = "ignore"))
This model is technically correct, but gives a singular fit (as is not surprising since we are trying to estimate a variance across only three levels of scales); see ?isSingular, or the GLMM FAQ, for advice about to handle this.
This is not the only model we could set up; a maximal model would include more terms.
Some further comments:
One principle is that, since the elements of multivariate data generally have different units, we should not have any terms in the model that apply across scales (such as an overall intercept, or an overall effect of intervention); this might not apply in your particular case, I don't know
it is unusual (and often, although not always, wrong) to have a term varying in a random effect (collection_point in this case) that does not have a corresponding fixed effect — by doing so you are assuming that the average (population-level) effect of collection point is exactly zero, which is surprising unless (1) there's something special about the experimental design (e.g. observations were somehow randomized across collection points), (2) you have pre-processed the data in some way (e.g. explicitly transformed the data to have zero variance across collection points), (3) are setting up a null model for comparison.
I'm a little concerned about your need to override the check that you have fewer random effects estimated than observations; I haven't looked into this in detail, but that usually means your model is overfitting in some way. (Maybe it's because we're looking at a subset of the data, and this doesn't come up in the full data set?)
More here.

Related

Multiple bands per group using plot_model

Here is an example of my dataset:
df <- data.frame(
id = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65),
collection_point = c(rep(c("Baseline", "Immediate", "3M"), each=28)),
intervention = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C",
"A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)),
scale_A = c(6.5, 7.0, 6.25, 6.0, NA, 7.5, 7.5,
8.0, 7.5, 6.75, 7.5, 6.75, 6.75, 6.5,
5.75, 6.75, 7.75, 7.5, 7.75, 7.25, 7.75,
7.25, 7.25, 5.75, 6.75, NA, 6.75, 7.5,
6.75, 7.0, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.25, 7.25, 7.25, 7.5, 6.5, 6.25,
6.25, 7.25, 7.5, 6.75, 7.25, 7.25, 7.5,
7.25, 7.5, 7.25, NA, 7.0, 7.5, 7.5,
6.75, 7.25, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.5, 7.5, 7.5, 7.5, 6.5, 5.75,
6.25, 6.75, 7.5, 7.25, 7.25, 7.5, 7.75,
7.75, 7.75, 7.5, NA, NA, NA, NA))
where,
id = participant
collection_point = times data was collected from participant (repeated measure)
intervention = group each participant was randomized to (fixed effect)
scale_A = questionnaire score that each participant completed at each data collection point (outcome)
Participants were randomized to one of three interventions and completed the same scales (scales A-C) at three different time points to determine any improvements over time.
I have used the code
mixed.lmer.A1<-lmer(scale_A~intervention*collection_point+(collection_point|id), control =
lmerControl(check.nobs.vs.nRE = "ignore"), data = df)
I can use plot_model(mixed.lmer.A1) and use the function terms to select only the interaction effects (ex: terms = c("interventionB:collection_point3M") to create a forest plot. However, I think it would look much neater to only have the interventions on the y axis and have multiple bands that represent each collection_point. Desired output like this:
Any idea how I can do this? Thanks!
Here is one solution:
library(ggplot2)
library(lme4)
#> Loading required package: Matrix
library(sjPlot)
#> Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!
df <- data.frame(
id = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65),
collection_point = c(rep(c("Baseline", "Immediate", "3M"), each=28)),
intervention = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C",
"A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)),
scale_A = c(6.5, 7.0, 6.25, 6.0, NA, 7.5, 7.5,
8.0, 7.5, 6.75, 7.5, 6.75, 6.75, 6.5,
5.75, 6.75, 7.75, 7.5, 7.75, 7.25, 7.75,
7.25, 7.25, 5.75, 6.75, NA, 6.75, 7.5,
6.75, 7.0, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.25, 7.25, 7.25, 7.5, 6.5, 6.25,
6.25, 7.25, 7.5, 6.75, 7.25, 7.25, 7.5,
7.25, 7.5, 7.25, NA, 7.0, 7.5, 7.5,
6.75, 7.25, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.5, 7.5, 7.5, 7.5, 6.5, 5.75,
6.25, 6.75, 7.5, 7.25, 7.25, 7.5, 7.75,
7.75, 7.75, 7.5, NA, NA, NA, NA))
mixed.lmer.A1 <- lmer(scale_A~intervention*collection_point+(collection_point|id), control =
lmerControl(check.nobs.vs.nRE = "ignore"), data = df)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> unable to evaluate scaled gradient
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge: degenerate Hessian with 1 negative eigenvalues
plot_model(mixed.lmer.A1, type = "int") +
coord_flip()
Created on 2021-12-13 by the reprex package (v2.0.1)

Saving outputs as dataframe in loop

Here I have tried to test my data for presence of acf and select a trend test method for daily climate station data ordered into multiple columns. The code intends to work as such:
find monthly average
test the acf
Select trend test method
save result of trend test for each month in excel.
so far i have managed to write a code that does for each month. But, I encountered a problem saving the result as dataframe and export into excel(Error in Result[k, ] <- Outcome : incorrect number of subscripts on matrix). Can any one help me? I have attached a sample data and code I have written.
Data:
structure(list(Date = structure(c(5479, 5480, 5481, 5482, 5483,
5484, 5485, 5486, 5487, 5488, 5489, 5490, 5491, 5492, 5493, 5494,
5495, 5496, 5497, 5498, 5499, 5500, 5501, 5502, 5503, 5504, 5505,
5506, 5507, 5508, 5509, 5510, 5511, 5512, 5513, 5514), class = "Date"),
Adaba = c(6.7, 7.6, 4.9, 6.2, 7.8, 3.1, 4.5, 4.9, 4.2, 5.8,
6.7, 6.1, 5.7, 5.8, 6.4, 5.3, 5.1, 7.6, 7.1, 5.8, 6.7, 6.5,
8.9, 7.6, 7.6, 11.3, 9.5, 11.3, 7.8, 7.6, 6.7, 7.1, 7.6,
7.5, 6.7, 6.5), Bedessa = c(15.1, 14.1, 10.8, 9.9, 10.7,
10.7, 12.4, 13.5, 13, 11.4, 12.9, 13, 13.6, 13, 10.8, 11.9,
13, 10.8, 9.7, 10.8, 9.2, 8.7, 9.2, 10.9, 9.7, 8.8, 12, 10.8,
11.4, 10.3, 10.8, 14.1, 13.5, 13, 14.1, 15.5), Beletu = c(15.3,
14.9, 15.1, 15.7, 15.5, 15.3, 14.8, 15.3, 15.5, 15.2, 14.7,
15.8, 15.9, 14.6, 13.7, 15.2, 15.3, 15.7, 16.2, 15, 15.4,
12.5, 12.6, 12.9, 13.4, 13.2, 11.5, 11.6, 11.7, 12.5, 12.6,
12.6, 12.7, 12, 10.7, 11.8)), row.names = c(NA, 36L), class = "data.frame")`enter code here`
code:
Wabi <- read.csv("Tmin_17.csv",TRUE,",")
# makes the file as data frame
class(Wabi)
# this the package to identify the date type
library(xlsx)
library(lubridate)
library(dplyr)
library(modifiedmk)
# to make new columns of month and year from the original data
Wabi$Date <- as.Date(Wabi$Date, format("%m/%d/%Y"))
# add the package lubridate
Wabi$month <- lubridate::month(Wabi$Date)
Wabi$year <- lubridate::year(Wabi$Date)
# to view the change in the original data
head(Wabi)
N=34
Result <- matrix(nrow = 100,ncol = 2)
# this function is written to sum monthly values
for (k in 1:192){
for(j in 2:17) {
colnum <- colnames(Wabi[j])
Wabi_mon <- Wabi%>%group_by(year,month)%>%summarise_at(.vars = colnum,.funs = mean)
for (i in 1:12)
{
test = acf((Wabi_mon %>% group_by(month) %>% filter(month == i))[3],lag.max = 1)
Trendtest1 <- as.data.frame(mmky(as.data.frame((Wabi_mon %>% group_by(month) %>% filter(month == i))[3])[,]))
Trendtest2 <- as.data.frame(mkttest(as.data.frame((Wabi_mon %>% group_by(month) %>% filter(month == i))[3])[,]))
if (abs(test$acf[2])>abs(((-1-1.96*(N-1)^0.5))/N-1))
Outcome <- Trendtest1
else
Outcome <- Trendtest2
Result[k,] <- Outcome
}
}
}
Result <- data.frame(Result)
class(Result)
write.xlsx(Result,file = "tmin_trend.xlsx",sheetName = "Sheet1")

Kolmogorov-Smirnov test in R - For-loop

I have a problem with comparing two sets of curves by using the Kolmogorow-Smirnow-test.
What I would like the program to do, is to compare each variation of Curve 1 with each variation of Curve 2. To accomplish that, I have tried to build a for-loop that iterates through Curve 1, and within that loop another loop that iterates through Curve 2.
Unfortunately, when executing the code, I get an error message about
"not enough x-Data“
When I try running the test by comparing one variation of each curve manually, it works, so I think the problem is the combination of the two loops and the KS-test.
If anyone has experienced a similar error and was able to solve the issue, I would highly appreciate any advice on how to fix it. Thank you!
Example data.frames:
Kurve1 <- structure(list(Punkte = 1:21,
Trial.1 = c(105.5, 85.3, 63.1, 54.9, 42, 34.1, 30.7,
24.2, 20.1, 15.7, 14, 11, 9.3, 7.2, 6.6,
5.3, 4.2, 3.3, 2.6, 1.8, 0.9),
Trial.2 = c(103.8, 85.2, 64.3, 54.1, 41.8, 35.9, 29,
23.7, 20.2, 15.9, 13.5, 11, 9.3, 7.3, 6.4,
5.5, 4.3, 3.4, 2.5, 1.9, 0.9),
Trial.3 = c(104.8, 87.2, 64.9, 52.8, 40.8, 35.6, 29.1,
24.5, 20.4, 16.2, 13.7, 11.2, 9.2, 7.5,
6.4, 5.5, 4.2, 3.5, 2.5, 1.8, 0.9),
Trial.4 = c(106.9, 83.9, 67.1, 55.1, 44.1, 34.1, 29.3,
22.9, 19.4, 16.7, 13.6, 10.8, 9.4, 7.4,
6.1, 5.6, 4.4, 3.5, 2.4, 1.9, 0.9),
Trial.5 = c(104.8, 84.3, 68.7, 54.8, 45.3, 35.2, 28.9,
23.1, 20.1, 16.9, 13.3, 11, 9.6, 7.1, 6.3,
5.4, 4.5, 3.4, 2.3, 2, 0.9)),
class = "data.frame", row.names = c(NA, -21L))
Kurve2 <- structure(list(Punkte = 1:21,
Trial.1 = c(103.5, 81.2, 66.2, 54.5, 45.1, 39.1, 30.9,
27, 21.9, 19.3, 16.6, 14.9, 12.9, 11, 10.1,
9.2, 8, 7.1, 6.3, 6.2, 5),
Trial.2 = c(104, 81, 66.9, 55.2, 46, 38.7, 31.2, 27.3,
22.3, 20, 17.2, 15.2, 12.9, 11.1, 10.2,
9.1, 8, 7.1, 6.4, 5.9, 5),
Trial.3 = c(103.9, 81.9, 67.2, 53.8, 45.4, 38.5, 31.5,
26.8, 22.2, 19.8, 17.4, 15.1, 13, 10.9,
10.1, 9.2, 8.1, 7.1, 6.4, 6, 4.9),
Trial.4 = c(104.2, 84.1, 68.7, 55.4, 45.1, 36.3, 32,
26.9, 22.8, 19.8, 16.8, 14.8, 13.2, 10.9,
10.3, 9.1, 8.2, 7.2, 6.3, 6.1, 5),
Trial.5 = c(103.8, 83.2, 69.2, 55.7, 44.8, 36.4, 31.4,
26.7, 22.1, 18.9, 16.9, 14.4, 13, 11.1,
10.2, 9, 7.9, 7, 6.3, 6.1, 5.1)),
class = "data.frame", row.names = c(NA, -21L))
The code I used for the loop:
for(i in 1:ncol(Kurve1)){
for(j in 1:ncol(Kurve2)){
ks.test(Kurve1$Trial.[i], Kurve2$Trial.[j], alternative = "greater")
}
}
This will work:
for(i in 1:(ncol(Kurve1) - 2)){
for(j in (i + 1):(ncol(Kurve2) - 1)){
print(paste0("Trial.", i, " - Trial.", j))
ks_result <- ks.test(Kurve1[, paste0("Trial.", i)],
Kurve2[, paste0("Trial.", j)],
alternative="greater")
print(ks_result)
}
}
Explanation:
As it is doesn't make sense to run the KS test for the same column, and also doesn't make sense to run for both Trial.1 ~ Trial.2 and Trial.2 ~ Trial.1, etc., you have to run your outer for loop from 1 to the last but one ((ncol(Kurve1) - 2)) index for Trial.* columns, and you have to run your inner for loop from the next index as the outer loop has (i + 1) to the last index ((ncol(Kurve2) - 1)) for Trial.* columns.
You can not paste strings like Trial.[i], you have to use the paste function for that. As with that the Kurve1$paste0("Trial.", i) notation not working, you have to use the extract operator [ to get the column you need (Kurve1[, paste0("Trial.", i)])
As in a (nested) for loop the ks.test runs silently, a have added a print to be able to see the results. I have also added a line print(paste0("Trial.", i, " - Trial.", j)) to tag the actual result with the columns for which it belongs.

left_join does not merge all values

I'm merging two data.frames, dat1 and dat2, by temp and the merge is not providing all values for dat2. Why are values from dat2 not merging correctly?
Sample data
dat1 <- data.frame(temp = seq(0, 33.2, 0.1))
dat2 <- structure(list(temp = c(6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7,
7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8, 8.1, 8.2, 8.3,
8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6,
9.7, 9.8, 9.9, 10, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7,
10.8, 10.9, 11, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8,
11.9, 12, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9,
13, 13.1, 13.2), pprox = c(193.53, 626.8, 1055.04, 1478.24,
1896.41, 2309.55, 2717.64, 3120.69, 3518.7, 3911.66, 4299.58,
4682.45, 5060.26, 5433.03, 5800.74, 6163.39, 6520.99, 6873.53,
7221.01, 7563.43, 7900.78, 8233.07, 8560.3, 8882.46, 9199.56,
9511.59, 9818.55, 10120.44, 10417.27, 10709.03, 10995.71, 11277.33,
11553.88, 11825.36, 12091.78, 12353.13, 12609.41, 12860.63, 13106.78,
13347.87, 13583.89, 13814.86, 14040.76, 14261.61, 14477.41, 14688.14,
14893.83, 15094.47, 15290.05, 15480.59, 15666.09, 15846.55, 16021.96,
16192.34, 16357.68, 16517.98, 16673.26, 16823.51, 16968.73, 17108.93,
17244.1, 17374.25, 17499.38, 17619.5, 17734.6, 17844.68, 17949.76,
18049.82, 18144.87, 18234.91)), row.names = c(NA, 70L), class = "data.frame")
Merge
dat <- left_join(dat1, dat2, by = "temp")
Output
dat[65:70, ]
temp approx
65 6.4 626.80
66 6.5 1055.04
67 6.6 NA
68 6.7 1896.41
69 6.8 NA
70 6.9 2717.64
I converted the temp columns in both data frames to a factor, followed by left joining them together. It works!
dat1$temp <- as.factor(dat1$temp)
dat2$temp <- as.factor(dat2$temp)
dat <- left_join(dat1, dat2, by = "temp")
Hmm interestingly identical(dat2$temp[4],6.6 ) returns TRUE but identical(dat1$temp[67],6.6) returns FALSE.
Floating point issues are a known issues have a look at Why are these numbers not equal? or floating point issue in R? among many other similar posts.
If you set dat1 <- data.frame(temp = round(seq(0, 33.2, 0.1), 2)) it should fix this. Possibly check out ?all.equal as all.equal(dat1$temp[67],6.6 )
is TRUE

ggplot2 - factor colour and legend adjustment

I have written a short script to plot trends of measured radioactivity activities on 2 separate measurement devices. The script is shown below
pkgLoad <- function(x)
{
if (!require(x,character.only = TRUE))
{
install.packages(x,dep=TRUE, repos='http://star-www.st-andrews.ac.uk/cran/')
if(!require(x,character.only = TRUE)) stop("Package not found")
}
}
pkgLoad("ggplot2")
pkgLoad("XLConnect")
pkgLoad("reshape2")
#Load the workbook
wb<-loadWorkbook("CapintecQC.xlsx")
df_blue <-readWorksheet(wb, sheet = "Blue", startCol=1, endCol=6)
#sort date format
df_blue$Date <- as.Date(df_blue$Date , "%d/%m/%y")
df_blue[order(df_blue$Date ),]
df_gold <-readWorksheet(wb, sheet = "Gold", startCol=1, endCol=6)
df_gold$Date <- as.Date(df_gold$Date , "%d/%m/%y")
df_gold[order(df_gold$Date ),]
#Reference Cs-137 details
ref_activity <- 9.3
half_life <- 30.23
ref_date <- as.Date('06/01/08',format='%d/%m/%y')
blue_melt <- melt(df_blue[,c(1,2:6)], id="Date", value.name="Activity", variable.name="Isotope")
#Add new column to data frame with expected activity
df_gold["Exp_Act"] <- round(ref_activity*exp((-0.693/half_life)*as.numeric(difftime(df_gold$Date,ref_date))/365.25),3)
df_gold["Exp_Act_0.95"] <- 0.95 * df_gold$Exp_Act
df_gold["Exp_Act_1.05"] <- 1.05 * df_gold$Exp_Act
gold_melt <- melt(df_gold[,c(1,2:6)], id="Date", value.name="Activity", variable.name="Isotope")
p <- ggplot( NULL )+geom_point(data = gold_melt, aes(x=Date,y=Activity, col=Isotope)) + geom_ribbon(data = df_gold, aes(x = Date, ymin = Exp_Act_0.95, ymax = Exp_Act_1.05), fill='blue', alpha=0.2) + geom_point(data = blue_melt, aes(x=Date,y=Activity, col=Isotope), shape=2) + theme_bw()
print(p)
I am not very competent with R/ggplot2. I would like the final plot to show the measured activity for each radionuclide to be the same color for both devices (i.e Cs-137 in red, 99mTc in Blue). How can I do this as my graph plots different colours.
Also the legend is non-pleasing.
(i) The format for each nuclide, which is picked up from the excel header changes from Cs-137 to Cs.137. How can I have Cs-137, Tc-99m etc as headers?
(ii) Each radionuclide is duplicated in the legend - one for each device. Is it possible to show just the legend for the first data frame (df_gold) or better just have text in the legend, with the text color matched to the marker color in the plot?)
df_gold structure
structure(list(Date = structure(c(15708, 15709, 15712, 15713,
15714, 15715, 15716, 15719, 15720, 15721, 15722, 15723, 15726,
15727, 15729, 15730, 15733, 15734, 15735, 15736, 15740, 15741,
15743, 15747, 15748, 15749, 15750, 15751, 15754, 15755, 15756,
15757, 15758, 15761, 15762, 15764, 15765, 15768, 15769, 15770,
15771, 15772, 15775, 15776, 15777, 15779, 15782, 15783, 15784,
15785, 15786, 15789, 15790, 15791, 15792, 15797, 15798, 15799,
15800), class = "Date"), Cs..137 = c(8.2, 8.1, 8.1, 8.1, 8.1,
8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1,
8.1, 8.1, 8.1, 8.2, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8, 8.2,
8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1,
8.1, 8.1, 8.1, 8, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1, 8.1,
8.1, 8.1), In..111 = c(6.49, 6.47, 6.48, 6.43, 6.49, 6.51, 6.5,
6.47, 6.48, 6.4, 6.48, 6.48, 6.48, 6.49, 6.49, 6.47, 6.48, 6.48,
6.5, 6.47, 6.49, 6.55, 6.46, 6.49, 6.48, 6.48, 6.46, 6.48, 6.49,
6.44, 6.49, 6.46, 6.45, 6.46, 6.46, 6.43, 6.49, 6.47, 6.45, 6.43,
6.44, 6.44, 6.44, 6.46, 6.45, 6.47, 6.45, 6.43, 6.44, 6.47, 6.45,
6.46, 6.45, 6.46, 6.39, 6.46, 6.44, 6.42, 6.41), I..123 = c(6.97,
6.94, 6.96, 6.91, 6.92, 6.95, 6.93, 6.92, 6.93, 7, 6.97, 6.96,
6.96, 6.94, 6.98, 6.97, 6.95, 6.95, 6.94, 6.96, 6.97, 7.01, 6.92,
7, 6.98, 6.97, 6.91, 6.99, 6.95, 6.88, 6.96, 6.91, 6.91, 6.93,
6.94, 6.94, 6.97, 6.93, 6.93, 6.93, 6.96, 6.94, 6.94, 6.92, 6.93,
6.91, 6.93, 6.92, 6.92, 6.91, 6.91, 6.89, 6.92, 6.9, 6.9, 6.91,
6.91, 6.9, 6.9), I..131 = c(10.5, 10.5, 10.5, 10.5, 10.5, 10.5,
10.5, 10.5, 10.5, 10.8, 10.5, 10.6, 10.5, 10.5, 10.5, 10.5, 10.5,
10.5, 10.5, 10.5, 10.5, 10.6, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5,
10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.4,
10.5, 10.4, 10.5, 10.5, 10.5, 10.4, 10.5, 10.4, 10.4, 10.5, 10.4,
10.4, 10.4, 10.4, 10.4, 10.3, 10.5, 10.5, 10.5, 10.6), Tc..99m = c(15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15.1, 15, 15, 15.1, 15,
15, 15, 15, 15.1, 15, 15.1, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 15, 14.9, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
14.9, 14.8, 14.9, 14.9, 14.9, 14.9, 15, 15, 14.8, 15, 15, 15,
15), Exp_Act = c(8.294, 8.293, 8.292, 8.291, 8.291, 8.29, 8.29,
8.288, 8.288, 8.287, 8.287, 8.286, 8.285, 8.284, 8.283, 8.283,
8.281, 8.28, 8.28, 8.279, 8.277, 8.277, 8.276, 8.274, 8.273,
8.273, 8.272, 8.272, 8.27, 8.27, 8.269, 8.269, 8.268, 8.266,
8.266, 8.265, 8.264, 8.263, 8.262, 8.262, 8.261, 8.261, 8.259,
8.259, 8.258, 8.257, 8.256, 8.255, 8.255, 8.254, 8.254, 8.252,
8.251, 8.251, 8.25, 8.248, 8.247, 8.247, 8.246), Exp_Act_0.95 = c(7.8793,
7.87835, 7.8774, 7.87645, 7.87645, 7.8755, 7.8755, 7.8736, 7.8736,
7.87265, 7.87265, 7.8717, 7.87075, 7.8698, 7.86885, 7.86885,
7.86695, 7.866, 7.866, 7.86505, 7.86315, 7.86315, 7.8622, 7.8603,
7.85935, 7.85935, 7.8584, 7.8584, 7.8565, 7.8565, 7.85555, 7.85555,
7.8546, 7.8527, 7.8527, 7.85175, 7.8508, 7.84985, 7.8489, 7.8489,
7.84795, 7.84795, 7.84605, 7.84605, 7.8451, 7.84415, 7.8432,
7.84225, 7.84225, 7.8413, 7.8413, 7.8394, 7.83845, 7.83845, 7.8375,
7.8356, 7.83465, 7.83465, 7.8337), Exp_Act_1.05 = c(8.7087, 8.70765,
8.7066, 8.70555, 8.70555, 8.7045, 8.7045, 8.7024, 8.7024, 8.70135,
8.70135, 8.7003, 8.69925, 8.6982, 8.69715, 8.69715, 8.69505,
8.694, 8.694, 8.69295, 8.69085, 8.69085, 8.6898, 8.6877, 8.68665,
8.68665, 8.6856, 8.6856, 8.6835, 8.6835, 8.68245, 8.68245, 8.6814,
8.6793, 8.6793, 8.67825, 8.6772, 8.67615, 8.6751, 8.6751, 8.67405,
8.67405, 8.67195, 8.67195, 8.6709, 8.66985, 8.6688, 8.66775,
8.66775, 8.6667, 8.6667, 8.6646, 8.66355, 8.66355, 8.6625, 8.6604,
8.65935, 8.65935, 8.6583)), row.names = c(NA, -59L), .Names = c("Date",
"Cs..137", "In..111", "I..123", "I..131", "Tc..99m", "Exp_Act",
"Exp_Act_0.95", "Exp_Act_1.05"), class = "data.frame")
df_blue structure
structure(list(Date = structure(c(15790, 15791, 15792, 15797,
15798, 15799, 15800), class = "Date"), Cs.137 = c(8.1, 8.2, 8.2,
8.2, 8.2, 8.2, 8.2), I.123 = c(6.82, 6.85, 6.91, 6.84, 6.82,
6.82, 6.83), I.131 = c(10.5, 10.6, 10.6, 10.5, 10.6, 10.6, 10.6
), In.111 = c(6.35, 6.45, 6.43, 6.37, 6.38, 6.4, 6.37), X99m.Tc = c(15,
15, 15.1, 15.1, 15.1, 15.1, 15.1)), .Names = c("Date", "Cs.137",
"I.123", "I.131", "In.111", "X99m.Tc"), row.names = c(NA, -7L
), class = "data.frame")
My approach would be to bind together both data frames and then add new column that contains name of device (gold or blue).
df<-rbind(gold_melt,blue_melt)
df$device<-rep(c("gold","blue"),c(nrow(gold_melt),nrow(blue_melt)))
With function recode() from library car change names of Isotope as they should be.
df$Isotope<-recode(df$Isotope,"c('Cs..137','Cs.137')='Cs-137';
c('I..123','I.123')='I-123';
c('I..131','I.131')='I-131';
c('In..111','In.111')='In-111'
;c('Tc..99m','X99m.Tc')='Tc-99m'")
Now you just need one call to geom_point() using new data frame. I added also shape=device to get different shapes for each device.
ggplot(NULL) +
geom_point(data=df,aes(x=Date,y=Activity, col=Isotope,shape=device))+
geom_ribbon(data = df_gold, aes(x = Date, ymin = Exp_Act_0.95, ymax = Exp_Act_1.05), fill='blue', alpha=0.2)
Just in case you'd want to "fuse" the legend together, then building up on Didzis' answer:
df <- transform(df, device = factor(device, levels=unique(device)),
grp = paste(Isotope, device, sep="_"))
require(RColorBrewer)
ggplot() + geom_point(data = df, aes(x = Date, y = Activity,
colour=grp, shape = grp, fill=grp)) +
geom_ribbon(data = df_gold, aes(x = Date, ymin = Exp_Act_0.95,
ymax = Exp_Act_1.05), fill='blue', alpha=0.2) +
scale_shape_manual("", values=rep(c(21,24), 5)) +
scale_fill_manual("", values=rep(brewer.pal(5, "Set1"), each=2)) +
scale_colour_manual("", values=rep(brewer.pal(5, "Set1"), each=2))

Resources