I have been using the code below to successfully modify the 'Zt', 'L', and 'A' slots of models fit using lme4 versions <1.0. I just updated to lme4 1.0-4 today and found that the model objects are different. Can anyone provide insight/guidance as to how to modify these slots in the new lmer model objects?
dat<-structure(list(pop1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 10L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 8L, 8L, 9L), pop2 = c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
5L, 6L, 7L, 8L, 9L, 10L, 6L, 7L, 8L, 9L, 10L, 7L, 8L, 9L, 10L,
8L, 9L, 10L, 9L, 10L, 10L), X = c(0.49136, 0.75587, 0.93952,
0.61278, 0.79934, 1.07918, 1.13354, 1.15836, 1.2014, 0.43136,
0.77815, 0.716, 0.93952, 1.13672, 1.16137, 1.18184, 1.21748,
0.65321, 0.86332, 1.04922, 1.19866, 1.20412, 1.22272, 1.24797,
0.89763, 1.08991, 1.19033, 1.15836, 1.17319, 1.18752, 0.64345,
0.93952, 0.98227, 1.01703, 1.07188, 0.78533, 0.94939, 0.99564,
1.06819, 0.64345, 0.716, 0.85126, -0.04576, 0.4624, 0.30103),
Y = c(0.491694, 0.394703, 0.113303, 0.156597, 0.450924, 0.487845,
0.21821, 0.129027, -0.131522, 0.35156, -0.116826, 0.18941,
0.306608, 0.258401, 0.008552, -0.024369, -0.305258, -0.013628,
0.215715, 0.13783, 0.467272, 0.088882, 0.084295, -0.172337,
-0.206725, -0.084339, -0.191651, -0.001586, -0.079501, -0.195094,
0.232045, 0.17102, 0.003742, -0.023688, -0.26085, 0.205326,
0.172809, 0.133219, -0.159054, 0.082231, 0.011025, -0.238611,
0.732679, 0.478058, 0.325698)), .Names = c("pop1", "pop2",
"X", "Y"), class = "data.frame", row.names = c(NA, -45L))
library(lme4) # lme4 versions >1.0 have different model output
# Specify the model formula
lmer_mod <- as.formula("Y ~ X + (1|pop1)")
# Create the Zl and ZZ matrices
Zl <- lapply(c("pop1","pop2"), function(nm) Matrix:::fac2sparse(dat[[nm]], "d", drop=FALSE))
ZZ <- Reduce("+", Zl[-1], Zl[[1]])
# Fit lmer model to the data
mod <- lmer(lmer_mod, data = dat, REML = TRUE)
# Replace the following slots in the fitted model
# These slots don't exist in this form in the new lmerMod objects
mod#Zt <- ZZ
mod#A <- ZZ
mod#L <- Cholesky(tcrossprod(ZZ), LDL=FALSE, Imult=1)
# Refit the model to the same response data
Final.mod <- refit(mod, dat[,Y])
Any help or insight as to how to modify these slots will be greatly appreciated. In the meantime, I guess I will stick to using an older version of lme4 for these models.
Does this do what you want? (This follows ?modular pretty closely ...)
Create the Zl and ZZ matrices:
Zl <- lapply(c("pop1","pop2"),
function(nm) Matrix:::fac2sparse(dat[[nm]], "d", drop=FALSE))
ZZ <- Reduce("+", Zl[-1], Zl[[1]])
Construct the random-effects data structures:
lf <- lFormula(Y ~ X + (1|pop1), data=dat)
Modify them:
lf$reTrms$Zt <- ZZ
Proceed through the remaining model-construction and fitting steps:
dfun <- do.call(mkLmerDevfun,lf) ## create dev fun from modified lf
opt <- optimizeLmer(dfun) ## fit the model
## make the results into a 'merMod' object
fit <- mkMerMod(environment(dfun), opt, lf$reTrms,
fr = lf$fr)
Related
I want a barplot based on the number of occurrences of a string in a particular column in a dataset in r.
At the same time, I want to run a t-test and plot the significant p-values using stars on the top of the bars. The nonsignificant can be represented as ns.
My attempt has been:
barplot(prop.table(table(ttcluster_dataset$Phenotype)),col=clustercolor,border="black",xlab="Phenotypes",ylab="Percentage of Samples expressed",main="Sample wise Phenotype distribution",cex.names = 0.8)
The dataset column is:
ttcluster_dataset$Phenotype<-
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), .Label = c("Proneural (Cluster 1)", "Proneural (Cluster 2)", "Neural (Cluster 1)", "Neural (Cluster 2)",
"Classical (Cluster 1)", "Classical (Cluster 2)", "Mesenchymal (Cluster 1)",
"Mesenchymal (Cluster 2)"), class = "factor")
All suggestions shall be apprciated.
A t-test is probably not what you want since you are looking at counts and proportions between the two clusters. Your data is not really set up to do either one so first we need to split the two variables:
Pheno.splt <- strsplit(as.character(ttcluster_dataset$Phenotype), " ")
Pheno.mat <- do.call(rbind, x)[, c(1, 3)]
ttclust <- data.frame(Phenotype=Pheno.mat[, 1], Cluster=gsub(")", "", Pheno.mat[, 2]))
str(ttclust)
# 'data.frame': 171 obs. of 2 variables:
# $ Phenotype: chr "Proneural" "Proneural" "Proneural" "Proneural" ...
# $ Cluster : chr "1" "1" "1" "1" ...
Now Phenotype and Cluster are separate columns in the data frame. There are multiple ways to do this, but here we just split your Phenotype into three parts by splitting on the space between them. Now ttclust is as data frame with two variables. Now a summary table and bar plot:
tbl <- xtabs(~Phenotype+Cluster, ttclust)
tbl
# Cluster
# Phenotype 1 2
# Classical 32 6
# Mesenchymal 44 10
# Neural 26 0
# Proneural 45 8
tbl.row <- prop.table(tbl, 1)
barplot(t(tbl.row), beside=TRUE)
At this point, a simple proportions test indicates that there is no difference in percent of Cluster 1 across the four Phenotypes:
prop.test(tbl)
4-sample test for equality of proportions without continuity correction
data: tbl
X-squared = 5.2908, df = 3, p-value = 0.1517
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 prop 4
0.8421053 0.8148148 1.0000000 0.8490566
Using `prop.test' on each Phenotype indicates that Cluster 1 is significantly difference from Cluster 2 in every case:
for(i in 1:4) print(prop.test(t(tbl[i, ])))
# First test
#
# 1-sample proportions test with continuity correction
#
# data: t(tbl[i, ]), null probability 0.5
# X-squared = 16.447, df = 1, p-value = 5.002e-05
# alternative hypothesis: true p is not equal to 0.5
# 95 percent confidence interval:
# 0.6807208 0.9341311
# sample estimates:
# p
# 0.8421053
. . . .
I would like to do vizualisation of 2 vectors (predikcia & test data) of all wrongly classified numbers from my classification problem, where i have 76 data in both vectors - first one (predikcia) has numbers from 0-9 what classificator wrongly predicted and in second vector (test data) are numbers what it should be. Basic plot of these vectors has not good representation or not giving some good information about what numbers were wrongly classified and what number they should be classified correctly. Here is a picture what is basic plot showing
data
classres <- data.frame(
predikcia = c(9L, 8L, 3L, 9L, 1L, 6L, 2L, 2L,
6L, 3L, 5L, 9L, 8L, 1L, 5L, 1L, 3L, 3L, 5L, 9L,
5L, 1L, 8L, 9L, 5L, 0L, 1L, 9L, 5L, 5L, 8L, 9L,
2L, 5L, 8L, 5L, 6L, 9L, 9L, 4L, 9L, 3L, 5L, 5L, 9L, 9L, 9L, 4L, 3L,
5L, 8L, 3L, 0L, 5L, 8L, 8L, 7L, 3L, 8L, 8L, 5L, 9L, 9L, 1L, 5L, 5L,
9L, 9L, 5L, 3L, 1L, 9L, 2L, 5L, 8L, 9L),
testdata = c(4L, 6L, 1L, 5L, 5L, 1L, 1L, 1L, 5L,
9L, 7L, 8L, 0L, 8L, 8L, 9L, 7L, 1L, 9L, 5L, 8L,
8L, 0L, 5L, 1L, 8L, 4L, 1L, 9L, 1L, 0L, 5L, 1L,
9L, 0L, 0L, 0L, 4L, 1L, 2L, 7L, 5L, 9L, 8L, 5L,
5L, 5L, 1L, 9L, 9L, 0L, 9L, 8L, 9L, 6L, 0L, 8L,
5L, 0L, 9L, 8L, 5L, 5L, 9L, 2L, 8L, 0L, 5L, 7L,
1L, 8L, 8L, 9L, 9L, 7L, 1L))
I'm assuming that there is either "correct" or "incorrect" predictions, otherwise the graph would need more work.
First, I have the data in which there are precitions and real values. In this examle they are integers, but I'm pretending that it does not mean anything.
classres <- data.frame(
predikcia = c(9L, 8L, 3L, 9L, 1L, 6L, 2L, 2L,
6L, 3L, 5L, 9L, 8L, 1L, 5L, 1L, 3L, 3L, 5L, 9L),
testdata = c(4L, 6L, 1L, 5L, 5L, 1L, 1L, 1L, 5L,
9L, 7L, 8L, 0L, 8L, 8L, 9L, 7L, 1L, 9L, 5L))
Then I create a count data-frame. The "factor" part is important because I want all the possible combinations to appear on the plot.
dat.plot <- classres %>%
count(testdata, predikcia) %>%
mutate(
testdata = factor(testdata, levels = 0:9),
predikcia = factor(predikcia, levels = 0:9))
Finally, I create a heatmap from the data coloring the inside of each cell with the count values and adding a border to the cells where predictions are considered correct (this is why I need the goodclass data-frame).
goodclass <- data.frame(
testdata = factor(0:9),
predikcia = factor(0:9)
)
dat.plot %>%
ggplot(aes(testdata, predikcia, fill = n)) +
geom_tile() +
scale_fill_gradient(low = "goldenrod", high = "darkorchid4") +
geom_tile(data = goodclass,
aes(testdata, predikcia, color = "Correct\npredictions"),
inherit.aes = FALSE, fill = NA, size = 2) +
scale_color_manual(values = c(`Correct\npredictions` = "limegreen")) +
labs(x = "Real class value", y = "Predicted class value",
fill = "count", color = "") +
coord_equal() +
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_line(color = "black", size = 2))
And the results hurst a little bit the eyes: it will probably need little bit more work to find more beautiful colors.
So i have the following for loop:
for (Count in 1:19){
png(paste0(colnames(fdd$rawCounts)[Count], ".pdf"))
qplot(y = log2(fdd$rawCounts[,Count]), main = colnames(fdd$rawCounts)[Count])
dev.off()
}
Which should simply plot some count data which i put a head from here:
structure(c(11L, 3L, 12L, 8L, 15L, 2L, 5L, 2L, 8L, 7L, 6L, 10L,
6L, 1L, 7L, 4L, 2L, 1L, 3L, 0L, 4L, 4L, 2L, 5L, 8L, 0L, 13L,
4L, 10L, 7L, 2L, 1L, 2L, 4L, 7L, 7L, 14L, 4L, 25L, 17L, 14L,
16L, 4L, 2L, 5L, 5L, 5L, 2L, 9L, 5L, 11L, 8L, 1L, 4L, 10L, 8L,
8L, 7L, 9L, 5L, 9L, 15L, 14L, 11L, 16L, 8L, 11L, 4L, 3L, 6L,
3L, 0L, 6L, 3L, 4L, 6L, 1L, 4L, 11L, 11L, 12L, 6L, 2L, 6L, 7L,
9L, 22L, 8L, 13L, 7L, 6L, 1L, 4L, 5L, 6L, 2L, 4L, 2L, 6L, 7L,
3L, 2L, 6L, 3L, 3L, 2L, 5L, 5L, 9L, 2L, 6L, 5L, 4L, 2L), .Dim = c(6L,
19L), .Dimnames = structure(list(feature = c("chr10:100000001-100000500",
"chr10:10000001-10000500", "chr10:1000001-1000500", "chr10:100000501-100001000",
"chr10:100001-100500", "chr10:100001001-100001500"), sample = c("K562_FAIRE_Acla_4hr_1",
"K562_FAIRE_Acla_4hr_2", "K562_FAIRE_Daun_4hr_1", "K562_FAIRE_Daun_4hr_2",
"K562_FAIRE_Etop_4hr_1", "K562_FAIRE_Etop_4hr_2", "K562_FAIRE_untreated",
"FAIRE.seq_K562_2MethylDoxo_A", "FAIRE.seq_K562_2MethylDoxo_B",
"FAIRE.seq_K562_Ctr_A", "FAIRE.seq_K562_Ctr_B", "FAIRE.seq_K562_Doxo_10uM_4hrs_A",
"FAIRE.seq_K562_Doxo_10uM_4hrs_B", "FAIRE.seq_K562_Epirubicin_A",
"FAIRE.seq_K562_Epirubicin_B", "FAIRE.seq_K562_MTX_40uM_4hrs_A",
"FAIRE.seq_K562_MTX_40uM_4hrs_B", "FAIRE.seq_K562_MTX_5uM_4hrs_A",
"FAIRE.seq_K562_MTX_5uM_4hrs_B")), .Names = c("feature", "sample"
)))
Now if i try to plot the data it gives me a variable called Count and the value is 19L. While i expect 19 plots to be drawn. Why is this happening?
Thanks!
This works for me:
for (Count in 1:19){
pdf(paste0(colnames(df)[Count], ".pdf"))
print(qplot(y = log2(df[, Count]), main = colnames(df)[Count]))
dev.off()
}
A couple of changes:
I read in your data as df. It turns out to be a matrix, so I adjusted the subsetting accordingly.
I also wrapped the qplot function in a print function which forces the figure to be created.
Finally, I switched the png function to pdf as that seemed like the files you were trying to create based on the paste0 result.
The following figure:
Was generated with the following code:
library(ggplot2)
library(reshape2)
dat <- structure(list(Type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), .Label = c("High_expression",
"KD.ip", "LG.ip", "LN.id", "LV.id", "LV.ip", "SP.id", "SP.ip"
), class = "factor"), ImmGen = structure(c(1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("Bcells",
"DendriticCells", "Macrophages", "Monocytes", "NKCells", "Neutrophils",
"StemCells", "StromalCells", "abTcells", "gdTCells"), class = "factor"),
Exp_06hr = c(7174.40482999999, 23058.74882, 39819.39133,
15846.46146, 8075.78226, 105239.11609, 7606.34563, 19513.57747,
7116.51211, 6978.64995, 498.36828, 732.01788, 621.51576,
546.63461, 529.1711, 545.17219, 477.54658, 1170.50303, 550.99528,
607.56707, 775.0691, 1269.50773, 2138.69883, 1561.74652,
601.9372, 5515.59896, 744.48716, 997.32859, 639.13126, 657.64581,
4165.29899, 5465.1883, 7773.25723, 5544.86758, 3461.13442,
8780.64899, 4380.00437, 8721.84871, 3674.62723, 3911.00108,
2932.76554, 5903.48407, 6179.81046, 3683.64539, 2744.59622,
6760.37307, 4097.14665, 6845.31988, 2872.77771, 2912.84262
), Exp_24hr = c(1596.9091, 4242.52354, 9984.68861, 3519.18627,
1602.92511, 12203.57109, 1656.19357, 3389.93866, 1617.35484,
1579.00309, 715.47289, 643.98371, 689.40412, 580.26036, 608.22853,
695.10737, 830.77947, 670.34899, 640.67908, 637.47464, 356.75713,
393.13449, 549.60095, 466.76064, 336.95453, 617.20976, 339.2476,
469.57407, 292.86365, 305.45178, 2604.07605, 4210.64843,
5797.13123, 3650.88447, 2275.03269, 6475.27485, 2604.70614,
4796.3314, 2411.09694, 2458.23237, 1498.21516, 1996.6875,
2927.82836, 1911.00463, 1523.57171, 2199.62297, 1541.82034,
2815.82184, 1608.46099, 1588.80561), ExpDiff_06_24hr = c(5577.49572999999,
18816.22528, 29834.70272, 12327.27519, 6472.85715, 93035.545,
5950.15206, 16123.63881, 5499.15727, 5399.64686, -217.10461,
88.03417, -67.88836, -33.62575, -79.05743, -149.93518, -353.23289,
500.15404, -89.6838, -29.9075700000001, 418.31197, 876.37324,
1589.09788, 1094.98588, 264.98267, 4898.3892, 405.23956,
527.75452, 346.26761, 352.19403, 1561.22294, 1254.53987,
1976.126, 1893.98311, 1186.10173, 2305.37414, 1775.29823,
3925.51731, 1263.53029, 1452.76871, 1434.55038, 3906.79657,
3251.9821, 1772.64076, 1221.02451, 4560.7501, 2555.32631,
4029.49804, 1264.31672, 1324.03701)), .Names = c("Type",
"ImmGen", "Exp_06hr", "Exp_24hr", "ExpDiff_06_24hr"), row.names = c(NA,
-50L), class = "data.frame")
dat.m <- melt(dat)
setwd("~/Desktop/")
pdf("myfig.pdf",width=30,height=20)
p <- ggplot(dat.m,aes(ImmGen,value)) +
geom_bar(aes(fill = variable),position = "dodge",stat="identity")+
facet_wrap(~Type)
p
dev.off();
How can I modify it such that instead of wrapping it to (2x3) matrix like the above, we create (5x1) matrix instead. So each row will have its on scale of y-axis.
Secondly notice that the blue-bar (ExpDiff_06_24hr) can contain negative value. How can I show that so that in the plot the bar goes below 0 in y-axis.
I think the subplots shouldn't be plotted in one row but in one column for clarity reasons. Whith some help of this answer (thanks to hrbrmstr for linking to it) and because I think this question deserves an answer, here is a solution:
dat$rank <- rank(dat$ExpDiff_06_24hr)
dat.m <- melt(dat, id = c("Type","ImmGen","rank"))
dat.t <- transform(dat.m, TyIm = factor(paste0(Type, ImmGen)))
dat.t <- transform(dat.t, TyIm = reorder(TyIm, rank(rank)))
p <- ggplot(dat.t, aes(TyIm,value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity")+
facet_wrap(~Type, ncol=1, scales="free") +
scale_x_discrete("ImmGen", breaks=dat.t$TyIm, labels=dat.t$ImmGen)
p
The result:
I've got what I know is a really easy question, but I'm stumped and seem to lack the vocabulary to seek out the answer effectively with the search bar.
I have a data frame full of numbers similar to this (though not of the same class)
Dat <- structure(c(9L, 9L, 3L, 3L, 2L, 9L, 10L, 5L, 6L, 2L, 4L, 6L,
10L, 2L, 9L, 0L, 1L, 8L, 9L, 7L, 7L, 4L, 4L, 3L, 4L, 7L, 7L,
1L, 0L, 3L, 6L, 10L, 8L, 3L, 0L, 7L, 7L, 1L, 2L, 8L, 5L, 7L,
7L, 8L, 2L, 1L, 10L, 3L, 0L, 2L, 7L, 0L, 0L, 7L, 9L, 8L, 9L,
0L, 4L, 4L, 5L, 6L, 6L, 2L, 4L, 1L, 6L, 2L, 4L, 7L, 5L, 2L, 7L,
4L, 8L, 3L, 3L, 2L, 5L, 1L, 1L, 3L, 8L, 0L, 1L, 8L, 8L, 1L, 1L,
0L, 4L, 4L, 4L, 5L, 6L, 9L, 5L, 2L, 6L, 3L), .Dim = c(10L, 10L
))
All I want to do is replace all values > 5 with a 1, and all values less than 5 with a 0. I've gotten as far as getting a frame with TRUE and FALSE, but can't seem to figure out how to replace things.
Datlog <- Dat > 5
Any help would be greatly appreciated. Thank you.
If I read your question correctly, you'll kick yourself for the answer:
(Dat > 5) * 1
TRUE and FALSE in R equate to 1 and 0 respectively. As such, the more semantically correct way to do this would be something like:
out <- as.numeric(Dat > 5)
dim(out) <- dim(Dat)
The two step approach is required in this second approach because when you use as.numeric, the dims of the original data are lost.
One way to replace with different values would be to use factor:
out <- factor((Dat > 5), c(TRUE, FALSE), c("YES", "NO"))
dim(out) <- dim(Dat)
Another way would be basic subsetting and substitution:
out <- Dat
out[out > 5] <- 999
out[out <= 5] <- 0
out