How to plot splom function? - r

i'm trying to recreate this graph
require(car)
scatterplotMatrix(~Week+Cases+Egg.Pr+Beef.Pr+Pork.Pr+Chicken.Pr+Cereal.Pr, reg.line = lm,
smooth = TRUE, spread = FALSE, span = 0.5, id.n = 0, diagonal = 'boxplot', data = Eggs)
using the lattice package,
i've been trying to make it work with this structure, but it just comes out wrong:
splom(Eggs, panel = function(Week, Cases, Egg.Pr, Beef.Pr, Pork.Pr, Chicken.Pr, Cereal.Pr) {
panel.xyplot(Week, Cases, Egg.Pr, Beef.Pr, Pork.Pr, Chicken.Pr, Cereal.Pr)
panel.lmline(Week, Cases, Egg.Pr, Beef.Pr, Pork.Pr, Chicken.Pr, Cereal.Pr)
panel.smooth(Week, Cases, Egg.Pr, Beef.Pr, Pork.Pr, Chicken.Pr, Cereal.Pr)
}, spread = FALSE, span = 0.5, id.n = 0, diagonal = 'boxplot',
data = Eggs)
please help me out, and overexplain as much as possible

With some guessing from context, and since you asked for overexplanation:
First, you need to refine your understanding of panel functions. They are supposed to be general purpose functions describing the plotting procedure, not specific to the data at hand. With that in mind, we can rewrite your first attempt as:
myPanel <- function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.lmline(x, y, ...)
panel.loess(x, y, ...)
}
Note that panel.smooth will not work, for reasons that are too complicated to get into (but basically, it's not grid-based, as lattice requires).
With this, you should be able to do
Eggs <- Duncan[-1] # placeholder to make example reproducible
splom(Eggs, panel = myPanel)
This leaves the diagonal, for which you need another panel function. Here's one suggestion which can be fine-tuned as per your requirements:
myDiagonal <- function(x, ...) {
diag.panel.splom(x, ...) # retain default and build on it
ycenter <- quantile(x, 0.25, na.rm = TRUE)
panel.bwplot(x = x, y = rep(ycenter, length(x)),
box.width = 0.1 * diff(range(x, finite = TRUE))),
...)
}
Combining, you can now do
splom(Eggs, panel = myPanel, diag.panel = myDiagonal)
The sprinkling of ...-s are important, but I'm not going to go into why. For the rest, documentation of the corresponding functions should help you figure out how to fine tune.

Related

Weird characters appearing in the plot legend when using DoHeatmap

I was using Seurat to analyse single cell RNA-seq data and I managed to draw a heatmap plot with DoHeatmap() after clustering and marker selection, but got a bunch of random characters appearing in the legend. They are random characters as they will change every time you run the code. I was worrying over it's something related to my own dataset, so I then tried the test Seurat object 'ifnb' but still got the same issue (see the red oval in the example plot).
example plot
I also tried importing the Seurat object in R in the terminal (via readRDS) and ran the plotting function, but got the same issue there, so it's not a Rstudio thing.
Here are the codes I ran:
'''
library(Seurat)
library(SeuratData)
library(patchwork)
InstallData("ifnb")
LoadData("ifnb")
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
features <- SelectIntegrationFeatures(object.list = ifnb.list)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
immune.combined <- IntegrateData(anchorset = immune.anchors)
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
DefaultAssay(immune.combined) <- 'RNA'
immune_markers <- FindAllMarkers(immune.combined, latent.vars = "stim", test.use = "MAST", assay = 'RNA')
immune_markers %>%
group_by(cluster) %>%
top_n(n = 10, wt = avg_log2FC) -> top10_immune
DoHeatmap(immune.combined, slot = 'data',features = top10_immune$gene, group.by = 'stim', assay = 'RNA')
'''
Does anyone have any idea how to solve this issue other than reinstalling everything?
I have been having the same issue myself and while I have solved it by not needing the legend, I think you could use this approach and use a similar solution:
DoHeatmap(immune.combined, slot = 'data',features = top10_immune$gene, group.by = 'stim', assay = 'RNA') +
scale_color_manual(
values = my_colors,
limits = c('CTRL', 'STIM'))
Let me know if this works! It doesn't solve the source of the odd text values but it does the job! If you haven't already, I would recommend creating a forum question on the Seurat forums to see where these characters are coming from!
When I use seurat4.0, I met the same problem.
While I loaded 4.1, it disappeared

R gplots: Heatmap with side colours

I want to create a heatmap using the heatmap.2 function from the gplots package. This is a minimal example.
require(gplots)
# create symmetric matrix
x = matrix(rnorm(100), nrow=10)
diag(x) <- 1
x[upper.tri(x)] <- t(x)[upper.tri(x)]
colnames(x) <- rownames(x) <- letters[1:nrow(x)]
# create side colours
varcols = setNames(rainbow(nrow(x)), rownames(x))
# create heatmap
heatmap.2(x,
symm = TRUE,
trace = "none",
revC=TRUE, # <-- THIS IS THE PROBLEM
ColSideColors = varcols,
RowSideColors = varcols
)
The problem are the sidecolors. x is a symmetric matrix, thus columns and rows should have the same sidecolors. This is fine as long as revC = FALSE. However, when I use revC = TRUE the order of the colors is messed up. Sometimes - in small examples - it helps to reverse the ColSideColors, but that doesn't always work.
Am I doing anything wrong or is this a gplots bug?
For anyone else who comes across this problem this is how I solved it:
thing = heatmap.2(my_matrix,...RowSideColors=row_cols, revC=F)
ordinary_order = thing$rowInd
reversal = cbind(ordinary_order, rev(ordinary_order))
rev_col = row_cols[reversal[,2]]; rev_col = rev_col[order(reversal[,1])];
heatmap.2(my_matrix, RowSideColors=rev_col, revC=T)

theta.sparse error with lorDIF

I was wondering whether anyone can help me out.
I am trying to run a dif analysis on my data but keep getting a theta.sparse error, which I am unsure of how to fix. I would really appreciate any that you can give me.
library(lordif)
dat<- read.csv2("OPSO.csv",header=TRUE)
datgender <- read.csv2("DATA.csv",header=TRUE)
group<-datgender$Gender
sink("outputDIFopso.txt")
gender.difopso <- lordif(dat, group, selection = NULL,
criterion = c("Chisqr", "R2", "Beta"),
pseudo.R2 = c("McFadden", "Nagelkerke", "CoxSnell"), alpha = 0.01,
beta.change = 0.1, R2.change = 0.02, maxIter = 10, minCell = 5,
minTheta = -4, maxTheta = 4, inc = 0.1, control = list(), model = "GRM",
anchor = NULL, MonteCarlo = FALSE, nr = 100)
print(gender.difopso)
summary(gender.difopso)
sink()
pdf("graphtestop.pdf")
plot(gender.difopso)
dev.off()
dev.off()
Error in lordif(dat, group, selection = NULL, criterion = c("Chisqr", :
object 'theta.sparse' not found
Thank you :)
You should check the error line before then. The output will probably say you have no items flagged for DIF. When that's the case you should just run the mirt function and extract theta and ipar objects as necessary.
The author could add some case handling for when compare(flags, flags.matrix) is true. At the very least, it seems a warning is omitted when there are no items with DIF the same way it says
if (ndif == ni) {
warning("all items got flagged for DIF - stopping\n")
}
and there is no case handling when (ndif == 0) although compare(flags, flag.matrix) evaluates to TRUE.
The implications when all or none of the items have DIF is that you would get the same results (generating the same ICC plots, same inference etc) by fitting mirt in the combined sample (no DIF) or two or more mirt models for each group (all DIF). So it's a correct time saving procedure to just bypass when all that breaks down.

R 'fdrtool' package: how to use t statistic

Can one use t statistics from a Student's t-test directly in the fdrtool() function of fdrtool package (ver. 1.2.12)? The paper (Strimmer-K BMC Bioinfo. 2008, 9:303) mentions this but as far as I can see the parameters only recognize "normal", "correlation" and "pvalue". Is there a workaround for a non-statistician ?
I think it's a typo.
I took a look at the source for the fdrtool function and found that the statistic argument first gets passed through match.arg and then to fdrtool:::get.nullmodel.
Then, lo and behold:
args(fdrtool:::get.nullmodel)
# function (statistic = c("normal", "correlation", "pvalue", "studentt"))
# NULL
and indeed there is a fully-implemented case in that function for the student t:
if (statistic == "studentt") {
f0 = function(x, param, log = FALSE) {
return(dt(x, df = param, log = log))
}
F0 = function(x, param) {
return(pt(x, df = param))
}
iqr = function(param) {
return(qt(0.75, df = param) - qt(0.25, df = param))
}
get.support = function() return(c(1, 1000))
}
Now, before I tell you how to access this option, I want to warn you that it's very possible it was disabled on purpose. I can't imagine why, because at first glance it seems like it should work fine. But if you're planning to use this in a research result you ought to document the fact that this was essentially a "hidden" option and that you had to do some hacking to access it. Moreover, I haven't actually tested this on my computer, so beware of typos.
Now, as for that hacking, the easiest way to get this to work would be to first simply type fdrtool into the R console. Then, copy and paste the output to a new R script (or use sink if you're fancy like that). The first few lines should look like:
function (x, statistic = c("normal", "correlation", "pvalue"),
plot = TRUE, color.figure = TRUE, verbose = TRUE, cutoff.method = c("fndr",
"pct0", "locfdr"), pct0 = 0.75)
{
statistic = match.arg(statistic)
...
Then all you have to do is change c("normal", "correlation", "pvalue") to c("normal", "correlation", "pvalue", "studentt"). That is, the first few lines should now look like
function (x, statistic = c("normal", "correlation", "pvalue", "studentt"),
plot = TRUE, color.figure = TRUE, verbose = TRUE, cutoff.method = c("fndr",
"pct0", "locfdr"), pct0 = 0.75)
{
statistic = match.arg(statistic)
...
Finally, reassign this function to fdrtool (don't worry, this won't break the underlying package, it will just act like a "mask" until you remove it with rm):
fdrtool <- function (x, statistic = c("normal", "correlation", "pvalue", "studentt"),
plot = TRUE, color.figure = TRUE, verbose = TRUE, cutoff.method = c("fndr",
"pct0", "locfdr"), pct0 = 0.75)
{
statistic = match.arg(statistic)
...
And run the whole thing or source the script. Then you should be good to go.
Turns out that the maintainer of the package, Korbinian Strimmer, disabled the t-score based function on purpose. The reason for that is that it has been used incorrectly too often.
Prof. Strimmer is a nice guy and responded to my help request quickly and very comprehensively. This is what he suggests: T-scores in practice often do not follow a t-distribution but show rather an over- or underdispersion, which is why you should better use the normal option.
Before that, however, you will have to center your data
z.centered = z-median(z)
fdrtool(z.centered, statistic="normal")

In R 2.14.1, what does wrong sign in 'by' argument mean?

Here is my code
slidingwindowplotATGC = function(windowsize, inputseq)
{
starts = seq(1, length(inputseq)-windowsize, by = windowsize)
n = length(starts)
chunkGs = numeric(n)
chunkAs = numeric(n)
chunkTs = numeric(n)
chunkCs = numeric(n)
for (i in 1:n) {
chunk = windowsize[starts[i]:(starts[i]+9999)]
chunkG = sum("g" == chunk)/length(chunk)
chunkA = sum("a" == chunk)/length(chunk)
chunkT = sum("t" == chunk)/length(chunk)
chunkC = sum("c" == chunk)/length(chunk)
chunkGs[i] = chunkG
chunkAs[i] = chunkA
chunkTs[i] = chunkT
chunkCs[i] = chunkC
}
plot(starts,chunkGs,type="b",ylim=c(min(min(chunkAs),min(chunkTs),min(chunkCs),min(chunkGs)),max(max(chunkAs),max(chunkTs),max(chunkCs),max(chunkGs))),col = "red")
points(starts,chunkTs,col = "blue")
points(starts,chunkAs,col = "green")
points(starts,chunkCs)
}
Im getting the following error message,
Error in seq.default(1, length(inputseq) - windowsize, by = windowsize) :
wrong sign in 'by' argument
which I never got before when running codes of this sort, infact I re ran old code that worked perfectly before, except this time Im getting this error message which doesn't seem to make any sense at all! I need help with this before I go completely insane... Maybe Im just bad at this program, but it seems to me that it has a mind of its own... I was also getting an error message before regarding the ylim function, stating that it needed to be a finite value, which is what I was giving it? HELP!!!
Change
starts = seq(1, length(inputseq)-windowsize, by = windowsize)
to
starts = seq(1, nchar(inputseq)-windowsize, by = windowsize)
assuming you're using a character vector as inputseq, such as
slidingwindowplotATGC(3, "ATAGACGATACGATACCCCGAGGGTAGGTA")
ETA: Aside from that difference, there are some very serious issues with how you are using character vectors. For example:
windowsize[starts[i]:(starts[i]+9999)]
Why does it look like you are selecting from windowsize, which is just an integer of your window size? Were you trying to select from inputseq?
Even if you were selecting from inputseq, the way to do that is substr(inputseq, start, stop)
Where does the starts[i]+9999 come from? Do you mean starts[i]+windowsize?
You should start over and carefully consider what you are trying to do, and learn the right tools to do it within R.
ETA: Here is a proposed rewrite of what you're trying to do (you'll need to install the zoo package first):
library(zoo)
slidingwindowplotATGC = function(windowsize, inputseq)
{
print(nchar(inputseq)-windowsize)
s = strsplit(inputseq, "")[[1]]
starts = seq(1, nchar(inputseq)-windowsize, by = windowsize)
n = length(starts)
letters = c("a", "c", "g", "t")
colors = c("green", "black", "red", "blue")
counts = t(sapply(letters, function(l) rollapply(s, windowsize, function(x) mean(x == l))))
plot(counts[1, ], type="l", col=colors[1])
for (i in 2:4) {
points(counts[i, ], type="l", col=colors[i])
}
print(counts)
}
slidingwindowplotATGC(10, "aagaaaagatcaaagaccagccgccccaccccccagagccccccc")
This should get you most of the way there. After that, you're on your own ;-)
A further condensation. You need to specify windowsize (width of the window) and by (periodicity of sampling) separately, although I think you wanted them the same (i.e. chop the sequence into exclusive chunks) -- if you want a sliding window you could use by=1.
The error you're seeing above is most likely occurring because for some reason windowsize is greater than nchar(inputseq).
slidingwindowplotATGC = function(windowsize, by, inputseq) {
s = strsplit(inputseq, "")[[1]]
colors = c("green", "black", "red", "blue")
counts = rollapply(factor(s), width=windowsize, by=by,table)
matplot(counts,type="l", lty=1,col=colors)
counts
}
itest <- "aagaaaagatcaaagaccagccgccccaccccccagagccccccc"
slidingwindowplotATGC(10, itest)
You should also check Bioconductor -- it's extremely likely that there's efficient code in there somewhere for doing this sort of summary.

Resources