Profiling in RStudio - profvis() not giving desired outputs - r

I am trying to learn how to profile in R, and am in need of some help. I am using the profvis() function, and am following the examples both in Chapter 7 of Colin Gillespie's Efficient R programming, and the Rstudio support page for profiling: https://support.rstudio.com/hc/en-us/articles/218221837-Profiling-with-RStudio.
However, when I run the same code as their examples, I am not getting the same outputs. For example:
library("profvis")
profvis({
data(movies, package = "ggplot2movies") # Load data
movies = movies[movies$Comedy == 1,]
plot(movies$year, movies$rating)
model = loess(rating ~ year, data = movies) # loess regression line
j = order(movies$year)
lines(movies$year[j], model$fitted[j]) # Add line to the plot
})
I should be getting an output like this:
Efficient R Output
but instead I am getting an output like this:
My output
In fact, this output isn't always consistent (for example, sometimes sources not available), but always has the "profvis" as the bottom of the flame graph, which is not the case with any examples I've seen.
I have a similar problem when running the example from the RStudio support page:
library(profvis)
profvis({
data(diamonds, package = "ggplot2")
plot(price ~ carat, data = diamonds)
m <- lm(price ~ carat, data = diamonds)
abline(m, col = "red")
})
Support Page Output
But after running the code three times in a row, here are my 3 outputs, all with the profvis in the flame graph:
Output_1
Output_2
Output_3
Also, one last note: I have fully updated R and RStudio and this did not help.
I am very confused as to why this is happening, and can't find any examples of this online. I hope I'm not missing anything obvious, and would greatly appreciate any help.
Thanks.

Related

Prevent plot.gam from producing a figure

Say, I have a GAM that looks like this:
# Load library
library(mgcv)
# Load data
data(mtcars)
# Model for mpg
mpg.gam <- gam(mpg ~ s(hp) + s(wt), data = mtcars)
Now, I'd like to plot the GAM using ggplot2. So, I use plot.gam to produce all the information I need, like this:
foo <- plot(mpg.gam)
This also generates an unwanted figure. (Yes, I realise that I'm complaining that a plotting function plots something...) When using visreg in the same way, I'd simply specify plot = FALSE to suppress the figure, but plot.gam doesn't seem to have this option. My first thought was perhaps invisible would do the job (e.g., invisible(foo <- plot(mpg.gam))), but that didn't seem to work. Is there an easy way of doing this without outputting the unwanted figure to file?
Okay, so I finally figured it out 5 minutes after posting this. There is an option to select which term to plot (e.g., select = 1 is the first term, select = 2 is the second), although the default behaviour is to plot all terms. If, however, I use select = 0 it doesn't plot anything and doesn't give an error, yet returns exactly the same information. Check it out:
# Load library
library(mgcv)
# Load data
data(mtcars)
# Model for mpg
mpg.gam <- gam(mpg ~ s(hp) + s(wt), data = mtcars)
# Produces figures for all terms
foo1 <- plot(mpg.gam)
# Doesn't produce figures
foo2 <- plot(mpg.gam, select = 0)
# Compare objects
identical(foo1, foo2)
[1] TRUE
Bonza!

How to control plot layout for lmerTest output results?

I am using lme4 and lmerTest to run a mixed model and then use backward variable elimination (step) for my model. This seems to work well. After running the 'step' function in lmerTest, I plot the final model. The 'plot' results appear similar to ggplot2 output.
I would like to change the layout of the plot. The obvious answer is to do it manually myself creating an original plot(s) with ggplot2. If possible, I would like to simply change the layout of of the output, so that each plot (i.e. plotted dependent variable in the final model) are in their own rows.
See below code and plot to see my results. Note plot has three columns and I would like three rows. Further, I have not provided sample data (let me know if I need too!).
library(lme4)
library(lmerTest)
# Full model
Female.Survival.model.1 <- lmer(Survival.Female ~ Location + Substrate + Location:Substrate + (1|Replicate), data = Transplant.Survival, REML = TRUE)
# lmerTest - backward stepwise elimination of dependent variables
Female.Survival.model.ST <- step(Female.Survival.model.1, reduce.fixed = TRUE, reduce.random = FALSE, ddf = "Kenward-Roger" )
Female.Survival.model.ST
plot(Female.Survival.model.ST)
The function that creates these plots is called plotLSMEANS. You can look at the code for the function via lmerTest:::plotLSMEANS. The reason to look at the code is 1) to verify that, indeed, the plots are based on ggplot2 code and 2) to see if you can figure out what needs to be changed to get what you want.
In this case, it sounds like you'd want facet_wrap to have one column instead of three. I tested with the example from the **lmerTest* function step help page, and it looks like you can simply add a new facet_wrap layer to the plot.
library(ggplot2)
plot(Female.Survival.model.ST) +
facet_wrap(~namesforplots, scales = "free", ncol = 1)
Try this: plot(difflsmeans(Female.Survival.model.ST$model, test.effs = "Location "))

R programming - Graphic edges too large error while using clustering.plot in EMA package

I'm an R programming beginner and I'm trying to implement the clustering.plot method available in R package EMA. My clustering works fine and I can see the results populated as well. However, when I try to generate a heat map using clustering.plot, it gives me an error "Error in plot.new (): graphic edges too large". My code below,
#Loading library
library(EMA)
library(colonCA)
#Some information about the data
data(colonCA)
summary(colonCA)
class(colonCA) #Expression set
#Extract expression matrix from colonCA
expr_mat <- exprs(colonCA)
#Applying average linkage clustering on colonCA data using Pearson correlation
expr_genes <- genes.selection(expr_mat, thres.num=100)
expr_sample <- clustering(expr_mat[expr_genes,],metric = "pearson",method = "average")
expr_gene <- clustering(data = t(expr_mat[expr_genes,]),metric = "pearson",method = "average")
expr_clust <- clustering.plot(tree = expr_sample,tree.sup=expr_gene,data=expr_mat[expr_genes,],title = "Heat map of clustering",trim.heatmap =1)
I do not get any error when it comes to actually executing the clustering process. Could someone help?
In your example, some of the rownames of expr_mat are very long (max(nchar(rownames(expr_mat)) = 271 characters). The clustering_plot function tries to make a margin large enough for all the names but because the names are so long, there isn't room for anything else.
The really long names seem to have long stretches of periods in them. One way to condense the names of these genes is to replace runs of 2 or more periods with just one, so I would add in this line
#Extract expression matrix from colonCA
expr_mat <- exprs(colonCA)
rownames(expr_mat)<-gsub("\\.{2,}","\\.", rownames(expr_mat))
Then you can run all the other commands and plot like normal.

Using panel.mathdensity and panel.densityplot in lattice graphics to plot Bayesian prior and posterior

I am trying to plot a Bayesian prior and posterior distribution using lattice graphics. I would like to have both distributions in one panel, for direct comparison.
I've tried different solutions all day, including qqmath but I didn't get them to work. Here's the attempt that has been most successful so far:
# my data
d <- dgamma(seq(from=0.00001,to=0.01,by=0.00001),shape = .1, scale = .01)
# my plot
densityplot(~d,
plot.points=FALSE,
panel = function(x,...) {
panel.densityplot(x,...)
panel.mathdensity(
dmath = dgamma,
args = list(shape = .1, scale=.01)
)
}
)
Even though the code runs through nicely, it doesn't do what I want it to. It plots the posterior (d) but not the prior.
I added stop("foo") to densityplot(...) to stop execution if an error occurs and I searched online for the error message:
Error in eval(substitute(groups), data, environment(formula)) : foo
But there are only a few results and they seem unrelated to me.
So, here's my question: Can anyone help me with this approach to achieve what I want?
I asked a similar question which leads to the same result. I got an answer and it was useful. You can find everything here

Getting statistics for nodes from a regression tree in the party pagckage

I am using the party package in R.
I would like to get various statistics (mean, median, etc) from various nodes of the resultant tree, but I cannot see how to do this. For example
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq,
controls = ctree_control(maxsurrogate = 3))
airct
plot(airct)
results in a tree with 4 terminal nodes. How would I get the mean airquality for each of those nodes?
I can't get which variable of the node is the airquality. But I show you here how to customize your tree plot:
innerWeights <- function(node){
grid.circle(gp = gpar(fill = "White", col = 1))
mainlab <- node$psplit$variableName
label <- paste(mainlab,paste('prediction=',round(node$prediction,2) ,sep= ''),sep= '\n')
grid.text( label= label,gp = gpar(col='red'))
}
plot(airct, inner_panel = innerWeights)
Edit to get statistics by node
library(gridExtra)
innerWeights <- function(node){
dat <- round_any(node$criterion$statistic,0.01)
grid.table(t(dat))
}
plot(airct, inner_panel = innerWeights)
This is surprisingly harder than I thought. Try something like this:
a <- by(airq,where(airct),colMeans) #or whatever function you desire for colMeans
a
a$"3" #access at node three
a[["3"]] #same thing
You might find some other useful examples with ?`BinaryTree-class`.
How to get there if you are lost in R-space (and the documentation does not help you immediately)
First, try str(airct): The output is a bit lengthy, since the results are complex, but for easier cases, e.g. t-test, this is all you need.
Since print(airct) or simply airct gives quite useful info, how does print work? Try class(airct) or check the documentation: The result if of class BinaryTree.
Ok, we could have seen this from the docs, and in this case the information on the BinaryTree page is good enough (see the examples on that page.)
But assume the author was lazy: the try getAnywhere(print.BinaryTree). On the top you find y<-x#responses: So try airct#responses next
You can also do this using the dplyr package.
First get which node each observation belongs to and store it in the dataframe.
airq$node <- where(airct)
Then use group_by to group the observations by node, and use summarise to calculate the mean of the Ozone measurement. You can swap mean out for whatever summary statistic function you like.
airq %>% group_by(node) %>% summarise(avg=mean(Ozone))
Which gives the following results.
node avg
(int) (dbl)
1 3 55.60000
2 5 18.47917
3 6 31.14286
4 8 81.63333
5 9 48.71429

Resources