Mosaic plot and text values - r

I created structable from Titanic dataset and used mosaic function for it. Everything worked great, hovewer I also wanted to label each box from mosaic plot with quantity of titanic passangers given their Class, Survival and Sex. As it turns out, I am not able to do that. I know I need to use labeling_cells to achive that, hovewer i am not able to use it (and i wan't able to find any example) in combination with stuctable and below code.
library("vcd")
struct <- structable(~ Class + Survived + Sex, data = Titanic)
mosaic(struct, data = Titanic, shade = TRUE, direction = "v")

If I understand your question correctly, then the last example in ?labeling_cells is pretty close to what you want to do. Using your example, the labeling_cells() can be added afterwards provided that the viewport tree is not popped. The only aspect that is somewhat awkward is that the struct object has to be a regular table again for the labeling. I have to ask David, the main author, whether this could be handled automatically.
mosaic(struct, shade = TRUE, direction = "v", pop = FALSE)
labeling_cells(text = as.table(struct), margin = 0)(as.table(struct))

Fixed in upstream in vcd 1.4-4, but note that you can simply use
mosaic(struct, labeling = labeling_values)

Related

R: How does stat_density_ridges modify and plot data?

<Disclaimer(s) - (1) This is my first post, so please be gentle, specifically regarding formatting and (2) I did try to dig as much as I could on this topic before posting the question here>
I have a simple data vector containing returns of 40 portfolios on the same day:
Year Return
Now -17.39862061
Now -12.98954582
Now -12.98954582
Now -12.86928749
Now -12.37044334
Now -11.07007504
Now -10.68971539
Now -10.07578182
Now -9.984867096
Now -8.764036179
Now -8.698093414
Now -8.594026566
Now -8.193638802
Now -7.818599701
Now -7.622627735
Now -7.535216808
Now -7.391239166
Now -7.331315517
Now -5.58059597
Now -5.579797268
Now -4.525201797
Now -3.735909224
Now -2.687532902
Now -2.65363884
Now -2.177522898
Now -1.977644682
Now -1.353205681
Now -0.042584345
Now 0.096564181
Now 0.275416046
Now 0.638839543
Now 1.959529042
Now 3.715519428
Now 4.842819691
Now 5.475946426
Now 6.380955219
Now 6.535937309
Now 8.421762466
Now 8.556800842
Now 10.39185524
I am trying to plot these returns to compare versus other days (so the rest of my history e.g.). I tried to use stat_density_ridges as per the code block below
ggplot(data = data.plot, aes(x = Return, y = Year, fill = factor(..quantile..))) +
stat_density_ridges(geom = "density_ridges_gradient",calc_ecdf = TRUE,
quantiles = c(0.025, 0.5, 0.975),
quantile_lines = TRUE)
As you can see - the "year" in this case is the same i.e. there is no height parameter, yet I get a nice ridg(y) chart. While the chart is beautiful to behold, and very very awesome, I am at a loss to determine how the plotting function is computing the density in this case, specially the height.
This is the output chart I get (I have omitted the formatting code here since it doesn't make a difference to my question):
Portfolio Return Distribution Plots - US versus Europe
I tried digging into the code of the function itself, but came up with a total blank. The documentation didn't help (except perhaps give me a hint that the function plots continous distributions).
Any help, or guidance, or even a nudge in the right direction would be extremely helpful.

Add text to a ggpairs() scatterplot?

dumb but maddening question: How can I add text labels to my scatterplot points in a ggpairs(...) plot? ggpairs(...) is from the GGally library. The normal geom_text(...) function doesn't seem to be an option, as it take x,y arguments and ggpairs creates an NxN matrix of differently-styled plots.
Not showing data, but imagine I have a column called "ID" with id's of each point that's displayed in the scatterplots.
Happy to add data if it helps, but not sure it's necessary. And maybe the answer is simply that it isn't possible to add text labels to ggpairs(...)?
library(ggplot2)
library(GGally)
ggpairs(hwWrld[, c(2,6,4)], method = "pearson")
Note: Adding labels is for my personal reference. So no need to tell me it would look like an absolute mess. It will. I'm just looking to identify my outliers.
Thanks!
It is most certainly possible. Looking at the documentation for ?GGally::ggpairs there are three arguments, upper, lower and diag, which from the details of the documentations are
Upper and lower are lists that may contain the variables 'continuous', 'combo', 'discrete' and 'na'. Each element of thhe list may be a function or a string
... (more description)
If a function is supplied as an option to upper, lower, or diag, it should implement the function api of function(data, mapping, ...){#make ggplot2 plot}. If a specific function needs its parameters set, wrap(fn, param1 = val1, param2 = val2) the function with its parameters.
Thus a way to "make a label" would be to overwrite the default value of a plot. For example if we wanted to write "hello world" in the upper triangle we could do something like:
library(ggplot2)
library(GGally)
#' Plot continuous upper function, by adding text to the standard plot
#' text is placed straight in the middle, over anything already residing there!
continuous_upper_plot <- function(data, mapping, text, ...){
p <- ggally_cor(data, mapping, ...)
if(!is.data.frame(text))
text <- data.frame(text = text)
lims <- layer_scales(p)
p + geom_label(data = text, aes(x = mean(lims$x$range$range),
y = mean(lims$y$range$range),
label = text),
inherit.aes = FALSE)
}
ggpairs(iris, upper = list(continuous = wrap(continuous_upper_plot,
text = 'hello world')))
with the end result being:
There are 3 things to note here:
I've decided to add the text in the function itself. If your text is part of your existing data, simply using the mapping (aes) argument when calling the function will suffice. And this is likely also better, as you are looking to add text to specific points.
If you have any additional arguments to a function (outside data and mapping) you will need to use wrap to add these to the call.
The function documentation specifically says that arguments should be data, mapping rather than the standard for ggplot2 which is mapping, data. As such for any of the ggplot functions a small wrapper switching their positions will be necessary to overwrite the default arguments for ggpairs.

Can I arrange my study labels in a forest plot using study years after specifying the byvar to be something different? (Meta package-R)

Can I arrange all my study labels within subgroups in my forest plot with the year of publication after specifying that I want subgroups divided by a certain variable?
Here is the code I am currently using.
brugia.forest <- metaprop(event = no.positive, n = no.tested, studlab = studylabel, data = brugia, byvar = diagnostics, bylab = c("direct detection", "direct and indirect detection", "indirect detection"), print.byvar = F, sm = "PLO", method.tau = "REML", title = "", hakn = T)
I would like the studies within the "diagnostics" groups to be arranged from the oldest to the most recent and not alphabetically as is currently the case. I am using the meta package of R because of its user-friendliness and would like to continue using it (so, metafor suggestions may not be too helpful)
Thanks.
I would like to answer this question because the creator of the meta package, Dr. Guido Schwarzer was kind to answer the question via email. Here is the way forward:
By default, the forest function does not sort the studies at all,
instead the order of the dataset is used. One can therefore order the dataset before utilising it for the forestplot first.
Alternatively, one can use the 'sortvar' function to change the order of studies and specify the variable one wants to sort the studies by.
Hope this helps.
Another option is to use ggplot if you are familiar with the ggplot package. Gives a lot of flexibility in arranging and modifying the plots

Node labels on circular phylogenetic tree

I am trying to create circular phylogenetic tree. I have this part of code:
fit<- hclust(dist(Data[,-4]), method = "complete", members = NULL)
nclus= 3
color=c('red','blue','green')
color_list=rep(color,nclus/length(color))
clus=cutree(fit,nclus)
plot(as.phylo(fit),type='fan',tip.color=color_list[clus],label.offset=0.2,no.margin=TRUE, cex=0.70, show.node.label = TRUE)
And this is result:
Also I am trying to show label for each node and to color branches. Any suggestion how to do that?
Thanks!
When you say "color branches" I assume you mean color the edges. This seems to work, but I have to think there's a better way.
Using the built-in mtcars dataset here, since you did not provide your data.
plot.fan <- function(hc, nclus=3) {
palette <- c('red','blue','green','orange','black')[1:nclus]
clus <-cutree(hc,nclus)
X <- as.phylo(hc)
edge.clus <- sapply(1:nclus,function(i)max(which(X$edge[,2] %in% which(clus==i))))
order <- order(edge.clus)
edge.clus <- c(min(edge.clus),diff(sort(edge.clus)))
edge.clus <- rep(order,edge.clus)
plot(X,type='fan',
tip.color=palette[clus],edge.color=palette[edge.clus],
label.offset=0.2,no.margin=TRUE, cex=0.70)
}
fit <- hclust(dist(mtcars[,c("mpg","hp","wt","disp")]))
plot.fan(fit,3); plot.fan(fit,5)
Regarding "label the nodes", if you mean label the tips, it looks like you've already done that. If you want different labels, unfortunately, unlike plot.hclust(...) the labels=... argument is rejected. You could experiment with the tiplabels(....) function, but it does not seem to work very well with type="fan". The labels come from the row names of Data, so your best bet IMO is to change the row names prior to clustering.
If you actually mean label the nodes (the connection points between the edges, have a look at nodelabels(...). I don't provide a working example because I can't imagine what labels you would put there.

How to plot a large ctree() to avoid overlapping nodes

When I plotted the decision tree result from ctree() from party package, the font was too big and the box was also too big. They are overlapping other nodes.
Is there a way to customize the output from plot() so that the box and the font would be smaller ?
The short answer seems to be, no, you cannot change the font size, but there are some good other options.
I know of three possible solutions. First, you can change other parameters in the plot to make it more compact. Second, you can write it to a graphic file and view that file. Third, you can use an alternative implementation of ctree() in the partykit package, which is a newer package by some of the same authors.
Default Plot Example
library(party)
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq,
controls = ctree_control(maxsurrogate = 3))
plot(airct) #default plot, some crowding with N hidden on leafs
Simplified plot
# simpler version of plot
plot(airct, type="simple", # no terminal plots
inner_panel=node_inner(airct,
abbreviate = TRUE, # short variable names
pval = FALSE, # no p-values
id = FALSE), # no id of node
terminal_panel=node_terminal(airct,
abbreviate = TRUE,
digits = 1, # few digits on numbers
fill = c("white"), # make box white not grey
id = FALSE)
)
This is somewhat better and one might be able to improve it further. To figure out these details, I originally did class(airct) which returned "BinaryTree". Armed with this info, I started reading ?plot.BinaryTree
Write to a file
A second simple solution is to write the plot to a file and then view the file. You may need to play with the settings to find the best fit.
png("airct.png", res=80, height=800, width=1600)
plot(airct)
dev.off()
Plot with partykit package instead
Finally, you can use a newer and not-yet-finished re-implementation of the party package by some of the same authors. At this point (Dec 2012), the only function they have re-done is ctree(). This version allows you to change font size.
library(partykit)
airct <- ctree(Ozone ~ ., data = airq)
class(airct) # different class from before
# "constparty" "party"
plot(airct, gp = gpar(fontsize = 6), # font size changed to 6
inner_panel=node_inner,
ip_args=list(
abbreviate = TRUE,
id = FALSE)
)
Here I have left the leafs in their default setting because I have frankly never figured out how to get it to work the way I want. I suspect this has to do with the fact that the package is incomplete (as of Dec 2012). You can read about the plot method starting with ?plot.party
Another option (that doesn't change what you want but does potentially solve the underlying problem) is to change the size of the figure itself, as I learned in my class for my assignment.
Replace the r in the below:
{r}
with:
{r, fig.width=X, fig.height=Y}
where the X and Y need to be replaced by numbers chosen by you depending on what size you think works better.
This website, talks about doing this in more detail and universally throughout the document.

Resources