Flipping a mosaic plot in R - r

I have a mosaic plot that looks like
but I need to show the proportions of Countries relative to roles, i.e. flip the chart. Is it possible to do without transposing the table?
thanks.

You can play with the argument split determining the split order of the variables and dir for the split direction (horizontal vs. vertical). For example, both of these split in Roles first and then show the conditional proportions of Countries given Roles (either horizontally or vertically):
tab <- structure(c(12, 14, 23, 12, 26, 13), .Dim = c(3L, 2L),
.Dimnames = structure(list(
Countries = c("American", "European", "Japanese"),
Roles = c("student", "staff")),
.Names = c("Countries", "Roles")), class = "table")
mosaicplot(tab, sort = 2:1, dir = c("h", "v"))
mosaicplot(tab, sort = 2:1, dir = c("v", "h"))
Note that the mosaic() function in package vcd also comes with a formula-based interface and more display options.

Related

Filter two tables with crosstalk

I am creating a Flexdashboard in R. I want the dashboard to contains both a table and a series of visualizations, that would be filtered through inputs.
As I need to deliver a dashboard locally (without a server running in the background), I am unable to use Shiny, hence I rely on crosstalk.
I know that the crosstalk package provides limited functionality in the front-end. For instance, the documentation says that you can't aggregate the SharedData object.
Nonetheless, I am not clear if I can use the same inputs to filter two different dataframes.
For example, lets say I have:
Dataframe One: Contains original data
df1 <- structure(list(owner = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("John",
"Mark"), class = "factor"), hp = c(250, 120, 250, 100, 110),
car = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c("benz",
"bmw"), class = "factor"), id = structure(1:5, .Label = c("car1",
"car2", "car3", "car4", "car5"), class = "factor")), .Names = c("owner",
"hp", "car", "id"), row.names = c(NA, -5L), class = "data.frame")
Dataframe Two: Contains aggregated data
df2 <- structure(list(car = structure(c(1L, 2L, 1L, 2L), .Label = c("benz",
+ "bmw"), class = "factor"), owner = structure(c(1L, 1L, 2L, 2L
+ ), .Label = c("John", "Mark"), class = "factor"), freq = c(0L,
+ 1L, 2L, 2L)), .Names = c("car", "owner", "freq"), row.names = c(NA,
+ -4L), class = "data.frame")
These two dataframes contain columns with identical values - car and owner. As well as, additional columns too.
I could create two different objects:
library(crosstalk)
shared_df1 <- SharedData$new(df1)
shared_df2 <- SharedData$new(df2)
and than:
filter_select("owner", "Car owner:", shared_df1, ~ owner)
filter_select("owner", "Car owner:", shared_df2, ~ owner)
However, that would mean that the user will need to fill inputs that are essentially identical, twice. Also, if the table is large, this would double the size of the memory needed to use the dashboard.
Is it possible to work around this problem in crosstalk?
Ah I recently ran into this too, there is another argument to SharedData$new(..., group = )! The group argument seems to do the trick. I found out by accident when I had two dataframes and used the group =.
If you make a sharedData object, it will include
a dataframe
a key to select rows by - preferably unique, but not necessarily.
a group name
What I think happens is that crosstalk filters the sharedData by the key - for all sharedData objects in the same group! So as long as two dataframes use the same key, you should be able to filter them together in one group.
This should work for your example.
---
title: "blabla"
output:
flexdashboard::flex_dashboard:
orientation: rows
social: menu
source_code: embed
theme: cerulean
---
```{r}
library(plotly)
library(crosstalk)
library(tidyverse)
```
```{r Make dataset}
df1 <- structure(list(owner = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("John", "Mark"), class = "factor"), hp = c(250, 120, 250, 100, 110), car = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c("benz", "bmw"), class = "factor"), id = structure(1:5, .Label = c("car1", "car2", "car3", "car4", "car5"), class = "factor")), .Names = c("owner", "hp", "car", "id"), row.names = c(NA, -5L), class = "data.frame")
df2 <- structure(list(car = structure(c(1L, 2L, 1L, 2L), .Label = c("benz",
"bmw"), class = "factor"), owner = structure(c(1L, 1L, 2L, 2L
), .Label = c("John", "Mark"), class = "factor"), freq = c(0L,
1L, 2L, 2L)), .Names = c("car", "owner", "freq"), row.names = c(NA,
-4L), class = "data.frame")
```
#
##
### Filters
```{r}
library(crosstalk)
# Notice the 'group = ' argument - this does the trick!
shared_df1 <- SharedData$new(df1, ~owner, group = "Choose owner")
shared_df2 <- SharedData$new(df2, ~owner, group = "Choose owner")
filter_select("owner", "Car owner:", shared_df1, ~owner)
# You don't need this second filter now
# filter_select("owner", "Car owner:", shared_df2, ~ owner)
```
### Plot1 with plotly
```{r}
plot_ly(shared_df1, x = ~id, y = ~hp, color = ~owner) %>% add_markers() %>% highlight("plotly_click")
```
### Plots with plotly
```{r}
plot_ly(shared_df2, x = ~owner, y = ~freq, color = ~car) %>% group_by(owner) %>% add_bars()
```
##
### Dataframe 1
```{r}
DT::datatable(shared_df1)
```
### Dataframe 2
```{r}
DT::datatable(shared_df2)
```
I spent some time on this by trying to extract data from plot_ly() using plotly_data() without luck until I figured out the answer. That's why there's some very simple plots with plotly.
Recently, I've also wanted to use one filter to filter 2 visualizations.
Brief description of my situation
I've wanted to use one filter to filter a boxplot and a table.
Source data has been a data frame. I've wanted to use some of variables for the boxplot and also calculate some statistics (like mean, standard deviation, mode, number of records).
Functions I've needed to use to display results: plotly::plot_ly(), DT::datatable(), crosstalk::bscols().
I've found out that there are 3 key information to solve this situation
Key 1) It's necessary to correctly create shared data.
In my case, I've had to use crosstalk::SharedData$new() twice.
Correct shared data, to be used as source for visualizations, can be used if firstly keys 2 and 3 are fulfilled.
Key 2) When creating shared data, use the same group argument as "Lodewic Van Twillert" explained on 16 Mar 2018.
Key 3) Ensure that all SharedData instances refer conceptually to the same data points, and share the same keys.
Start with ensuring that a data frame has row names even if row names are character vector with numbers (like "1", "2", ...).
Used literature for this key 3: https://rstudio.github.io/crosstalk/using.html. (I suggest to mainly read subtitle "Grouping".)
Summary of steps I've used to fulfill key information from above
Key 3) This one could be tricky in order to fulfill relevant conditions of key 3 above.
The approach I've chosen creates one table containing all data and this table (data frame) will be used to create both shared data.
I've applied data manipulations to original data frame (risk_scores_df) so now this data has a new column.
I've created a new data frame with statistics.
I've joined both data frames using
risk_scores_df <- dplyr::left_join... so now the original data frame contains all prepared data.
I've run print(rownames(risk_scores_df)) to ensure that my updated data frame has row names.
Now, I've had one data frame containing all data (needed for both visualizations) that fulfill conditions of information of key 3 above.
Key 2) I've simply added group = "sd1" in both crosstalk::SharedData$new()
Key 1) This one could be also tricky if a wrong approach is chosen.
Here, the key to create correct shared data instances is to use that one table with all data and choose only rows and columns needed for a relevant shared data.
Example - in my case, I've run codes in Option 1 to create two shared data instances, but also Option 2 is possible.
Option 1 (choosing of only needed rows and columns is in crosstalk::SharedData$new())
rs_df_sd1 <- crosstalk::SharedData$new(
risk_scores_df[, c(1, 2, 5)],
group = "sd1"
)
rs_df_sd1a <- crosstalk::SharedData$new(
risk_scores_df[risk_scores_df$NumRecords > 0 &
is.na(risk_scores_df$NumRecords) == F,
c(1, 6:11)],
group = "sd1"
)
Option 2 (choosing of only needed rows and columns is in additional variables)
sd1 <- risk_scores_df[, c(1, 2, 5)]
sd1a <- risk_scores_df[risk_scores_df$NumRecords > 0 &
is.na(risk_scores_df$NumRecords) == F,
c(1, 6:11)]
rs_df_sd1 <- crosstalk::SharedData$new(sd1, group = "sd1")
rs_df_sd1a <- crosstalk::SharedData$new(sd1a, group = "sd1")
Completing the solution
At this point I've created shared data instances rs_df_sd1 and rs_df_sd1a that can be used as main sources for visualizations that will be filtered using crosstalk::bscols().
Brief example:
box_n_jitter_chart1 <- plotly::plot_ly(rs_df_sd1) %>% add_trace(...
DT_table1 <- DT::datatable(rs_df_sd1a)
crosstalk::bscols(
widths = c(6, 12, NA),
crosstalk::filter_select(
id = "idAvgRisk",
label = "Account",
sharedData = rs_df_sd1,
group = ~Account,
multiple = F
),
box_n_jitter_chart1,
DT_table1
)
Note: DT::datatable() can also use rs_df_sd1a$data() and cells = list(values = base::rbind(... (see that cells = ... is used; see more about using cells e.g. at https://plotly.com/r/reference/table/) but because method data() is used (see more e.g. at https://rdrr.io/cran/crosstalk/man/SharedData.html#method-data) then it will not work with crosstalk::bscols.

Using meta, metaprop, and forest to create forest plot graphics in R

I am using the metaprop command in R to do a meta-analysis of proportions. Some sample code is below.
library(meta)
m <- metaprop(4:1, c(10, 20, 30, 40))
forest(m)
I have several questions.
How do I force all of the text (proportion column, 95% CI column, fixed and random weights columns) to appear on the LEFT side of the plot rather than on the right?
I have another column of text identifiers for each study that I would like to add on the left as well. How can I stick this in as its own column?
I really need to plot these as percentages, not proportions. Is there a way to make this change?
Last, I need the bottom axis to go from 0 to 100% and have a label.
Edit
Ok, I have figured out how to do almost everything. The only thing I need help with is, I have another vector of labels that I would like to add to the left side of the plot. It seems like the leftlabs or leftcols command should do this but I can't get it to work. I would like to also plot another column on the left entitled "Details" that, for each of the four studies, has a little sentence about them.
forest(m, xlim = c(0,100), pscale = 100, weight = "random", leftcols = c("studlab", "event", "n", "effect", "ci", "w.random"), rightcols = F, leftlabs = c("Study", "Number", "Total", "Prevalence (%)", "95% CI", "Weight"), xlab = "Prevalence (%)", addspace = TRUE, digits = 1, squaresize = 0.5, text.I2 = "I2", text.tau2 = "tau2")
# https://rdrr.io/cran/meta/man/forest.html
# See the example in the link above, specifically
data(Olkin95)
meta1 <- metabin(event.e, n.e, event.c, n.c,
data=Olkin95, subset=c(41,47,51,59),
sm="RR", method="I",
studlab=paste(author, year))
#
# Specify column labels only for newly created variables
# 'year' and 'author' (which are part of dataset Olkin95)
#
forest(meta1,
leftcols=c("studlab", "event.e", "n.e", "event.c", "n.c",
"author", "year"),
leftlabs=c("Author", "Year of Publ"))

changing strip's color in lattice multipanel plot with 2 (or possibly more) factors

I've checked quite extensively through the forum and on the web but I couldn't find anyone that already presented my case, so here you are the question:
my goal: how can I extend the example presented here in case I have more than one conditioning factor?
I've tried several ways to modify the which.panel variable of strip.default function, but I couldn't come out of my problem.
This is the code I'm using at the moment (with comments):
if (!require("plyr","lattice")) install.packages("plyr","lattice")
require("plyr")
require("lattice")
# dataframe structure (8 obs. of 6 variables)
data2 <- structure(list(
COD = structure(c(1L, 1L, 1L, 1L, 2L, 2L,2L, 2L),
.Label = c("A", "B"), class = "factor"),
SPEC = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L),
.Label = c("15/25-(15/06)", "15/26-(22/06)"), class = "factor"),
DATE = structure(c(16589, 16590, 16589, 16590, 16589, 16590, 16589, 16590), class = "Date"),
PM.BDG = c(1111.25, 1111.25, 1141.29, 1141.29, 671.26, 671.26, 707.99, 707.99),
PM = c(1033.14, 1038.4, 1181.48, 1181.48, 616.39, 616.39, 641.55, 641.55),
DELTA.PM = c(-78.12, -72.85, 40.19, 40.19, -54.87, -54.87, -66.44, -66.44)),
.Names = c("COD", "SPEC", "DATE", "PM.BDG", "PM", "DELTA.PM"),
row.names = c(NA, 8L), class = "data.frame")
# create a dataframe with a vector of colors
# based on the value of DELTA.PM for the last
# date available for each combination of COD and SPEC.
# Each color will be used for a specific panel, and it will
# forestgreen if DELTA.PM is higher than zero, red otherwise.
listaPM <- ddply(data2, .(COD,SPEC), summarize, ifelse(DELTA.PM[DATE=="2015-06-04"]<0, "red", "forestgreen"))
names(listaPM) <- c("COD","SPEC","COLOR")
# set a personalized strip, with bg color based on listaPM$COLOR
# and text based on listaPM$COD and listaPM$SPEC
myStripStylePM <- function(which.panel, factor.levels, ...) {
panel.rect(0, 0, 1, 1,
col = listaPM[which.panel,3],
border = 1)
panel.text(x = 0.5, y = 0.5,
font=2,
lab = paste(listaPM[which.panel,1],listaPM[which.panel,2], sep=" - "),
col = "white")}
# prepare a xyplot function to plot that will be used later with dlply.
# Here I want to plot the values of PM.BDG and PM over time (DATE),
# conditioning them on the SPEC (week) and COD (code) factors.
graficoPM <- function(df) {
xyplot (PM.BDG + PM ~ DATE | SPEC + COD,
data=df,
type=c("l","g"),
col=c("black", "red"),
abline=c(h=0,v=0),
strip = myStripStylePM
)}
# create a trellis object that has a list of plots,
# based on different COD (codes)
grafico.PM <- dlply(data2, .(data2$COD), graficoPM)
# graphic output, 1st row should be COD "A",
# 2nd row should be COD "B", each panel is a different SPEC (week)
par(mfrow=c(2,1))
print(grafico.PM[[1]], position=c(0,0.5,1,1), more=TRUE)
print(grafico.PM[[2]], position=c(0,0,1,0.5))
As you can see, the first row of plots is correct: text of the first strip is "A" (1st COD), the weeks (SPEC) are shown and the color represents if PM is above or below PM.BDG on the last date of the plot
On the contrary, the 2nd row of plots just repeats the same scheme of the first row (as it can be seen by the fact that COD is Always "A" and 2nd strip's bg color in the 2nd row is green, when the line of PM in red is clearly well below the PM.BDG line in black).
Although I'd like to keep my code, I'm pretty sure my goal could be achieved with a different strategy. If you can find a better way to use my dataframe, I'll be happy to study the code and see if it works with my data.
The problem is match up the current panel data to the listaPM data. Because you are doing different sub-setting in each of the calls, it's difficult to use which.panel() to match up the data sets.
There is an undocumented feature which allows you to get the conditioning variable names to make the matching more robust. Here's how you would use it in your case.
myStripStylePM <- function(which.panel, factor.levels, ...) {
cp <- dimnames(trellis.last.object())
ci <- arrayInd(packet.number(), .dim=sapply(cp, length))
cv <- mapply(function(a,b) a[b], cp, as.vector(ci))
idx<-which(apply(mapply(function(n, v) listaPM[, n] == v, names(cv), cv),1,all))
stopifnot(length(idx)==1)
panel.rect(0, 0, 1, 1,
col = listaPM[idx,3],
border = 1)
panel.text(x = 0.5, y = 0.5,
font=2,
lab = paste(listaPM[idx,1],listaPM[idx,2], sep=" - "),
col = "white")
}
When run with the rest of your code, it produces this plot

Drawing x-y plot with rhombus width and height controlled by xError and yError in R

I posted a recent post about controlling x-y plots as two Normal curves and have since realised I was making things too complicated. I have since managed to plot it as ellipse's but this slightly over estimates the error; which ideally could be plotted as rhombus.
The code I have to date is:
plot(c(-5,10), c(-5,5), xlab = expression(Age), ylab = expression(value), type="n")
draw.ellipse(Age, value, a=Age_error, b=value_error, col="grey70")
Which plots:
Is there someway to replace the ellipse with a rhombus whose height is controlled by 2x value_error and width by 2x age_error?
My data frame is below
structure(list(Age = c(1L, 2L, 4L), value = c(3, -2, 0.01), Age_error = c(2,
1.4, 3), value_error = c(0.5, 1, 2.1)), .Names = c("Age", "value",
"Age_error", "value_error"), class = "data.frame", row.names = c(NA,
-3L))
Many thanks
You can use the my.symbols and ms.polygon functions in the TeachingDemos package to draw the rhombuses:
library(TeachingDemos)
plot(c(-5,10), c(-5,5), xlab = expression(Age), ylab = expression(value),
type="n")
my.symbols( Age, value, ms.polygon, n=4, xsize=2*Age_error,
ysize=2*value_error, linesfun=polygon, col='grey' )
Leave out linesfun and col if you don't want the rhombuses filled.

How to set ylim for a xyplot of a zoo object (lattice)

I have a zoo object that looks like this:
z <- structure(c(6, 11, 3.6, 8.4, 8.9, 0, NA, 0.5, 7, NA, 9, NA),
.Dim = c(6L, 2L), .Dimnames = list(NULL, c("2234", "2234.1")), index = structure(c(-17746, -17745, -17744, -17743, -17742, -17741), class = "Date"),
class = "zoo")
I tried to use lattice to plot both columns at the same time in 2 different panels:
xyplot(z)
This gives me the same x axis for both panels but different ylim. I want them to have the same ylim so I tried xyplot(z, ylim=range(z[,1])) it didn't do anything, so after reading "Plot zoo Series with Lattice" I tried trellis.focus("panel", 2,1,ylim=range(z[,1])) also without any luck...
This is probably an easy thing to do but I am finding the lattice package very hard to use (at least to start with). Can anyone help?
Thanks!
Try xyplot(z, ylim=range(z, na.rm=TRUE)).
There are two things:
na.rm=TRUE cause range to work properly
range(z) instead of range(z[,1]) let you handling range of all data, not just one column.
require(lattice)
require(zoo)
z <- zoo(cbind(a=1:4,b=11:14), Sys.Date()+(1:4)*10)
xyplot(z, ylim=range(z, na.rm=TRUE))
Note: R version 2.13.0, zoo_1.6-5, lattice_0.19-26
xyplot.zoo accepts most xyplot arguments so:
xyplot(z, scales = list(y = list(relation = "same")))
or this variation:
xyplot(z, scales = list(y = list(relation = "same", alternating = FALSE)))

Resources