Graphic in R/ggplot2 with several variables - r

I need help with a graph in R with ggplot2 because after trying several things, I don't know how to do. I have the following dataframe:
df <- data.frame(
TITLE = c("GRADUATE IN TITLE 1", "GRADUATE IN TITLE 2", "GRADUATE IN TITLE 3",
"GRADUATE IN TITLE 4", "GRADUATE IN TITLE 5"),
X2011 = c(1, 2, 3, 4, 5),
X2012 = c(3, 4, 5, 1, 2),
X2013 = c(1, 2, 5, 3, 4),
X2014 = c(1, 3, 4, 2, 5),
X2015 = c(5, 1, 2, 4, 3)
)
What I want is to make a graph that has all the "TITLE" values in the Y axis, each of the years (2011, 2012, 2013 ...) in the X axis, and for each row corresponding to a "TITLE", paint a horizontal line so that the crossing point of each year is the position corresponding to the value of that year column for that "TITLE", so all the lines on the graph go up or down depending on the values corresponding to this column.

ggplot likes it when your data is "tidy", that is each row should represent one point to be plotted. One easy way to convert your data to a more plotting-friendly format is to melt it using the reshape2 package.
library(ggplot2)
ggplot(reshape2::melt(df), aes(variable, TITLE, group=value)) + geom_line()
Then you can use the standard line geom and use groups= to connect the values across the different categories.
This produces

Related

Save multiple ggplots with different layout matrices

Currently I'm creating multiple plots with regional data and save them to a PDF file. This works without problems, thanks to an SO post I've found (use grid.arrange over multiple pages or marrangeGrob with a layout_matrix).
This is my code so far:
library(ggplot2)
library(gridExtra)
library(dplyr)
data <- data.frame(
region = c("region 1", "region 2", "region 3", rep("region 4", 2), rep("region 5", 2)),
countries = c("country 1", "country 2", "country 3", "country 4", "country 5", "country 6", "country 7"),
dummydata1 = c(rep(1, 7)),
dummydata2 = c(rep(2, 7))
)
criterias <- list()
criterias[[ 'region_1' ]] <- data %>% filter(region == 'region 1')
criterias[[ 'region_2' ]] <- data %>% filter(region == 'region 2')
criterias[[ 'region_3' ]] <- data %>% filter(region == 'region 3')
criterias[[ 'region_4' ]] <- data %>% filter(region == 'region 4')
criterias[[ 'region_5' ]] <- data %>% filter(region == 'region 5')
# This layout matrix should be used for the regional plots
# Don't wonder about the strange numbering, some plots came later
# and it was easier to modify the matrix then all other functions.
regionLayout <- rbind(
c(1,1,1,1,1,2),
c(NULL,NULL,3,3,NULL,NULL),
c(9,9,4,4,10,10),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7)
)
# This is just a dummy function
# The actual function creates several plots based on the real data
createRegionalPlots <- function (data, region) {
examplePlots <- list(ggplot() + ggtitle('Title (ggtext = plot 1)'),
ggplot() + ggtitle('Month (ggtext = plot 2)'),
ggplot() + ggtitle('Plot 1 (tile = 3)'),
ggplot() + ggtitle('Plot 2 (tile = 4)'),
ggplot() + ggtitle('Plot 3 (geom_bar = 5)'),
ggplot() + ggtitle('Plot 4 (geom_bar = 6)'),
ggplot() + ggtitle('Plot 5 (tile = 7)'),
ggplot() + ggtitle('Plot 6 (tile = 8)'))
}
# Found in https://stackoverflow.com/questions/43491685/
preparePage <- function(plots,layoutMatrix) {
# pdf(file = NULL) #invisible
par(mar=(c(5,5,5,5)))
plotsPerPage <- length(unique(na.omit(c(layoutMatrix))))
ml <- lapply(1:ceiling(length(plots)/plotsPerPage), function(page_IND){
ind <- (1 + ((page_IND - 1) * plotsPerPage )) : (page_IND * plotsPerPage)
grid.arrange(grobs = plots[ind], layout_matrix = layoutMatrix)
})
return(marrangeGrob(grobs=ml,nrow=1,ncol=1,top=NULL))
# dev.off() #invisible
}
# Here I'm running through all regions
regionalPlotList <- list()
for (region in names(criterias)) {
regionData <- criterias[[region]]
regionalPlots <- createRegionalPlots(data = regionData, region = region)
regionalPlotList <- do.call(c, list(regionalPlotList, regionalPlots))
}
# This leaves me with a list of 40 plots (5 regions x 8 plots)
allPlots <- preparePage(regionalPlotList, regionLayout)
ggsave("example.pdf",width = 297, height = 210, units = "mm", plot = allPlots)
As said, this works perfectly and leaves me (using the current data) with a five page report, one per every region and with the required layout.
I have now been asked to add additional per country plots at the end of the regional report and these pages should have a different layout (and different plots).
Overestimating myself (and my knowledge of r resp. ggplot) once again, I thought of this as an easy job (which it probably is for everyone else, but I'm stuck).
So, I've created a list of new criterias and a function, including a new layout:
createCountryPlots <- function(data, country) {
exampleCountryPlots <- list(ggplot() + ggtitle('Title (ggtext = plot 1)'),
ggplot() + ggtitle('Month (ggtext = plot 2)'),
ggplot() + ggtitle('Plot 1 (bar = 3)'),
ggplot() + ggtitle('Plot 2 (pie = 4)'),
ggplot() + ggtitle('Plot 3 (geom_bar = 5)'),
ggplot() + ggtitle('Plot 4 (geom_bar = 6)')
)
}
countryLayout = rbind(
c(1, 1, 1, 1, 1, 2),
c(3, 3, 3, 4, 4, 4),
c(3, 3, 3, 4, 4, 4),
c(3, 3, 3, 4, 4, 4),
c(5, 5, 5, 6, 6, 6),
c(5, 5, 5, 6, 6, 6),
c(5, 5, 5, 6, 6, 6)
)
# prepare the data per country
countryCriterias <- list()
countryCriterias[[ 'country_1' ]] <- data %>% filter(country == 'country 1')
countryCriterias[[ 'country_2' ]] <- data %>% filter(country == 'country 2')
# Running through all selected countries
countryPlotList <- list()
for (country in names(countryCriterias)) {
countryData <- countryCriterias[[country]]
countryPlots <- createCountryPlots(data = countryData, country = country)
countryPlotList <- do.call(c, list(countryPlotList, countryPlots))
}
countryPlots <- preparePage(countryPlotList, countryLayout)
# Just saving the country plots works perfectly again
ggsave("example.pdf",width = 297, height = 210, units = "mm", plot = countryPlots)
Saving this plots in a separate file works without any problems, but I'm currently stuck on how to combine these plots in one single PDF, respecting the different layouts the pages should have.
I've tried several possibilities (i.e. grid.arrange and arrangeGrob etc.), but I haven't been able to combine the plots into a single file.
Could anyone please enlighten me?
Edit:
Sorry, if I didn't make myself clear enough. This would be the result I should have at the end.
Thanks to the hint by #teunbrand to have a look at the patchwork package, I've found a solution to my problem.
It's in general almost the same as before, but instead of trying to arrange the plots first and then saving them, I "print" them directly to a pdf in the for-loop.
# defininig the layouts (simplified)
regionLayout <- "
AAAAAB
##CC##
DDEEFF
GGGHHH
GGGHHH"
countryLayout <- "
AAAAAB
CCCCDD
CCCCDD
EEEEFF
EEEEFF
"
# opening pdf
pdf('example5.pdf', pagecentre = FALSE, width = 29.7/2.54, height = 21/2.54)
par(mar = c(5, 5, 5, 5), oma = c(1, 1, 1, 1))
for (region in names(criterias)) {
regionData <- criterias[[region]]
regionalPlots <- createRegionalPlots(data = regionData, region = region)
# as regionalPlots is a list of plots, I'm using wrap_plots, which can take a dynamic
# number of plots
print(wrap_plots(regionalPlots, design = regionLayout))
}
# then the same for the country plots, with a different layout
countryPlotList <- list()
for (country in names(countryCriterias)) {
countryData <- countryCriterias[[country]]
countryPlots <- createCountryPlots(data = countryData, country = country)
print(wrap_plots(countryPlots, design = countryLayout))
}
dev.off()
And at the end I have my PDF with seperate layouts...
Thank you all for your help!!!
PS: Took me a while to find out why the PDF always was empty, before I realized that wrap_plot just arranges the plots but does not print them. As said, relatively new to R (did I mention that?)

How to display Dataset labels inside a HoverTool in a Sankey diagram using Holoviews and Bokeh

I am using Holoviews to display a Sankey Diagram and would like to customize the information displayed when positioning a cursor over the diagram. However, I don't know how to display the correct labels.
Taking the 2nd example from the docs, I can add a custom HoverTool
import holoviews as hv
from holoviews import opts
from bokeh.models import HoverTool
nodes = ["PhD", "Career Outside Science", "Early Career Researcher", "Research Staff",
"Permanent Research Staff", "Professor", "Non-Academic Research"]
nodes = hv.Dataset(enumerate(nodes), 'index', 'label')
edges = [
(0, 1, 53), (0, 2, 47), (2, 6, 17), (2, 3, 30), (3, 1, 22.5), (3, 4, 3.5), (3, 6, 4.), (4, 5, 0.45)
]
value_dim = hv.Dimension('Percentage', unit='%')
careers = hv.Sankey((edges, nodes), ['From', 'To'], vdims=value_dim)
# this is my custom HoverTool
hover = HoverTool(
tooltips = [
("From": "#From"), # this displays the index: "0", "1" etc.
("To": "#To"), # How to display the label ("PhD", "Career Outside Science", ...)?
]
)
careers.opts(
opts.Sankey(labels='label', tools=[hover]))
Same as in the example shown in the docs, the HoverTool displays the index values for "From" and "To" (e.g. "0", "1") etc., which do not necessarily mean anything to the user.
Is there a way to display the associated label (e.g. "PhD", "Career Outside Science", ...) in the HooverTool syntax?
I am using Holoviews 1.11.2 and Bokeh 1.0.4.
The easiest way to do this is simply to provide the labels instead of the indices to the Sankey element:
nodes = ["PhD", "Career Outside Science", "Early Career Researcher", "Research Staff",
"Permanent Research Staff", "Professor", "Non-Academic Research"]
edges = [
(0, 1, 53), (0, 2, 47), (2, 6, 17), (2, 3, 30), (3, 1, 22.5), (3, 4, 3.5), (3, 6, 4.), (4, 5, 0.45)
]
# Replace the indices with the labels
edges = [(nodes[s], nodes[e], v) for s, e, v in edges]
value_dim = hv.Dimension('Percentage', unit='%')
careers = hv.Sankey(edges, ['From', 'To'], vdims=value_dim)
careers.opts(labels='index', tools=['hover'])
That said I think your expectation that defining labels would make it to use the label column in the nodes to fetch the edge hover labels makes sense and labels may not be unique, so the approach above is not generally applicable. I'll file an issue in HoloViews.

label certain points with textxy()

I am trying to plot a volcano plot in R using the plot function and calibrate package in R and am trying to use the textxy function to plot only certain points.
Here is some data:
Metabolites <- data.frame(Metabolite = c("Glucose", "Galactose", "Creatine", "Lactose", "N-Acetylputrescine", "Tyramine", "Adenine", "Glycine", "Erythritol", "Choline"), Neg_pvalue = c(10, 8, 2, 1, 0.5, 0.7, 5, 3, 5.8, 4), LogFC = c(4, -3, 2, -1, 0.5, 0.7, 1, -2, -4, -1), padjust = c(1.453557e-19, 5.312771e-08, 4.983176e-02, 9.585447e-01, 2.449707e-01, 3.058580e-01, 4.223173e-02, 1.002379e-03, 4.466316e-27, 1.003879e-01))
Here is my code:
with(Metabolites, plot(LogFC, Neg_pvalue, pch=20, main="CNL", xlim=c(-5,6)))
with(subset(Metabolites, padjust <.05 ), points(LogFC, Neg_pvalue, pch=20, col="blue"))`
with(subset(Metabolites, padjust <.05 & abs(LogFC) > 2), points(LogFC, Neg_pvalue, ph=20, col="red"))
Now here is the issue:
with(subset(Metabolites, padjust <.05 & abs(LogFC) > 2), textxy(LogFC, Neg_pvalue, labs=Metabolite[1:3], cex=.5, offset = 0.2))`
If I plot this code, I get only the top 3 data points, as is indicated with the labs=Metabolite[1:3] part of the code. Alternatively, if I plot labs=Metabolite, then I get all labels.
If I wanted to plot the labels of only: Glycine, Lactose, and Erythritol as given in the Metabolites$Metabolite, am I able to do this?
Also, say I wanted to keep my top 3 data points labeled (labs=Metabolite[1:3]), but also want to label other metabolites of interest, say Tyramine and N-Acetylputrescine too; how can I do this?
This seems to work by slecting items that are in that set and using those character values as lables:
library(calibrate)
with(subset(Metabolites, Metabolite %in% c( 'Glycine', 'Lactose', 'Erythritol' )),
textxy(LogFC, Neg_pvalue, labs=c( 'Glycine', 'Lactose', 'Erythritol' ), cex=.5, offset = 0.2))

R Highcharter mosaïc / marimekko chart

i tried to create a marimekko chart in R Highcharter, following this example :
http://jsfiddle.net/highcharts/h2np93k1/
I cannot seem to get the sortIndex of the treemap to work, my code is as follows:
parentid <- c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
sortIndex <- c(0, 1, 2, 3, 4, 0, 1, 2, 3, 4)
child <- c("Alpha", "Alpha", "Alpha", "Alpha", "Alpha", "Beta", "Beta", "Beta", "Beta", "Beta")
childid <- c(100, 100, 100, 100, 100, 200, 200, 200, 200, 200)
colorid <- c(100, 100, 100, 100, 100, 200, 200, 200, 200, 200)
parent <- c("Parent 1", "Parent 2", "Parent 3", "Parent 4", "Parent 5", "Parent 1", "Parent 2", "Parent 3", "Parent 4", "Parent 5")
value <- c(10, 60, 70, 20, 90, 50, 30, 10, 90, 10)
data <- data.frame(parentid, sortIndex, child, childid, colorid, parent, value)
hctreemap2(data, group_vars=c("parentid", "childid"),
size_var="value",
color_var="colorid",
layoutAlgorithm='stripes',
alternateStartingDirection = T,
stacking="percent",
levelIsConstant = F,
sortIndex=sortIndex,
levels = list(
list(level=1, dataLabels = list(enabled=T, align='left', verticalAlign='top'), borderWidth=3),
list(level=2, dataLabels = list(enabled=T))))
does anyone have any ideas?
I realize this question was several years ago, and Highcharter has changed since it was written.
This answer may not have worked in 2018, but it does work now.
At some point hctreemap and hctreemap2 were deprecated. The instructions from Highcharter are to use data_to_hierarchical to prepare the data and then use either hchart() or highchart() to create the treemap. However, this method will strip the sortIndex, so I don't think that's the route you would want to go.
Instead, I prepared the data by formatting it as it is in the JS link you provided and then graphed it.
The data:
# collect hc colors (like they did in the example)
colrs <- getOption("highcharter.color_palette")
# two data frames, one for each level
# id isn't as important until you go beyond 2 levels
pars <- data.frame(id = unique(data$parent),sortIndex = unique(data$sortIndex))
kids <- data.frame(
name = data$child, parent = data$parent, sortIndex = data$sortIndex,
value = data$value, color = data$color/100) %>% mutate(color = colrs[color])
# this assumes data is already sorted by sort order*
newData <- list() # for storing the data as hierarchical
invisible(map(1:nrow(pars),
function(j) {
p <- pars[j, ]$id # collect par id to find children
k <- kids[kids$parent == p, ] # isolate applicable children
pl <- list_parse(pars[j, ]) # make row a list
kl <- list_parse(k) # make each child row a list
newData <<- append(newData, pl) # add parent
newData <<- append(newData, kl) # add that parent's children
}))
Now the data is ready for plotting.
hchart(newData, type ="treemap", layoutAlgorithm = "stripes",
alternateStartingDirection = T)

R ggplot2 gave accent in legend

I created a function to plot some data per city in a line graph. I want the user to be able to change the label of each city in the legend.
A simplified example:
example_plot <- function(plot_labs = c("Anvers", "Liège")){
graphics.off()
input <- data.table(x_axis = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
y_axis = c(5, 6, 4, 2, 8, 9, 3, 1, 7, 5),
City = c("Anvers", "Anvers", "Anvers", "Anvers", "Anvers",
"Liege", "Liege", "Liege", "Liege", "Liege"))
ggplot(data = input, aes(x = x_axis, y = y_axis, group = City, lty = City)) +
geom_line() + scale_linetype_manual(labels = plot_labs, breaks = c("Anvers",
"Liege"), values = 1:2)
}
My problem:
When I save the function as "example_plot.R" and then call it in the command prompt with no argument, the accent in "Liège" does not display correctly:
example_plot()
If I call the function with the plot_labs argument, it displays correctly:
example_plot(plot_labs = c("Anvers", "Liège"))
What I find even stranger is that if I copy-paste the function's code in the command prompt (instead of 'source(example_plot.R")'), then everything works fine.
Any idea why it behaves differently when the function is saved?
You're probably saving your source file in an encoding such as UTF-8 and then reopen or source it assuming it's in Latin-1.
If you're using RStudio, check the menu points File/Save with encoding, and File/reopen with encoding, and ensure the character encodings match.

Resources