ggplot2 equivalent of 'factorization or categorization' in googleVis in R - r

Due to static graph prepared by ggplot, we are shifting our graphs to googleVis with interactive charts. But when it comes to categorization we are facing many problems. Let me give example which will help you understand:
#dataframe
df = data.frame( x = sample(1:100), y = sample(1:100), cat = sample(c('a','b','c'), 100, replace=TRUE) )
ggplot2 provides parameter like alpha, colour, linetype, size which we can use with categories like shown below:
ggplot(df) + geom_line(aes(x = x, y = y, colour = cat))
Not just line chart, but majority of ggplot2 graphs provide categorization based on column values. Now I would like to do the same in googleVis, based on value df$cat I would like parameters to get changed or grouping of line or charts.
Note:
I have already tried dcast to make multiple columns based on category column and use those multiple columns as Y input, but that it not what I would like to do.
Can anyone help me regarding this?
Let me know if you need more information.

vrajs5 you are not alone! We struggled with this issue. In our case we wanted to fill bar charts like in ggplot. This is the solution. You need to add specifically named columns, linked to your variables, to your data table for googleVis to pick up.
In my fill example, these are called roles, but once you see my syntax you can abstract it to annotations and other cool features. Google has them all documented here (check out superheroes example!) but it was not obvious how it applied to r.
#mages has this documented on this webpage, which shows features not in demo(googleVis):
http://cran.r-project.org/web/packages/googleVis/vignettes/Using_Roles_via_googleVis.html
EXAMPLE ADDING NEW DIMENSIONS TO GOOGLEVIS CHARTS
# in this case
# How do we fill a bar chart showing bars depend on another variable?
# We wanted to show C in a different fill to other assets
suppressPackageStartupMessages(library(googleVis))
library(data.table) # You can use data frames if you don't like DT
test.dt = data.table(px = c("A","B","C"), py = c(1,4,9),
"py.style" = c('silver', 'silver', 'gold'))
# Add your modifier to your chart as a new variable e.g. py1.style
test <-gvisBarChart(test.dt,
xvar = "px",
yvar = c("py", "py.style"),
options = list(legend = 'none'))
plot(test)
We have shown py.style deterministically here, but you could code it to be dependent on your categories.
The secret is myvar.googleVis_thing_youneed linking the variable myvar to the googleVis feature.
RESULT BEFORE FILL (yvar = "py")
RESULT AFTER FILL (yvar = c("py", "py.style"))
Take a look at mages examples (code also on Github) and you will have cracked the "categorization based on column values" issue.

Related

Formatting changes affect only legend and not bar graph using swimplot and ggplot2 packages

Update- this issue was solved, updated code is at the end of the post.
I am trying to create a swimmer plot to visualize individual patient duration of treatment with a drug administered at multiple dose levels (DLs). Each patient will be be assigned to treatment with only one DL, but multiple patients can be assigned to a given DL (e.g. 3 patients at DL1, 3 patients and DL2, etc.). I would like to color code the bars in the swimmer plot according to DL.
I am using the swimplot package for R and have been following the guide located here (https://cran.r-project.org/web/packages/swimplot/vignettes/Introduction.to.swimplot.html).
This guide has been sufficient for most things I have tried, up until I tried to change the colors of the bars in the plot and corresponding legend. Following the section in that guide titled "Modifying Colours and shapes" under "Making the plots more aesthetically pleasing with ggplot manipulations", I was able to change the bar colors in the legend, but not the bars themselves.
Example here
I have been using the following code.
library(ggplot2)
library (swimplot)
library (gdata)
library (readxl)
ClinicalTrial.Arm <- read_excel("Swimmer_Test_Data1.xls")
ClinicalTrial.Arm <- as.data.frame(ClinicalTrial.Arm)
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt',width=.85+ scale_fill_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))+ scale_color_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))
arm_plot
I have tried a number of things to fix this, but am quite new to R and don't think I really know enough to troubleshoot effectively. I have tried various syntax changes (e.g. removing quotation marks) and have tried using the geom bar command but wasn't sure how/what to map to X and Y (it also seems like I shouldn't need to do this).
I have also tried using the following code, but get an error.
Colors <- c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600")
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt',width=.85, fill = Colors)+ scale_fill_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))+ scale_color_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (20): fill
Run `rlang::last_error()` to see where the error occurred.
Any help here would be greatly appreciated.
Solved! Updated, working code
library(ggplot2)
library (swimplot)
library (gdata)
library (readxl)
ClinicalTrial.Arm <- read_excel("Swimmer_Test_Data1.xls")
ClinicalTrial.Arm <- as.data.frame(ClinicalTrial.Arm)
Colors <- c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600")
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt', name_fill = "Arm", width=.85) + scale_fill_manual(name="Arm",values = Colors) +
scale_color_manual(name="Arm",values=Colors)
To make your code work you first have to map a variable on the fill aesthetic which using swimplot could be achieved via the name_fill argument:
Note: As I use the ClinicalTrial.Arm dataset from the swimplot package I adjusted your color palette to make it work with the three categories of the Arm column in this dataset.
library(ggplot2)
library(swimplot)
#pal <- c("DL1" = "#003f5c", "DL2" = "#374c80", "DL3" = "#7a5195", "DL4" = "#bc5090", "DL5" = "#ef5675", "DL6" = "#ff764a", "DL7" = "#ffa600")
pal <- c("Arm A" = "#003f5c", "Arm B" = "#bc5090", "Off Treatment" = "#ffa600")
swimmer_plot(df = ClinicalTrial.Arm, id = "id", end = "End_trt", name_fill = "Arm", width = .85) +
scale_fill_manual(name = "Arm", values = pal)

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).
Following is my code.
library(fpc)
library(dbscan)
data("iris")
head(iris,2)
data1 <- iris[,1:4]
head(data1,2)
set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)
table(db$cluster,iris$Species)
plot(db,data1,main = 'DBSCAN')
Error: Error in axis(side = side, at = at, labels = labels, ...) :
invalid value specified for graphical parameter "pch"
How to rectify this error?
I have a suggestion below, but first I see two issues:
You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.
Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:
# load our packages
# note: only loading dbscacn, not loading fpc since we're not using it
library(dbscan)
library(ggplot2)
library(dplyr)
# run dbscan::dbscan() on the first four columns of iris
db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
# create a new data frame by binding the derived clusters to the original data
# this keeps our input and output in the same dataframe for ease of reference
data2 <- bind_cols(iris, cluster = factor(db$cluster))
# make a table to confirm it gives the same results as the original code
table(data2$cluster, data2$Species)
# using ggplot, make a point plot with "jitter" so each point is visible
# x-axis is species, y-axis is cluster, also coloured according to cluster
ggplot(data2) +
geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
position = "jitter") +
labs(title = "DBSCAN")
Here's the image it generates:
If you're looking for something else, please be more specific about what the final plot should look like.

Highlight a Specific Section of a Treemap using Treemap package in R

I am using the Treemap package in R to highlight the number of COVID outbreaks in different settings. I am making a number of different reports using R Markdown. Each one describes a different type of settings and I would like to highlight that setting in the treemap for each report, showing what proportion of total outbreaks occur in the setting in question. For example you I am currently working on the K-12 school report and would like to highlight the box representing that category in the figure.
I was previously using an exploded donut pie chart however there were two many subcategories and the graph became hard to read.
I am picturing a way to change the label or border on one specific box, ie. put a yellow border around the box or make the label yellow. I found a way to do both these things for all the boxes but not just one specific box. I made this image using the snipping tool to further illustrate what the desired outcome might look like. The code to generate the treemap can be found in the link below. It looks like this:
# library
library(treemap)
# Build Dataset
group <- c(rep("group-1",4),rep("group-2",2),rep("group-3",3))
subgroup <- paste("subgroup" , c(1,2,3,4,1,2,1,2,3), sep="-")
value <- c(13,5,22,12,11,7,3,1,23)
data <- data.frame(group,subgroup,value)
# treemap
treemap(data,
index=c("group","subgroup"),
vSize="value",
type="index"
)
This is the most straightforward information I can find about the package, this is where I took the sample image and code from: https://www.r-graph-gallery.com/236-custom-your-treemap.html
It looks like the treemap package doesn't have a built-in way to do this. But we can hack it by using the data frame returned by treemap() and adding a rectangle to the appropriate viewport.
# Plot the treemap and save the data used for plotting.
t = treemap(data,
index = c("group", "subgroup"),
vSize = "value",
type = "index"
)
# Add a rectangle around subgroup-2.
library(grid)
library(dplyr)
with(
# t$tm is a data frame with one row per rectangle. Filter to the group we
# want to highlight.
t$tm %>%
filter(group == "group-1",
subgroup == "subgroup-2"),
{
# Use grid.rect to add a rectangle on top of the treemap.
grid.rect(x = x0 + (w / 2),
y = y0 + (h / 2),
width = w,
height = h,
gp = gpar(col = "yellow", fill = NA, lwd = 4),
vp = "data")
}
)

Using multiple datasets for one graph

I have 2 csv data files. Each file has a "date_time" column and a "temp_c" column. I want to make the x-axis have the "date_time" from both files and then use 2 y-axes to display each "temp_c" with separate lines. I would like to use plot instead of ggplot2 if possible. I haven't been able to find any code help that works with my data and I'm not sure where to really begin. I know how to do 2 separate plots for these 2 datasets, just not combine them into one graph.
plot(grewl$temp_c ~ grewl$date_time)
and
plot(kbll$temp_c ~ kbll$date_time)
work separately but not together.
As others indicated, it is easy to add new data to a graph using points() or lines(). One thing to be careful about is how you format the axes as they will not be automatically adjusted to fit any new data you input using points() and the like.
I've included a small example below that you can copy, paste, run, and examine. Pay attention to why the first plot fails to produce what you want (axes are bad). Also note how I set this example up generally - by making fake data that showcase the same "problem" you are having. Doing this is often a better strategy than simply pasting in your data since it forces you to think about the core component of the problem you are facing.
#for same result each time
set.seed(1234)
#make data
set1<-data.frame("date1" = seq(1,10),
"temp1" = rnorm(10))
set2<-data.frame("date2" = seq(8,17),
"temp2" = rnorm(10, 1, 1))
#first attempt fails
#plot one
plot(set1$date1, set1$temp1, type = "b")
#add points - oops only three showed up bc the axes are all wrong
lines(set2$date2, set2$temp2, type = "b")
#second attempt
#adjust axes to fit everything (set to min and max of either dataset)
plot(set1$date1, set1$temp1,
xlim = c(min(set1$date1,set2$date2),max(set1$date1,set2$date2)),
ylim = c(min(set1$temp1,set2$temp2),max(set1$temp1,set2$temp2)),
type = "b")
#now add the other points
lines(set2$date2, set2$temp2, type = "b")
# we can even add regression lines
abline(reg = lm(set1$temp1 ~ set1$date1))
abline(reg = lm(set2$temp2 ~ set2$date2))

Dimple dPlot color x-axis bar values in R

I'm attempting to set manual colors for Dimple dPlot line values and having some trouble.
d1 <- dPlot(
x="Date",
y="Count",
groups = "Category",
data = AB_DateCategory,
type = 'line'
)
d1$xAxis(orderRule = "Date")
d1$yAxis(type = "addMeasureAxis")
d1$xAxis(
type = "addTimeAxis",
inputFormat = "%Y-%m-%d",
outputFormat = "%Y-%m-%d",
)
The plot comes out looking great, but I would like to manually set the "Category" colors. Right now, it's set to the defaults and I cannot seem to find a method of manually setting a scale.
I have been able to set the defaults using brewer.pal, but I want to match other colors in my report:
d1$defaultColors(brewer.pal(n=4,"Accent"))
Ideally, these are my four colors - the category values I'm grouping on are R, D, O and U.
("#377EB8", "#4DAF4A", "#E41A1C", "#984EA3"))
If I understand correctly, you want to make sure R is #377EB8, etc. To match R, D, O, U consistently to the colors especially across multiple charts, you will need to do something like this.
d1$defaultColors = "#!d3.scale.ordinal().range(['#377EB8', '#4DAF4A', '#E41A1C', '#984EA3']).domain(['R','D','O','U'])!#"
This is on my list of things to make easier.
Let me know if this doesn't work.
The issue with the accepted answer above is that defining an ordinal scale will not guarantee that specific colors are bound to specific categories R, D, O and U. The color mapping will change depending on the input data. To assign each color specifically you can use assignColor like this
d1$setTemplate(afterScript = '<script>
myChart.assignColor("R","#377EB8");
myChart.draw();
</script>')

Resources