Formatting changes affect only legend and not bar graph using swimplot and ggplot2 packages - r

Update- this issue was solved, updated code is at the end of the post.
I am trying to create a swimmer plot to visualize individual patient duration of treatment with a drug administered at multiple dose levels (DLs). Each patient will be be assigned to treatment with only one DL, but multiple patients can be assigned to a given DL (e.g. 3 patients at DL1, 3 patients and DL2, etc.). I would like to color code the bars in the swimmer plot according to DL.
I am using the swimplot package for R and have been following the guide located here (https://cran.r-project.org/web/packages/swimplot/vignettes/Introduction.to.swimplot.html).
This guide has been sufficient for most things I have tried, up until I tried to change the colors of the bars in the plot and corresponding legend. Following the section in that guide titled "Modifying Colours and shapes" under "Making the plots more aesthetically pleasing with ggplot manipulations", I was able to change the bar colors in the legend, but not the bars themselves.
Example here
I have been using the following code.
library(ggplot2)
library (swimplot)
library (gdata)
library (readxl)
ClinicalTrial.Arm <- read_excel("Swimmer_Test_Data1.xls")
ClinicalTrial.Arm <- as.data.frame(ClinicalTrial.Arm)
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt',width=.85+ scale_fill_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))+ scale_color_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))
arm_plot
I have tried a number of things to fix this, but am quite new to R and don't think I really know enough to troubleshoot effectively. I have tried various syntax changes (e.g. removing quotation marks) and have tried using the geom bar command but wasn't sure how/what to map to X and Y (it also seems like I shouldn't need to do this).
I have also tried using the following code, but get an error.
Colors <- c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600")
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt',width=.85, fill = Colors)+ scale_fill_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))+ scale_color_manual(name="Arm",values=c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600"))
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (20): fill
Run `rlang::last_error()` to see where the error occurred.
Any help here would be greatly appreciated.
Solved! Updated, working code
library(ggplot2)
library (swimplot)
library (gdata)
library (readxl)
ClinicalTrial.Arm <- read_excel("Swimmer_Test_Data1.xls")
ClinicalTrial.Arm <- as.data.frame(ClinicalTrial.Arm)
Colors <- c("DL1" ="#003f5c", "DL2"="#374c80","DL3"="#7a5195","DL4"="#bc5090","DL5"="#ef5675","DL6"="#ff764a","DL7"="#ffa600")
arm_plot <- swimmer_plot(df=ClinicalTrial.Arm,id='id',end='End_trt', name_fill = "Arm", width=.85) + scale_fill_manual(name="Arm",values = Colors) +
scale_color_manual(name="Arm",values=Colors)

To make your code work you first have to map a variable on the fill aesthetic which using swimplot could be achieved via the name_fill argument:
Note: As I use the ClinicalTrial.Arm dataset from the swimplot package I adjusted your color palette to make it work with the three categories of the Arm column in this dataset.
library(ggplot2)
library(swimplot)
#pal <- c("DL1" = "#003f5c", "DL2" = "#374c80", "DL3" = "#7a5195", "DL4" = "#bc5090", "DL5" = "#ef5675", "DL6" = "#ff764a", "DL7" = "#ffa600")
pal <- c("Arm A" = "#003f5c", "Arm B" = "#bc5090", "Off Treatment" = "#ffa600")
swimmer_plot(df = ClinicalTrial.Arm, id = "id", end = "End_trt", name_fill = "Arm", width = .85) +
scale_fill_manual(name = "Arm", values = pal)

Related

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).
Following is my code.
library(fpc)
library(dbscan)
data("iris")
head(iris,2)
data1 <- iris[,1:4]
head(data1,2)
set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)
table(db$cluster,iris$Species)
plot(db,data1,main = 'DBSCAN')
Error: Error in axis(side = side, at = at, labels = labels, ...) :
invalid value specified for graphical parameter "pch"
How to rectify this error?
I have a suggestion below, but first I see two issues:
You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.
Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:
# load our packages
# note: only loading dbscacn, not loading fpc since we're not using it
library(dbscan)
library(ggplot2)
library(dplyr)
# run dbscan::dbscan() on the first four columns of iris
db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
# create a new data frame by binding the derived clusters to the original data
# this keeps our input and output in the same dataframe for ease of reference
data2 <- bind_cols(iris, cluster = factor(db$cluster))
# make a table to confirm it gives the same results as the original code
table(data2$cluster, data2$Species)
# using ggplot, make a point plot with "jitter" so each point is visible
# x-axis is species, y-axis is cluster, also coloured according to cluster
ggplot(data2) +
geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
position = "jitter") +
labs(title = "DBSCAN")
Here's the image it generates:
If you're looking for something else, please be more specific about what the final plot should look like.

How to specify bin colors for plot_usmap?

I'm looking to create a heat map with a little more control over the color scale, specifically I want to have bins for ranges of values that will correspond to a specific color.
Below I provide some sample code to generate some data and make a plot. The issue seems to be how it maps the colors to the breaks, it is not a 1:1 correspondence, when I add more percentiles to the breaks it seems to stretch the colors.
It does not appear to be a large issue here, but when I apply this to the entire US data set I'm working with the color scheme really breaks down.
library(usmap)
library(ggplot2)
fips <- seq(45001,45091,2)
value <- rnorm(length(fips),3000,10000)
data <- data.frame(fips,value)
data$value[data$value<0]=0
plot_usmap(regions='counties',data=data,values="value",include="SC") +
scale_fill_stepsn(breaks=c(as.numeric(quantile(data$value,seq(.25,1,.25)))),
colors=c("blue","green","yellow","red"))
plot_usmap(regions='counties',data=data,values="value",include="SC") +
scale_fill_stepsn(breaks=c(as.numeric(quantile(data$value,seq(0,1,.1)))),
colors=c("blue","green","yellow","red"))
#data not provided for this bit
plot_usmap(regions='counties',data=datar,values="1969",exclude=c("AK","HI")) +
scale_fill_stepsn(breaks=c(as.numeric(quantile(datar$`1969`,seq(0,1,.1)))),
colours=c("blue","green","yellow","red"))
One way would be to manually bin the percentiles and then use the factor levels for your manual breaks and labels.
I've never used this high level function from usmap, so I don't know how to deal with this warning which comes up. Would personally prefer and recommend to use ggplot + geom_polygon or friends for more control.
library(usmap)
library(ggplot2)
fips <- seq(45001,45091,2)
value <- rnorm(length(fips),3000,10000)
mydat <- base::data.frame(fips,value)
mydat$value[mydat$value<0]=0
mydat$perc_cuts <- as.integer(cut(ecdf(mydat$value)(mydat$value), seq(0,1,.25)))
plot_usmap(regions='counties',
data=mydat,
values="perc_cuts",include="SC") +
scale_fill_stepsn(breaks= 1:4, limits = c(0,4), labels = seq(.25, 1, .25),
colors=c("blue","green","yellow","red"),
guide = guide_colorsteps(even.steps = FALSE))
#> Warning: Use of `map_df$x` is discouraged. Use `x` instead.
#> Warning: Use of `map_df$y` is discouraged. Use `y` instead.
#> Warning: Use of `map_df$group` is discouraged. Use `group` instead.
Created on 2020-06-27 by the reprex package (v0.3.0)

Displaying counts instead of "levels" using stat_density2d

My objective is to portray the locations with varying numbers of traffic conflicts in a road intersection. My data consists of all the conflicts that we observed in a given time period at an intersection coded into a .CSV file with the following fields "time of conflict", "TTC" (means Time to Collision), "Lat", "Lon" and "Conflict Type". I figured the best way to do so would be using the 'ggmap+stat_density2d' function in R. I am using the following code:
df = read.csv(filename, header = TRUE)
int.map = get_map(location = c(mean.long, mean.lat), zoom = 20, maptype = "satellite")
int.map = ggmap(int.map, extent ="device", legend = "right")'''
int.map +stat_density2d(data = new_xdf, aes(x, y, fill = ..levels.., alpha = ..levels..),
geom = "polygon")
int.map + scale_fill_gradientn(guide = "colourbar", colours = rev(brewer.pal(7,"Spectral")),
name = "Conflict Density")
The output is a very nice map Safety Heat Map that correctly portrays the conflict hotspots. My problem is that in the legends it gives the values of "levels" automatically calculated by the 'stat_density2d()' function. I tried searching for a way to display, say, the counts of all conflict points inside each level on the legend bar but to no avail.
I did find the below link that handles a similar question, but the problem with that is that it creates a new data frame (new_xdf) with much more points than in the original data. Thus, the counts determined in that program seems to be of no use to me as I want the exact number of conflict points in my original data to be displayed in the legends bar.
How to find points within contours in R?
Thanks in advance.
Edit: Link to a sample data file
https://docs.google.com/spreadsheets/d/11vc3lOhzQ-tgEiAXe-MNw2v3fsAqnadweVrvBdNyNuo/edit?usp=sharing

Run points() after plot() on a dataframe

I'm new to R and want to plot specific points over an existing plot. I'm using the swiss data frame, which I visualize through the plot(swiss) function.
After this, want to add outliers given by the Mahalanobis distance:
mu_hat <- apply(swiss, 2, mean); sigma_hat <- cov(swiss)
mahalanobis_distance <- mahalanobis(swiss, mu_hat, sigma_hat)
outliers <- swiss[names(mahalanobis_distance[mahalanobis_distance > 10]),]
points(outliers, pch = 'x', col = 'red')
but this last line has no effect, as the outlier points aren't added to the previous plot. I see that if repeat this procedure on a pair of variables, say
plot(swiss[2:3])
points(outliers[2:3], pch = 'x', col = 'red')
the red points are added to the plot.
Ask: is there any restriction to how the points() function can be used for a multivariate data frame?
Here's a solution using GGally::ggpairs. It's a little ugly as we need to modify the ggally_points function to specify the desired color scheme.
I've assumed that mu_hat = colMeans(swiss) and sigma_hat = cov(swiss).
library(dplyr)
library(GGally)
swiss %>%
bind_cols(distance = mahalanobis(swiss, colMeans(swiss), cov(swiss))) %>%
mutate(is_outlier = ifelse(distance > 10, "yes", "no")) %>%
ggpairs(columns = 1:6,
mapping = aes(color = is_outlier),
upper = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
lower = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
axisLabels = "internal")
Unfortunately this isn't possible the way you're currently doing things. When plotting a data frame R produces many plots and aligns them. What you're actually seeing there is 6 by 6 = 36 individual plots which have all been aligned to look nice.
When you use the dots command, it tells it to place the dots on the current plot. Which doesn't really make sense when you have 36 plots, at least not the way you want it to.
ggplot is a really powerful tool in R, it provides far greater combustibility. For example you could set up the dataframe to include your outliers, but have them labelled as "outlier" and place it in each plot that you have set up as facets. The more you explore it you might find there are better plots which suit your needs as well.
Plotting a dataframe in base R is a good exploratory tool. You could set up those outliers as a separate dataframe and plot it, so you can see each of the 6 by 6 plots side by side and compare. It all depends on your goal. If you're goal is to produce exactly as you've described, the ggplot2 package will help you create something more professional. As #Gregor suggested in the comments, looking up the function ggpairs from the GGally package would be a good place to start.
A quick google image search shows some funky plots akin to what you're after and then some!
Find it here

ggplot2 equivalent of 'factorization or categorization' in googleVis in R

Due to static graph prepared by ggplot, we are shifting our graphs to googleVis with interactive charts. But when it comes to categorization we are facing many problems. Let me give example which will help you understand:
#dataframe
df = data.frame( x = sample(1:100), y = sample(1:100), cat = sample(c('a','b','c'), 100, replace=TRUE) )
ggplot2 provides parameter like alpha, colour, linetype, size which we can use with categories like shown below:
ggplot(df) + geom_line(aes(x = x, y = y, colour = cat))
Not just line chart, but majority of ggplot2 graphs provide categorization based on column values. Now I would like to do the same in googleVis, based on value df$cat I would like parameters to get changed or grouping of line or charts.
Note:
I have already tried dcast to make multiple columns based on category column and use those multiple columns as Y input, but that it not what I would like to do.
Can anyone help me regarding this?
Let me know if you need more information.
vrajs5 you are not alone! We struggled with this issue. In our case we wanted to fill bar charts like in ggplot. This is the solution. You need to add specifically named columns, linked to your variables, to your data table for googleVis to pick up.
In my fill example, these are called roles, but once you see my syntax you can abstract it to annotations and other cool features. Google has them all documented here (check out superheroes example!) but it was not obvious how it applied to r.
#mages has this documented on this webpage, which shows features not in demo(googleVis):
http://cran.r-project.org/web/packages/googleVis/vignettes/Using_Roles_via_googleVis.html
EXAMPLE ADDING NEW DIMENSIONS TO GOOGLEVIS CHARTS
# in this case
# How do we fill a bar chart showing bars depend on another variable?
# We wanted to show C in a different fill to other assets
suppressPackageStartupMessages(library(googleVis))
library(data.table) # You can use data frames if you don't like DT
test.dt = data.table(px = c("A","B","C"), py = c(1,4,9),
"py.style" = c('silver', 'silver', 'gold'))
# Add your modifier to your chart as a new variable e.g. py1.style
test <-gvisBarChart(test.dt,
xvar = "px",
yvar = c("py", "py.style"),
options = list(legend = 'none'))
plot(test)
We have shown py.style deterministically here, but you could code it to be dependent on your categories.
The secret is myvar.googleVis_thing_youneed linking the variable myvar to the googleVis feature.
RESULT BEFORE FILL (yvar = "py")
RESULT AFTER FILL (yvar = c("py", "py.style"))
Take a look at mages examples (code also on Github) and you will have cracked the "categorization based on column values" issue.

Resources