ggplot2 multi-variable scatterplot, Changing Labels and View in Margins - r

I am trying to create a scatterplot based on four values. My data is just lists of prices (BASIC,VALUE,DELUXE,ULTIMATE). I want VALUE and DELUXE to be the two axis (x,y) and then have the size and color of the points represent the data for the other two columns.
It is hard to set up a reproducible example, because it is only an issue when I get a lot of values listed. i have about 300 points, with about 30 different color/value labels(For ULTIMATE, and 20 size/value labels(For BASIC)
> gg <- ggplot(d, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1)
> plot(gg)
My code does this well, and lists the colors/size with the corresponding value on the side. This is great, but I would like to alter how that is displayed, so that it is not cut off. I would like to be able to "wrap" the values into more columns, or shrink the display size of those so that they fit.
Currently, this lists ULTIMATE in three columns, to the right of the plot area, but cuts off the top of the labels (it extends well above the plot area)
This lists BASIC size/value labels to the right of the plot area, below ULTIMATE labels, in one column, so about half are cut off at the bottom.
I can increase the margins with:
> gg <- ggplot(d, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1) +theme(plot.margin = unit(c(4,2,4,2), "cm"))
> plot(gg)
This gets more of it in, but creates lots of white area and a smaller view of the plot. I would like to be able to just increase the right margin if necessary, and "wrap" the labels in more columns extending to the right. (i.e. put ULTIMATE into 4 columns instead of 3, and put BASIC into 3-4 columns instead of 1 - So that they are shorter and don't run out the plot area.

There is some built in functionality I found to do the required operation. It lies in adding a guides() argument to the plot, specifying whether I am dealing with the color or size legend, and specifying the number of columns with "ncol = " (You can also specify rows). Giving it an order ranking allows you to rank these as well, so my resulting code was:
> gg <- ggplot(Table, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1) + guides(color = guide_legend(order = 0,ncol = 4),size = guide_legend(order = 1,ncol = 4))

Related

How can I ensure consistent axis lengths between plots with discrete variables in ggplot2?

I've been trying to standardise multiple bar plots so that the bars are all identical in width regardless of the number of bars. Note that this is over multiple distinct plots - faceting is not an option. It's easy enough to scale the plot area so that, for instance, a plot with 6 bars is 1.5* the width of a plot with 4 bars. This would work perfectly, except that each plot has an expanded x axis by default, which I would like to keep.
"The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables."
https://ggplot2.tidyverse.org/reference/scale_discrete.html
My problem is that I can't for the life of me work out what '0.6 units' actually means. I've manually measured the distance between the bars and the y axis in various design tools and gotten inconsistent answers, so I can't factor '0.6 units' into my calculations when working out what size the panel windows should be. Additionally I can't find any answers on how many 'units' long a discrete x axis is - I assumed at first it would be 1 unit per category but that doesn't fit with the visuals at all. I've included an image that hopefully shows what I mean - the two graphs
In this image, the top graph has a plot area exactly 1.5* that of the bottom graph. Seeing as it has 6 bars compared with 4, that would mean each bar is the same width, except that that extra space between the axis and the first bar messes this up. Setting expand = expansion(add = c(0, 0)) clears this up but results in not-so-pretty graphs. What I'd like is for the bars to be identical in width between the two plots, accounting for this extra space. I'm specifically looking for a general solution that I can use for future plots, not for the individual solution for this sample. As such, what I'd really like to know is how many 'units' long are these two x axes? Many thanks for any and all help!
Instead of using expansion for the axis, I would probably use the fact that categorical variables are actually plotted on the positive integers on Cartesian co-ordinates. This means that, provided you know the maximum number of columns you are going to use in your plots, you can set this as the range in coord_cartesian. There is a little arithmetic involved to keep the bars centred, but it should give consistent results.
We start with some reproducible data:
library(ggplot2)
set.seed(1)
df <- data.frame(group = letters[1:6], value = 100 * runif(6))
Now we set the value for the maximum number of bars we will need:
MAX_BARS <- 6
And the only thing "funny" about the plot code is the calculation of the x axis limits in coord_cartesian:
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
Now let us remove one factor level and run the exact same plot code:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
And again:
df <- df[-1,]
ggplot(df, aes(group, value)) +
geom_col() +
coord_cartesian(xlim = c(1 -(MAX_BARS - length(unique(df$group)))/2,
MAX_BARS - (MAX_BARS - length(unique(df$group)))/2))
You will see the bars remain constant width and centralized, yet the panel size remains fixed.
Created on 2021-11-06 by the reprex package (v2.0.0)

How to alter distances between plots in a 4 X 4 graph panel?

I am trying to create a graph panel with 8 graphs in total ( 4 x 4). Each graph corresponds to a different gene, whereby there are three lines ( one for control, one for UC disease and one for Crohns), representing the average change in expression comparing a first measurement and a second.
The code I am using to run each of the plots is;
s <- ggplot(X876, aes(x=Timepoint, y=value, group=Group)) +
geom_line(aes(color=Group), size=1)+
geom_point(aes(color=Group), size=2.5) +
labs(y="X876") + ylim(0.35, 0.55) +
theme_classic() +
scale_color_manual(values=c("darkmagenta", "deepskyblue4", "dimgrey"))
Using grid.arrange(l, m, n, o, p, q, r, s, nrow=4, nrow=4), creates a graph panel where the y axes names overlap.
I have seen on here about changing the plot margins via,
pl = replicate(3, ggplot(), FALSE)
grid.arrange(grobs = pl)
margin = theme(plot.margin = unit(c(2,2,2,2), "cm"))
grid.arrange(grobs = lapply(pl, "+", margin))
However, I am unsure how this can be applied to increase the vertical height between the plots on the top and bottom rows. For each of the graphs l, m, n, o, p, q, r, s do I need to include
+ theme(plot.margin=unit(c(t,r,b,l),"cm"))
and then run the grid.arrange(l, m, n, o, p, q, r, s, nrow=4, ncol=4)
Please could somebody suggest which values do I need to include for top (t), right(r), bottom (b), left(l) to only increase the distance (by about 3cms) between the top and bottom row? I am trying different values and I'm not getting a decent graph panel yet.
Thank-you
Probably the easiest way is to create your own theme based on the theme_classic theme and then modify the plotting margins (and anything else) the way that you prefer.
theme_new <- theme_classic() +
theme(plot.margin=unit(c(1,0,1,0), "cm")) # t,r,b,l
Then set the theme (will revert back to the default on starting a new R session).
theme_set(theme_new)
The alternative is to use grid.arrange and modify the margins using the grobs as you've already mentioned.
Once the panels have been arranged, you can then modify the top and bottom margins (or left and right) by specifying the vp argument of grid.arrange, which allows you to modify the viewport of multiple grobs on a single page. You can specify the height and width using the viewport function from the grid package.
For example, if you have a list of ggplot() grobs called g.list that contain your individual plots (l,m,n,o,p,q,r,s), then the following would reduce the height of the viewport by 90%, which effectively increases the top and bottom margins equally by 5%.
library(grid)
library(gridExtra)
grid.arrange(grobs = g.list, vp=viewport(height=0.9))
Without your data, I can't test it, especially to see if the y-axes labels overlap. And I don't know why you think increasing the top and bottom margins can solve that problem since the y-axes are, by default, on the left-hand side of the graph.
Anyway, I'll use the txhousing dataset from the ggplot2 package to see if I can reproduce your problem.
library(ggplot2)
data(txhousing)
theme_new <- theme_classic() +
theme(plot.margin=unit(c(0.1,0.1,0.1,0.1), "cm"), text=element_text(size=8))
theme_set(theme_new)
tx.list <- split(txhousing, txhousing$year)
g.list <- lapply(tx.list, function(data)
{
ggplot(data, aes(x=listings, y=sales)) +
geom_point(size=0.5)
} )
grid.arrange(grobs = g.list, vp=viewport(height=0.9))
I don't see any overlapping. And I don't see why increasing the top and bottom margins would make much difference.
The question was asked a couple of years ago, but I bumped into it only now and thought that I might share a quick and dirty tip for this, which works good enough in many cases.
In some situations the theme is already so complex that this trick might be the easiest way: adding a few \n's (newlines) to the x and y axis names, as this will affect the distances between the plots in the panel. I've learned this trick for a slightly different purpose from here (originally from here).
I'll use the same logic for the example dataset (in this case: Orange from R built-in data sets) as in the excellent code by the previous answerer.
library(ggplot2)
library(gridExtra)
or.list <- split(Orange, Orange$Tree)
g.list <- lapply(or.list, function(data)
{
ggplot(data, aes(x=age, y=circumference)) +
theme_classic() +
geom_point(size=0.5) +
scale_x_continuous(name = "Age\n\n") +
scale_y_continuous(name = "\n\n\nCircumference")
} )
grid.arrange(grobs = g.list)

Controling size/ scale of additional points in ggplot

I have a plot in which I am trying to highlight different points along the line. My plot looks like this:
library(ggplot2)
X<-c(seq(1:20))
Y<-c(2,4,6,3,5,8,6,5,4,3,2,4,6,6,9,8,9,5,4,3)
Col<-as.character(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5))
value<-c(NA,NA,NA,NA,10,NA,NA,NA,20,NA,NA,20,20,10,NA,NA,NA,10,NA,NA)
DF<-data.frame(X,Y,Col,value)
p<-ggplot(DF, aes(x=X, y=Y)) +geom_line(size=.02) +geom_point(col='black', size=.5)+ geom_jitter(aes(color = Col, size = value),position = position_jitter(height = .2, width = .2)) + scale_colour_manual(values=c("red", "violetred","orange",'blue','steelblue'))
p+guides(size = "none")
On top of the basic line plot, I have 5 different colored dot possibilities defined by the "Col" column. I also have 2 different size classes I'd like to represent using the "value" column. My issue is that the default sizes chosen to represent the "value" column is scaled so that the larger size appears too big in the plot in comparison to the smaller size.
Is there a way to manually set the scale so I can control the drawn size of the added points (hopefully make them closer is size to each other)? The final plot should be similar to the one here, only the value=10 points would be only slightly smaller than than the value=20 points.
I've tried using some of the "manual" options, but when I do it always plots the other "Y" column points, which for now are suppressed because of the NAs
In the same way you manually define the colours for your points using a scale_colour_ variant, you can define the sizes using scale_size.
In this case you can define the range of sizes to use. Adding + scale_size(range = c(4,6)) seems to give results that fit your description. You can tweak the size of, and the difference in size between the points by changing the numbers.

re-sizing ggplot geom_dotplot

I'm having trouble creating a figure with ggplot2. I am using geom_dotplot with center stacking to display my data which are discrete values for 4 categories.
For aesthetic reasons I want to customize the positions of the dots so that
reduce the empty space between dots along the y axis, (ie the dots are 1 value large)
The distributions fit and don't overlap
I've adjusted the bin and dotsize to achieve aesthetic goal 1, but that requires me to fiddle with the ylim() parameter to make sure that the groups fit in the plot. This results in a plot with more whitw space and few numbers on the y axis.
Question: Can anyone explain a way to resize the empty space on this plot?
My code is below:.
plot <- ggplot(figdata, aes(y=Counts, x=category, col=strain)) +
geom_dotplot(aes(fill=strain), dotsize=1, binwidth=.7,
binaxis= "y",stackdir ="centerwhole", stackratio=.7) +
ylim(18,59)
plot + scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_errorbar(stat="hline", yintercept="mean",
aes( ymax=..y..,ymin=..y.., group = category, width = 0.5),
color="black")
Which produces:
EDIT: Incorporating jitter will allow the all the data to fit, but I don't want to add noise to this data and would prefer to show it as discreet data.
adjusting the binwidth and dotsize to 0.3 as suggested below also fits all the data, however it leaves too much white space.
I think that I might have to transform my data so that the values are steps smaller than 1, in order to get everything to fit horizontally and dot sizes to big large enough to reduce white space.
I think the easiest way is using coord_cartesian:
plot + scale_color_manual(values=c("#E69F00", "#56B4E9")) +
geom_errorbar(stat="hline", yintercept="mean",
aes( ymax=..y..,ymin=..y.., group = category, width = 0.5),
color="black") +
coord_cartesian(ylim=c(17,40))
Which gives me this plot (with fake data that are not as neatly distributed as yours):

Extend X axis interval ggplot2

I am trying to plot data with lot's of X axis values. I am trying to not overlap my point with geom_point. I found lot's of discussions about "scale_x_continuous", "position = jitter or dodge" etc... and every time my problem is remaining because I need to keep my point aligned. Moreover, "scale_size_area" does not make it good.
EDIT: Generated data already melted at the end of the post.
I can not post image (Link to image), but to give the idea: I have 6 levels in my Y axis, and 400 levels in X axis. My points (shape = 1 = circle) are Y-levels aligned, and have different diameters depending on the value.
This is ok, but circles are overlapping.
plot <- ggplot(data, aes(x_variable_400_levels, y_variable_6_levels)) +
# value*100 because values are between 0 and 1 to have bigger circles
geom_point(shape = 1, size = data$value*100) +
# theme description
theme(
plot.title = element_text(lineheight=.8, face="bold", vjust=1),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=0.3)
)
So, my question is: Can I modify the interval between two values of the X axis in order to avoid the overlapping between circles? Jitter is not interesting here because the noise does not allow a good visualisation of data, including that when I tried to had only HORIZONTAL noise.
Any kind of solution, links or other tutorial to solve it will be appreciated.
EDIT : Generated data. Import with read.table, sep = "," and header = T. The point is that, I have very little circles and they are important too.
data <- read.table(text='"trf","sample","value"
36,"S1",0.143882104
38,"S1",0.025971979
47,"S1",0.016711593
56,"S1",0.027896069
67,"S1",0.025870577
93,"S1",0.07638307
100,"S1",0.022905895
102,"S1",0.019192547
104,"S1",0.018258923
107,"S1",0.005032219
114,"S1",0.028297368
123,"S1",0.007874848
131,"S1",0.024184004
36,"S2",0.115123666
38,"S2",0
47,"S2",0.00479275
56,"S2",0.029523128
67,"S2",0.030133055
93,"S2",0.044749246
100,"S2",0.032865979
102,"S2",0
104,"S2",0
107,"S2",0.013160255
114,"S2",0.052047248
123,"S2",0.007632445
131,"S2",0
36,"S3",0.179332128
38,"S3",0.046215267
47,"S3",0
56,"S3",0.070791832
67,"S3",0.050214857
93,"S3",0.074108014
100,"S3",0
102,"S3",0
104,"S3",0
107,"S3",0
114,"S3",0.081441849
123,"S3",0
131,"S3",0.100090456', header=T,sep=",")
I don't think changing the interval is the solution, as your x-axis is numeric. It would be more difficult to interpret if the space between for instance 1 and 2 is larger that the space between 9 and 10. And if you would change all intervals to the largest circle, the plot would be too wide. I also imagine it would be very cluttered if you have more data, which makes it harder to see patterns. Maybe a (faceted) barplot is the solution? Allows for horizontal and vertical comparison, small values are visible and values are easily extracted and compared. Here's a start:
p2 <- ggplot(data, aes(x=trf, y=value))+
geom_bar(stat="identity") +
facet_grid(sample~.) +
xlim(c(0,150)) + theme_bw()

Resources