Gap Y axis in ggplot - r

I have the below plot of ggplot with most Y values between 0-200, and one value ~3000:
I want to "zoom" on most of the values, but still show the high value
I wrote the following code:
Figure_2 <- ggplot(data = count_df, aes(x=count_df$`ng`,
y=count_df$`Number`)) +
geom_point(col = "darkmagenta") + ggtitle("start VS Number") +
xlab(expression(paste("start " , mu, "l"))) + ylab("Number") +
theme(plot.title = element_text(hjust = 0.5, color="orange", size=14,
face="bold.italic"),
axis.title.x = element_text(color="#993333", size=10, face = "bold"),
axis.title.y = element_text(color="#993333", size=10,face = "bold"))
Anybody knows how to achieve that?

A possible solution could be found by help of facet_grid. I do not have the exact data from OP but the approach should be to think of grouping y-axis in ranges. The OP has mentioned about two ranges as 0 - 200 and ~3000 for value of Number.
Hence, we have an option to divide Number by 2000 to transform it into factors representing 2 groups. That means factor(ceiling(Number/2000)) will create two factors.
Let's take similar data as OP and try our approach:
# Data
count_df <- data.frame(ng = 1:30, Number = sample(200:220, 30, TRUE))
# Change one value high as 3000
count_df$Number[20] <- 3000
library(ggplot2)
ggplot(data = count_df, aes(x=ng, y=Number)) +
geom_point() +
facet_grid(factor(ceiling(Number/2000))~., scales = "free_y") +
ggtitle("start VS Number") +
xlab(expression(paste("start " , mu, "l")))

Related

change environment size on scale_colour_manual to assign colour to factors to use across multiples plots

I need to make 5 plots of bacteria species. Each plot has a different number of species present in a range of 30-90. I want each bacteria to always have the same color in all plots, therefore I need to set an assigned color to each name.
I tried to use scale_colour_manual to create a color set but, the environment created has only 16 colors. How can I increase the number of colors present in the environment created?
the code I am using can be replicated as follow:
colour_genus <- stringi::stri_rand_strings(90, 5) #to be random names
nb.cols = nrow(colour_genus) #to set the length of my string
MyPalette = colorRampPalette(brewer.pal(12,"Set1"))(nb.cols) # the palette of choice
colGenus <- scale_color_manual(name = colour_genus, values = MyPalette)
The output formed contains only 16 values, so when I try to apply it to a figure with 90 factors, it complains I have only 16 values
abundance <- runif(90, min = 10, max = 100)
my_data <- data.frame(colour_genus, abundance)
p <- ggplot(my_data, aes(x = colour_genus, y= abundance)) +
geom_bar(aes(color = colour_genus, fill = colour_genus), stat = "identity", position = "stack") +
labs(x = "", y = "Relative Abundance\n") +
theme(panel.background = element_blank())
p + theme(legend.text= element_text(size=7, face="bold"), axis.text.x = element_text(angle = 90)) + guides(fill=guide_legend(ncol=2)) + scale_fill_manual(values=colGenus)
The following error shows:
Error: Insufficient values in manual scale. 90 needed but only 16 provided.
Thank you very much for your help.
When you know all your 90 bacci names in front of plotting, you can try.
set.seed(123)
colour_genus <- sort(stringi::stri_rand_strings(90, 5))#to be random names. I sorted the vector to illustrate the output better (optional).
MyPalette <- sample(colors(), length(colour_genus))
# named vector for scale_fill
names(MyPalette) <- colour_genus
# data
abundance <- runif(90, min = 10, max = 100)
my_data <- data.frame(colour_genus, abundance)
# two sets to show results
set1 <- my_data[20:30,]
set2 <- my_data[25:35,]
ggplot(set1, aes(x = colour_genus, y= abundance)) +
geom_col(aes(fill = colour_genus)) +
scale_fill_manual(values = MyPalette)
ggplot(set2, aes(x = colour_genus, y= abundance)) +
geom_col(aes(fill = colour_genus)) +
scale_fill_manual(values = MyPalette)

how to change geom_tile scale for very small values?

I have a dataframe containing some comparisons and the value represent the similarity between objects. I have a real object compared to some random ones which led to very small similarity. Also, I compared random objects versus random which led to higher similarity rate. At this point I want to put all together and plot it as a heatmap. Problem is that very small values of similarity which I want to highlight have the same colour as the not-so-small from the random-random comparison. Of course this is a problem of scale but I don't know how to manage colour scale. The following code generate a heatmap that actually show the issue. Here, the first column has a yellowish colour, which is fine, but this is the same colour as other tiles which, on the other hand, have higher, non comparable values. How to colour tiles accordingly to the actual scale?
The code:
set.seed(131)
#number of comparisons in the original data: 1 value versus n=10
n <- 10
#generate real data (very small values)
fakeRealData <- runif(n, min=0.00000000000001, max=0.00000000000002)
#and create the data structure
realD <- cbind.data.frame(rowS=rep("fakeRealData", n), colS=paste("rnd", seq(1, n, by=1), sep=" "), Similarity=fakeRealData, stringsAsFactors=F)
#the same for random data, n=10 random comparisons make for a n by n matrix
rndN <- n*n
randomData <- data.frame(matrix(runif(rndN), nrow=n, ncol=n))
rowS <- vector()
#for each column of randomData
for (r in seq(1, n, by=1)) {
#create a vector of the first rowname, then the second, the third, etc etc which is long as the number of columns
rowS <- append(rowS, rep(paste("rnd", r, sep=" "), n))
}
#and create the random data structure
randomPVs <- cbind.data.frame(rowS=rowS, colS=rep(paste("rnd", seq(1, n, by=1), sep=" "), n), Similarity=unlist(randomData), stringsAsFactors=F)
#eventually put everything together
everything <- rbind.data.frame(randomPVs, realD)
#and finally plot the heatmap
heaT <- ggplot(everything, aes(rowS, colS, fill=Similarity)) +
geom_tile() +
scale_fill_distiller(palette = "YlGn", direction=2) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
xlab("")+
ylab("")
plot(heaT)
Here are three approaches:
Add geom_text to your plot to show the values when color differences are small.
heaT <- ggplot(everything, aes(rowS, colS)) +
geom_tile(aes(fill=Similarity)) +
scale_fill_distiller(palette = "YlGn", direction=2) +
geom_text(aes(label = round(Similarity, 2))) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("") +
ylab("")
Use the values argument to set a nonlinear scale to scale_fill_distiller. I added an extra break point at 0.01 to the otherwise linear scale to accentuate the difference between 0 and small nonzero numbers. I let the rest of the scale linear.
heaT <- ggplot(everything, aes(rowS, colS)) +
geom_tile(aes(fill=Similarity)) +
scale_fill_distiller(palette = "YlGn", direction=2,
values = c(0, 0.01, seq(0.05, 1, 0.05))) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("") +
ylab("")
Transform your scale as Richard mentioned in the comments. Note that this will mess with the values in the legend, so either rename it or hide it.
heaT <- ggplot(everything, aes(rowS, colS)) +
geom_tile(aes(fill=Similarity)) +
scale_fill_distiller(palette = "YlGn", direction=2, trans = "log10",
name = "log10(Similarity)") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
xlab("")+
ylab("")
Try combinations of these approaches and see what you like.

Need help on customizing my Odds Ratio (ggplot)!

I'm assigned to create an Odds of Ratio ggplot in R. The plot I'm supposed to create is given below.
Given plot
My job is to figure out codes which creates the exact plots in R. I've done most parts. Here is my work.
My work
Before jumping into my code, it is very important that I am not using the correct values for boxOdds, boxCILow, and boxCIHigh since I have not figured out the correct values. I wanted to figure out codes for ggplot first so I can enter the right values as soon as I find them.
This is the code I used:
library(ggplot2)
boxLabels = c("Females/Males", "Student-Centered Prac. (+1)", "Instructor Quality (+1)", "Undecided / STM",
"non-STEM / STM", "Pre-med / STM", "Engineering / STM", "Std. test percentile (+10)",
"No previous calc / HS calc", "College calc / HS calc")
df <- data.frame(yAxis = length(boxLabels):1,
boxOdds =
c(2.23189, 1.315737, 1.22866, 0.8197413, 0.9802449, 0.9786673, 0.6559005, 0.5929812, 0.6923759, 1.3958275),
boxCILow =
c(.7543566,1.016,.9674772,.6463458,.9643047,.864922,.4965308,.3572142, 0.4523759, 1.2023275),
boxCIHigh =
c(6.603418,1.703902,1.560353,1.039654,.9964486,1.107371,.8664225,.9843584, 0.9323759, 1.5893275)
)
(p <- ggplot(df, aes(x = boxOdds, y = boxLabels)) +
geom_vline(aes(xintercept = 1), size = 0.75, linetype = 'dashed') +
geom_errorbarh(aes(xmax = boxCIHigh, xmin = boxCILow), size = .5, height =
0, color = 'gray50') +
geom_point(size = 3.5, color = 'orange') +
theme_bw() +
theme(panel.grid.minor = element_blank()) +
scale_x_continuous(breaks = seq(0,7,1) ) +
ylab('') +
xlab('Odds Ratio') +
annotate(geom = 'text', y =1.1, x = 3.5, label ='',
size = 3.5, hjust = 0) + ggtitle('Estimated Odds of Switching') +
theme(plot.title = element_text(hjust = 0.5, size = 30),
axis.title.x = (element_text(size = 15))) +
theme(panel.grid.minor = element_blank(), panel.grid.major = element_blank())
)
p
Where I'm stuck at:
Removing small vertical lines on the beginning and end of each row's CI). I was not sure what it's called so I was having hard time looking it up. SOLVED
I'm also stuck at coloring specific rows in different colors.
The last part I'm stuck at is assigning proper order of each variable for y-axis. As you can see in my code ("boxLabels" part), I have put all the variables in order of given plot but it seems like the R didn't care about the order. So the varaible located at the very top is "Undecided / STM", instead of "Females / Males".
How do I decrease the space from 0 to 1? SOLVED
Any help would be appreciated!
First, probably you want ggstance::geom_pointrangeh. Second, you could define colors by yAxis right at the beginning. To group some factors create a new variable group. Third is related to your data where you could assign factor labels. Fourth, remove coord_trans as suggested by #beetroot.
Assign factor labels
dat$yAxis <- factor(dat$yAxis, levels=10:1, labels=rev(boxLabels))
Create groups
dat$group <- 1
dat$group[which(dat$yAxis %in% c("Females/Males", "Undecided / STM", "non-STEM / STM",
"Pre-med / STM"))] <- 2
dat$group[which(dat$yAxis %in% c("Student-Centered Prac. (+1)",
"No previous calc / HS calc",
"College calc / HS calc"))] <- 3
Colors
colors <- c("#860fc2", "#fc691d", "black")
Plot
library(ggplot2)
library(ggstance)
ggplot(dat, aes(x=boxOdds, y=yAxis, color=as.factor(group))) +
geom_vline(aes(xintercept=1), size=0.75, linetype='dashed') +
geom_pointrangeh(aes(xmax=boxCIHigh, xmin=boxCILow), size=.5,
show.legend=FALSE) +
geom_point(size=3.5, show.legend=FALSE) +
theme_bw() +
scale_color_manual(values=colors)+
theme(panel.grid.minor=element_blank()) +
scale_x_continuous(breaks=seq(0,7,1), limits=c(0, max(dat[2:4]))) +
ylab('') +
xlab('Odds Ratio') +
annotate(geom='text', y =1.1, x=3.5, label ='',
size=3.5, hjust=0) + ggtitle('Estimated Odds of Switching') +
theme(plot.title=element_text(hjust=.5, size=20)) +
theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank())
Gives
Data
dat <- structure(list(yAxis = 10:1, boxOdds = c(2.23189, 1.315737, 1.22866,
0.8197413, 0.9802449, 0.9786673, 0.6559005, 0.5929812, 0.6923759,
1.3958275), boxCILow = c(0.7543566, 1.016, 0.9674772, 0.6463458,
0.9643047, 0.864922, 0.4965308, 0.3572142, 0.4523759, 1.2023275
), boxCIHigh = c(6.603418, 1.703902, 1.560353, 1.039654, 0.9964486,
1.107371, 0.8664225, 0.9843584, 0.9323759, 1.5893275)), class = "data.frame", row.names = c(NA,
-10L))

How can I plot 2 related variables on the same axis using ggplot? [duplicate]

Edit: This question has been marked as duplicated, but the responses here have been tried and did not work because the case in question is a line chart, not a bar chart. Applying those methods produces a chart with 5 lines, 1 for each year - not useful. Did anyone who voted to mark as duplicate actually try those approaches on the sample dataset supplied with this question? If so please post as an answer.
Original Question:
There's a feature in Excel pivot charts which allows multilevel categorical axes.I'm trying to find a way to do the same thing with ggplot (or any other plotting package in R).
Consider the following dataset:
set.seed(1)
df=data.frame(year=rep(2009:2013,each=4),
quarter=rep(c("Q1","Q2","Q3","Q4"),5),
sales=40:59+rnorm(20,sd=5))
If this is imported to an Excel pivot table, it is straightforward to create the following chart:
Note how the x-axis has two levels, one for quarter and one for the grouping variable, year. Are multilevel axes possible with ggplot?
NB: There is a hack with facets that produces something similar, but this is not what I'm looking for.
library(ggplot2)
ggplot(df) +
geom_line(aes(x=quarter,y=sales,group=year))+
facet_grid(.~year,scales="free")
New labels are added using annotate(geom = "text",. Turn off clipping of x axis labels with clip = "off" in coord_cartesian.
Use theme to add extra margins (plot.margin) and remove (element_blank()) x axis text (axis.title.x, axis.text.x) and vertical grid lines (panel.grid.x).
library(ggplot2)
ggplot(data = df, aes(x = interaction(year, quarter, lex.order = TRUE),
y = sales, group = 1)) +
geom_line(colour = "blue") +
annotate(geom = "text", x = seq_len(nrow(df)), y = 34, label = df$quarter, size = 4) +
annotate(geom = "text", x = 2.5 + 4 * (0:4), y = 32, label = unique(df$year), size = 6) +
coord_cartesian(ylim = c(35, 65), expand = FALSE, clip = "off") +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 4, 1), "lines"),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
See also the nice answer by #eipi10 here: Axis labels on two lines with nested x variables (year below months)
The suggested code by Henrik does work and helped me a lot! I think the solution has a high value. But please be aware, that there is a small misstake in the first line of the code, which results in a wrong order of the data.
Instead of
... aes(x = interaction(year,quarter), ...
it should be
... aes(x = interaction(quarter,year), ...
The resulting graphic has the data in the right order.
P.S. I suggested an edit (which was rejected until now) and, due to a small lack of reputation, I am not allowed to comment, what I rather would have done.
User Tung had a great answer on this thread
library(tidyverse)
library(lubridate)
library(scales)
set.seed(123)
df <- tibble(
date = as.Date(41000:42000, origin = "1899-12-30"),
value = c(rnorm(500, 5), rnorm(501, 10))
)
# create year column for facet
df <- df %>%
mutate(year = as.factor(year(date)))
p <- ggplot(df, aes(date, value)) +
geom_line() +
geom_vline(xintercept = as.numeric(df$date[yday(df$date) == 1]), color = "grey60") +
scale_x_date(date_labels = "%b",
breaks = pretty_breaks(),
expand = c(0, 0)) +
# switch the facet strip label to the bottom
facet_grid(.~ year, space = 'free_x', scales = 'free_x', switch = 'x') +
labs(x = "") +
theme_classic(base_size = 14, base_family = 'mono') +
theme(panel.grid.minor.x = element_blank()) +
# remove facet spacing on x-direction
theme(panel.spacing.x = unit(0,"line")) +
# switch the facet strip label to outside
# remove background color
theme(strip.placement = 'outside',
strip.background.x = element_blank())
p

ggplot2 geom_mosaic with weight variable

I am trying to make a mosaic plot with ggplot2. I am using the bladdercancer data from the HSAUR3 package. I am looking to show the relationship between tumorsize and number, but I am not sure how to weight it. I know that the number in the sample with tumorsizes<=3cm is not the same as those with tumorsize>3cm. How do I incorporate that into my mosaic plot?
Here is what I did without weighting it.
library("ggplot2")
library("ggmosaic")
ggplot(data = bladdercancer, family=poisson()) +
geom_mosaic(aes(weight= 1 , x = product(tumorsize, number),
fill=factor(tumorsize)), na.rm=TRUE) +
labs(x="Number of tumors", title='Number of tumors vs Tumorsize') +
guides(fill=guide_legend(title = "Tumor Size"))
This may be late nevertheless I provide my suggestions since I am trying to do a similar thing. Below are two suggestions:
library(tidyverse)
first option:
bladdercancer %>%
group_by(tumorsize, number) %>%
# get frequencies/counts for each tumor size and for each number
summarise(n.cases = n()) %>%
ggplot() +
geom_mosaic(aes(weight = n.cases, x = product(number),
fill = factor(n.cases)), offset = 0) +
guides(fill=guide_legend(title = "Tumor Size")) +
labs(x="Number of tumors", title='Number of tumors vs Tumorsize') +
# remove background colour
theme_bw() +
theme(panel.grid.major = element_blank(),
# remove major and minor grids
panel.grid.minor = element_blank(),
# push title to the middle
plot.title = element_text(size = 10, hjust = .5))
where the categories within each column represent different counts for each tumor size e.g number 1 appears 15 times for <=3cm and 5 times for >3cm. I am however not able to partitions where the frequencies are the same, in this case number 3 and 4. Hence my option 2
second option:
ggplot(bladdercancer) +
geom_bar(aes(x = number, fill = tumorsize), position = "dodge")

Resources