I am a novice R user and am trying to create a plot using the likert function from the HH package. My problem seems to come from from repeating category labels. It is easier to show the issue:
library(HH)
responses <- data.frame( Subtable= c(rep('Var1',5),rep('Var2',4),rep('Var3',3)),
Question=c('very low','low','average','high','very high', '<12', '12-14', '15+',
'missing', '<25','25+','missing'), Res1=as.numeric(c(0.05, 0.19, 0.38, 0.24, .07,
0.09, 0.73, 0.17, 0.02, 0.78, 0.20, 0.02)), Res2=as.numeric(c(0.19, 0.04, 0.39,
0.22, 0.06, 0.09, 0.50, 0.16, 0.02, 0.75, 0.46, 0.20)))
likert(Question ~ . | Subtable, responses,
scales=list(y=list(relation="free")), layout=c(1,3),
positive.order=TRUE,
between=list(y=0),
strip=FALSE, strip.left=strip.custom(bg="gray97"),
par.strip.text=list(cex=.6, lines=3),
main="Description of Sample",rightAxis=FALSE,
ylab=NULL, xlab='Percent')
Unfortunately it creates strange spaces that aren't really there, as exhibited in the bottom panel of the following plot:
This seems to come from the repeated category 'missing'. My actual data has several repeats (e.g., 'no', 'other') and whenever they are included I get these extra spaces. If I run the same code but remove the repeated categories then it runs properly. In this case that means changing 'responses' in the code above to responses[! responses$Question %in% 'missing',].
Can someone tell me how to create the graph using all the categories, without getting the 'extra' spaces? Thanks for your help and patience.
-Z
R 3.0.2
HH 3.0-3
lattice 0.20-24
latticeExtra 0.6-26
Here is a solution using ggplot2 to create the graphic
library(ggplot2)
responses <-
data.frame(Subtable = c(rep('Var1',5), rep('Var2',4), rep('Var3',3)),
Question = c('very low','low','average','high','very high',
'<12', '12-14', '15+', 'missing', '<25','25+',
'missing'),
Res1 = as.numeric(c(0.05, 0.19, 0.38, 0.24, .07, 0.09, 0.73,
0.17, 0.02, 0.78, 0.20, 0.02)),
Res2 = as.numeric(c(0.19, 0.04, 0.39, 0.22, 0.06, 0.09, 0.50,
0.16, 0.02, 0.75, 0.46, 0.20)),
stringsAsFactors = FALSE)
responses$Subtable <- factor(responses$Subtable, levels = paste0("Var", 1:3))
responses$Question <-
factor(responses$Question,
levels = c("missing", "25+","<25", "<12", "12-14", "15+",
"very low", "low", "average", "high", "very high"))
ggplot(responses) +
theme_bw() +
aes(x = 0, y = Question) +
geom_errorbarh(aes(xmax = 0, xmin = Res1, color = "red")) +
geom_errorbarh(aes(xmin = 0, xmax = -Res2, color = "blue")) +
facet_wrap( ~ Subtable, ncol = 1, scale = "free_y") +
scale_color_manual(name = "",
values = c("red", "blue"),
labels = c("Res1", "Res2")) +
scale_x_continuous(breaks = c(-0.5, 0, 0.5),
labels = c("0.5", "0", "0.5")) +
ylab("") + xlab("Percent") +
theme(legend.position = "bottom")
Related
This is the data frame and ggplot code I am using:
ROS<- c(0.03, 0.03, 0.03, 0.03, 0.07, 0.07, 0.07, 0.07, 0.07, 0.1, 0.1, 0.1)
wind<- c(0.84, 1.77, 3.5, 6.44, 0.84, 1.77, 3.5, 6.44, 7.55, 0.84, 1.77, 3.5)
rey <- c(31500,66375,131250,241500,31500,66375,131250,241500,283125,31500,66375,131250)
wind250_1 <- c(69.4,69.4,1,1,31.08,37.07,1,1,1,22.8,19.45,1)
lee250_1 <- c(79.84,125.56,93.34,94.42,33.78,49.6,38.95,40.9,39.32,24.2,32.95,27.46)
df<- data.frame(ROS,wind,rey,wind250_1,lee250_1)
ggplot() +
stat_ma_line(df, mapping=aes(rey, lee250_1), method="RMA",
range.y = "interval", range.x = "interval",
linewidth = 1,fill = "yellow") +
geom_point(df, mapping = aes(x = rey, lee250_1, colour=factor(ROS)),
size=3)+
xlab("Re") + ylab((expression(paste(tau~"windward"))))+
scale_x_continuous(trans='log10', label = scientific_10) +
scale_y_continuous(trans='log10') +
scale_color_manual(values = c("#0072B2", "#000000","#E7B800","#CC79A7")) +
labs(colour = "ROS (m/s)") +
theme_bw()
When I plot using the variable "y = wind250_1", the code work with no problem. But when I try to use the variable "y = lee250_1" it gives the "Error: Discrete value supplied to continuous scale". The variable is numeric (checked the class) and here are a few things I tried it didn't work: use y= as.numeric(lee250_1) in ggplot code, change the name of the variables, run ggplot code without the lines scale_x_continuous(), scale_y_continuous(), and scale_color_manual().
The error I am getting is probably related to the stat_ma_line() because I tried to plot using geom_line() and it did work but I need to use stat_ma_line. So any help on how to solve this error is very much appreciated!!
You probably have too less points per group (probably you need more than 7 points per group), that's why you get an error. I added some fake data and now it works:
ROS<- c(0.03, 0.03, 0.03, 0.03, 0.07, 0.07, 0.07, 0.07, 0.07, 0.03, 0.03, 0.03, 0.03, 0.07, 0.07, 0.07, 0.07, 0.07)
rey <- c(31500,66375,131250,241500,31500,66375,131250,241500,131250, 31600,66475,131350,241600,31300,66575,132250,242500,283425)
lee250_1 <- c(79.84,125.56,93.34,94.42,33.78,49.6,24.2,32.95, 79.94,122.54,92.34,91.42,32.78,43.6,31.95,44.9,32.32,22.2)
library(ggplot2)
library(ggpmisc)
df<- data.frame(ROS,rey,lee250_1)
ggplot(df, aes(rey, lee250_1)) +
geom_point(aes(colour = factor(ROS))) +
stat_ma_line(method = "RMA",
range.y = "interval", range.x = "interval", fill = 'yellow', linewidth = 1) +
xlab("Re") + ylab((expression(paste(tau~"windward"))))+
scale_x_continuous(trans='log10') +
scale_y_continuous(trans='log10') +
scale_color_manual(values = c("#0072B2", "#000000","#E7B800","#CC79A7")) +
labs(colour = "ROS (m/s)") +
theme_bw()
Created on 2023-01-26 with reprex v2.0.2
It looks like that the model fit returns NAs for RMA and ggplot would get problems with that. Please see the answer in more details here https://github.com/aphalo/ggpmisc/issues/36
I want to include the mean inside the boxplot but apparently, the mean is not located at the position where it is supposed to be. If I calculate the mean from the data it is 16.2, which would equal 1.2 at the log scale. I tried various things, e.g., changing the position of the stat_summary function before or after the transformation but this does not work.
Help is much appreciated!
Yours,
Kristof
Code:
Data:
df <- c(2e-05, 0.38, 0.63, 0.98, 0.04, 0.1, 0.16, 0.83, 0.17, 0.09, 0.48, 4.36, 0.83, 0.2, 0.32, 0.44, 0.22, 0.23, 0.89, 0.23, 1.1, 0.62, 5, 340, 47) %>% as.tibble()
Output:
df %>%
ggplot(aes(x = 0, y = value)) +
geom_boxplot(width = .12, outlier.color = NA) +
stat_summary(fun=mean, geom="point", shape=21, size=3, color="black", fill="grey") +
labs(
x = "",
y = "Particle counts (P/kg)"
) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x)))
The mean calculated by stat_summary is the mean of log10(value), not of value. Below I propose to define a new function my_mean for a correct calculation of the average value.
library(ggplot2)
library(dplyr)
library(tibble)
library(scales)
df <- c(2e-05, 0.38, 0.63, 0.98, 0.04, 0.1, 0.16,
0.83, 0.17, 0.09, 0.48, 4.36, 0.83, 0.2, 0.32, 0.44,
0.22, 0.23, 0.89, 0.23, 1.1, 0.62, 5, 340, 47) %>% as.tibble()
# Define the mean function
my_mean <- function(x) {
log10(mean(10^x))
}
df %>%
ggplot(aes(x = 0, y = value)) +
geom_boxplot(width = .12, outlier.color = NA) +
stat_summary(fun=my_mean, geom="point", shape=21, size=3, color="black", fill="grey") +
labs(
x = "",
y = "Particle counts (P/kg)"
) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
I am trying to create a scatterplot using ggplot. Is there a way to stop my text labels from overlapping the trend line?
I was only able to stop overlapping the text labels from each other.
rownames = c("dummy", "dummy", "dummy", "dummy", "dummy", "dummy","dummy", "dummy", "dummy", "dummy")
corr_truth = c(-0.39, -0.13, 0.28, -0.49, -0.14, 0.52, 0.43, 0.22, -0.29, -0.02)
corr_pred= c(-0.41, 0.01, 0.36, -0.38, -0.28, 0.44, 0.26, 0.24, -0.38, -0.23)
corr_complete = data.frame(rownames, corr_truth,corr_pred)
plot_corr_complete = ggplot(data = corr_complete, aes(corr_truth, corr_pred)) + geom_point() +
xlim(-0.5,0.7) +
ylim(-0.5,0.7) +
geom_text(label = corr_complete$rownames, nudge_x = 0.08, nudge_y = 0.005, check_overlap = T) +
geom_smooth(method = "lm", se = FALSE, color = "black")
plot_corr_complete
An example using ggrepel. I needed to add some padding to the solution, so the labels did not overlap the trend line.
library(tidyverse);library(ggrepel)
rownames = c("dummy", "dummy", "dummy", "dummy", "dummy", "dummy","dummy", "dummy", "dummy", "dummy")
corr_truth = c(-0.39, -0.13, 0.28, -0.49, -0.14, 0.52, 0.43, 0.22, -0.29, -0.02)
corr_pred= c(-0.41, 0.01, 0.36, -0.38, -0.28, 0.44, 0.26, 0.24, -0.38, -0.23)
corr_complete = data.frame(rownames, corr_truth,corr_pred)
plot_corr_complete = ggplot(data = corr_complete, aes(corr_truth, corr_pred)) + geom_point() +
xlim(-0.5,0.7) +
ylim(-0.5,0.7) +
geom_text_repel(label = corr_complete$rownames,point.padding = 0.2,
nudge_y = 0.005, nudge_x = 0.02) +
geom_smooth(method = "lm", se = FALSE, color = "black")
plot_corr_complete
ggrepel package provides functions to avoid texts from overlapping.
Once youve installed the package, load it before running the following code
Revised code worked from my machine:
rownames = c("dummy", "dummy", "dummy", "dummy", "dummy", "dummy","dummy", "dummy", "dummy", "dummy")
corr_truth = c(-0.39, -0.13, 0.28, -0.49, -0.14, 0.52, 0.43, 0.22, -0.29, -0.02)
corr_pred= c(-0.41, 0.01, 0.36, -0.38, -0.28, 0.44, 0.26, 0.24, -0.38, -0.23)
corr_complete = data.frame(rownames, corr_truth,corr_pred)
plot_corr_complete = ggplot(data = corr_complete, aes(corr_truth, corr_pred, label = rownames)) + geom_point() +
xlim(-0.5,0.7) +
ylim(-0.5,0.7) +
geom_text_repel() +
geom_smooth(method = "lm", se = FALSE, color = "black")
plot_corr_complete
Hope this helps
Fairly specific question here, but it may help others who are having similar issues.
I have some simple data:
Y = c(0.02, 0.03, 0.03, 0.04, 0.05, 0.06, 0.08, 0.09, 0.10, 0.13, 0.17, 0.17, 0.21, 0.22,
0.35, 0.47, 0.51, 0.53, 0.54, 0.65, 0.78)
X = c(0.45, 0.26, 0.35, 0.22, 0.37, 0.09, 0.27, 0.51, 0.39, 0.37, 0.37, 0.27, 0.51, 0.36,
0.44, 0.49, 0.63, 0.49, 0.71, 0.56, 0.67)
self1 = data.frame(X, Y)
I also have a simple custom ggplot theme:
plot.theme = theme(axis.text = element_text(size=26), axis.title=element_text(size=28),
plot.title=element_text(size=36, margin=margin(0,0,20,0)), panel.grid.minor
= element_blank(), plot.margin=unit(c(0.1,0.25,0.5,0.85), "cm"), axis.title.y =
element_text(margin=margin(0,15,0,0)), panel.border = element_rect(color="black", fill=NA,
size=2), axis.ticks = element_blank(), legend.title = element_text(size=26), legend.text =
element_text(size=18))
When I plot a scatterplot of the data with marginal histograms:
bing = ggplot(self1, aes(x=X, y=Y)) + geom_point(size=3) +
geom_smooth(method = "lm", se=F, color="black") +
plot.theme +
ylab("Observed selfing rate") +
xlab("Observed crossing rate") +
geom_vline(xintercept = 0.42, linetype="longdash") +
geom_hline(yintercept = 0.25, linetype="longdash")
ggExtra::ggMarginal(bing, type = "histogram", bins=6, size=10)
Everything looks great, except that the "g" in "Observed crossing rate" is getting cut off at the bottom of the graph. I have tried fidgeting with every theme parameter I can think of, and I've also tried adjusting several of the arguments to ggMarginal, but I have yet to find the one I need to change to get everything to stay inside the plot area. Can anyone help me out? I suspect the issue ultimately lies with the way ggMarginal is auto-adjusting the sizes of various theme parameters, but that's just a hunch.
if g is your plot, you can do g$vp = grid::viewport(height=0.9, width=0.9) before drawing it (print or grid.draw)
I haven't found a way to change the plot margins on the object returned by ggMarginal. So, until someone comes along with a better solution, you can modify the code in the ggMarginal function itself. Here's how:
Type ggMarginal in the console. This will print the code of ggMarginal. Paste this code into a script window. Give this function a new name, like my_ggMarginal = [all the ggMarginal code you just pasted in].
Find the following line inside this function:
p <- p + ggplot2::theme(plot.margin = grid::unit(c(0, 0,
0, 0), "null"))
and change it to this:
p <- p + ggplot2::theme(plot.margin = grid::unit(c(0, 0,
1, 0), "lines"))
Run the code for the new function you just created so that my_ggMarginal will be available in your current workspace.
Run your new function on bing:
my_ggMarginal(bing, type = "histogram", bins=6, size=10)
I have been trying to minimize my use of Excel in favor of R, but am still stuck when it comes to display simple data cells as is often needed as the last step of an analysis. The following example is one I would like to crack, as it would help me switch to R for this critical part of my workflow.
I would like to illustrate the following correlation matrix in R :
matrix_values <- c(
NA,1.54,1.63,1.15,0.75,0.78,1.04,1.2,0.94,0.89,
17.95,1.54,NA,1.92,1.03,0.78,0.89,0.97,0.86,1.27,
0.95,25.26,1.63,1.92,NA,0.75,0.64,0.61,0.9,0.88,
1.18,0.74,15.01,1.15,1.03,0.75,NA,1.09,1.03,0.93,
0.93,0.92,0.86,23.84,0.75,0.78,0.64,1.09,NA,1.2,
1.01,0.85,0.9,0.88,30.4,0.78,0.89,0.61,1.03,1.2,
NA,1.17,0.86,0.95,1.02,17.64,1.04,0.97,0.9,0.93,
1.01,1.17,NA,0.94,1.09,0.93,17.22,1.2,0.86,0.88,
0.93,0.85,0.86,0.94,NA,0.95,0.96,24.01,0.94,1.27,
1.18,0.92,0.9,0.95,1.09,0.95,NA,1.25,21.19,0.89,
0.95,0.74,0.86,0.88,1.02,0.93,0.96,1.25,NA,18.14)
cor_matrix <- matrix(matrix_values, ncol = 10, nrow = 11)
item_names <- c('Item1','Item2','Item3','Item4','Item5',
'Item6','Item7','Item8','Item9','Item10')
colnames(cor_matrix) <- item_names
rownames(cor_matrix) <- c(item_names, "Size")
The cells should be colored based on their rank (e.g. >95 percentile is completely green, <5 percentile is completely red). The last row should be illustrated by a horizontal bar (representing the fraction of the maximum value).
I have made in Excel the output that I would like to have:
Ideally, I would also like to highlight correlation groups (either manually or by script), like in the following illustration:
Your correlation matrix has several values greater than 1, which is not possible. But anyhow...
Try this one
library(reshape2)
dat <- melt(cor_matrix[-11, ])
library(ggplot2)
p <- ggplot(data = dat, aes(x = Var1, y = Var2)) +
geom_tile(aes(fill = value), colour = "white") +
geom_text(aes(label = sprintf("%1.2f",value)), vjust = 1) +
scale_fill_gradient(low = "white", high = "steelblue")
print(p)
Myaseen208 has a good start on the answer. I thought I'd fill in a few more pieces: getting color gradient in the red/green you specified, flipping the order of the y-axis, and cleaning up a few other points (gray background and legend).
library("reshape2")
library("ggplot2")
cor_dat <- melt(cor_matrix[-11,])
cor_dat$Var1 <- factor(cor_dat$Var1, levels=item_names)
cor_dat$Var2 <- factor(cor_dat$Var2, levels=rev(item_names))
cor_dat$pctile <- rank(cor_dat$value, na.last="keep")/sum(!is.na(cor_dat$value))
ggplot(data = cor_dat, aes(x = Var1, y = Var2)) +
geom_tile(aes(fill = pctile), colour = "white") +
geom_text(aes(label = sprintf("%1.1f",value)), vjust = 1) +
scale_fill_gradientn(colours=c("red","red","white","green","green"),
values=c(0,0.05,0.5,0.95,1),
guide = "none", na.value = "white") +
coord_equal() +
opts(axis.title.x = theme_blank(),
axis.title.y = theme_blank(),
panel.background = theme_blank())
EDIT:
Now attempting to get the blue size bars at the bottom.
What makes the size bars harder is that they are a completely different representation of different data than the correlation matrix. So I will first try and make just that part separate and then work on putting them together.
Like with the cor data, first the size data is extracted from the matrix and then turned into a data.frame that has the useful values, including the fraction of the total.
size_dat <- melt(cor_matrix[11,,drop=FALSE])
size_dat$Var2 <- factor(size_dat$Var2, levels=item_names)
size_dat$frac <- size_dat$value / max(size_dat$value)
ggplot(data=size_dat, aes(x=Var2, y=Var1)) +
geom_blank() +
geom_rect(aes(xmin = as.numeric(Var2) - 0.5,
xmax = as.numeric(Var2) - 0.5 + frac),
ymin = -Inf, ymax = Inf, fill="blue", color="white") +
coord_equal() +
opts(axis.title.x = theme_blank(),
axis.title.y = theme_blank(),
panel.background = theme_blank())
The geom_rect call uses some tricks such as using the numeric representation of the categorical (discrete) variable to position things carefully. Each "item" goes from 0.5 below it to 0.5 above it. So the left edge of the rectangle is 0.5 below the item value, and the right edge is frac to the right of that. Using Inf and -Inf for the y limits means go to the extreme of the plot. This gives
Now to try and put them together. The x scale is common, and the y scales can be made common (though disjoint). Playing with levels and orders is necessary. Also, I flipped x and y in the original (which is fine since it is symmetric). Since the data sets are extracted and formatted a little differently, I've renamed them.
cor_dat2 <- melt(cor_matrix[-(nrow(cor_matrix),])
cor_dat2$Var1 <- factor(cor_dat$Var1, levels=rev(c(item_names, "Size")))
cor_dat2$Var2 <- factor(cor_dat$Var2, levels=item_names)
cor_dat2$pctile <- rank(cor_dat$value, na.last="keep")/sum(!is.na(cor_dat$value))
size_dat2 <- melt(cor_matrix["Size",,drop=FALSE])
size_dat2$Var1 <- factor(size_dat$Var1, levels=rev(c(item_names, "Size")))
size_dat2$Var2 <- factor(size_dat$Var2, levels=item_names)
size_dat2$frac <- size_dat$value / max(size_dat$value)
ggplot(data = cor_dat2, aes(x = Var2, y = Var1)) +
geom_tile(aes(fill = pctile), colour = "white") +
geom_text(aes(label = sprintf("%1.1f",value))) +
geom_rect(data=size_dat2,
aes(xmin = as.numeric(Var2) - 0.5,
xmax = as.numeric(Var2) - 0.5 + frac,
ymin = as.numeric(Var1) - 0.5,
ymax = as.numeric(Var1) + 0.5),
fill="lightblue", color="white") +
geom_text(data=size_dat2,
aes(x=Var2, y=Var1, label=sprintf("%.0f", value))) +
scale_fill_gradientn(colours=c("red","red","white","green","green"),
values=c(0,0.05,0.5,0.95,1),
guide = "none", na.value = "white") +
scale_y_discrete(drop = FALSE) +
coord_equal() +
opts(axis.title.x = theme_blank(),
axis.title.y = theme_blank(),
panel.background = theme_blank())
This final version does not assume that it is a 10x10 correlation with an additional row. It can be any number. cor_matrix must have the right names (and "Size" has to be the last row) and item_names must contain the list of items. But it doesn't have to be 10.
Here is an approach using base graphics:
par(mar=c(1, 5, 5, 1))
plot.new()
plot.window(xlim=c(0, 10), ylim=c(0, 11))
quant_vals <- findInterval(cor_matrix[-11, ],
c(-Inf, quantile(cor_matrix[-11, ],
c(0.05, 0.25, 0.45, 0.55, 0.75, 0.95),
na.rm=TRUE),
Inf))
quant_vals[is.na(quant_vals)] <- 4
cols <- c('#ff0000', '#ff6666', '#ffaaaa', '#ffffff', '#aaffaa',
'#66ff66', '#00ff00')
colmat <- matrix(cols[quant_vals], ncol=10, nrow=10)
rasterImage(colmat, 0, 1, 10, 11, interpolate=FALSE)
for (i in seq_along(cor_matrix[11, ])) {
rect(i - 1, 0.1, i - 1 + cor_matrix[11, i]/max(cor_matrix[11, ]), 0.9,
col='lightsteelblue3')
}
text(col(cor_matrix) - 0.5, 11.5 - row(cor_matrix), cor_matrix, font=2)
rect(0, 1, 10, 11)
rect(0, 0, 10, 1)
axis(2, at=(11:1) - 0.5, labels=rownames(cor_matrix), tick=FALSE, las=2)
axis(3, at=(1:10) - 0.5, labels=colnames(cor_matrix), tick=FALSE, las=2)
rect(0, 8, 3, 11, lwd=2)
rect(4, 4, 7, 7, lwd=2)
rect(8, 1, 10, 3, lwd=2)
Data
cor_matrix <- structure(c(NA, 1.54, 1.63, 1.15, 0.75, 0.78, 1.04, 1.2, 0.94,
0.89, 17.95, 1.54, NA, 1.92, 1.03, 0.78, 0.89, 0.97, 0.86, 1.27,
0.95, 25.26, 1.63, 1.92, NA, 0.75, 0.64, 0.61, 0.9, 0.88, 1.18,
0.74, 15.01, 1.15, 1.03, 0.75, NA, 1.09, 1.03, 0.93, 0.93, 0.92,
0.86, 23.84, 0.75, 0.78, 0.64, 1.09, NA, 1.2, 1.01, 0.85, 0.9,
0.88, 30.4, 0.78, 0.89, 0.61, 1.03, 1.2, NA, 1.17, 0.86, 0.95,
1.02, 17.64, 1.04, 0.97, 0.9, 0.93, 1.01, 1.17, NA, 0.94, 1.09,
0.93, 17.22, 1.2, 0.86, 0.88, 0.93, 0.85, 0.86, 0.94, NA, 0.95,
0.96, 24.01, 0.94, 1.27, 1.18, 0.92, 0.9, 0.95, 1.09, 0.95, NA,
1.25, 21.19, 0.89, 0.95, 0.74, 0.86, 0.88, 1.02, 0.93, 0.96,
1.25, NA, 18.14), .Dim = 11:10)