For loop plot() vs ggplot() - ggplot() not working? - r

I am using Iris dataset in R.
I have the following code
for (i in colnames(iris[-5])) {
plot(iris$Species, iris[[i]],
xlab = 'Species',
ylab = `i`)
}
This prints out 4 boxplot.
I want to do the same in ggplot with the following code
for (i in colnames(iris[-5])) {
print(iris %>%
ggplot(aes(x = Species)) +
geom_col(aes(y = i)))
}
When I do this in ggplot, the boxplots look messed up. Is it just my R or am I missing something?

In this case, you can just change aes(y=i) to aes_string(y=i), and of course, use geom_boxplot instead of geom_col

Related

For loops for ggplot2 not returning y axis labels

I am trying to plot multiple y values vs a single x value. I want the graphs to be in seperate figures. The following code is working to generate the graphs, but the y axes are labeled "taxonomy_metadata_combined_p21[,i]" and I want them to have the same labels as the column titles. I have tried multiple different things but how can I change the y axes label?
for(i in 2:ncol(taxonomy_metadata_combined_p21)) {
print(ggplot(taxonomy_metadata_combined_p21, aes(x = Txt_Sex, y = taxonomy_metadata_combined_p21[ , i])) +
geom_boxplot())+
ylab(colnames(taxonomy_metadata_combined_p21)[i])
}
Your ylab() was added after the print() function. If you print after adding ylab() it should work.
As a sidenote; it is not recommended to use y = taxonomy_metadata_combined_p21[ , i] as an aesthetic. The ggplot2 authors instead recommend to use the .data pronoun if the column name is known.
Reprex with built-in data:
library(ggplot2)
df <- rev(iris)
for (i in 2:ncol(df)) {
p <- ggplot(df, aes(x = Species, y = .data[[colnames(df)[i]]])) +
geom_boxplot() +
ylab(colnames(df)[i])
print(p)
}

How to change the margins to be different for each graph when plotting multiple graphs in ggplot2?

I am trying to plot one variable against all other variables in a data set and view each graph at the same time. The code I am using to do this is:
theme_set(
theme_bw() +
theme(legend.position = "top")
)
healthTrain.gathered <- healthTrain %>%
as_tibble() %>%
gather(key = "variable", value = "value",
-CHD, -Population2010)
ggplot(healthTrain.gathered, aes(x = value, y = CHD)) +
geom_point() +
facet_wrap(~variable)
This code works great, except not all of the variables have the same range of x values, but each graph uses the margins of the variable with the largest range of x values. Is there way to make each graph use the margins that are best fit for itself?
Example of what I am looking for:
plot(heatlh$CHD, health$BPHIGH)
plot(health$CHD, health$COPD)
plot(health$CHD, health$STROKE)
Except I want to be able to see all of the graphs at the same time.

r facet_wrap not grouping properly with geom_point

I'm struggling with facet_wrap in R. It should be simple however the facet variable is not being picked up? Here is what I'm running:
plot = ggplot(data = item.household.descr.count, mapping = aes(x=item.household.descr.count$freq, y = item.household.descr.count$descr, color = item.household.descr.count$age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
I colored the faceting variable to try to help illustrate what is going on. The plot should have only one color in each facet instead of what you see here. Does anyone know what is going on?
This error is caused by fact that you are using $and data frame name to refer to your variables inside the aes(). Using ggplot() you should only use variables names in aes() as data frame is named already in data=.
plot = ggplot(data = item.household.descr.count,
mapping = aes(x=freq, y = descr, color = age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
Here is an example using diamonds dataset.
diamonds2<-diamonds[sample(nrow(diamonds),1000),]
ggplot(diamonds2,aes(diamonds2$carat,diamonds2$price,color=diamonds2$color))+geom_point()+
facet_wrap(~color)
ggplot(diamonds2,aes(carat,price,color=color))+geom_point()+
facet_wrap(~color)

How to obtain y-axis-labels in ggplot2? [duplicate]

I have created a function for creating a barchart using ggplot.
In my figure I want to overlay the plot with white horizontal bars at the position of the tick marks like in the plot below
p <- ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_bar(stat = 'identity')
# By inspection I found the y-tick postions to be c(50,100,150)
p + geom_hline(aes(yintercept = seq(50,150,50)), colour = 'white')
However, I would like to be able to change the data, so I can't use static positions for the lines like in the example. For example I might change Sepal.With to Sepal.Height in the example above.
Can you tell me how to:
get the tick positions from my ggplot; or
get the function that ggplot uses for tick positions so that I can use this to position my lines.
so I can do something like
tickpositions <- ggplot_tickpostion_fun(iris$Sepal.Width)
p + scale_y_continuous(breaks = tickpositions) +
geom_hline(aes(yintercept = tickpositions), colour = 'white')
A possible solution for (1) is to use ggplot_build to grab the content of the plot object. ggplot_build results in "[...] a panel object, which contain all information about [...] breaks".
ggplot_build(p)$layout$panel_ranges[[1]]$y.major_source
# [1] 0 50 100 150
See edit for pre-ggplot2 2.2.0 alternative.
Check out ggplot2::ggplot_build - it can show you lots of details about the plot object. You have to give it a plot object as input. I usually like to str() the result of ggplot_build to see what all the different values it has are.
For example, I see that there is a panel --> ranges --> y.major_source vector that seems to be what you're looking for. So to complete your example:
p <- ggplot() +
geom_bar(data = iris, aes(x = Species, y = Sepal.Width), stat = 'identity')
pb <- ggplot_build(p)
str(p)
y.ticks <- pb$panel$ranges[[1]]$y.major_source
p + geom_hline(aes(yintercept = y.ticks), colour = 'white')
Note that I moved the data argument from the main ggplot function to inside geom_bar, so that geom_line would not try to use the same dataset and throw errors when the number in iris is not a multiple of the number of lines we're drawing. Another option would be to pass a data = data.frame() argument to geom_line; I cannot comment on which one is a more correct solution, or if there's a nicer solution altogether. But the gist of my code still holds :)
For ggplot 3.1.0 this worked for me:
ggplot_build(p)$layout$panel_params[[1]]$y.major_source
#[1] 0 50 100 150
for sure you can. Read the help file for the seq() function.
seq(from = min(), to = max(), len = 5)
and do something like this.
p <- ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_bar(stat = 'identity')
p + geom_hline(aes(yintercept = seq(from = min(), to = max(), len = 5)), colour = 'white')

How do I create a categorical scatterplot in R like boxplots?

Does anyone know how to create a scatterplot in R to create plots like these in PRISM's graphpad:
I tried using boxplots but they don't display the data the way I want it. These column scatterplots that graphpad can generate show the data better for me.
Any suggestions would be appreciated.
As #smillig mentioned, you can achieve this using ggplot2. The code below reproduces the plot that you are after pretty well - warning it is quite tricky. First load the ggplot2 package and generate some data:
library(ggplot2)
dd = data.frame(values=runif(21), type = c("Control", "Treated", "Treated + A"))
Next change the default theme:
theme_set(theme_bw())
Now we build the plot.
Construct a base object - nothing is plotted:
g = ggplot(dd, aes(type, values))
Add on the points: adjust the default jitter and change glyph according to type:
g = g + geom_jitter(aes(pch=type), position=position_jitter(width=0.1))
Add on the "box": calculate where the box ends. In this case, I've chosen the average value. If you don't want the box, just omit this step.
g = g + stat_summary(fun.y = function(i) mean(i),
geom="bar", fill="white", colour="black")
Add on some error bars: calculate the upper/lower bounds and adjust the bar width:
g = g + stat_summary(
fun.ymax=function(i) mean(i) + qt(0.975, length(i))*sd(i)/length(i),
fun.ymin=function(i) mean(i) - qt(0.975, length(i)) *sd(i)/length(i),
geom="errorbar", width=0.2)
Display the plot
g
In my R code above I used stat_summary to calculate the values needed on the fly. You could also create separate data frames and use geom_errorbar and geom_bar.
To use base R, have a look at my answer to this question.
If you don't mind using the ggplot2 package, there's an easy way to make similar graphics with geom_boxplot and geom_jitter. Using the mtcars example data:
library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + geom_jitter() + theme_bw()
which produces the following graphic:
The documentation can be seen here: http://had.co.nz/ggplot2/geom_boxplot.html
I recently faced the same problem and found my own solution, using ggplot2.
As an example, I created a subset of the chickwts dataset.
library(ggplot2)
library(dplyr)
data(chickwts)
Dataset <- chickwts %>%
filter(feed == "sunflower" | feed == "soybean")
Since in geom_dotplot() is not possible to change the dots to symbols, I used the geom_jitter() as follow:
Dataset %>%
ggplot(aes(feed, weight, fill = feed)) +
geom_jitter(aes(shape = feed, col = feed), size = 2.5, width = 0.1)+
stat_summary(fun = mean, geom = "crossbar", width = 0.7,
col = c("#9E0142","#3288BD")) +
scale_fill_manual(values = c("#9E0142","#3288BD")) +
scale_colour_manual(values = c("#9E0142","#3288BD")) +
theme_bw()
This is the final plot:
For more details, you can have a look at this post:
http://withheadintheclouds1.blogspot.com/2021/04/building-dot-plot-in-r-similar-to-those.html?m=1

Resources