Problems with Creating Bar Plot In R - r

Image of Problematic Barplot
Hello everyone, this is my first time posting on Stack Exchange and I will start off by saying that I am a beginner at coding, (like, really a beginner).
I am having issues creating a barplot for one of my classes. (I have attached an image of this problematic barplot in this post).
As you will be able to see, there are two problems with it:
(1) The legend practically blocks out the whole fourth plot
and
(2) I just can't get there to be one color per leaf shape, (there are multiple leaf shapes represented by one color, in other words. I have 13 leaf shapes, I would like 13 different colors, one per shape).
Lastly, here is the code I used to generate the plot:
barplot(shape_biome_table,beside=T,legend.text = T,col=c(1:13),
main="Leaf Shapes By Biome Type",
xlab="Leaf Shape",ylab="Frequency",las=1,
args.legend=list(x="topright"))
If someone can please help me in figuring out what needs to be done to solve these two issues I would be very appreciative. And, as I mentioned previously, I am not very well versed in coding jargon, so please try to make your explanation as easy as possible to understand.
Thank you very much!

R's native plotting support is a bit cumbersome. Perhaps the first thing is to try ggplot2. I've tried to make a guess at what your data looks like.
library(tidyverse)
library(ggplot2)
shape_biome_table <- tribble(
~leaf.shape, ~biome,
"Acicular", "Hawaiian Natives",
"Acicular", "Hawaiian Natives",
"Acuminate", "Hawaiian Natives",
"Aristate", "Mediterranean" )
ggplot(shape_biome_table, aes(x = leaf.shape, fill = leaf.shape)) +
geom_bar() +
facet_grid(. ~ biome)

Related

Change colors in r plot

I am currently trying to plot some data and don't manage to obtain a nice result. I have a set of 51 individuals with each a specific value (Pn) and split within 14 groups. The closest thing I end up with is this kind of plot. I obtain it thanks to the simple code bellow, starting by ordering my values for the Individuals :
Individuals <- factor(Individuals,levels=Individuals[order(Pn)])
dotchart(Pn,label=Individuals,color=Groups)
The issue is that I only have 9 colors on this plot (so I lost information somehow) and I can't manage to find a way to apply manually one color per group.
I've also try to use the ggplot2 package by reading it could give nice looking things. In that case I can't manage to order properly the Individuals (the previous sorting doesn't seem to have any effect here), plus I end up with only different type of blue for the group representation which is not an efficient way to represent the information given by my data set. The plot I get is accessible here and I used the following code:
ggplot(data=gps)+geom_point(mapping=aes(x=Individuals, y=Pn, color=Groups))
I apologize if this question seems redundant but I couldn't figure a solution on my own, even following some answer given to others...
Thank you in advance!
EDIT: Using the RColorBrewer as suggested bellow sorted out the issue with the colors when I use the ggplot2 package.
I believe you are looking for the scale_color_manual() function within ggplot2. You didn't provide a reproducible example, but try something along the lines of this:
ggplot(data=gps, mapping=aes(x=Individuals, y=Pn, color=Groups))+
geom_point() +
scale_color_manual(values = c('GROUP1' = 'color_value_1',
'GROUP2' = 'color_value_2',
'GROUP3' = 'color_value_3'))
Replace GROUPX with the values inside your Group column, and replace color_value_x with whatever colors you want to use.
A good resource for further learning about ggplot2 is chapter 3 of R For Data Science, which you can read here: http://r4ds.had.co.nz/data-visualisation.html
I can't be sure without looking at your data, but it looks like Groups may be a numeric value. Try this:
gps$Groups <- as.factor(gps$Groups)
library(RColorBrewer)
ggplot(data=gps)+
geom_point(mapping=aes(x=Individuals, y=Pn, color=Groups))+
scale_colour_brewer(palette = "Set1")

Breaking value axis using ggplot2 [duplicate]

This question already has answers here:
Using ggplot2, can I insert a break in the axis?
(10 answers)
Closed 3 years ago.
I have used Thinkcell, and one of its cool features is that it breaks very long y-axis to fit the graph. I am not sure whether we can do this with ggplot2. I am a beginner in ggplot2. So, I'd appreciate any thoughts.
For example:
Series <- c(1:6)
Values <- c(899, 543, 787, 35323, 121, 234)
df_val_break <- data.frame(Series, Values)
ggplot(data=df_val_break, aes(x=Series, y=Values)) +
geom_bar(stat="identity")
This creates a graph like this:
However, I want a graph that looks something like this:
However, it seems that broken axis is not supported in ggplot2 because it's misleading (Source: Using ggplot2, can I insert a break in the axis?). This thread suggests a couple of things--faceting and tables.
While I like tables, but I don't like faceting because my categorical variable "Series" are closely related. Moreover, I'd prefer Excel for drawing tables--it's fast.
I have two questions:
Question 1: One of the options I liked is at https://stats.stackexchange.com/questions/1764/what-are-alternatives-to-broken-axes. The graph is at
.
I am unable to replicate similar graph because of the scaling issue.
Question 2: This is a minor question just in case there were new packages introduced that might help us to do this. (The linked SO thread above is older than 5 years. ) Are there any other options on the table?
Update: I don't think my question is duplicate for two reasons: a) I have already gone through the indicated thread, and have referenced here explaining that I am looking for a solution that looks like the third graph in my post. Specifically, I am looking to plot both the graphs--one with shorter scales and the other with 1/20 scale in one graph. I am unable to do this using ggplot2 because of scale issue. Either both the sub-graphs get scaled to 1/nth or one of them get scaled to normal range. I believe this version is much relatable for non-technical audience who don't understand log and Inverse transformation.
I took a stab at this one. I'm a beginner so I am not sure whether this can be improved further in terms of placement of text. I struggled with fitting both high growth rate series and low growth rate series in one graph because of different scales. So, I used facetting.
Here's the code:
ggplot(data = df_val_break,aes(x=Series,y=Values)) +
geom_bar(stat = "identity") +
facet_wrap(~Modified) +
geom_text(data = df_val_break[df_val_break$Modified=="HIGH_GROWTH",], aes(label = "x20 growth rate"),hjust=0.5, vjust=0)
ggsave("post.png")
Here's the output:
There are quite a few issues that I see:
a) High_growth rate graph has Series 2 and Series 6 on the x-axis, although we don't need them. I don't know how to turn them off.
b) geom_text overlaps with the bar. This looks a little annoying.
c) I'd believe that the graph is a little misleading, especially for HIGH_GROWTH section because the y-axis isn't scaled with LOW_GROWTH I was originally thinking of showing two different y-axis--one scaled by 1/20 and the other unscaled.

Making one variable be shapes of different colors (ggplot2)

So right now I've got this plot:
my plot
(sorry it's not inline image, this is my first time on Stack Overflow and it wouldn't let me post images)
The plot is produced with this code:
ggplot(potassium.data,
aes(x=Experiment,y=value,
colour=Pedigree))+geom_jitter()+labs(title=element)
The problem is, there are 31 different maize pedigrees being plotted here, so it's difficult to distinguish the colors from each other. I was wondering if it's possible to make it so that the color and shape of the point are used to uniquely identify a pedigree, so that for example one pedigree is red squares, another is red circles, a third one is blue squares, a fourth is blue circles, and so on. This would make it far easier to distinguish the points. Anyone know how to do this?
I don't think thats possible, if you do the shaping by pedigree you will just end up with as many categories of shapes as you have colors now.
geom_label() and geom_text() would let you plot the cultivar id directly onto the plot, then maybe you could build a separate column for something equivalent to genus, so that the cultivars could be grouped somehow (maybe A, B, PH, etc). Then you could color by that "genus" column, which would make the plot look better:
ggplot(potassium.data,
aes(x=Experiment,y=value, label=Pedigree, colour = genus))+
geom_label(position = position_jitter())+
labs(title=element)
Ideally you would end up with a plot colored by the genus while only plotting the suffix digits currently in Pedigree.
I have to agree with Nathan and Joran, the plot is quite confusing by having so many different points and adding shapes into the mix is unlikely to help.
To answer your question you should be able to use shape=pedigree, but maybe to make the graph more readable you could join the pedigrees from one experiment to the other with a geom_line so the reader spends less time scanning.

How to make the bg of a single legend transparent in a merged plot?

I don't have to mention I am new, and my problem has too many solutions. I tried about 12 different versions and couldn't solve it:
The example given is close to my desired plot I want to generate.
I overtook a given script from 2013, so I do not entirely understand what to do to change it in the way I would like:
Plot 1's legend in the bottom right corner without a title, transparent background and instead of 1 and 2 the labels "urban" and "non-urban".I am aware that "legend.position="none"" delets all legends, but was not able to find a solution that looked like at least close to this. Still it is not on the plot, not transparent and has a title.
Unfortunately somehow the dots changed into squares in this process and I have no clue why. I didn't change geom_point.
Another flaw I want to change is to remove the top line over the central plot. But how?
And last but not least I am not sure if the function geom.mooth(method=lm) does reflect the regression line + confidence interval, because the description says it adds a conditional mean which is, afaIk, not the same. Is my concern unnecessary?
Edit: shorter Version plot1 out of 3 merged plots:
library(ggplot2)
library(gridExtra)
set.seed(42)
DF <- data.frame(x=rnorm(100,mean=c(1,5)),y=rlnorm(100,meanlog=c(8,6)),group=1:2)
p1 <- ggplot(DF,aes(x=x,y=y,colour=factor(group))) + geom_point(shape=16) +
scale_x_continuous(expand=c(0.02,0)) +
scale_y_continuous(expand=c(0.02,0)) +
geom_rug() +
geom_smooth(method=lm,alpha=0.3) +
theme_bw()+
theme(legend.position=c(0.9,0.09),
legend.title=element_blank(),
plot.margin=unit(c(0,0,0,0),
"points"))
Thanks for any advice, I am researching on this topic since 2 weeks, even though I thought I am studying psychology, I learned a lot ... but not enough in the short time to success. :/

Varying axis labels formatter per facet in ggplot/R

I have a dataframe capturing several measures over time that I would like to visualize a 3x1 facet. However, each measure contains different units/scales that would benefit from custom transformations and labeling schemes.
So, my question is: If the units and scales are different across different facets, how can I specify a custom formatter or transformation (i.e., log10) to a particular axis within a facet?
For example, let's say I have the data:
df = data.frame(dollars=10^rlnorm(50,0,1), counts=rpois(50, 100))
melted.df = melt(df, measure.var=c("dollars", "counts"))
How would one go upon setting up a 2x1 facet showing dollars and counts over the index with labels=dollars and scale_y_continuous(trans = "log10", ...) for the df$dollars data?
Thank you!
As you discovered, there isn't an easy solution to this, but it comes up a lot. Since this sort of thing is asked so often, I find it helpful to explain why this is hard, and suggest a potential solution.
My experience has been that people coming to ggplot2 or lattice graphics fundamentally misunderstand the purpose of faceting (or trellising, in lattice). This feature was developed with a very specific idea in mind: the visualization of data across multiple groups that share a common scale. It comes from something called the principle of small multiples, espoused by Tufte and others.
Placing panels next to each other with very different scales is something that visual design experts will tend to avoid, because it can be at best misleading. (I'm not scolding you here, just explaining the rationale...)
But of course, once you have this great tool out in the open, you never know how folks are going to use it. So it gets stretched: the requests come in for the ability to allows the scales to vary by panel, and to set various aspects of the plot separately for each panel. And so faceting in ggplot2 has been expanded well beyond its original intent.
One consequence of this is that some things are difficult to implement simply due to the original design intent of the feature. This is likely one such instance.
Ok, enough explanation. Here's my solution.
The trick here is to recognize that you aren't plotting graphs that share a scale. To me, that means you shouldn't even be thinking of using faceting at all. Instead, make each plot separately, and arrange them together in one plot:
library(gridExtra)
p1 <- ggplot(subset(melted.df,variable == 'dollars'),
aes(x = value)) +
facet_wrap(~variable) +
geom_density() +
scale_x_log10(labels = dollar_format())
p2 <- ggplot(subset(melted.df,variable == 'counts'),
aes(x = value)) +
facet_wrap(~variable) +
geom_density()
grid.arrange(p1,p2)
I've just guessed at what geom_* you wanted to use, and I'm sure this isn't really what you wanted to plot, but at least it illustrates the principle.

Resources