Y scale on ggplot2 bar graph makes no sense

Y scale on ggplot2 bar graph makes no sense - r

My bar graph has a weird Y Axis that skips around seemingly at random, from -1.7% to -10.1%, -10.3%, and then to -2%. You can see it below:
Here is my code:
library(ggplot2)
healthd = read.csv("R/states.csv")
states = healthd[[1]]
insuredChange = healthd[[4]]
ggplot(data = healthd, aes(x = states, y = insuredChange)) +
geom_bar(stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust = 1))
What's going on here? How do I fix it?
Also, how can I get the x axis labels to all be right justified on the same line?

First - what you present isn't a reproducible example and nobody wants to sign up to access your data to help you out...
In your code:
states = healthd[[1]]
and
insuredChange = healthd[[4]]
are assigning the columns to the global environment - they are not changing the name of the values in your data.frame. When you use ggplot it is looking for columns in your data.frame with the names that don't exist - hence the NULL statement
healthd$states = healthd[[1]]
healthd$insuredChange = healthd[[4]]
will change it to something that should work - though I don't have the data so am not completely sure.
This should now generate the figure you want.
ggplot(healthd, aes(states, insuredChange)) +
geom_bar(stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust = 1))

Related

R ggplot: How can I create conditional labeling for a continuous axis ticks

I would like to conditionally alter the color/face/etc of a continuous tick mark label using logic instead of hard coding. For example:
library(tidyverse)
library(viridis)
xx=rpois(1000,lambda = 40)
y=density(xx,n=3600,from=0)
ggplot(data.frame(x = y$x, y = y$y), aes(x, y)) +
geom_line() +
geom_segment(aes(xend = x, yend = 0, colour = y)) +
scale_color_viridis() +
labs(y='Density',x='Count',colour='Density')+
geom_vline(xintercept=40,color='red') +
scale_x_continuous(breaks=c(0,40,seq(25,100,25)),limits=c(0,100))+
theme(axis.text.x = element_text(face=c('plain','bold',rep('plain',4)),
color=ifelse(y$x==60,'red','black')))
So in my example above, hard coding is seen in the face function and that works (I can do the same thing for color). It's not a big deal here since I only have a few breaks. In future scenarios, though, I may have significantly more breaks or a less simple coloring scheme needed for my axes. Is there a more efficient solution to create this labeling based on conditional logic? My first attempt is seen in color function but that does not work . The issue is identifying the object to use in the logical statement. Maybe there is a behind the scenes variable I can call for the breaks (similar to using ..level.. for a density plot). If that's the case, bonus points if you can teach me how to find it/figure that out on my own

Quick thanks to Djork for teaching me some QnA etiquette...
I was able to solve this problem by defining my break points outside of ggplot and then using ifelse logic within the theme function to create my desired outcome. The updated (and working) example of my original code is below:
library(tidyverse)
library(viridis)
xx=rpois(1000,lambda = 40)
y=density(xx,n=3600,from=0)
med_x=median(xx)
breakers = c(seq(0,100,25),med_x)
ggplot(data.frame(x = y$x, y = y$y), aes(x, y)) +
geom_line() +
geom_segment(aes(xend = x, yend = 0, colour = y)) +
scale_color_viridis() +
labs(y='Density',x='Count',colour='Density')+
geom_vline(xintercept=40,color='red') +
scale_x_continuous(breaks=breakers,limits=c(0,100))+
theme(axis.text.x = element_text(face=ifelse(breakers==med_x,'bold','plain'),
color=ifelse(breakers==med_x,'red','black')))
I haven't tried this on more complicated logic yet but I assume that this approach will work across all logical formatting.

Faulty legend in R with "color_scale_manual"

Can please someone tell me why my legend is not displaying correctly (The point in the legend for Hypericin is filled green and not blue).
Here is my code:
ggplot(df,aes(x=x, y=y))+
labs(list(title='MTT_darktox',x=expression('Concentration['*mu*'M]'),y='Survival[%]'))+
scale_x_continuous(breaks=seq(0,50,2.5))+
scale_y_continuous(breaks=seq(0,120,20))+
expand_limits(y=c(0,120))+
geom_point(data=df,shape = 21, size = 3, aes(colour='Hypericin'), fill='blue')+
geom_errorbar(data=df,aes(ymin=y-sd1, ymax=y+sd1),width = 0.8, colour='blue')+
geom_line(data=df,aes(colour='Hypericin'), size = 0.8)+
geom_point(data=df2,shape = 21, size = 3, aes(colour='#212'), fill='green')+
geom_errorbar(data=df2,aes(ymin=y-sd1, ymax=y+sd1),width = 0.8, colour='green')+
geom_line(data=df2,aes(colour='#212'), size = 0.8)+
scale_colour_manual(name='Batch_Nr', values=c('Hypericin'='blue','#212' ='green'))
Thank you!
R Plot

It would definately help to see some data for reproducability.
Guessing the structure of your data results in something like this.
# create some fake data:
df <- data.frame(x = rep(1:10, 2),
y = c(1/1:10, 1/1:10 + 1),
error = c(rnorm(10, sd = 0.05), rnorm(10, sd = 0.1)),
group = rep(c("Hypericin", "#212"), each = 10))
Which can be plotted like this:
# plot the data:
library(ggplot2)
ggplot(df, aes(x = x, y = y, color = group)) +
geom_line() +
geom_point() +
geom_errorbar(aes(ymin = y - error, ymax = y + error)) +
scale_colour_manual(name='Batch_Nr',
values = c("Hypericin" = "blue", "#212" = "green"))
Which results in a plot like this:
Explanation
First of all, you don't need to add the data = df in the ggplot-functions if you already defined that in the first ggplot-call.
Furthermore, ggplot likes tidy data best (aka. the long-format. Read more about that here http://vita.had.co.nz/papers/tidy-data.pdf). Thus adding two datasets (df, and df2) is possible but merging them and creating every variable in the dataset has the advantage that its also easier for you to understand your data.
Your error (a green point instead of a blue one) came from this confusion. In line 6 you stated that fill = "blue", which you don't change later (i.e., you don't specify something like this: scale_fill_color(...).
Does that give you what you want?
Lastly, for future questions, please make sure that you follow the MWE-principle (Minimal-Working-Example) to make the life of everyone trying to answer the question a bit easier: How to make a great R reproducible example?

Thank you very much for your help! I will consider the merging for future code.
Meanwhile I found another solution to get what I wanted without changing everything (although probably not the cleanest way). I just added another line to override the legend appearance :
guides(colour= guide_legend(override.aes=list(linetype=c(1,1)
,shape=c(16,16))))
resulting in :
R plot new

geom_dotplot Gaps Between scale_x_discrete

I am trying to make a histodot using geom_dotplot. For some reason, ggplot is giving me what appear to me are arbitrary breaks in my data along the x axis. My data has been binned seq(0,22000000,500000), so I would like to see gaps in my data where they actually exist. However, I'm struggling to successfully list those breaks(). I wouldn't expect to see my first break until after $6,000,000 with a break until $10,000,000. Bonus points for teaching me why I can't use labels=scales::dollar on my scale_x_discrete.
Here is my data.
library(ggplot2)
data <- read.csv("data2.csv")
ggplot(data,aes(x=factor(constructioncost),fill=designstage)) +
geom_dotplot(binwidth=.8,method="histodot",dotsize=0.75,stackgroups=T) +
scale_x_discrete(breaks=seq(0,22000000,500000)) +
ylim(0,30)
Thanks for any and all help and please, let me know if you have any questions!

Treating the x axis as continuous instead of a factor will give you what you need. However, you experienced the enormous range of your cost variable (0 to 21 million) was making ggplot2 choke when you try to treat is as continuous.
Because your values (other than 0) are all at least 500000, dividing the cost by 100000 will put things on a scale that ggplot2 can handle but also give you the variable spacing you want.
Note I had to play around with binwidth when I changed the scale of the data.
ggplot(data, aes(x = constructioncost/100000, fill = signsta)) +
geom_dotplot(binwidth = 3, method="histodot", stackgroups=T) +
scale_x_continuous(breaks=seq(0, 220, 5)) +
ylim(0,30)
Then you can change the labels to reflect the whole dollar amounts if you'd like. The number are so big you'll likely need to either add fewer or change the orientation (or both).
ggplot(data, aes(x = constructioncost/100000, fill = signsta)) +
geom_dotplot(binwidth = 3, method="histodot", stackgroups=T) +
scale_x_continuous(breaks = seq(0, 220, 10),
labels = scales::dollar(seq(0, 22000000, 1000000))) +
ylim(0,30) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

Secondary / Dual axis - ggplot

I am opening this question for three reasons : First, to re-open the dual-axis discussion with ggplot. Second, to ask if there is a non-torturing generic approach to do that. And finally to ask for your help with respect to a work-around.
I realize that there are multiple discussions and questions on how to add a secondary axis to a ggplot. Those usually end up in one of two conclusions:
It's bad, don't do it: Hadley Wickham answered the same question here, concluding that it is not possible. He had a very good argument that "using separate y scales (not y-scales that are transformations of each other) are fundamentally flawed".
If you insist, over-complicate your life and use grids : for example here and here
However, here are some situations that I often face, in which the visualization would greatly benefit from dual-axis. I abstracted the concepts below.
The plot is wide, hence duplicating the y-axis on the right side would help (or x-axis on the top) would ease interpretation. (We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far)
I need to add a new axis that is a transformation to the original axes (eg: percentages, quantiles, .. ). (I am currently facing a problem with that. Reproducible example below)
And finally, adding Grouping/Meta information: I stumble across that when using categorical data with multiple-level, (e.g.: Categories = {1,2,x,y,z}, which are "meta-divided" into letters and numerics.) Even though color-coding the meta-levels and adding a legend or even facetting solve the issue, things get a little bit simpler with a secondary axis, where the user won't need to match the color of the bars to that of the legend.
General question: Given the new extensibility features ggplot 2.0.0, is there a more-robust no-torture way to have dual-axis without using grids?
And one final comment: I absolutely agree that the wrong use of dual-axis can be dangerously misleading... But, isn't that the case for information visualization and data science in general?
Work-around question:
Currently, I need to have a percentage-axis (2nd case). I used annotate and geom_hline as a workaround. However, I can't move the text outside the main plot. hjust also didn't seem to work with me.
Reproducible example:
library(ggplot2)
# Random values generation - with some manipulation :
maxVal = 500
value = sample(1:maxVal, size = 100, replace = T)
value[value < 400] = value[value < 400] * 0.2
value[value > 400] = value[value > 400] * 0.9
# Data Frame prepartion :
labels = paste0(sample(letters[1:3], replace = T, size = length(value)), as.character(1:length(value)))
df = data.frame(sample = factor(labels, levels = labels), value = sort(value, decreasing = T))
# Plotting : Adding Percentages/Quantiles as lines
ggplot(data = df, aes(x = sample, y = value)) +
geom_bar(stat = "identity", fill = "grey90", aes(y = maxVal )) +
geom_bar(stat = "identity", fill = "#00bbd4") +
geom_hline(yintercept = c(0, maxVal)) + # Min and max values
geom_hline(yintercept = c(maxVal*0.25, maxVal*0.5, maxVal*0.75), alpha = 0.2) + # Marking the 25%, 50% and 75% values
annotate(geom = "text", x = rep(100,3), y = c(maxVal*0.25, maxVal*0.5, maxVal*0.75),
label = c("25%", "50%", "75%"), vjust = 0, hjust = 0.2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(panel.background = element_blank()) +
theme(plot.background = element_blank()) +
theme(plot.margin = unit(rep(2,4), units = "lines"))

In response to #1
We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far
cowplot.
# Assign your original plot to some variable, `gpv` <- ggplot( ... )
ggdraw(switch_axis_position(gpv, axis="y", keep="y"))

Overlaying geom_point layer on a geom_boxplot

Really struggling with this.
a & b are actual datasets in the real world, a being extremely large. I get an error ggplot2 doesn't know how to deal with data of class uneval. What I'm trying to do is overlay a single point from a second data set on to the boxplot to highlight how one particular sample compared to a universe.
Any idea what I'm doing wrong? How can I fix it?
a = data.frame(YTD.Retn=runif(1000,-10,10),sector="a")
a = rbind(a,data.frame(YTD.Retn=runif(1000,-10,10),sector="b"))
a = rbind(a,data.frame(YTD.Retn=runif(1000,-10,10),sector="c"))
a = rbind(a,data.frame(YTD.Retn=runif(1000,-10,10),sector="d"))
a = rbind(a,data.frame(YTD.Retn=runif(1000,-10,10),sector="e"))
a = rbind(a,data.frame(YTD.Retn=runif(1000,-10,10),sector="f"))
a = rbind(a,data.frame(YTD.Retn=runif(1000,-10,10),sector="g"))
b = data.frame(sector=c("a","b","c","d","e","f","g"),YTD.Retn=c(5,6,7,3,2,-1,-5))
p1 =ggplot(a,aes(factor(sector),YTD.Retn,fill=factor(sector))) + geom_boxplot() +
scale_fill_discrete(guide=F) +
geom_point(b,aes(factor(sector),YTD.Retn))
plot(p1)

You need to name the argument data when called within a geom_...() call. Naming arguments is good practice in general (if somewhat timeconsuming)
p1 =ggplot(data = a, aes(x = factor(sector), y = YTD.Retn, fill=factor(sector))) +
geom_boxplot() +
scale_fill_discrete(guide=F) +
geom_point(data = b, aes(x= factor(sector),y= YTD.Retn))
plot(p1)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Y scale on ggplot2 bar graph makes no sense - r

Related

R ggplot: How can I create conditional labeling for a continuous axis ticks

Faulty legend in R with "color_scale_manual"

geom_dotplot Gaps Between scale_x_discrete

Secondary / Dual axis - ggplot

Overlaying geom_point layer on a geom_boxplot

Categories

Resources