Bug in ggplot2? - r

I'm currently working on plotting simple plots using ggplot2.
The graph looks good, but there is one tiny detail I can't fix.
When you look at the legend, it says "Low n" twice. One of them should be "High n".
Here is my code:
half_plot <- ggplot() +
ggtitle(plot_title) +
geom_line(data = plot_dataframe_SD1, mapping = aes(x = XValues, y = YValues_SD1, color = "blue")) +
geom_line(data = plot_dataframe_SD2, mapping = aes(x = XValues, y = YValues_SD2, color = "green")) +
xlim(1, 2) +
ylim(1, 7) +
xlab("Standard Deviation") +
ylab(AV_column_name) +
scale_fill_identity(name = 'the fill', guide = 'legend',labels = c('m1')) +
scale_colour_manual(name = 'Legend',
values =c('blue'='blue','green'='green'),
labels = c(paste("Low ", Mod_column_name), paste("High ", Mod_column_name))
Here is the graph I get in my output:
So do you know how to fix this?
And there is one more thing that makes me curious: I can't remember that I changes anything in this code, but I know that the legend worked just fine a few days ago. I safed pictures I made wih this code and it looks alright..
Also if you have any further suggestions how to upgrade the graph, these suggestions are very welcome too.

When asking questions, it will help us if you provide a reproducible example including the data. With some sample data, there are a couple ways to fix it.
Sample data
library(dplyr)
plot_dataframe_SD1 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD1=XValues*2)
plot_dataframe_SD2 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD2=XValues*5)
The simplest way to modify your code is to supply the desired color label in the aesthetic.
Mod_column_name = 'n'
half_plot <- ggplot() +
# put the desired label name in the aesthetic
# link describing the bang bang operator (!!) https://www.r-bloggers.com/2019/07/bang-bang-how-to-program-with-dplyr/ geom_line(data=plot_dataframe_SD1,mapping=aes(x=XValues,y=YValues_SD1,color=!!paste('Low',Mod_column_name))) +
geom_line(data=plot_dataframe_SD2,mapping=aes(x=XValues,y=YValues_SD2,color=!!paste('High',Mod_column_name))) +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))
A more general approach is to join the dataframes and pivot the joined df to have a column with the SD values and another to specify how to separate the colors. This makes it easier to plot without having to make multiple calls to geom_line.
# Join the dfs, pivot the SD columns longer, and make a new column with your desired labels
joined_df = plot_dataframe_SD1 %>% full_join(plot_dataframe_SD2,by='XValues') %>%
tidyr::pivot_longer(cols=contains('YValues'),names_to='df_num',values_to='SD') %>%
mutate(label_name=if_else(df_num == 'YValues_SD1',paste('Low',Mod_column_name),paste('High',Mod_column_name)))
# Simplified plot
ggplot(data=joined_df,aes(x=XValues,y=SD,color=label_name)) +
geom_line() +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))

Related

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

R (ggplot) function to add an 'empty' row in an existing legend, split legend?

I'm currently working on a plot. It's all coming together, but i'm wondering about one thing... I tried googling this, but I couldn't found what I was looking for.
Right now, I've created a legend that I liked. With 'scale_color_manual' and 'scale_size_manual', I've created a combined legend that includes the thickness of the corresponding line and the color. I have put the code I used below.
scale_color_manual(name = "combined legend",
labels = c("Nederland totaal", "Noord-Holland", "Utrecht", "Noord-Brabant", "Zuid-Holland", "Gelderland", "Flevoland", "Overijssel", "Limburg", "Drenthe", "Zeeland", "Friesland", "Groningen"),
values=c("#000000", "#001EFF", "#2ECC71","#FF009D","#00FFCD",
"#FF8400", "#8514EB", "#EB1A14", "#FFE100",
"#FF00AA", "#00AAFF", "#16A085", "#B903FC")) +
scale_size_manual(name = "combined legend",
labels = c("Nederland totaal", "Noord-Holland", "Utrecht", "Noord-Brabant", "Zuid-Holland", "Gelderland", "Flevoland", "Overijssel", "Limburg", "Drenthe", "Zeeland", "Friesland", "Groningen"),
values = c(1.75, 0.8, 0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8))
This is what the legend looks like right now
My question is: Is it possible to create a little bit of space between the first 'legend' part (so "Nederland totaal" and the other lines?).
I want it to look more like this
(I made this for clarification via word).
Is there a function to add some space between certain legend items? I hope somebody could help me :))
More detailed:
The dataset I'm working with, are the average yearly housing prices per Dutch province for 2005 until 2019. I created a ggplot, the current plot looks like this now. It is basically a ggplot with the year on the horizontal axis and the average housing price on the vertical axis. I have sorted the color by province, and added the black, thicker line that corresponds the total average of the netherlands (which I also put in the province vector). I used geom_line all of the legend items are factors. I hope this was clear enough, if not, let me know
Could this be applied to your data?
library(tidyverse)
tib <-
tibble(x = 1:3,
a = 1:3,
b = 1.5:3.5,
c = 2:4) %>%
pivot_longer(cols = a:c, names_to = "id", values_to = "val")
ggplot()+
geom_line(data = filter(tib, id == "a"), aes(x, val, linetype = id))+
geom_line(data = filter(tib, id != "a"), aes(x, val, colour = id))+
labs(linetype = "legend", colour = NULL)
Gives you:
One option is to use stat_summary.
This will add another aesthetic to the graph, and thus another legend, far apart from the other one.
airquality %>%
mutate(Month=factor(Month)) %>%
ggplot(aes(x=Day, y=Temp, col=Month)) +
geom_line() +
stat_summary(aes(lwd="Nederland totaal"), fun=mean, geom="line", col="black") +
theme_classic() +
theme(legend.title = element_blank()) +
guides(lwd = guide_legend(order = 1))
But the benefit of this method is that you don't have to calculate the averages per year and manually add it to your Province variable.

Add legend to ggplot2 line with point plot

I have a question about legends in ggplot2. I managed to plot two lines and two points in the same graph and want to add a legend with the two colors used. This is the code used
P <- ggplot() + geom_point(data = data_p,aes(x = V1,y = V2),shape = 0,col = "#56B4E9") + geom_line(data = data_p,aes(x = V1,y = V2),col = "#56B4E9")+geom_point(data = data_p,aes(x = V1,y = V3),shape = 1,col = "#009E73") + geom_line(data = data_p,aes(x = V1,y = V3),col = "#009E73")
and the output is
enter image description here
I try to use scale_color_manual and scale_shape_manual and scale_line_manual,but they don't work .
P + scale_color_manual(name = "group",values = c('#56B4E9' = '#56B4E9','#009E73' = '#009E73'),
breaks = c('#56B4E9','#009E73'),labels = c('B','H')) +
I want it like this
Here is the simple data if it can help you.
5 0.49216 0.45148
10 0.3913 0.35751
15 0.32835 0.30361
data_p
I would approach this problem in two steps.
Generally, to get stuff in the guides, ggplot2 wants you to put "aesthetics" like colour inside the aes() function. I typically do this inside the ggplot() rather than individually for each "geom", especially if everything kind of makes sense in a single dataframe.
My first step would be to remake your dataframe slightly. I would use the package tidyr (part of the tidyverse, like ggplot2, which is really nice for reformatting data and worth learning as you go), and do something like this
#key is the new variable that will be your color variable
#value is the numbers that had been in V2 and V3 that will now be your y-values
data_p %>% tidyr::gather (key = "color", value = "yval", V2, V3)
#now, I would rewrite your plot slightly
P<-(newdf %>% ggplot(aes(x=V1,y=yval, colour=color))
#when you put the ggplot inside parentheses,
#you can add each new layer on its own line, starting with the "+"
+ geom_point()
+ geom_line()
+ scale_color_manual(values=c("#56B4E9","#009E73"))
#theme classic is my preferred look in ggplot, usually
+ theme_classic()
)

ggplot conditionally change shape or shape fill using variable

I am trying to create a line chart that shows open symbols for data that is not detected and closed (filled) symbols to represent detected data. Here is the some code to work with:
date <- c("1991-04-25","1991-04-26","1991-04-27","1991-04-28","1991-04-29","1991-04-25","1991-04-26","1991-04-27","1991-04-28","1991-04-29","1991-04-25","1991-04-26","1991-04-27","1991-04-28","1991-04-29")
Parameter <- c("TEA","TEA","TEA","TEA","TEA","COFFEE","COFFEE","COFFEE","COFFEE","COFFEE","WATER","WATER","WATER","WATER","WATER")
data <- c(5,4,7,3,6,4,6,8,6,3,7,8,7,6,7)
DetectYN <- c("Y","N","Y","Y","Y","N","Y","Y","Y","N","N","N","Y","Y","N")
df <- data.frame(date, Parameter,data, DetectYN)
df$date <- as.Date(df$date, "%Y-%m-%d" )
df$DetectYN <-as.character(df$DetectYN)
ggplot(df, aes(x=date, y=data)) +
geom_point(size=4, aes(shape = Parameter , colour= Parameter)) +
geom_line(aes(x=date, y=data,color = Parameter)) +
scale_shape_manual(values=ifelse(DetectYN == "Y",c(15,16,17),c(0,1,2)) , guide = "none")
This creates the following chart - nearly correct, except that my ifelse is not having the desired effect. I would like the DetectYN = "N" to be hollow (no fill) and I would like the DetectYN = "Y" to be filled. The existing symbols need to remain. Could anyone help me with this please?
This is a deceptively difficult problem!
This solution directly answers your question, and is hopefully of some use. However, I fear that it may become messy with large, complicated datasets.
I added a column combining the two variables that you wish to control shape, and then defined shape by this new column, ordering the shape numbers to achieve the desired result.
df$shape<-paste(Parameter, DetectYN)
ggplot(df, aes(x=date, y=data, colour= Parameter)) +
geom_point(size=4, aes(shape=shape))+
geom_line() +
scale_shape_manual(values=c(0,15,1,16,2,17) , guide = "none")

How to format the scatterplots of data series in R

I have been struggling in creating a decent looking scatterplot in R. I wouldn't think it was so difficult.
After some research, it seemed to me that ggplot would have been a choice allowing plenty of formatting. However, I'm struggling in understanding how it works.
I'd like to create a scatterplot of two data series, displaying the points with two different colours, and perhaps different shapes, and a legend with series names.
Here is my attempt, based on this:
year1 <- mpg[which(mpg$year==1999),]
year2 <- mpg[which(mpg$year==2008),]
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy,color="yellow")) +
geom_point(data = year2, aes(x=cty,y=hwy,color="green")) +
xlab('cty') +
ylab('hwy')
Now, this looks almost OK, but with non-matching colors (unless I suddenly became color-blind). Why is that?
Also, how can I add series names and change symbol shapes?
Don't build 2 different dataframes:
df <- mpg[which(mpg$year%in%c(1999,2008)),]
df$year<-as.factor(df$year)
ggplot() +
geom_point(data = df, aes(x=cty,y=hwy,color=year,shape=year)) +
xlab('cty') +
ylab('hwy')+
scale_color_manual(values=c("green","yellow"))+
scale_shape_manual(values=c(2,8))+
guides(colour = guide_legend("Year"),
shape = guide_legend("Year"))
This will work with the way you currently have it set-up:
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy), col = "yellow", shape=1) +
geom_point(data = year2, aes(x=cty,y=hwy), col="green", shape=2) +
xlab('cty') +
ylab('hwy')
You want:
library(ggplot2)
ggplot(mpg, aes(cty, hwy, color=as.factor(year)))+geom_point()

Resources