I have a work project where I've been given a spreadsheet with tons of data and I want to plot it using R to look for trends.
The issue I am having is that I cannot plot it correctly using ggplot because I want to place to variables in the Y-Axis.
My goal is to plot "interest" and "awareness" on the Y-axis in different colors, say green and blue, and "admissions" on the X-axis.
Unfortunately, I am new StackOverflow and cannot include my Excel spreadsheet, so I included a screenshot for reference.
Excel note - the actual spreadsheet has 381 titles
ggplot(data =data, aes(x = 'admissions', y =
'interest')) + geom_line()
Here is one way we could do it:
library(tidyverse)
df %>%
pivot_longer(
c(interest, awarenes)
) %>%
mutate(value_number = parse_number(value)) %>%
ggplot(aes(x = admission, y = value_number, color= name, group=name)) +
geom_point()+
geom_line()+
scale_y_continuous(labels = function(x) paste0(x, "%"))+
scale_color_manual(values=c("green", "blue"))+
theme_bw()
Related
I am trying to plot one variable against all other variables in a data set and view each graph at the same time. The code I am using to do this is:
theme_set(
theme_bw() +
theme(legend.position = "top")
)
healthTrain.gathered <- healthTrain %>%
as_tibble() %>%
gather(key = "variable", value = "value",
-CHD, -Population2010)
ggplot(healthTrain.gathered, aes(x = value, y = CHD)) +
geom_point() +
facet_wrap(~variable)
This code works great, except not all of the variables have the same range of x values, but each graph uses the margins of the variable with the largest range of x values. Is there way to make each graph use the margins that are best fit for itself?
Example of what I am looking for:
plot(heatlh$CHD, health$BPHIGH)
plot(health$CHD, health$COPD)
plot(health$CHD, health$STROKE)
Except I want to be able to see all of the graphs at the same time.
i use geom_bar in ggplot to visualize the purchase decision of customers (3 factor levels purchase, may be, no purchase. The decisions are grouped for several product groups with facet_wrap.
ggplot(df, aes(x= status_purchase)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 90)) +
facet_wrap(~ product_group)
Not surprisingly this works fine. Do i have any options to visualize another variable for the groups in facet_wrap (e.g. total expenses for each product group)? A kind of bubble in the respective size placed in the right upper corner of the plot or at least the sum of the expenses in the headline would be nice.
Thank you for your answers.
Philipp
OP. In the absence of a specific example, let me demonstrate one way to do this that uses geom_text() to display summary data for a given dataset that is separated in to facets.
In this example, I'll use the txhousing dataset (which is part of ggplot2):
library(dplyr)
library(tidyr)
library(ggplot2)
df <- txhousing
df %>% ggplot(aes(x=month, y=sales)) + geom_col() +
facet_wrap(~year)
Let's say we wanted to display a red total of sales for a year in the upper right portion of each facet. The easiest way to do this is to first calculate our summary data in a separate dataset, then overlay that information according to the facets via geom_text().
df_summary <- df %>%
group_by(year) %>%
summarize(total = sum(sales, na.rm = TRUE))
df %>% ggplot(aes(x=month, y=sales)) + geom_col() +
facet_wrap(~year) +
geom_text(
data=df_summary, x=12, y=33000, aes(label=total),
hjust=1, color='red', size=3
)
I override the mapping for the x and y aesthetics in the geom_text() call. As long as the df_summary dataset contains a column called year, the data will be placed on the facets properly.
I hope you can apply a similar idea to your particular question.
I'm currently working on plotting simple plots using ggplot2.
The graph looks good, but there is one tiny detail I can't fix.
When you look at the legend, it says "Low n" twice. One of them should be "High n".
Here is my code:
half_plot <- ggplot() +
ggtitle(plot_title) +
geom_line(data = plot_dataframe_SD1, mapping = aes(x = XValues, y = YValues_SD1, color = "blue")) +
geom_line(data = plot_dataframe_SD2, mapping = aes(x = XValues, y = YValues_SD2, color = "green")) +
xlim(1, 2) +
ylim(1, 7) +
xlab("Standard Deviation") +
ylab(AV_column_name) +
scale_fill_identity(name = 'the fill', guide = 'legend',labels = c('m1')) +
scale_colour_manual(name = 'Legend',
values =c('blue'='blue','green'='green'),
labels = c(paste("Low ", Mod_column_name), paste("High ", Mod_column_name))
Here is the graph I get in my output:
So do you know how to fix this?
And there is one more thing that makes me curious: I can't remember that I changes anything in this code, but I know that the legend worked just fine a few days ago. I safed pictures I made wih this code and it looks alright..
Also if you have any further suggestions how to upgrade the graph, these suggestions are very welcome too.
When asking questions, it will help us if you provide a reproducible example including the data. With some sample data, there are a couple ways to fix it.
Sample data
library(dplyr)
plot_dataframe_SD1 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD1=XValues*2)
plot_dataframe_SD2 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD2=XValues*5)
The simplest way to modify your code is to supply the desired color label in the aesthetic.
Mod_column_name = 'n'
half_plot <- ggplot() +
# put the desired label name in the aesthetic
# link describing the bang bang operator (!!) https://www.r-bloggers.com/2019/07/bang-bang-how-to-program-with-dplyr/ geom_line(data=plot_dataframe_SD1,mapping=aes(x=XValues,y=YValues_SD1,color=!!paste('Low',Mod_column_name))) +
geom_line(data=plot_dataframe_SD2,mapping=aes(x=XValues,y=YValues_SD2,color=!!paste('High',Mod_column_name))) +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))
A more general approach is to join the dataframes and pivot the joined df to have a column with the SD values and another to specify how to separate the colors. This makes it easier to plot without having to make multiple calls to geom_line.
# Join the dfs, pivot the SD columns longer, and make a new column with your desired labels
joined_df = plot_dataframe_SD1 %>% full_join(plot_dataframe_SD2,by='XValues') %>%
tidyr::pivot_longer(cols=contains('YValues'),names_to='df_num',values_to='SD') %>%
mutate(label_name=if_else(df_num == 'YValues_SD1',paste('Low',Mod_column_name),paste('High',Mod_column_name)))
# Simplified plot
ggplot(data=joined_df,aes(x=XValues,y=SD,color=label_name)) +
geom_line() +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))
I'm currently working on a plot. It's all coming together, but i'm wondering about one thing... I tried googling this, but I couldn't found what I was looking for.
Right now, I've created a legend that I liked. With 'scale_color_manual' and 'scale_size_manual', I've created a combined legend that includes the thickness of the corresponding line and the color. I have put the code I used below.
scale_color_manual(name = "combined legend",
labels = c("Nederland totaal", "Noord-Holland", "Utrecht", "Noord-Brabant", "Zuid-Holland", "Gelderland", "Flevoland", "Overijssel", "Limburg", "Drenthe", "Zeeland", "Friesland", "Groningen"),
values=c("#000000", "#001EFF", "#2ECC71","#FF009D","#00FFCD",
"#FF8400", "#8514EB", "#EB1A14", "#FFE100",
"#FF00AA", "#00AAFF", "#16A085", "#B903FC")) +
scale_size_manual(name = "combined legend",
labels = c("Nederland totaal", "Noord-Holland", "Utrecht", "Noord-Brabant", "Zuid-Holland", "Gelderland", "Flevoland", "Overijssel", "Limburg", "Drenthe", "Zeeland", "Friesland", "Groningen"),
values = c(1.75, 0.8, 0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8))
This is what the legend looks like right now
My question is: Is it possible to create a little bit of space between the first 'legend' part (so "Nederland totaal" and the other lines?).
I want it to look more like this
(I made this for clarification via word).
Is there a function to add some space between certain legend items? I hope somebody could help me :))
More detailed:
The dataset I'm working with, are the average yearly housing prices per Dutch province for 2005 until 2019. I created a ggplot, the current plot looks like this now. It is basically a ggplot with the year on the horizontal axis and the average housing price on the vertical axis. I have sorted the color by province, and added the black, thicker line that corresponds the total average of the netherlands (which I also put in the province vector). I used geom_line all of the legend items are factors. I hope this was clear enough, if not, let me know
Could this be applied to your data?
library(tidyverse)
tib <-
tibble(x = 1:3,
a = 1:3,
b = 1.5:3.5,
c = 2:4) %>%
pivot_longer(cols = a:c, names_to = "id", values_to = "val")
ggplot()+
geom_line(data = filter(tib, id == "a"), aes(x, val, linetype = id))+
geom_line(data = filter(tib, id != "a"), aes(x, val, colour = id))+
labs(linetype = "legend", colour = NULL)
Gives you:
One option is to use stat_summary.
This will add another aesthetic to the graph, and thus another legend, far apart from the other one.
airquality %>%
mutate(Month=factor(Month)) %>%
ggplot(aes(x=Day, y=Temp, col=Month)) +
geom_line() +
stat_summary(aes(lwd="Nederland totaal"), fun=mean, geom="line", col="black") +
theme_classic() +
theme(legend.title = element_blank()) +
guides(lwd = guide_legend(order = 1))
But the benefit of this method is that you don't have to calculate the averages per year and manually add it to your Province variable.
I have been struggling in creating a decent looking scatterplot in R. I wouldn't think it was so difficult.
After some research, it seemed to me that ggplot would have been a choice allowing plenty of formatting. However, I'm struggling in understanding how it works.
I'd like to create a scatterplot of two data series, displaying the points with two different colours, and perhaps different shapes, and a legend with series names.
Here is my attempt, based on this:
year1 <- mpg[which(mpg$year==1999),]
year2 <- mpg[which(mpg$year==2008),]
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy,color="yellow")) +
geom_point(data = year2, aes(x=cty,y=hwy,color="green")) +
xlab('cty') +
ylab('hwy')
Now, this looks almost OK, but with non-matching colors (unless I suddenly became color-blind). Why is that?
Also, how can I add series names and change symbol shapes?
Don't build 2 different dataframes:
df <- mpg[which(mpg$year%in%c(1999,2008)),]
df$year<-as.factor(df$year)
ggplot() +
geom_point(data = df, aes(x=cty,y=hwy,color=year,shape=year)) +
xlab('cty') +
ylab('hwy')+
scale_color_manual(values=c("green","yellow"))+
scale_shape_manual(values=c(2,8))+
guides(colour = guide_legend("Year"),
shape = guide_legend("Year"))
This will work with the way you currently have it set-up:
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy), col = "yellow", shape=1) +
geom_point(data = year2, aes(x=cty,y=hwy), col="green", shape=2) +
xlab('cty') +
ylab('hwy')
You want:
library(ggplot2)
ggplot(mpg, aes(cty, hwy, color=as.factor(year)))+geom_point()