I have made a relatively simple boxplot with ggplot
ggplot(l8tc.df_17_18,aes(x=landcover,y= tcw_17, group=landcover))+
geom_boxplot()+
geom_boxplot(aes(y= tcw_18),position_dodge(1))
A screenshot to get an idea of the data used:
This is the output:
I want the different boxplots to be next to each other and not in one vertical line. I have looked through all related questions and tried out a couple of options, however I could not find a solution so far.
I am still a ggplot beginner though.
Any ideas?
You should use in this case different data format and melt it.
require(reshape2)
require(tidyverse)
# format data
melted_data <- l8tc.df_17_18 %>%
select(landcover, tcw_17, tcw_18) %>%
melt('landcover', variable.name = 'tcw')
# plot
ggplot(melted_data, aes(x = as.factor(landcover), y = value)) + geom_boxplot(aes(fill = tcw))
a dodge should be automatic but if you want ot experiment use geom_boxplot(aes(fill = tcw), position = position_dodge())
https://ggplot2.tidyverse.org/reference/position_dodge.html
you can write it in one line without creating temp file
l8tc.df_17_18 %>%
select(landcover, tcw_17, tcw_18) %>%
melt('landcover', variable.name = 'tcw') %>%
ggplot(aes(x = as.factor(landcover), y = value)) + geom_boxplot(aes(fill = tcw))
Related
I made an upset plot using the ggupset package and added a break to the y axis with scale_y_break from the ggbreakpackage.
However, when I add scale_y_break, the combination matrix under the bar plot disappears.
Is there a way to combine the combination matrix of the plot made without scale_y_break with the bar plot portion of a plot made with scale_y_break? I can't seem to be able to access the grobs of these plots or use any other workaround. If anyone could help, I would greatly appreciate it!
Example with scale_x_upset and scale_y_break:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
I would like to combine the barplot portion of the plot created with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
with the combination matrix portion of the plot made with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)
Thanks!
This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 8 months ago.
I am new to R and have the following example code that I wish to apply for every column in my data.
data(economics, package="ggplot2")
economics$index <- 1:nrow(economics)
loessMod10 <- loess(uempmed ~ index, data=economics, span=0.10)
smoothed10 <- predict(loessMod10)
plot(economics$uempmed, x=economics$date, type="l", main="Loess Smoothing and Prediction", xlab="Date", ylab="Unemployment (Median)")
lines(smoothed10, x=economics$date, col="red")
Could someone please suggest how this would be possible?
It's possible to perform loess smoothing within ggplot.
library(data.table)
library(ggplot2)
df <- economics
##
#
gg.melt <- setDT(df) |> melt(id='date', variable.name = 'KPI')
ggplot(gg.melt, aes(x=date, y=value))+
geom_line()+
stat_smooth(method=loess, color='red', size=0.5, se=FALSE, method.args = list(span=0.1))+
facet_wrap(~KPI, scales = 'free_y')
Regarding combining everything on one plot I'm not seeing how you would do that as the y-scales are so different. If the point is to see how the peaks line up, etc. you could do this:
ggplot(gg.melt, aes(x=date, y=value))+
geom_line()+
stat_smooth(method=loess, color='red', size=0.5, se=FALSE, method.args = list(span=0.1))+
facet_grid(KPI~., scales = 'free_y')
There is also the dygraphs package which allows creation of dynamic graphics that can be saved to html:
gg.melt[, scaled:=scale(value, center = FALSE, scale=diff(range(value))), by=.(KPI)]
gg.melt[, pred:=predict(loess(scaled~as.integer(date), .SD, span=0.1)), by=.(KPI)]
gg.dt <- dcast(gg.melt, date~KPI, value.var = list('scaled', 'pred'))
library(dygraphs)
dygraph(gg.dt) |>
dyCrosshair(direction = 'vertical') |>
dyRangeSelector()
It's possible to create a dygraph(...) version of the second plot, where the different KPI are in different facets, but you have to use RMarkdown for that.
You can make your data from wide to long by the date and use facet_wrap. Maybe you want something like this:
library(ggplot2)
library(reshape2)
library(dplyr)
economics %>%
melt(., "date") %>%
ggplot(., aes(date, value)) +
geom_line() +
facet_wrap(~variable, scales = "free")
Output:
Comment: All plots in one graph
If you mean all plots in one graph, you can give the variables a color like this:
economics %>%
melt(., "date") %>%
ggplot(., aes(date, value, color = variable)) +
geom_line() +
scale_y_log10()
Output:
Looking for some help with what I am assuming is a very simple task. From my data below, I want to create a stacked bar graph, with the fill = colnames(df_Consumers)[2,4]. I'm trying to get the x-axis to be df_Consumers$Month, the y-axis as df_Consumers$Referrals with the 2nd and 4th columns being the stacked bar graphs. I hope this makes sense. Apologies in advance if I am too vague. My ggplot code and data are below. Thanks in advance!
ggplot(df_Consumers, aes(x = Month, y = Referrals)) +
geom_col(aes(fill = df_Consumers[2, 4]))
ggplot likes long data frames. I'd suggest the following:
library(tidyverse)
df_Consumers %>%
select(-Referrals) %>%
pivot_longer(c(New.Consumers, No.Fill), names_to = "type", values_to = "value") %>%
ggplot() +
aes(x = Month, y = value, fill = type) +
geom_col()
More than a solution I'd like to understand the reason why something which should be quite easy, it's actually not.
[I am borrowing part of the code from a different post which touched on the issue but it ended up with a solution I didn't like]
library(ggplot2)
library(xts)
library(dplyr)
library(scales)
csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"
tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.yearmon(tmp$dt)
tmp$status <- as.factor(tmp$status)
### Not good. Why?
ggplot(tmp, aes(x = yearmon, fill = status)) +
geom_bar() +
scale_x_yearmon()
### Almost good but long-winded and ticks not great
chartData <- tmp %>%
group_by(yearmon, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) +
geom_col() +
scale_x_yearmon()
The first plot is all wrong; the second is almost perfect (ticks on the X axis are not great but I can live with that). Isn't geom_bar() supposed to perform the count job I have to manually perform in the second chart?
FIRST CHART
SECOND CHART
My question is: why is the first chart so poor? There is a warning which is meant to suggest something ("position_stack requires non-overlapping x intervals") but I really fail to understand it.
Thanks.
MY PERSONAL ANSWER
This is what I learned (thanks so much to all of you!):
Even if there is a scale_#_yearmon or scale_#_date, unfortunately ggplot treats those object types as continuous numbers. That makes geom_bar unusable.
geom_histogram might do the trick. But you lose control on relevant parts of the aestethics.
bottom line: you need to group/sum before you chart
Not sure (if you plan to use ggplot2) xts or lubridate are really that useful for what I was trying to achieve. I suspect for any continuous case - date-wise - they will be perfect.
All in, I ended with this which does perfectly what I am after (notice how there is no need for xts or lubridate):
library(ggplot2)
library(dplyr)
library(scales)
csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"
tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.Date(format(tmp$dt, "%Y-%m-01"))
tmp$status <- as.factor(tmp$status)
### GOOD
chartData <- tmp %>%
group_by(yearmon, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) +
geom_col() +
scale_x_date(labels = date_format("%h-%y"),
breaks = seq(from = min(chartData$yearmon),
to = max(chartData$yearmon), by = "month"))
FINAL OUTPUT
You could also aes(x=factor(yearmon), ...) as a shortcut fix.
The reason why the first plot is screwed is basically ggplot2 does not exactly what the yearmon is. As you see here it is just a num internally with labels.
> as.numeric(tmp$yearmon)
[1] 2015.917 2015.917 2015.917 2015.833 2015.750 2015.917 2016.417 2016.333 2016.167 2015.917
So when you plot without the previous aggregation, the bar is spread out. You need to assign appropriate binwidth using geom_histogram() like this:
ggplot(tmp, aes(x = yearmon, fill = status)) +
geom_histogram(binwidth = 1/12) +
scale_x_yearmon()
1/12 corresponds with 12 months in each year.
For a plot after aggregation, as #ed_sans suggest, I also prefer lubridate as I know better on how to change ticks and modify axis labels.
chartData <- tmp %>%
mutate(ym = floor_date(dt,"month")) %>%
group_by(ym, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = ym, y = count, fill = status)) +
geom_col() +
scale_x_date(labels = date_format("%Y-%m"),
breaks = as.Date("2015-09-01") +
months(seq(0, 10, by = 2)))
Here is a snapshot of data:
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales, restaurant_change_labor, restaurant_change_POS)
I want two bars for each of the columns indicating the change. One graph for each of the columns.
I tried:
ggplot(aes(x = rest_change$restaurant_change_sales), data = rest_change) + geom_bar()
This is not giving the result the way I want. Please help!!
So ... something like:
library(ggplot2)
library(dplyr)
library(tidyr)
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales,
restaurant_change_labor,
restaurant_change_POS)
cbind(rest_change,
change = c("Before", "After")) %>%
gather(key,value,-change) %>%
ggplot(aes(x = change,
y = value)) +
geom_bar(stat="identity") +
facet_grid(~key)
Which will produce:
Edit:
To be extra fancy e.g. make it so that the order of x-axis labels goes from "Before" to "After", you can add this line: scale_x_discrete(limits = c("Before", "After")) to the end of the ggplot function
Your data are not formatted properly to work well with ggplot2, or really any of the plotting packages in R. So we'll fix your data up first, and then use ggplot2 to plot it.
library(tidyr)
library(dplyr)
library(ggplot2)
# We need to differentiate between the values in the rows for them to make sense.
rest_change$category <- c('first val', 'second val')
# Now we use tidyr to reshape the data to the format that ggplot2 expects.
rc2 <- rest_change %>% gather(variable, value, -category)
rc2
# Now we can plot it.
# The category that we added goes along the x-axis, the values go along the y-axis.
# We want a bar chart and the value column contains absolute values, so no summation
# necessary, hence we use 'identity'.
# facet_grid() gives three miniplots within the image for each of the variables.
ggplot2(rc2, aes(x=category, y=value, facet=variable)) +
geom_bar(stat='identity') +
facet_grid(~variable)
You have to melt your data:
library(reshape2) # or library(data.table)
rest_change$rowN <- 1:nrow(rest_change)
rest_change <- melt(rest_change, id.var = "rowN")
ggplot(rest_change,aes(x = rowN, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable)