I have a dataset that has monthly averages of interest rate, years, and then I created a dummy variable to indicate the years begore 2016 (which will be the 0) and after that (which is the 1 in the dummy variable). I want to make a plot of the interest rate in time having one separate line according to the dummy variable (one for the years before 2016 and one after it). My code is:
p <- ggplot(data = dataset_new,
mapping = aes(x = month(Dates, label = TRUE),
y = int_rate))+
geom_point()+
geom_line(aes(group = factor(dummy),
color = factor(dummy)))
p + theme(legend.background = element_rect(fill="lightblue",
size=0.5, linetype="solid"))
I would like to do two things next:
change the title of the legend from dummy to case study and
change the categories of the legend. What I mean is that now it writes 1 and 0, but I want to write (2017-2020) for the first and (2013-2016) for the second one.
Any help would be appreciated. Thanks in advance!
To change the name:
+ labs(fill = 'Case Study')
To change the categories, I'd do it in the data:
dataset_new$case_study <- ifelse(dataset_new$dummy == 1, '(2017-2020)', '(2013-2016)')
And then in your ggplot call replace any instances of dummy with case_study.
Related
I want to overlay 3 geom_bar to make clear an evolution over 3 years.
My data is as follows for each year:
Example : PerfDist2021 (my dataframe for 2021)
Districts
Perf
1
40
2
30
3
60
On my Yaxis I have the performance (in %) and on the Xaxis I have the number corresponding to the district (from 1 to 25 and there is also a 31th).
I made this :
ggplot(data=NULL, aes(Districts, Perf)) +
geom_bar(aes(fill = "2019"), data = PerfDist2019, stat="identity" ,alpha = 0.5, col="red") +
geom_bar(aes(fill = "2020" ), data = PerfDist2020, stat="identity", alpha = 0.5, col="green") +
geom_bar(aes(fill = "2021" ), data = PerfDist2021, stat="identity", alpha = 0.5, col="blue")
But first, I can't see all my districts, I don't know how to get them
all visible it's like R erase some or is just not precise with my
Xaxis (see picture in link).
Secondly, I don't know how to change the color of the geom_bar, I can
just change the color of the frame of the bar with col=... , and the
data is not very readable this way.
Third, the colours blend together, it is sometimes hard to
distinguish the three. I tried with several combination of colors, it
is always the same. Is there a way to avoir this issue of mixing
colors ? Thanks you
Thanks you for your help !
PS : You can ask for any precision !
fill is the aesthetic that controls the color of the bars. color controls colors of lines, in this case, the frames, as you noted. So you want to delete the col = arguments and add scale_fill_manual to associate which color each custom fill name should have.
Also, the three repeated geom_bar isn't too good, if you had 12 years you wouldn't want to have 12 geom_bar. To change that, you can rbind your three datasets, and add a column specifying from which year it comes from, and than saying to ggplot to fill the bars following that column.
Lastly, as #h45 said, you can change the type of Districts to factor to avoid the missing levels.
PerfDist = rbind(
cbind(PerfDist2019, Year = "2019"),
cbind(PerfDist2019, Year = "2020"),
cbind(PerfDist2019, Year = "2021"))
PerfDist$Districts <- factor(PerfDist$Districts, levels = 1:31)
ggplot(PerfDist, aes(Districts, Perf, fill = Year)) +
geom_bar(stat="identity", position = "identity", alpha = 0.5) +
scale_fill_manual(c("red", "green", "blue"))
Thats it for the answer, now just a note for you to improve your next answers, please read how to make a great r reproducible example.
I have three vectors and a list of crimes. Each crime represents a row. On each row, each vector identifies the percentage change in the number of incidents of each type from the prior year.
Below is the reproducible example. Unfortunately, the df takes the first value in and repeats in down the columns (this is my first sorta reproducible example).
crime_vec = c('\tSTRONGARM - NO WEAPON', '$500 AND UNDER', 'ABUSE/NEGLECT: CARE FACILITY', 'AGG CRIM')
change15to16vec = as.double(825, -1.56, -66.67, -19.13)
change16to17vec = as.double(8.11, .96, 50, 4.84)
change17to18vec = as.double(-57.50, 1.29, 83.33, 28.72)
df = data.frame(crime_vec, change15to16vec, change16to17vec, change17to18vec)
df
I need a graph that will take the correct data frame, show the crimes down the y axis and ALL 3 percentage change vectors on the x-axis in a dodged bar. The examples I've seen plot only two vectors. I've tried plot(), geom_bar, geom_col, but can only get one column to graph (occasionally).
Any suggestions for a remedy would help.
Not sure if this is what you are looking for:
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(-crime_vec) %>%
ggplot(aes(x = value, y = crime_vec, fill = as.factor(name))) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
xlab("Percentage Change") +
ylab("Crime") +
labs(fill = "Change from")
For using ggplot2 it's necessary, to bring your data into a long format. geom_bar should create your desired plot.
I tried to generate a line chart with 3 lines by years but i can only generate 1 line with my code, what should I do
Try this:
Without seeing your data it is guess work on my part.
I suspect YEAR is being treated as a continuous variable and to get distinct colours you need YEAR to be a discrete variable.
ggplot(data = crimessum2)+
geom_line(mapping = aes(x=HOUR, y = Numbers, col = factor(YEAR), group = YEAR))+
xlab("HOUR")+
ylab("Total Paid by Insurance in $$")+
ggtitle(" ")
I'm trying to add a legend to a plot that I've created using ggplot. I load the data in from two csv files, each of which has two columns of 8 rows (not including the header).
I construct a data frame from each file which include a cumulative total, so the dataframe has three columns of data (bv, bin_count and bin_cumulative), 8 rows in each column and every value is an integer.
The two data sets are then plotted as follows. The display is fine but I can't figure out how to add a legend to the resulting plot as it seems the ggplot object itself should have a data source but I'm not sure how to build one where there are multiple columns with the same name.
library(ggplot2)
i2d <- data.frame(bv=c(0,1,2,3,4,5,6,7), bin_count=c(0,0,0,2,1,2,2,3), bin_cumulative=cumsum(c(0,0,0,2,1,2,2,3)))
i1d <- data.frame(bv=c(0,1,2,3,4,5,6,7), bin_count=c(0,1,1,2,3,2,0,1), bin_cumulative=cumsum(c(0,1,1,2,3,2,0,1)))
c_data_plot <- ggplot() +
geom_line(data = i1d, aes(x=i1d$bv, y=i1d$bin_cumulative), size=2, color="turquoise") +
geom_point(data = i1d, aes(x=i1d$bv, y=i1d$bin_cumulative), color="royalblue1", size=3) +
geom_line(data = i2d, aes(x=i2d$bv, y=i2d$bin_cumulative), size=2, color="tan1") +
geom_point(data = i2d, aes(x=i2d$bv, y=i2d$bin_cumulative), color="royalblue3", size=3) +
scale_x_continuous(name="Brightness", breaks=seq(0,8,1)) +
scale_y_continuous(name="Count", breaks=seq(0,12,1)) +
ggtitle("Combine plot of BV cumulative counts")
c_data_plot
I'm fairly new to R and would much appreciate any help.
Per comments, I've edited the code to reproduce the dataset after it's loaded into the dataframes.
Regarding producing a single data frames, I'd welcome advice on how to achieve that - I'm still struggling with how data frames work.
First, we organize the data by combining i1d and i2d. I've added a column data which stores the name of the original dataset.
restructure data
i1d$data <- 'i1d'
i2d$data <- 'i2d'
i12d <- rbind.data.frame(i1d, i2d)
Then, we create the plot, using syntax that is more common to ggplot2:
create plot
ggplot(i12d, aes(x = bv, y = bin_cumulative))+
geom_line(aes(colour = data), size = 2)+
geom_point(colour = 'royalblue', size = 3)+
scale_x_continuous(name="Brightness", breaks=seq(0,8,1)) +
scale_y_continuous(name="Count", breaks=seq(0,12,1)) +
ggtitle("Combine plot of BV cumulative counts")+
theme_bw()
If we specify x and y within the ggplot function, we do not need to keep rewriting it in the various geoms we want to add to the plot. After the first three lines I copied and pasted what you had so that the formatting would match your expectation. I also added theme_bw, because I think it's more visually appealing. We also specify colour in aes using a variable (data) from our data.frame
If we want to take this a step further, we can use the scale_colour_manual function to specify the colors attributed to the different values of the data column in the data.frame i12d:
ggplot(i12d, aes(x = bv, y = bin_cumulative))+
geom_line(aes(colour = data), size = 2)+
geom_point(colour = 'royalblue', size = 3)+
scale_x_continuous(name="Brightness", breaks=seq(0,8,1)) +
scale_y_continuous(name="Count", breaks=seq(0,12,1)) +
ggtitle("Combine plot of BV cumulative counts")+
theme_bw()+
scale_colour_manual(values = c('i1d' = 'turquoise',
'i2d' = 'tan1'))
Is it possible to use multiple x-variables for a faceted ggplot boxplot? I am using facets in ggplot to stratify my analysis but would like to have two different classifications of price on the x-axis depending on the colour (E or F). I've made a test example using the diamonds data set where I have two different price classifications but can only figure out how to apply one at a time:
But I'd like to have a plot that considers both price classifications, depending on colour:
I know I could get a similar result using grobs and probably by assigning a the price category conditionally depending on the colour (E or F) but that seems a bit cumbersome. So for simplicity, I'd like to do this using facets. Is that possible, if so, how?
dat <- diamonds
# price grouping I
dat$priceI <- cut(diamonds$price,
breaks = c(0,5000,10000,Inf),
labels = c("0-4,999","5,000-9,999",">=10,000"),
right = FALSE)
# price grouping II
dat$priceII <- cut(diamonds$price,
breaks = c(0,1000,5000,10000,Inf),
labels = c("0-999", "1,000-4,999","5,000-9,999",">=10,000"),
right = FALSE)
ggplot(dat[(dat$color=="E" | dat$color=="F") &
(dat$cut=="Fair" | dat$cut=="Good"),],
aes(priceI,depth)) +
geom_boxplot() +
facet_grid(cut ~ color) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))