I realize this is a simple question, but I’m having trouble getting this graph to display right.
I have a dataset like:
pet
pet_counts
dog
22
cat
100
birs
2
I want to make a bar graph that has the X-Axis labeled with each animal and the counts along the Y. When I specify, labs it just changed the words in the label but not the value below the tick marks.
I want the x axis to say dog and then in the Y have a count for f 22, for example.
I have tried:
Graph <- ggplot(data = animals, aes(pet_counts)) + geom_bar(stat=“count”) + labs(x = “pet”)
I think you're looking for geom_col() instead of geom_bar():
library(dplyr)
library(ggplot2)
animals <- tibble(
pet = c("dog", "cat", "birds"),
pet_counts = c(22, 100, 2)
)
animals %>%
ggplot(aes(x = pet, y = pet_counts)) +
geom_col() +
labs(
x = "Pet",
y = "Count"
)
The labs() function is optional, and will just change the names on the axis to something more readable.
The result:
The difference between geom_col() and geom_bar(), according to the documentation:
geom_bar() makes the height of the bar proportional to the number of cases in each group. If you want the heights of the bars to represent values in the data, use geom_col() instead.
Since you already have pet_counts, you should use geom_col().
Related
I have three vectors and a list of crimes. Each crime represents a row. On each row, each vector identifies the percentage change in the number of incidents of each type from the prior year.
Below is the reproducible example. Unfortunately, the df takes the first value in and repeats in down the columns (this is my first sorta reproducible example).
crime_vec = c('\tSTRONGARM - NO WEAPON', '$500 AND UNDER', 'ABUSE/NEGLECT: CARE FACILITY', 'AGG CRIM')
change15to16vec = as.double(825, -1.56, -66.67, -19.13)
change16to17vec = as.double(8.11, .96, 50, 4.84)
change17to18vec = as.double(-57.50, 1.29, 83.33, 28.72)
df = data.frame(crime_vec, change15to16vec, change16to17vec, change17to18vec)
df
I need a graph that will take the correct data frame, show the crimes down the y axis and ALL 3 percentage change vectors on the x-axis in a dodged bar. The examples I've seen plot only two vectors. I've tried plot(), geom_bar, geom_col, but can only get one column to graph (occasionally).
Any suggestions for a remedy would help.
Not sure if this is what you are looking for:
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(-crime_vec) %>%
ggplot(aes(x = value, y = crime_vec, fill = as.factor(name))) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
xlab("Percentage Change") +
ylab("Crime") +
labs(fill = "Change from")
For using ggplot2 it's necessary, to bring your data into a long format. geom_bar should create your desired plot.
I have the following data:
I would like to generate a bar plot that shows the frequency of each value of Var1 per each run. I want the x axis represents each run and the y axis represents the frequency of each Var1 value. To do that, I wrote the following R script:
df <- read.csv("/home/nasser/Desktop/data.csv")
g <- ggplot(df) +
geom_bar(aes(Run, Freq, fill = Var1, colour = Var1), position = "stack", stat = "identity")
The result that I got is:
The issue is that the x axis does not show each run seperately (the axis should be 1, 2, .., etc) and the legend should show each value of Var1 seperately and in a different color. Also, the bars are not so clear since it is so difficult to see the frequency of each Var1 values. In other words, the generated plot is not the normal stacked bar like the one shown in this answer
How to solve that?
You need to convert both variables to factors. Otherwise, R sees them as numerical and not categorical data.
df <- read.csv("/home/nasser/Desktop/data.csv")
g <- ggplot(df) +
geom_bar(aes(factor(Run), Freq, fill = factor(Var1), colour = factor(Var1)),
position = "stack", stat = "identity")
I have successfully created a boxplot that displays the score of several neighborhoods of a city and have coloured them according to the district they belong to. The result looks like this:
library(ggplot2)
df = read.csv("http://pastebin.com/raw/rpPLwSXn")
ggplot(df, aes(x = neighbourhood, y = score, fill = district)) +
geom_boxplot() +
ggtitle("Neighbourhoods' score") +
labs(x = "Neighbourhoods", y = "Score", fill = "District") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
It looks quite well despite the fact that instead of sorting the neighborhoods on the x axis alphabetically (neighbourhood column on the dataframe) I would like them to be sorted according to the discrict they belong to (district variable on the dataframe)
I've read that I could use factor to relevel the values on neighbourhood column, but haven't succeeded with that since the vector lenght is different (there are less districts than neighbourhoods)
I like the facet idea in Ulrik's answer - that will probably be the nicest visualization. To order the factor levels of the neighbourhood column the easiest way is probably like this:
# order the data frame as desired
df = df[order(df$district, df$neighbourhood), ]
# set the neighbourhood levels in the order the occur in the data frame
df$neighbourhood = factor(df$neighbourhood, levels = unique(df$neighbourhood))
After the levels are in the order you want, the axis will follow.
I would faceted on the district along the lines of facet_wrap(~ district)
See ?facet_grid and ?facet_wrap
This has been something I've been experimenting with to find a fix for a while, but basically I was wondering if there is a quick way to "dodge" lineplots for two different data sets in ggplot2.
My code is currently:
#Example data
id <- c("A","A")
var <- c(1,10)
id_num <- c(1,1)
df1 <- data.frame(id,var,id_num)
id <- c("A","A")
var <- c(1,15)
id_num <- c(0.9,0.9)
df2 <- data.frame(id,var,id_num)
#Attempted plot
dodge <- position_dodge(width=0.5)
p<- ggplot(data= df1, aes(x=var, y=id)) +
geom_line(aes(colour="Group 1"),position="dodge") +
geom_line(data= df2,aes(x=var, y=id,colour="Group 2"),position="dodge") +
scale_color_manual("",values=c("salmon","skyblue2"))
p
Which produces:
Here the "Group 2" line is hiding all of the "Group 1" line which is not what I want. Instead, I want the "Group 2" line to be below the "Group 1" line. I've looked around and found this previous post: ggplot2 offset scatterplot points but I can't seem to adapt the code to get two geom_lines to dodge each other when using separate data frames.
I've been converting my y-variables to numeric and slightly offsetting them to get the desired output, but I was wondering if there was a faster/easier way to get the same result using the dodge functionality of ggplot or something else.
My work around code is simply:
p<- ggplot(data= df1, aes(x=var, y=id_num)) +
geom_line(aes(colour="Group 1")) +
geom_line(data= df2,aes(x=var, y=id_num,colour="Group 2")) +
scale_color_manual("",values=c("salmon","skyblue2")) +
scale_y_continuous(lim=c(0,1))
p
Giving me my desired output of:
Desired output:
The numeric approach can be a little cumbersome when I try to expand it to fit my actual data. I have to convert my y-values to factors, change them to numeric and then merge the values onto the second data set, so a quicker way would be preferable. Thanks in advance for your help!
You have actually two issues here:
If the two lines are plotted using two layers of geom_line() (because you have two data frames), then each line "does not know" about the other. Therefore, they can not dodge each other.
position_dodge() is used to dodge in horizontal direction. The standard example is a bar plot, where you place various bars next to each other (instead of on top of each other). However, you want to dodge in vertical direction.
Issue 1 is solved by combining the data frames into one as follows:
library(dplyr)
df_all <- bind_rows(Group1 = df1, Group2 = df2, .id = "group")
df_all
## Source: local data frame [4 x 4]
##
## group id var id_num
## (chr) (fctr) (dbl) (dbl)
## 1 Group1 A 1 1.0
## 2 Group1 A 10 1.0
## 3 Group2 A 1 0.9
## 4 Group2 A 15 0.9
Note how setting .id = "Group" lets bind_rows() create a column group with the labels taken from the names that were used together with df1 and df2.
You can then plot both lines with a single geom_line():
library(ggplot2)
ggplot(data = df_all, aes(x=var, y=id, colour = group)) +
geom_line(position = position_dodge(width = 0.5)) +
scale_color_manual("",values=c("salmon","skyblue2"))
I also used position_dodge() to show you issue 2 explicitly. If you look closely, you can see the red line stick out a little on the left side. This is the consequence of the two lines dodging each other (not very successfully) in vertical direction.
You can solve issue 2 by exchanging x and y coordinates. In that situation, dodging horizontally is the right thing to do:
ggplot(data = df_all, aes(y=var, x=id, colour = group)) +
geom_line(position = position_dodge(width = 0.5)) +
scale_color_manual("",values=c("salmon","skyblue2"))
The last step is then to use coord_flip() to get the desired plot:
ggplot(data = df_all, aes(y=var, x=id, colour = group)) +
geom_line(position = position_dodge(width = 0.5)) +
scale_color_manual("",values=c("salmon","skyblue2")) +
coord_flip()
I have a data frame with (to simplify) judges, movies, and ratings (ratings are on a 1 star to 5 star scale):
d = data.frame(judge=c("alice","bob","alice"), movie=c("toy story", "inception", "inception"), rating=c(1,3,5))
I want to create a bar chart where the x-axis is the number of stars and the height of each bar is the number of ratings with that star.
If I do
ggplot(d, aes(rating)) + geom_bar()
this works fine, except that the bars aren't centered over each rating and the width of each bar isn't ideal.
If I do
ggplot(d, aes(factor(rating))) + geom_bar()
the order of the number of stars gets messed up on the x-axis. (On my Mac, at least; for some reason, the default ordering works on a Windows machine.) Here's what it looks like:
I tried
ggplot(d, aes(factor(rating, ordered=T, levels=-3:3))) + geom_bar()
but this doesn't seem to help.
How can I get my bar chart to look like the above picture, but with the correct ordering on the x-axis?
I'm not sure your sample data frame is representative of the images you put up. You mentioned your ratings are on a 1-5 scale, but your images show a -3 to 3 scale. With that said, I think this should get you going in the right direction:
Sample data:
d = data.frame(judge=sample(c("alice","bob","tony"), 100, replace = TRUE)
, movie=sample(c("toy story", "inception", "a league of their own"), 100, replace = TRUE)
, rating = sample(1:5, 100, replace = TRUE))
You were closest with this:
ggplot(d, aes(rating)) + geom_bar()
and by adjusting the default binwidth in geom_bar we can make the bar widths more appropriate and treating rating as a factor centers them over the label:
ggplot(d, aes(x = factor(rating))) + geom_bar(binwidth = 1)
If you wanted to incorporate one of the other variables in the chart such as the movie, you can use fill:
ggplot(d, aes(x = factor(rating), fill = factor(movie))) + geom_bar(binwidth = 1)
It may make more sense to put the movies on the x axis and fill with the rating if you have a small number of movies to compare:
ggplot(d, aes(x = factor(movie), fill = factor(rating))) + geom_bar(binwidth = 1)
If this doesn't get you on your way, put up a more representative example of your dataset. I wasn't able to recreate the ordering problems, but that could be due to a difference in the sample data you posted and the data you are analyzing.
The ggplot website is also a great reference: http://had.co.nz/ggplot2/geom_bar.html