I have some data that I cannot make a repro of: 1 line = Customer, year, quantity ordered. I need a histogram of these quantities split by year:
ggplot(cust.year %>%
mutate(yearly.cases = ceiling(yearly.cases)), aes(x = yearly.cases)) +
geom_histogram(binwidth = 1, fill = "black") +
xlab("Yearly Cases") +
facet_wrap("year")
The plots output shows the data like this:
I noticed some low values seem to be me missing, and when I click zoom they show up:
They show up in shiny and save as the version missing bars, what's up with this? I've tried adjusting xlim to no avail.
EDIT: A more extreme example...
Related
I am currently creating some histograms in R using ggplot that have many bins and a large data set (850 000 elements).
As a result the vertical lines of each bin are filling in the area under the histogram with the line colour due to there close proximity. I would ideally like this to be clear so I can plot another histogram on the same plot.
Ideally, I would like a histogram with the bin lines hidden where they overlap with another bin so It looks similar to a line plot.
Below is the ggplot code I'm using:
ggplot(df, aes(x=eev)) +
geom_histogram(binwidth = 18,color="black") +
xlim(0,10000) +
scale_y_log10(name="Log of Counts", labels = scales::comma) +
xlab("Incident Energy in eV")
I can't really fiddle around with the bin size too much because I need the definition from the naarrow bins.
I've had a look through the ggplot documentation but can't find what I'm after.
Cheers
Edit:
Following MrFlicks advice I've made some reproducible code
a<-runif(10000, 0, 10)
b<-seq(0,9.999, by = 1/1000)
var<-data.frame(a,b)
ggplot(var, aes(x=a)) +
geom_histogram(binwidth = 0.3, col = "black", fill = "#ffffff00")
This gives the following output
Histogram with bin lines
However I need the final histogram to look like this
Histogram without overlapping bin lines
I can't use geom_freqpoly as the data needs to be presented as a histogram.
Here is the current histogram for some of the real data
Cheers again.
Also, apologies this is the first time posting on stack overflow if my post layout is off etc.
Maybe using hist to generate the values then plotting in ggplot:
library(ggplot2)
set.seed(1)
x = hist(rchisq(1000, df = 4), 100)
df = data.frame(
x = rep(x$breaks, each=2),
y = c(0, rep(x$counts, each = 2), 0))
ggplot(df, aes(x,y)) +
geom_polygon(fill='grey80') +
geom_line(col='red')
Setting a transparent color like #ffffff00 (the last two digits setting opacity to zero) should do the trick. Control the fill colour (the inner of the histogram columns) with, well: fill.
Example:
data.frame(x = rnorm(10000)) %>%
ggplot() +
geom_histogram(aes(x),
fill = 'blue',
binwidth = .025,
col='#ffffff00'
)
Note that while you can increase the border thickness of the columns with the size argument, setting size = 0 does not fully remove the border.
I am trying to graph some data and will be making multiple box plots. As well as being able to visually compare within a boxplot, I want it to be easy to visually compare boxplots when they are next to eachother and so want all the graphs to have the same y limit (0-500), even though the data for some of these graphs will only go up to ~400. By using the code below I produce the graph below. You can see even though I have put the max at 500 for breaks it cuts off the actual graphing at 440, but I would like to force it to go to 500. How do I go about doing this?
ggplot(twohour, aes(x = Treatment, y = Total, fill = Treatment)) + scale_y_continuous(breaks = round(seq(min(0), max(500), by = 20),1)) + geom_boxplot()
You can enforce the range of your plot to be between 0 and 500 with the limits argument so that your code will work. Additionally, min(), max() and round() are not necessary in your context :
ggplot(twohour, aes(x = Treatment, y = Total, fill = Treatment)) +
scale_y_continuous(limits=c(0,500),
breaks = seq(0, 500, by = 20)) +
geom_boxplot()
I have a Date column and Value column. I did my research on internet and tried every possible thing but it does not shows my the trend line graph. I am totally confused what is happening in my data. I have shared my code below:
ggplot(data = New, aes(x = OrderDate, y = TotalAmountWithGST))+
geom_line(color = "#00AFBB", size = 2) + scale_x_date(date_labels = "%b/%Y")
ggplot(x, aes(x = OrderDate, y = TotalAmountWithGST)) +
geom_line()+
theme_minimal()
I am trying to plot a line graph that shows a monthly trend but somehow I am getting a graph that is similar to bar graph but its not a line graph.
You need to add a geom_smooth to your ggplot code.
It's hard to replicate a working example without sample data but that should get you on the right path.
The data can be download here: https://docs.google.com/spreadsheets/d/1McbcquHdsdlEM_yPfBQHeX_CpUcARAm1I3VtASNsY3k/edit?usp=sharing
Here is my code
# load data
raw_data <- read.csv("Sleep vs reaction time (Responses) - Form Responses 1.csv")
library(ggplot2)
#histogram
qplot(x = Age, data = raw_data, xlim = c(13,43), geom = "histogram") + scale_x_continuous()
qplot(x = Age, data = raw_data, xlim = c(13,43), geom = "histogram") + scale_x_discrete()
I would like to draw a histogram by Age.
It is discrete value (age is whole number) so I use scale_x_discrete to separate between bar. However, it look like that
which have the space on left side.
If I use scale_x_continuous(), the left space will gone, but the separate between bar also gone too.
I would like to get rid of the space on left side, from 0 to 13, but keep the separate between bar. Please show me how.
Thank you.
My solution:
Thanked to #Gregor, this is my solution:
raw_data$Age = factor(raw_data$Age) #convert Age column to factor
qplot(x = Age, data = raw_data, geom = "histogram") + scale_x_discrete()
Result:
You should let the class of your data determine whether the scale is discrete or continuous. ggplot doesn't have built-in support for an integer scale as something different from a numeric scale, so if you want a discrete scale you should convert your age data to factor (if it's not already):
raw_data$Age_factor = factor(raw_data$Age)
Then the defaults will give you what you want if you don't specify xlim.
qplot(x = Age_factor, data = raw_data, geom = "histogram")
This is a bit confusing, but it was actually your xlim = c(13, 43) that was shifting your graph to the right. On a discrete scale, 13 and 43 refer to the 13th and 43rd discrete levels, so by setting those xlim you were forcing your data to the right.
I have a data frame with (to simplify) judges, movies, and ratings (ratings are on a 1 star to 5 star scale):
d = data.frame(judge=c("alice","bob","alice"), movie=c("toy story", "inception", "inception"), rating=c(1,3,5))
I want to create a bar chart where the x-axis is the number of stars and the height of each bar is the number of ratings with that star.
If I do
ggplot(d, aes(rating)) + geom_bar()
this works fine, except that the bars aren't centered over each rating and the width of each bar isn't ideal.
If I do
ggplot(d, aes(factor(rating))) + geom_bar()
the order of the number of stars gets messed up on the x-axis. (On my Mac, at least; for some reason, the default ordering works on a Windows machine.) Here's what it looks like:
I tried
ggplot(d, aes(factor(rating, ordered=T, levels=-3:3))) + geom_bar()
but this doesn't seem to help.
How can I get my bar chart to look like the above picture, but with the correct ordering on the x-axis?
I'm not sure your sample data frame is representative of the images you put up. You mentioned your ratings are on a 1-5 scale, but your images show a -3 to 3 scale. With that said, I think this should get you going in the right direction:
Sample data:
d = data.frame(judge=sample(c("alice","bob","tony"), 100, replace = TRUE)
, movie=sample(c("toy story", "inception", "a league of their own"), 100, replace = TRUE)
, rating = sample(1:5, 100, replace = TRUE))
You were closest with this:
ggplot(d, aes(rating)) + geom_bar()
and by adjusting the default binwidth in geom_bar we can make the bar widths more appropriate and treating rating as a factor centers them over the label:
ggplot(d, aes(x = factor(rating))) + geom_bar(binwidth = 1)
If you wanted to incorporate one of the other variables in the chart such as the movie, you can use fill:
ggplot(d, aes(x = factor(rating), fill = factor(movie))) + geom_bar(binwidth = 1)
It may make more sense to put the movies on the x axis and fill with the rating if you have a small number of movies to compare:
ggplot(d, aes(x = factor(movie), fill = factor(rating))) + geom_bar(binwidth = 1)
If this doesn't get you on your way, put up a more representative example of your dataset. I wasn't able to recreate the ordering problems, but that could be due to a difference in the sample data you posted and the data you are analyzing.
The ggplot website is also a great reference: http://had.co.nz/ggplot2/geom_bar.html