I have created a graph in ggplot2 using zoo to create month bins. However, I want to be able to modify the graph so it looks like a standard ggplot graph. This means that the bins that aren't used are dropped and the bins that are populate the entire bin space. Here is my code:
library(data.table)
library(ggplot2)
library(scales)
library(zoo)
testset <- data.table(Date=as.Date(c("2013-07-02","2013-08-03","2013-09-04","2013-10-05","2013-11-06","2013-07-03","2013-08-04","2013-09-05","2013-10-06","2013-11-07")),
Action = c("A","B","C","D","E","B","A","B","C","A","B","E","E","C","A"),
rating = runif(30))
The ggplot call is:
ggplot(testset, aes(as.yearmon(Date), fill=Action)) +
geom_bar(position = "dodge") +
scale_x_yearmon()
I'm not sure what I'm missing, but I'd like to find out! Thanks in advance!
To get a "standard-looking" plot, convert the data to a "standard" data type, which is a factor:
ggplot(testset, aes(as.factor(as.yearmon(Date)), fill=Action)) +
geom_bar(position='dodge')
Related
Background
I have a dataframe, df, of athlete injuries:
df <- data.frame(number_of_injuries = c(1,2,3,4,5,6),
number_of_people = c(73,52,43,12,7,2),
stringsAsFactors=FALSE)
The Problem
I'd like to use ggplot2 to make a bar chart or histogram of this simple data using geom_bar or geom_histogram. Important point: I'm pretty novice with ggplot2.
I'd like something where the x-axis shows bins of the number of injuries (number_of_injuries), and the y-axis shows the counts in number_of_people. Like this (from Excel):
What I've tried
I know this is the most trivial dang ggplot issue, but I keep getting errors or weird results, like so:
ggplot(df, aes(number_of_injuries)) +
geom_bar(stat = "count")
Which yields:
I've been in the tidyverse reference website for an hour at this and I can't crack the code.
It can cause confusion from time to time. If you already have "count" statistics, then do not count data using geom_bar(stats = "count") again, otherwise you simply get 1 in all categories. You want to plot those values as they are with geom_col:
ggplot(df, aes(x = number_of_injuries, y = number_of_people)) + geom_col()
I would like to create a horizontal bar graph from my data.
The link to my data is here.
The code that I am using
library(ggplot2)
ggplot(data=df , aes(x=fct_inorder(WorkSchedule),y=timing, fill=Value)) + geom_col() + coord_flip()
The output of the plot:
How to change the x-axis to show time from 04:00 till 03:45 (24h)
I tried factor(Source) but it does not work.
UPDATE# How can I change the x axis of this graph?
Many tahnks
With the function lvls_reorder() from library forçats, you can specify the order of the levels of your variable.
library(tidyverse) # forcats is included in tidyverse library
df <- df %>%
mutate(Workschedule = lvls_reorder(Workschedule, c(3,2,4,5,1))
If you transform the variable Source as a factor, you can also determine the order you want.
I'm reading the book by Hadley Wickham about ggplot, but I have trouble to plot certain weights over time in a bar chart. Here is sample data:
dates <- c("20040101","20050101","20060101")
dates.f <- strptime(dates,format="%Y%m%d")
m <- rbind(c(0.2,0.5,0.15,0.1,0.05),c(0.5,0.1,0.1,0.2,0.1),c(0.2,0.2,0.2,0.2,0.2))
m <- cbind(dates.f,as.data.frame(m))
This data.frame has in the first column the dates and each row the corresponding weights. I would like to plot the weights for each year in a bar chart using the "fill" argument.
I'm able to plot the weights as bars using:
p <- ggplot(m,aes(dates.f))
p+geom_bar()
However, this is not exactly what I want. I would like to see in each bar the contribution of each weight. Moreover, I don't understand why I have the strange format on the x-axis, i.e. why there is "2004-07" and "2005-07" displayed.
Thanks for the help
Hope this is what you are looking for:
ggplot2 requires data in a long format.
require(reshape2)
m_molten <- melt(m, "dates.f")
Plotting itself is done by
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity")
You can add position="dodge" to geom_bar if you want then side by side.
EDIT
If you want yearly breaks only: convert m_molten$dates.f to date.
require(scales)
m_molten$dates.f <- as.Date(m_molten$dates.f)
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity") +
scale_x_date(labels = date_format("%y"), breaks = date_breaks("year"))
P.S.: See http://vita.had.co.nz/papers/tidy-data.pdf for Hadley's philosophy of tidy data.
To create the plot you need, you have to reshape your data from "wide" to "tall". There are many ways of doing this, including the reshape() function in base R (not recommended), reshape2 and tidyr.
In the tidyr package you have two functions to reshape data, gather() and spread().
The function gather() transforms from wide to tall. In this case, you have to gather your columns V1:V5.
Try this:
library("tidyr")
tidy_m <- gather(m, var, value, V1:V5)
ggplot(tidy_m,aes(x = dates.f, y=value, fill=var)) +
geom_bar(stat="identity")
I made a grouped barchart in R using the ggplot package. I used the following code:
ggplot(completedDF,aes(year,value,fill=variable)) + geom_bar(position=position_dodge(),stat="identity")
And the graph looks like this:
The problem is that I want the 1999-2008 data to be at the end.
Is there anyway to move it?
Thanks any help appreciated.
ggplot will follow the order of the levels in a factor. If you didn't ordered your factor, then it is assumed that the order is alphabetical.
If you want your "1999-2008" modality to be at the end, just reorder your factor using
completed$year <- factor(x=completed$year,
levels=c("1999-2002", "2002-2005", "2005-2008", "1999-2008"))
For example :
library(ggplot2)
# Create a sample data set
set.seed(2014)
years_labels <- c( "1999-2008","1999-2002", "2002-2005", "2005-2008")
variable_labels <- c("pointChangeVector", "nonPointChangeVector",
"onRoadChangeVector", "nonRoadChangeVecto")
years <- rbinom(n=1000, size=3,prob=0.3)
variables <- rbinom(n=1000, size=3,prob=0.3)
year <- factor(x=years , levels=0:3, labels=years_labels)
variable <- factor(x=variables , levels=0:3, labels=variable_labels)
completed <- data.frame( year, variable)
# Plot
ggplot(completed,aes(x=year, fill=variable)) + geom_bar(position=position_dodge())
# change the order
completed$year <- factor(x=completed$year,
levels=c("1999-2002", "2002-2005", "2005-2008", "1999-2008"))
ggplot(completed,aes(x=year, fill=variable)) + geom_bar(position=position_dodge())
Furthermore, the other benefit of using this is you will have also your results in a good order for others functions like summary or plot.
Does it help?
Yeah this is a real probelm in ggplot. It always changes the order of non-numeric values
The easiest way to solve it is to add scale_x_discrete in this way:
p <- ggplot(completedDF,aes(year,value,fill=variable))
p <- p + geom_bar(position=position_dodge(),stat="identity")
p <- p + scale_x_discrete(limits = c("1999-2002","2002-2005","2005-2008","1999-2008"))
I am interested in plotting the results from the following code which produces a frequency distribution table. I would like to graph the Freq column as a bar with the cum.Freq as a line both sharing the interval column as the x-axis.
library("qdap")
x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
dist_tab(x)
I have been able to get the bar chart built using ggplot, but I want to take it further with the cum.Freq added as a secondary axis. I also want to add the percent and cum.percent values added as data labels. Any help is appreciated.
library("ggplot2")
ggplot(dist_tab(x), aes(x=interval)) + geom_bar(aes(y=Freq))
Not sure if I understand your question. Is this what you are looking for?
df <- dist_tab(x)
df.melt <- melt(df, id.vars="interval", measure.vars=c("Freq", "cum.Freq"))
#
ggplot(df.melt, aes(x=interval, y=value, fill=variable)) +
geom_bar(stat="identity", position="dodge")