weird y-axis dodged bar chart - r

I want to compare the pageviews and sessions in each months.
So, I got a data frame as below
month time_interval sessions pageviews
1 < 10s 564622 577686
1 11 ~ 30 36575 84314
1 31~60 46547 127134
1 61~180 106056 408649
1 181~600 125891 839148
1 601~1800 99293 1143019
1 >1801 38534 1014548
2 < 10s 553552 566598
2 11 ~ 30 35440 82011
2 31~60 45558 124921
2 61~180 101529 390493
2 181~600 123027 820094
2 601~1800 98427 1137857
2 >1801 39178 1068057
3 < 10s 690598 706859
3 11 ~ 30 44409 102951
3 31~60 56585 156536
3 61~180 126382 492019
3 181~600 150267 1011472
3 601~1800 118928 1351807
3 >1801 45465 1195310
....
Now, I want to draw a dodged bar chart
here is my code
ggplot(data=mydata, aes(x=month,y=pageviews,fill=time_interval)) + geom_bar(stat="identity",position="dodge", colour="black")
Don't know why I got a weird graph as below :

Your pageviews data is a factor. It will work if you transform it to a numeric variable. Furthermore, you shoudl reorder the levels of the factor time_interval:
mydata <- transform(mydata,
time_interval = factor(time_interval, levels =
c('< 10s','11 ~ 30','31~60', '61~180','181~600', '601~1800','>1801')),
pageviews = as.numeric(as.character(pageviews)))
The plot:
library(ggplot2)
ggplot(data = mydata, aes(x = month, y = pageviews, fill = time_interval)) +
geom_bar(stat = "identity", position = "dodge", colour = "black")

Related

Error in ggplot2 when using both fill and group parameters in geom_bar

There seems to be a problem with R's ggplot2 library when I include both the fill and group parameters in a bar plot (geom_bar()). I've already tried looking for answers for several hours but couldn't find one that would help. This is actually my first post here.
To give a little background, I have a dataframe named smokement (short for smoke and mental health), a categorical variable named smoke100 (smoked in the past 100 days?) with "Yes" and "No", and another categorical variable named misnervs (frequency of feelings of nervousness) with 5 possible values: "All", "Most", "Some", "A little", and "None."
When I run this code, I get this result:
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, fill = smoke100)) +
facet_wrap(~misnervs, nrow = 1)
However, the result I want is to have all grouped bar plots display their respective proportions. By reading a bit of "R for Data Science" book I found out that I need to include y = ..prop.. and group = 1 in aes() to achieve it:
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, y = ..prop.., group = 1)) +
facet_wrap(~misnervs, nrow = 1)
Finally, I try to use the fill = smoke100 parameter in aes() to display this categorical variable in color, just like I did on the first code. But when I add this fill parameter, it doesn't work! The code runs, but it shows exactly the same output as the second code, as if the fill parameter this time was somehow ignored!
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, y = ..prop.., group = 1, fill = smoke100)) +
facet_wrap(~misnervs, nrow = 1)
Does anyone have an idea of why this happens, and how to solve it? My end goal is to display each value of smoke100 (the "Yes" and "No" bars) with colors and a legend at the right, just like on the first graph, while having each grouping level of "misnervs" display their respective proportions of smoke100 ("Yes", "No") levels, just like on the second graph.
EDIT:
> dim(smokement)
[1] 35471 6
> str(smokement)
'data.frame': 35471 obs. of 6 variables:
$ smoke100: Factor w/ 2 levels "Yes","No": 1 2 1 2 1 1 1 1 1 1 ...
$ misnervs: Factor w/ 5 levels "All","Most","Some",..: 3 4 5 4 1 5 3 3 5 5 ...
$ mishopls: Factor w/ 5 levels "All","Most","Some",..: 3 5 5 5 5 5 5 5 5 5 ...
$ misrstls: Factor w/ 5 levels "All","Most","Some",..: 3 5 5 3 1 5 3 5 1 5 ...
$ misdeprd: Factor w/ 5 levels "All","Most","Some",..: 5 5 5 5 4 5 5 5 5 5 ...
$ miswtles: Factor w/ 5 levels "All","Most","Some",..: 5 5 5 5 5 5 5 5 5 5 ...
> head(smokement)
smoke100 misnervs mishopls misrstls misdeprd miswtles
1 Yes Some Some Some None None
2 No A little None None None None
3 Yes None None None None None
4 No A little None Some None None
5 Yes All None All A little None
6 Yes None None None None None
As for the output without group = 1
ggplot(data = smokement) +
+ geom_bar(aes(x = smoke100, y = ..prop.., fill = smoke100)) +
+ facet_wrap(~misnervs, nrow = 1)
Besides the solution offered here the GGAlly package includes a stat_prop which introduces a new by aesthetic to specify the way the proportions should be calculated:
library(GGally)
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, y = ..prop.., fill = smoke100, by = misnervs), stat = "prop") +
facet_wrap(~misnervs, nrow = 1)
And just for reference the same could be achieved without GGAlly by setting fill=factor(..x..):
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, y = ..prop.., fill = factor(..x..), group = 1)) +
facet_wrap(~misnervs, nrow = 1)
DATA
misnervs <- c("All", "Most", "Some", "A little", "None")
set.seed(123)
smokement <-
data.frame(
smoke100 = sample(c("Yes", "No"), 100, replace = TRUE),
misnervs = factor(sample(misnervs, 100, replace = TRUE), levels = misnervs)
)
I wasn't able to get what you wanted by tweaking your call to geom_bar*, but I think this gives you what you are looking for. As you didn't provide your input dataset (for understandable reasons), I've used the diamonds tibble in my code. The changes you need to make should be obvious.
*: I'm sure it can be done: I just wasn't able to work it out.
The idea behind my solution is to pre-compute the proportions you want to plot before the call to ggplot.
group_modify takes a grouped tibble and applies the specified function to each group in turn, before returning the modified (grouped) tibble.
diamonds %>%
group_by(cut) %>%
group_modify(
function(.x, .y)
.x %>%
group_by(color) %>%
summarise(Prop=n()/nrow(.))
) %>%
ggplot() +
geom_col(aes(x=color, y=Prop, fill=color)) +
facet_wrap(~cut)
Note the switch from geom_bar to geom_col: geom_bar uses row counts, geom_col uses values in the data.
As a rough-and-ready QC, here's the equivalent of your code that produces the "all grey' plot:
diamonds %>%
ggplot() +
geom_bar(aes(x=color, y=..prop.., fill=color, group=1)) +
facet_wrap(~cut)

R - reshaped data from wide to long format, now want to use created timevar as factor

I am working with longitudinal data and assess the utilization of a policy over 13 months. In oder to get some barplots with the different months on my x-axis, I converted my data from wide Format to Long Format.
So now, my dataset looks like this
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
I thought, after reshaping I could easily use my newly created "month" variable as a factor and plot some graphs. However, it does not work out and tells me it's a list or an atomic vector. Transforming it into a factor did not work out - I would desperately Need it as a factor.
Does anybody know how to turn it into a factor?
Thank you very much for your help!
EDIT.
The OP's graph code was posted in a comment. Here it is.
library(ggplot2)
ggplot(data, aes(x = hours, y = month)) + geom_density() + labs(title = 'Distribution of hours')
# Loading ggplot2
library(ggplot2)
# Placing example in dataframe
data <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
# Converting month to factor
data$month <- factor(data$month, levels = 1:12, labels = 1:12)
# Plotting grouping by id
ggplot(data, aes(x = month, y = hours, group = id, color = factor(id))) + geom_line()
# Plotting hour density by month
ggplot(data, aes(hours, color = month)) + geom_density()
The problem seems to be in the aes. geom_density only needs a x value, if you think about it a little, y doesn't make sense. You want the density of the x values, so on the vertical axis the values will be the values of that density, not some other values present in the dataset.
First, read in the data.
Indirekte_long <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
Now graph it.
library(ggplot2)
g <- ggplot(Indirekte_long, aes(hours))
g + geom_density() + labs(title = 'Distribution of hours')

Reasons that ggplot2 legend does not appear [duplicate]

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 2 years ago.
I was attempting (unsuccessfully) to show a legend in my R ggplot2 graph which involves multiple plots. My data frame df and code is as follows:
Individuals Mod.2 Mod.1 Mod.3
1 2 -0.013473145 0.010859793 -0.08914021
2 3 -0.011109863 0.009503278 -0.09049672
3 4 -0.006465788 0.011304668 -0.08869533
4 5 0.010536718 0.009110458 -0.09088954
5 6 0.015501212 0.005929766 -0.09407023
6 7 0.014565584 0.005530390 -0.09446961
7 8 -0.009712516 0.012234843 -0.08776516
8 9 -0.011282278 0.006569570 -0.09343043
9 10 -0.011330579 0.003505439 -0.09649456
str(df)
'data.frame': 9 obs. of 4 variables:
$ Individuals : num 2 3 4 5 6 7 8 9 10
$ Mod.2 : num -0.01347 -0.01111 -0.00647 0.01054 0.0155 ...
$ Mod.1 : num 0.01086 0.0095 0.0113 0.00911 0.00593 ...
$ Mod.3 : num -0.0891 -0.0905 -0.0887 -0.0909 -0.0941 ...
ggplot(df, aes(df$Individuals)) +
geom_point(aes(y=df[,2]), colour="red") + geom_line(aes(y=df[,2]), colour="red") +
geom_point(aes(y=df[,3]), colour="lightgreen") + geom_line(aes(y=df[,3]), colour="lightgreen") +
geom_point(aes(y=df[,4]), colour="darkgreen") + geom_line(aes(y=df[,4]), colour="darkgreen") +
labs(title = "Modules", x = "Number of individuals", y = "Mode")
I looked up the following stackflow threads, as well as Google searches:
Merging ggplot2 legend
ggplot2 legend not showing
`ggplot2` legend not showing label for added series
ggplot2 legend for geom_area/geom_ribbon not showing
ggplot and R: Two variables over time
ggplot legend not showing up in lift chart
Why ggplot2 legend not show in the graph
ggplot legend not showing up in lift chart.
This one was created 4 days ago
This made me realize that making legends appear is a recurring issue, despite the fact that legends usually appear automatically.
My first question is what are the causes of a legend to not appear when using ggplot? The second is how to solve these causes. One of the causes appears to be related to multiple plots and the use of aes(), but I suspect there are other reasons.
colour= XYZ should be inside the aes(),not outside:
geom_point(aes(data, colour=XYZ)) #------>legend
geom_point(aes(data),colour=XYZ) #------>no legend
Hope it helps, it took me a hell long way to figure out.
You are going about the setting of colour in completely the wrong way. You have set colour to a constant character value in multiple layers, rather than mapping it to the value of a variable in a single layer.
This is largely because your data is not "tidy" (see the following)
head(df)
x a b c
1 1 -0.71149883 2.0886033 0.3468103
2 2 -0.71122304 -2.0777620 -1.0694651
3 3 -0.27155800 0.7772972 0.6080115
4 4 -0.82038851 -1.9212633 -0.8742432
5 5 -0.71397683 1.5796136 -0.1019847
6 6 -0.02283531 -1.2957267 -0.7817367
Instead, you should reshape your data first:
df <- data.frame(x=1:10, a=rnorm(10), b=rnorm(10), c=rnorm(10))
mdf <- reshape2::melt(df, id.var = "x")
This produces a more suitable format:
head(mdf)
x variable value
1 1 a -0.71149883
2 2 a -0.71122304
3 3 a -0.27155800
4 4 a -0.82038851
5 5 a -0.71397683
6 6 a -0.02283531
This will make it much easier to use with ggplot2 in the intended way, where colour is mapped to the value of a variable:
ggplot(mdf, aes(x = x, y = value, colour = variable)) +
geom_point() +
geom_line()
ind = 1:10
my.df <- data.frame(ind, sample(-5:5,10,replace = T) ,
sample(-5:5,10,replace = T) , sample(-5:5,10,replace = T))
df <- data.frame(rep(ind,3) ,c(my.df[,2],my.df[,3],my.df[,4]),
c(rep("mod.1",10),rep("mod.2",10),rep("mod.3",10)))
colnames(df) <- c("ind","value","mod")
Your data frame should look something likes this
ind value mod
1 5 mod.1
2 -5 mod.1
3 3 mod.1
4 2 mod.1
5 -2 mod.1
6 5 mod.1
Then all you have to do is :
ggplot(df, aes(x = ind, y = value, shape = mod, color = mod)) +
geom_line() + geom_point()
I had a similar problem with the tittle, nevertheless, I found a way to show the title: you can add a layer using
ggtitle ("Name of the title that you want to show")
example:
ggplot(data=mtcars,
mapping = aes(x=hp,
fill = factor(vs)))+
geom_histogram(bins = 9,
position = 'identity',
alpha = 0.8, show.legend = T)+
labs(title = 'Horse power',
fill = 'Vs Motor',
x = 'HP',
y = 'conteo',
subtitle = 'A',
caption = 'B')+
ggtitle("Horse power")

barplots in R comparing data from two columns

I have the following:
> ArkHouse2014 <- read.csv(file="C:/Rwork/ar14.csv", header=TRUE, sep=",")
> ArkHouse2014
DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349
What I would like to do is make a barplot (or series of barplots) to compare the totals in the second and third columns on the y-axis while the x-axis would display the information in the first column.
It seems like this should be very easy to do, but most of the information on making barplots that I can find has you make a table from the data and then barplot that, e.g.,
> table(ArkHouse2014$GOP)
2,936 3,258 3,508 3,573 3,581 3,588 3,638 3,830 3,899 3,951 4,133 4,166 4,319 4,330 4,345 4,391 4,396 4,588
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4,969 5,130 5,177 5,343 5,425 5,466 5,710 5,991 6,070 6,100 6,234 6,490 6,550 6,980 7,847 8,846
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I don't want the counts of how many have each total, I'd like to just represent the quantities visually. I feel pretty stupid not being able to figure this out, so thanks in advance for any advice you have to offer me.
Here's an option using libraries reshape2 and ggplot2:
I first read your data (with dec = ","):
df <- read.table(header=TRUE, text="DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349", dec = ",")
Then reshape it to long format:
library(reshape2)
df_long <- melt(df, id.var = "DISTRICT")
Then create a barplot using ggplot:
library(ggplot2)
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge")
or if you want the bars stacked:
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity")

Stacked Bar Chart in R using ggplot2

I am trying to create a Stacked Bar Graph using ggplot2. Here is how my data looks like
Date Full.Page.Ads Total.Pages
11/10/2015 3 24
12/10/2015 10 24
13/10/2015 15 24
This is the code which I am trying to use to visualize the data
library("ggplot2")
library("reshape2")
ns <- read.csv("NS.csv")
ggplot(data=ns, aes(x = Date, y = Total.Pages, fill=Full.Page.Ads)) + geom_bar(stat = "identity")
But for some reason rather than making a proper stacked bar chart, it keeps making something like this http://imgur.com/rqxEzjF.jpg. Rather than this I would like to show the main bars as Total.Pages and inside those bars, Full.Page.Ads bar in different colours.
I am not sure what I am doing wrong out here.
This is a sample of what I am aspiring to make http://blog.visual.ly/wp-content/uploads/2012/08/StackedPercent.png
You have to melt your data before plotting. You also need to compute the difference between Total.Pages and Full.Page.Ads if you want your graph to represent the proportion of Full.Page.Ads with respect to the total number of pages. Something like:
ns <- data.frame(Date = c("11/10/2015", "12/10/2015", "13/10/2015"),
Full.Page.Ads = c(3, 10, 15),
Total.Pages = c(24, 24, 24))
# Date Full.Page.Ads Total.Pages
# 1 11/10/2015 3 24
# 2 12/10/2015 10 24
# 3 13/10/2015 15 24
ns$not.Full.Page.Ads <- ns$Total.Pages - ns$Full.Page.Ads
ns$Total.Pages <- NULL
ns2 <-melt(ns)
# Date variable value
#1 11/10/2015 Full.Page.Ads 3
#2 12/10/2015 Full.Page.Ads 10
#3 13/10/2015 Full.Page.Ads 15
#4 11/10/2015 not.Full.Page.Ads 21
#5 12/10/2015 not.Full.Page.Ads 14
#6 13/10/2015 not.Full.Page.Ads 9
ggplot(data = ns2, aes(x = Date, y = value, fill = variable)) + geom_bar(stat = "identity")

Resources