Order Stacked Bars Plot R - r

I've tried to organize by factor levels; I've tried to organize my data, but nothing is working.
I want the stacked bars to be from either 1-5 or 5-1.
Data:
Scale variable value
5 5 - Extremely valuable Q10A 17.8%
10 5 - Extremely valuable Q10B 18.9%
4 4 Q10A 27.1%
9 4 Q10B 31.4%
3 3 Q10A 31.5%
8 3 Q10B 32.4%
2 2 Q10A 12.7%
7 2 Q10B 8.8%
1 1 - No value at all Q10A 11%
6 1 - No value at all Q10B 8.6%
Code:
ggplot(breakstablemelt,aes(x=variable, y=value,fill=Scale))+
geom_bar(stat="identity")+
coord_flip()+
labs(title="title",
x="Q10",
y=NULL)
Organizing Data by Scale:
breakstablemelt=breakstablemelt[order(breakstablemelt$Scale,decreasing=T),]
Edit:
Factor Organization:
breakstablemelt$Scale<-factor(breakstablemelt$Scale, levels=breakstable$Scale)
breakstablemelt2=breakstablemelt %>% arrange(desc(Scale))
Graph output:
unordered stacked bar graph

Removed the percent symbols at the end of the Value column, and it fixed everything.

Related

selective display of the groups text on a stacked ggplot2

I'm creating several stacked barplots using ggplot. I'm grouping my results by year and I want to sort my data by a factor variable that has many levels (around 30). I want to display my cumulative summs but there are so many of them that they overlap.
My barplot looks OK for categories with big values, but I haven't managed to find a solution for categories that have small values.I tried setting different geom_text arguments. Now I would like to simply exclude the text for those categories from the barplot but dont know how.
ggplot(data=pivot, aes(x=YEAR, y=SUM, fill=GROUP))+
geom_bar(stat="identity")+
geom_text(aes(label=round(SUM)), vjust=1.6,
position = position_stack(), size=2.5)+
labs(x = "YEAR", y="Amount sold in EUR")
I think that my graphs look better with text over categories with bigger values so I want to include them in the final results but don't know how to select only a few for display.
My dataframe looks as follows:
> pivot
A tibble: 86 x 3
Groups: value [31]
value Year SUM
1 1 2011 771.
2 1 2012 999.
3 1 2013 1479.
4 1 2014 512.
5 1 2015 677.
6 3 2012 4.07
7 4 2012 7.92
8 4 2013 3.97
9 4 2014 41.2
10 5 2011 12.0
... with 76 more rows
I would like to display text on the barplot for values of SUM for category 1 as they are bigger but not for categories 3, 4 and 5. In the final result I would be content with displaying text only for categories 1, 24 and 26 but dont know how to select only them.

R: Plot Density Graph for data in tables with respect to Labels in tables

I got a data in table form which look like this in R:
V1 V2
1 19 -1539
2 7 -1507
3 3 -1446
4 7 -1427
5 8 -1401
6 2 -422
7 22 4178
8 5 4277
9 10 4303
10 18 4431
....200 million more lines to go
I would like to plot a density plot for the value in the second column with respect to the label in the first column (i.e. each label has on density curve on a same graph). But I don't know how. Any suggestion?
If I understood the question correctly, this would end up somewhat like a density heatmap in the end. (Considering there are 200 million observations total and V1 has fairly considerable range of variation)
For that I would try ggplot and stat_binhex:
df <- read.table(text="V1 V2
1 19 -1539
2 7 -1507
3 3 -1446
4 7 -1427
5 8 -1401
6 2 -422
7 22 4178
8 5 4277
9 10 4303
10 18 4431")
library(ggplot2)
ggplot(data=df,aes(V1,V2)) +
stat_binhex() +
scale_fill_gradient(low="red", high="steelblue") +
scale_y_continuous() +
theme_bw()
stat_binhex should work well with large data and has several parameters that will help with presentation (like bins, binwidth. See ?stat_binhex)
OK I figure it out by myself
ggplot(data, aes(x=V2, color=V1)) + geom_density(aes(group=V1))
Should be able to do that.
However there is two thing I need to make sure first in order to let it run:
V1 is a factor
V2 is a numerical value
The data I got wasn't set directly by read.tables in the way I want, so I have to do the following before using ggplot:
data$V1 = as.factor(data$V1)
data$V2 = as.numeric(as.character(data$V2))

How to control colors and breaks in heatmap using ggplot?

I am trying to make a heatmap using ggplot2 package.
I have trouble controlling the colors and breaks on the heatmap.
I have 18 questions, 22 firms and the meanvalue of the firms responses on a 1 to 5 scale.
Say i would want values (0-1)(1-2)(2-3)(3-4)(4-5) to be color coded. Either with different colors (Blue, Green, Red, Yellow, Purple) or on a gradient scale. And also NA values = Black.
Short: How do i choose colors and breaks?
I would also like to fix the order on the axis to "Question1, Question2...Question18".
Likewise for the firms. At this moment I believe it is of class "factor" that causes this problem.
> head(mydf, 20)
Firm Question Value
1 1 Question1 3.6675482217047
2 1 Question2 3.74327628361858
3 1 Question3 <NA>
4 1 Question4 <NA>
5 1 Question5 <NA>
6 1 Question6 <NA>
7 1 Question7 0.352078239608802
8 1 Question8 3.04180471049169
9 1 Question9 3.9559090659924
10 1 Question10 <NA>
11 1 Question11 1
12 1 Question12 4.26591296778731
13 1 Question13 3.95256943635996
14 1 Question14 0.465686274509804
15 1 Question15 2.61764705882353
16 1 Question16 1.83333333333333
17 1 Question17 <NA>
18 1 Question18 0.225490196078431
19 2 Question1 3.85714285714286
20 2 Question2 4
> ggplot(mydf, aes(Question, Firm, fill=Value)) + geom_tile() + theme(axis.text.x = element_text(angle=330, hjust=0))
http://imgur.com/iM1aLXG Link to picture of my current plot.
The root of your problem appears to be that Value is a factor, rather than a numeric vector. I infer this based on the fact that in the head() output NA values are written as <NA>, which I assume is how they were written in your original spreadsheet, but is not default behavior for R. The image you link to is ggplot's default behavior for coloring based on a factor; the default coloration for numeric is much closer to what you want.
You can check if this in indeed the case by using class$mydf$Value. If it is indeed a factor, convert it to numeric with the following:
mydf$Value <-as.numeric(as.character(mydf$Value))
Your plotting code as written will now return a graph which looks like this:
You can play around with the exact visualization using the gradient scale, or add a manual scale.
As for your other question, reordering that factor is quite simple. Adapted From R bloggers:
mydf$Question <- factor(mydf$Question, levels(mydf$Question)[c(1,10:18,2:9)])

R stacked percentage bar plot with percentage of binary factor and labels (with ggplot)

I want to produce a graphic that looks something like this:
My original data set looks something like this:
> bb[sample(nrow(bb), 20), ]
IMG QUANT FIX
25663 1 1 0
7936 2 2 0
23586 3 2 0
23017 2 2 1
31363 1 3 1
7886 2 2 0
23819 3 3 1
29838 2 2 1
8169 2 3 1
9870 2 3 0
31440 2 1 0
35564 3 1 0
24066 1 2 0
12020 3 2 0
6742 3 2 0
6189 2 3 0
26692 2 3 0
1387 3 2 0
31839 2 3 1
28637 3 2 0
So the idea is that the bars display where FIX = 1 per factor QUANT and per
factor IMG.
I've aggregated my data set into percentages using plyr
library(plyr)
bb.perc <- ddply(bb,.(QUANT,IMG),summarise,FIX.PROP = sum(FIX) / length(FIX))
It does almost the right thing:
QUANT IMG FIX.PROP
1 1 1 0.52439024
2 1 2 0.19085366
3 1 3 0.13658537
4 2 1 0.20414201
5 2 2 0.53964497
6 2 3 0.09585799
7 3 1 0.29000000
8 3 2 0.13000000
9 3 3 0.40705882
But now if I make a graph, it doesn't account for the FIX==0 cases, i.e. all bars have the same height, namely 100%, which isn't what I want. Note how the individual QUANT subframes don't add up to 100%:
> sum(bb.perc[1:3,]$FIX.PROP)
[1] 0.8518293
> sum(bb.perc[4:6,]$FIX.PROP)
[1] 0.839645
> sum(bb.perc[7:9,]$FIX.PROP)
[1] 0.8270588
The best I could do with R is to display counts:
# Take only the positive samples
bb.pos <- bb[bb$FIX == 1,]
# Plot the counts
ggplot(bb,aes(factor(QUANT),fill=factor(IMG))) + geom_bar() +
scale_y_continuous(labels=percent)
And results in:
This is also not what I want:
The percentage scale is way off. I need a way to pass the 100% point to the
percent function, but I have no idea how.
It lacks the labels.
There are a great deal of similar questions on SO already, but I seem to lack
the sufficient amount of intelligence (or understanding of R) to extrapolate
from them to a solution to my particular problem.
Thanks for any pointers!
EDIT: Sven Hohenstein provided an answer already, but here's how I ended up doing it myself as well:
> ggplot(bb.perc,aes(x=factor(QUANT),y=FIX.PROP,label=paste(round(FIX.PROP*100),
"%"),fill=factor(IMG)))+ geom_bar(stat="identity") + geom_text(position="stack",
aes(ymax=1),vjust=5) + scale_y_continuous(labels = percent)
Using the bb.perc that I defined further up using plyr. This one has the
advantage that the percentages are computed locally per column, and not
globally.
Thanks everyone for the help. The following two questions and their respective
answers helped me greatly in getting it right:
Stacked Bar Graph Labels with ggplot2
Adding labels to ggplot bar chart
What I did wrong initially, was pass the position = "fill" parameter to
geom_bar(), which for some reason made all the bars have the same height!
This is a way to generate the plot:
ggplot(bb[bb$FIX == 1, ],aes(x = factor(QUANT), fill = factor(IMG),
y = (..count..)/sum(..count..))) +
geom_bar() +
stat_bin(geom = "text",
aes(label = paste(round((..count..)/sum(..count..)*100), "%")),
vjust = 5) +
scale_y_continuous(labels = percent)
Change the value of the vjust parameter to adjust the vertical position of the labels.

How to subset data for additional geoms while using facets in ggplot2?

I want additional 'geoms' to only apply to a subset of the initial data. I would like this subset to be from each units created by facets=~.
My trials using subletting of either the data or of the plotted variables leads to subsetting of the whole data set, rather than the subletting of the units created by 'facets=~' and in two different ways (apparently dependant on the sorting of the data).
This difficulty is appears with any 'geom' while using 'facets'
library(ggplot2)
test.data<-data.frame(factor=rep(c("small", "big"), each=9),
x=c(c(1,2,3,3,3,2,1,1,1), 2*c(1,2,3,3,3,2,1,1,1)),
y=c(c(1,1,1,2,3,3,3,2,1), 2*c(1,1,1,2,3,3,3,2,1)))
factor x y
1 small 1 1
2 small 2 1
3 small 3 1
4 small 3 2
5 small 3 3
6 small 2 3
7 small 1 3
8 small 1 2
9 small 1 1
10 big 2 2
11 big 4 2
12 big 6 2
13 big 6 4
14 big 6 6
15 big 4 6
16 big 2 6
17 big 2 4
18 big 2 2
qplot(data=test.data,
x=x,
y=y,
geom="polygon",
facets=~factor)+
geom_polygon(data=test.data[c(2,3,4,5,6,2),],
aes(x=x,
y=y),
fill=I("red"))
qplot(data=test.data,
x=x,
y=y,
geom="polygon",
facets=~factor)+
geom_polygon(aes(x=x[c(2,3,4,5,6,2)],
y=y[c(2,3,4,5,6,2)]),
fill=I("red"))
The answer is to subset the data in a first step.
library(ggplot2)
library(plyr)
test.data<-data.frame(factor=rep(c("small", "big"), each=9),
x=c(c(1,2,3,3,3,2,1,1,1), 2*c(1,2,3,3,3,2,1,1,1)),
y=c(c(1,1,1,2,3,3,3,2,1), 2*c(1,1,1,2,3,3,3,2,1)))
subset.test<-ddply(.data=test.data,
.variables="factor",
function(data){
data[c(2,3,4,5,6,2),]})
qplot(data=test.data,
x=x,
y=y,
geom="polygon",
facets=~factor)+
geom_polygon(data=subset.test,
fill=I("red"))

Resources