Transform a ggplot stacked bar into pie chart or alternative - r

I am having trouble deciding how to graph the data I have.
It consists of overlapping quantities that represent a population, hence my decision to use a stacked bar.
These represent six population divisions ("groups") wherein group 1 and group 2 are the main division. Groups 4 to 6 are subgroups of two, and these are subgroups of each other. Its simple diagram is below:
Note: groups 1 and 2 complete the entire population or group 1 + group 2 = 100%.
I want all of these information in one chart which I do not know what and how to implement.
So far I have the one below, which is wrong because Group 1 is included in the main bar.
require(ggplot2)
require(reshape)
tab <- data.frame(
set=c("XXX","XXX","XXX","XXX","XXX","XXX"),
group=c("1","6","5","4","3","2"),
rate=as.numeric(c(10000,20000,50000,55000,75000,100000))
)
dat <- melt(tab)
dat$time <- factor(dat$group,levels=dat$group)
ggplot(dat,aes(x=set)) +
geom_bar(aes(weight=value,fill=group),position="fill",color="#7F7F7F") +
scale_fill_brewer("Groups", palette="OrRd")
What do you guys suggest to visualize it? I want to use R and ggplot for consistency and uniformity with the other graphs I have made already.

Using facets you can divide your plot into two:
# changed value of set for group 1
tab <- data.frame(
set=c("UUU","XXX","XXX","XXX","XXX","XXX"),
group=c("1","6","5","4","3","2"),
rate=as.numeric(c(10000,20000,50000,55000,75000,100000))
)
# explicitly defined id.vars
dat <- melt(tab, id.vars=c('set','group'))
dat$time <- factor(dat$group,levels=dat$group)
# added facet_wrap, in geom_bar aes changed weight to y,
# added stat="identity", changed position="stack"
ggplot(dat,aes(x=set)) +
geom_bar(aes(y=value,fill=group),position="stack", stat="identity", color="#7F7F7F") +
scale_fill_brewer("Groups", palette="OrRd") +
facet_wrap(~set, scale="free_x")

My guess is what you need is a treemap. Please correct me if I misunderstood your question.
here a link on Treemapping]1
If tree map is what you need you can use either portfolio package or googleVis.

Related

Making a line graph with certain X + Y values expressed differently with lines of 33 user IDs in R

I'm trying to put ActivityDate on the X Axis, and Calories on the Y Axis, relating to how 33 different users ranged in their calorie burnings daily. I'm new to ggplot and visualizations as you can tell, so I'd appreciate the most basic solution that I can understand. Thank you so much.
I really tried several iterations of this code, and each one of them weren't quite right in how the visualization turned out. Here are a couple of my thoughts:
##first and foremost:
install.packages("tidyverse") install.packages("here") library(tidyverse) library(here)
Attempt 1 Bar Graph
ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=Id, color=ActivityDate))
Attempt 1 Bar Graph
##Not probably the best for stakeholders, but if I could maybe have the bars a little closer together that might help, so I tried to identify the unique IDs. Perhaps the reason why they are so small is that they appear in long number format, and are not sequential, so it could be adding the extra space and making the bars so small because of the spaces of empty sequential numbers.
Attempt 2 Bar Graph
UId <- unique("Id") ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=UId, color=ActivityDate))
Attempt 2 Bar Graph
##Facepalm, definitely not what I was looking for at all, but that was my effort to solve the above problem.
Attempt 3 Bar Graph
ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=ActivityDate, fill=Id)) + theme(axis.text.x = element_text(angle=45))
Attempt 3 Bar Graph
##The fill function does not work, and on the y-axis if you will, I don't know what "count" is referring to in this case, so could be useful except for those two issues.
##Finally, I switch to a line graph
Attempt 4 Line Graph
ggplot(data=trimmed_dactivity) + geom_line(mapping=aes(x=ActivityDate, y=Calories)) + theme(axis.text.x = element_text(angle=45))
Attempt 4 Line Graph
##Now what I get is separate lines going up and down, and what I want is 33 separate lines representing unique Id numbers to travel along the x axis for time, and rise in the y axis for calories. Of course I'm not sure how to do that...
Any help with what I'm missing on this journey here?
what I want is 33 separate lines representing unique Id numbers…
It sounds like you want a spaghetti plot. To make one, map Id to color (or to group if you don’t want each id to be colored differently).
library(ggplot2)
ggplot(fakedata, aes(ActivityDate, Calories)) +
geom_line(aes(color = factor(Id)), show.legend = FALSE)
Example data:
set.seed(13)
fakedata <- expand.grid(
Id = 1:33,
ActivityDate = seq(as.Date("2016-04-13"), length.out = 10, by = "day")
)
fakedata$Calories <- round(rnorm(330, 2500, 500))

Combine smallest elements in one category 'Other' in a pie chart R

I am trying to plot a pie chart with only 3 segments. I want the 2 largest elements and after that all the smallest elements in one category "Other" . Is there a way to do this directly from a charting package or do I need to manipulate the data set to combine all the smallest values into one. And is there a quick function to do that? Thanks for your answers.
A quick way of doing is this is by slightly formatting your data set. Since you didn't provide an example, I borrowed one from the R Graph Gallery.
Consider a simple data frame with 5 categories (A to E):
library(tidyverse)
data <- data.frame(
group=LETTERS[1:5],
value=c(13,7,9,21,2)
)
We can use rank() and ifelse() to format this data into 3 groups: the two with the largest values and 'Other':
plotting_data <- data %>%
mutate(rank = rank(-value),
group = ifelse(rank <= 2, group, 'Other'))
And then simply use this data set for creating the pie chart:
ggplot(plotting_data, aes(x="", y=value, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0)

R: relative frequency categorical data in ggplot2

I'm working in Rstudio.
With ggplot2, I'm trying to form a plot where I have frequencies of a categorical variable (number of shares purchased), per category (there are 5 categories). For example, members of category A might buy 1 share more frequently than members of category D.
I now have a count plot. However, because one category is much bigger than the others, you don't get a good idea about the n shares in the other categories.
The code of the count plot is as follows:
#ABS. DISTRIBUTION SHARES/CATEGORY
ggplot(dat, aes(x=Number_share, fill=category)) +
geom_histogram(binwidth=.5, alpha=.5, position="dodge")
This results in this graph: https://imgur.com/a/e4k94
Therefore, I am planning to make a plot where, instead of an absolute count, you have a distribution relative to their category.
I calculated the relative frequencies of each category:
library(MASS)
categories = dat$category
categories.freq = table(categories)
categories.relfreq = categories.freq / nrow(dat)
cbind(categories.relfreq)
categories.relfreq
Beauvent 1 0.002708692
Beauvent 2 0.015020931
E&B 0.037182960
Ecopower 1 0.042107855
Ecopower 2 0.029549372
Ecopower 3 0.873183945
I don't know how to make a plot where the frequency of a share number acquisition is relative to the category, instead of absolute. Can anybody help me with this?
I think what you are looking for is this
ggplot(dat, aes(x=Number_share, fill=category)) +
geom_bar(position="fill")
This will stack the categories on top of each other and the position="fill" argument will give the relative counts
I found that this problem is very similar: Histogram with weights in R
basically it's because the default of a histogram is to use counts on the y-axis, while I want to use a hist(freq=TRUE), or in the case of ggplot: ggplot_histogram(y= ..density..).

Plot multiple histograms in one using ggplot2 in R

I am fairly new to R and ggplot2 and am having some trouble plotting multiple variables in the same histogram plot.
My data is already grouped and just needs to be plotted. The data is by week and I need to plot the number for each category (A, B, C and D).
Date A B C D
01-01-2011 11 0 11 1
08-01-2011 12 0 3 3
15-01-2011 9 0 2 6
I want the Dates as the x axis and the counts plotted as different colors according to a generic y axis.
I am able to plot just one of the categories at a time, but am not able to find an example like mine.
This is what I use to plot one category. I am pretty sure I need to use position="dodge" to plot multiple as I don't want it to be stacked.
ggplot(df, aes(x=Date, y=A)) + geom_histogram(stat="identity") +
labs(title = "Number in Category A") +
ylab("Number") +
xlab("Date") +
theme(axis.text.x = element_text(angle = 90))
Also, this gives me a histogram with spaces in between the bars. Is there any way to remove this? I tried spaces=0 as you would do when plotting bar graphs, but it didn't seem to work.
I read some previous questions similar to mine, but the data was in a different format and I couldn't adapt it to fit my data.
This is some of the help I looked at:
Creating a histogram with multiple data series using multhist in R
http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/
I'm also not quite sure what the bin width is. I think it is how the data should be spaced or grouped, which doesn't apply to my question since it is already grouped. Please advise me if I am wrong about this.
Any help would be appreciated.
Thanks in advance!
You're not really plotting histograms, you're just plotting a bar chart that looks kind of like a histogram. I personally think this is a good case for faceting:
library(ggplot2)
library(reshape2) # for melt()
melt_df <- melt(df)
head(melt_df) # so you can see it
ggplot(melt_df, aes(Date,value,fill=Date)) +
geom_bar() +
facet_wrap(~ variable)
However, I think in general, that changes over time are much better represented by a line chart:
ggplot(melt_df,aes(Date,value,group=variable,color=variable)) + geom_line()

varying axis values in facet_wrap

I am working with a Danish dataset on immigrants by country of origin and age group. I transformed the data so I can see the top countries of origin for each age group.
I am plotting it using facet_wrap. What I would like to do is, since different age groups come from quite different areas, to show a different set of values for one axis in each facet. For example, those that are between 0 and 10 years old come from countries x,y and z, while those 10-20 years of age come from countries q, r, z and so on.
In my current version, it shows the entire set of values, including countries that are not in the top 10. I would like to show just the top ten countries of origin for each facet, in effect having different axis labels for each. (And, if it is possible, sorting by high to low for each facet).
Here is what I have so far:
library(ggplot2)
library(reshape)
###load and inspect data
load(url('http://dl.dropbox.com/u/7446674/dk_census.rda'))
head(dk_census)
###reshape for plotting--keep just a few age groups
dk_census.m <- melt(dk_census[dk_census$Age %in% c('0-9 år', '10-19 år','20-29 år','30-39 år'),c(1,2,4)])
###get top 10 observations for each age group, store in data frame
top10 <- by(dk_census.m[order(dk_census.m$Age,-dk_census.m$value),], dk_census.m$Age, head, n=10)
top10.df<-do.call("rbind", as.list(top10))
top10.df
###plot
ggplot(data=top10.df, aes(x=as.factor(Country), y=value)) +
geom_bar(stat="identity")+
coord_flip() +
facet_wrap(~Age)+
labs(title="Immigrants By Country by Age",x="Country of Origin",y="Population")
One option (that I actually strongly suspect you won't be happy with) is this:
p <- ggplot(data=top10.df, aes(x=Country, y=value)) +
geom_bar(stat="identity")+
coord_flip() +
facet_wrap(~Age)+
labs(title="Immigrants By Country by Age",x="Country of Origin",y="Population")
pp <- dlply(.data=top10.df,.(Age),function(x) {x$Country <- reorder(x$Country,x$value); p %+% x})
library(gridExtra)
do.call(grid.arrange,pp)
(Edited to sort each graph.)
Keep in mind that the only reason faceting exists is to plot multiple panels that share a common scale. So when you start asking to facet on some variable, but have the scales be different (oh, and also sort them separately on each panel as well) what you're doing is really no longer faceting. It's just making four different plots and arranging them together.
using lattice (Here I use ``latticeExtrafor ggplot2 theme), you can set torelation=freebetween panels. Here I am using abbreviate = TRUE` to short long labels.
library(latticeExtra)
barchart(value~ Country|Age,data=top10.df,layout=c(2,2),
horizontal=T,
par.strip.text =list(cex=2),
scales=list(y=list(relation='free',cex=1.5,abbreviate=T,
labels=levels(factor(top10.df$Country)))),
# ,cex=1.5,abbreviate=F),
par.settings = ggplot2like(),axis=axis.grid,
main="Immigrants By Country by Age",
ylab="Country of Origin",
xlab="Population")

Resources