I trying to create a Pie chart, but the value is overlapping each other.
Sample Data:
City_Area Age Spending
A 0-15 100
A 15-30 400
B 0-15 200
B 15-30 300
Here my code:
CA = filter(City_Area == 'A') %>% group_by(City_Area,Age,Spending)
ggplot(CA, aes(x="",y = Spending, fill = Age)) + geom_bar(stat='identity')+ coord_polar("y") + theme_void() + geom_text(aes(label = scales::percent(round((..count..)/sum(..count..),2)),y= ((..count..)/sum(..count..))), stat="count",position=position_stack(0.5))
Here without coord_polar
Using Rui Barradas code
The data preparation code seems to be wrong and so does the plotting code.
First, prepare the data. The main thing to do is to get rid of the dollar signs. I will do that with sub.
library(dplyr)
library(ggplot2)
CA2 <- CA %>%
mutate(Spending = as.numeric(sub("\\$", "", Spending))) %>%
filter(City_Area == 'A')
In the question there is a group_by line but for this example it is not needed.
Now the plot.
ggplot(CA2, aes(x = "", y = Spending, fill = Age)) +
geom_bar(stat = 'identity') +
coord_polar("y") +
theme_void() +
geom_text(aes(label = scales::percent(Spending/sum(Spending), 2)),
position = position_stack(0.5))
Data.
CA <- read.table(text = "
City_Area Age Spending
A 0-15 100$
A 15-30 400$
B 0-15 200$
B 15-30 300$
", header = TRUE)
Related
I want to create a stacked bar chart in R such that it shows the sum of levels of a feature over time. The feature is of type factor, "char", with levels A, B, H, N, P, U, W. Date feature is type date.
Example data from "chart_df":
char
date
w
2022-04-09
w
2022-04-07
b
2022-04-06
n
2022-04-05
b
2022-04-03
b
2022-04-03
I'm a total beginner. I've tried y= count(), sum(), summarize() with no luck. I've even tried to group by month in hopes that cleaned it up, but it didn't help. I've used this as my guide: https://r-graph-gallery.com/136-stacked-area-chart.html
I can't figure out how to sum the number of chars for a given date(for ex, "b" would have 2 for 2022-04-03). Below is where I'm at so far but it looks awful:
enter image description here
library(tidyverse)
library(plotly)
library(ggplot2)
library(viridis)
library(hrbrthemes)
p <- chart_df %>%
ggplot( aes(x=date, y = frequency(char), fill=char, text=char)) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
theme_ipsum() +
theme(legend.position="top")
# Turn it interactive
p <- ggplotly(p, tooltip="text")
p
I'd like to create a nice, clear and understandable stacked bar chart showing amounts of char for each day over time. Thank you.
One option would be to use stat="count" in geom_area (and drop the y aes):
library(ggplot2)
library(plotly)
library(viridis)
library(hrbrthemes)
chart_df$date <- as.Date(chart_df$date)
p <- ggplot(chart_df, aes(x = date, fill = char, text = char)) +
geom_area(stat = "count") +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position = "none") +
theme_ipsum() +
theme(legend.position = "top")
ggplotly()
Or as a second option you could compute the counts manually using e.g. dplyr::count:
library(dplyr)
chart_df_agg <- chart_df %>%
count(date, char, name = "count")
p <- ggplot(chart_df_agg, aes(x = date, y = count, fill = char, text = char)) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position = "none") +
theme_ipsum() +
theme(legend.position = "top")
ggplotly()
DATA
chart_df <- data.frame(
stringsAsFactors = FALSE,
char = c("w", "w", "b", "n", "b", "b"),
date = c(
"2022-04-09", "2022-04-07",
"2022-04-06", "2022-04-05", "2022-04-03", "2022-04-03"
)
)
Thanks, everyone. The " y = count" tip was super helpful.
I figured it out using the lubridate library (good for date stuffs):
s <- chart_df_agg %>%
ggplot(aes(x= chart_df_agg$`year(chart_df$date)`, y = char_count,
fill=char, text=char)) +
geom_area(size= 0.1, colour="black") +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
theme_ipsum() +
theme(legend.position="top")
# Turn it interactive
s <- ggplotly(s, tooltip="text")
s
I have a data frame that has columns Region, Date, and Deaths, and I've imported the package "maps" and its 50 state map.
All of the examples I've seen ask me to merge() the data with the map. However, when I do this merging, I manage to end up with an object of over 4 million rows.
The daily data is in melted8 and then melted9.
Because of the huge size of the merge(), the animate() step takes a long time to run... in fact I cut it short after 10 minutes. I do not know if my ggplot() is correctly made, but it is also huge (240 mb).
Is there a more reasonably-sized object I could give to ggplot(), and am I giving ggplot() the right instructions?
# a sample
melted8[sample(nrow(melted8), 5), ]
region date deaths
<chr> <int> <dbl>
arizona 214 7.2815030
missouri 287 0.0000000
arkansas 160 0.3313668
mississippi 53 0.0000000
new jersey 300 0.7880939
library(ggplot2)
library(gganimate)
library(maps)
us.map <- map_data("state") #50 state map from library(maps)
melted9 <- merge(us.map, melted8, by="region", all.x=T)
d <- ggplot(melted9) +
geom_polygon(aes(long,lat, group = group), color='white', fill=NA, data=us.map) +
geom_polygon(aes(long,lat, group = group, fill = deaths), color = "white") +
scale_fill_gradient(low = "gray65", high = "red") +
labs(title = "Deaths per Day") +
ease_aes("linear")
a <- animate(d, duration = 30, nframes = nrow(melted9)/50, end_pause = 5)
a
You don't have to merge the dataset with the map file, if you use geom_map instead of geom_polygon.
See if this is faster for you:
layer_type.GeomMap <- function(x) 'point' # must run this line first
melted8 %>%
ggplot(aes(fill = deaths, map_id = region)) +
geom_map(map = us.map) +
expand_limits(x = us.map$long, y = us.map$lat) +
coord_fixed() +
scale_fill_gradient(low = "gray65", high = "red") +
theme(legend.position = "bottom") +
labs(title = "Deaths per Day: {closest_state}",
x = "lon", y = "lat") +
transition_states(date)
Dataset used (simulating 7 days of records for each state):
library(dplyr)
set.seed(123)
melted8 <- data.frame(region = unique(us.map$region)) %>%
mutate(date = list(seq(1, 7))) %>%
tidyr::unnest(cols = c(date)) %>%
group_by(region) %>%
mutate(deaths = abs(rnorm(n()))) %>%
ungroup()
This is where I get my dataset and c
board_game_original<- read.csv("https://raw.githubusercontent.com/bryandmartin/STAT302/master/docs/Projects/project1_bgdataviz/board_game_raw.csv")
#tidy up the column of mechanic and category with cSplit function
library(splitstackshape)
mechanic <- board_game$mechanic
board_game_tidy <- cSplit(board_game,splitCols=c("mechanic","category"), sep = ",", direction = "long")
I am trying to make the graph more organized by ordering the bar by the values of the bar on the y-axis. I tried using the reorder function but still does not work. Does anyone have any suggestions? I am quite new to R and hope to learn more!
library(ggplot2)
average_complexity <- board_game_tidy %>%
filter(yearpublished >= 1950, users_rated >= 25, average_complexity>0 ) %>%
select(average_complexity)
category_complexity_graph <- ggplot(data=board_game_tidy, aes(x = reorder(category, -average_complexity), y = average_complexity, na.rm = TRUE)) +
geom_bar(stat = "identity", na.rm = TRUE, color="white",fill="sky blue") +
ylim(0,5) +
theme_bw() +
ggtitle("Which category of board games has the highest level of average complexity") +
xlab("category of board games") +
ylab("average complexity of the board game") +
theme(axis.text.x = element_text(size=5, angle = 45)) +
theme(plot.title = element_text(hjust = 0.5))
category_complexity_graph
Here's the graph I plot:
"Category" is a categorical variable and "average complexity" is a continuous variable.
I was trying to answer the question "which category has the highest average complexity?" but this graph looks messy and any suggestion of cleaning it up would be appreciated as well! Thank you all
Maybe this is what you are looking for. The issue is not about reordering, the issue is about preparing your data. (; Put differently the reordering by the average does not give you a nice plot, because you have multiple obs. per category and more importantly a different number of obs. per category. When you do a barplot with this dataset all these obs. get stacked, i.e. your plot is show the sum of average complexities. Hence, to achieve your desired result your have to first summarise your dataset by category. After doing so, your reordering code works and gives you a nice plot.
However, I would suggest to flip the axes which makes the labels easier to read:
board_game_original<- read.csv("https://raw.githubusercontent.com/bryandmartin/STAT302/master/docs/Projects/project1_bgdataviz/board_game_raw.csv")
#tidy up the column of mechanic and category with cSplit function
library(splitstackshape)
board_game <- board_game_original
mechanic <- board_game$mechanic
board_game_tidy <- cSplit(board_game,splitCols=c("mechanic","category"), sep = ",", direction = "long")
library(ggplot2)
library(dplyr)
# Summarise your dataset
board_game_tidy1 <- board_game_tidy %>%
as_tibble() %>%
filter(yearpublished >= 1950, users_rated >= 25, average_complexity > 0, !is.na(category)) %>%
group_by(category) %>%
summarise(n = n(), average_complexity = mean(average_complexity, na.rm = TRUE))
ggplot(data=board_game_tidy1, aes(x = reorder(category, average_complexity), y = average_complexity, na.rm = TRUE)) +
geom_bar(stat = "identity", na.rm = TRUE, color="white",fill="sky blue") +
ylim(0,5) +
theme_bw() +
ggtitle("Which category of board games has the highest level of average complexity") +
xlab("category of board games") +
ylab("average complexity of the board game") +
#theme(axis.text.x = element_text(size=5, angle = 45)) +
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip()
I can't find the answer looking in other group bar plot conversations. Each rename (or site name) should add up to 100% but the bars add up to more than that. I am wondering if I have my data set up incorrectly.
I also want to add error bars, but maybe once I get the replicates correct I can figure that out.
testData <- read.csv("composition.csv")
testData$id <- as.factor(testData$rename)
testDataMelt <- reshape2::melt(testData, rename.vars = "rename")
ggplot(testDataMelt,
aes(x = rename, y =value, group = replicate, fill = replicate)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Lake") +
ylab("% of Sediment Mass") +
labs(fill = "") +
scale_fill_grey()
As suggested by #PoGibas, here is an example with summarizing your data before passing it to ggplot.
Because I do not have your data in a easy to use format, I'll make some fake data for 3 sites; gravel, sand, silt & clay sum up to 100% for each row as in your original data.
set.seed(2018)
df <- data.frame(rename = c("HOG", "MAR", "MO BH"),
gravel = sample(20:40, 9),
sand = sample(40:50, 9),
silt = sample(0:10, 9))
df$clay = as.integer(100 - rowSums(df[,2:4]))
Here is a solution with data.table (this package needs far more advertising) for computing the means and standard errors (to be used for error bars).
library(ggplot2)
library(data.table) # for aggregations
# Convert to data.table object and
# calculate the means and standard errors of each variable per site.
setDT(df)
testDataMelt <- melt(df, id.vars = "rename")
testDataMelt_agg <- testDataMelt[, .(mean = mean(value),
se = sd(value)/.N),
by = .(rename, variable)]
# The mean percent of sediments sum up to 100% for each site.
# We are ready to make the graph.
ggplot(testDataMelt_agg,
aes(x = rename, y = mean, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
# Add error bars (here +/- 1.96 SE)
geom_errorbar(aes(ymax = mean + 1.96*se,
ymin = mean - 1.96*se),
position = "dodge") +
xlab("Lake") +
ylab("% of Sediment Mass") +
labs(fill = "") +
scale_fill_grey()
I have the following R code, where I transform the data and then order it by a specific column:
df2 <- df %>%
group_by(V2, news) %>%
tally() %>%
complete(news, fill = list(n = 0)) %>%
mutate(percentage = n / sum(n) * 100)
df22 <- df2[order(df2$news, -df2$percentage),]
I want to apply the ordered data "df22" in ggplot:
ggplot(df22, aes(x = V2, y = percentage, fill = factor(news, labels = c("Read","Otherwise")))) +
geom_bar(stat = "identity", position = "fill", width = .7) +
coord_flip() + guides(fill = guide_legend(title = "Online News")) +
scale_fill_grey(start = .1, end = .6) + xlab("Country") + ylab("Share")
Unfortunately, ggplot still returns me a plot without the order:
Does anyone know what is wrong with my code? This is not the same as to order bar chart with a single value per bar like here Reorder bars in geom_bar ggplot2. I try to order the cart by a specific category of a factor. In particular, I want to see countries with the largest share of Read news first.
Here is the data:
V2 news n percentage
1 United States News Read 1583 1.845139
2 Netherlands News Read 1536 1.790356
3 Germany News Read 1417 1.651650
4 Singapore News Read 1335 1.556071
5 United States Otherwise 581 0.6772114
6 Netherlands Otherwise 350 0.4079587
7 Germany Otherwise 623 0.7261665
8 Singapore Otherwise 635 0.7401536
I used the following R code:
df2 <- df %>%
group_by(V2, news) %>%
tally() %>%
complete(news, fill = list(n = 114)) %>%
mutate(percentage = n / sum(n) * 100)
df2 <- df2[order(df2$news, -df2$percentage),]
df2 <- df2 %>% group_by(news, percentage) %>% arrange(desc(percentage))
df2$V2 <- factor(df2$V2, levels = unique(df2$V2))
ggplot(df2, aes(x = V2, y = percentage, fill = news))+
geom_bar(stat = "identity", position = "stack") +
guides(fill = guide_legend(title = "Online News")) +
coord_flip() +
scale_x_discrete(limits = rev(levels(df2$V2)))
Everything was fine except some countries break the order for some reason and I do not understand why. Here is the picture:
What I did with the hints from guys, I used "arrange" command instead of dplyr
df4 <- arrange(df2, news, desc(percentage))
Here is the result:
Here's what I have - hope this is useful. As mentioned #Axeman - the trick is to reorder the labels as factors. Further, using coord_flip() reorders the labels in the opposite direction so scale_x_discrete() is needed.
I am using the small sample you provided.
library(ggplot2)
library(dplyr)
df <- read.csv("data.csv")
df <- arrange(df, news, desc(Percentage))
df$V2 <- factor(df$V2, levels = unique(df$V2))
ggplot(df, aes(x = V2, y = Percentage, fill = news))+
geom_bar(stat = "identity", position = "stack") +
guides(fill = guide_legend(title = "Online News")) +
coord_flip() +
scale_x_discrete(limits = rev(levels(df$V2)))