r ggplot stack identity

r ggplot stack identity - r

enter image description here
i want to make this image..
my data is difficult to disclose so i make a arbitrary data.. TnT...
data
no total outcome
1 800 40
2 700 30
3 650 27
4 600 25
5 500 20
i tried..
ggplot(data, aes(x=no, y=total))+
your textgeom_bar(stat="identity")
your textgeom_bar(stat="identity")+
your textlabs(x="No", y="Total")+
your textscale_y_continuous(breaks=seq(900,100)) +
your texttheme_minimal()
i want make
enter image description here
black bar is total, grey bar is outcome..
help me plz..TnT..
i write this topic using papago, so... sentence can be awkward...I ask for your understand.!!

For visualizing the chart that you want, you need to pivot your data.
You can pivot(make wide data long) with reshape2::melt()
df <- data.frame(
no = 1:5,
total = c(800,700,650,600,500),
outcome = c(40,30,27,25,20)
)
require(ggplot2)
require(reshape2)
df_long <- melt(df,
id.vars='no',
measure.vars=c('total','outcome'))
ggplot(df_long, aes(x=no, y=value, fill=variable))+
geom_col()+
scale_fill_manual(values = c('black','grey')) +
scale_y_continuous(breaks=seq(0,900,100))+
theme_minimal() +
labs(x='No', y='Total')
Note that geom_bar(stat='identity') is equal to geom_col().

Related

ggplot2 - How to plot length of time using geom_bar?

I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?

I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))

How to plot top 5 most frequent variables by region in R

I am looking to do a plot to look into the most common occuring FINAL_CALL_TYPE in my dataset by BOROUGH in NYC. I have a dataset with over 3 million obs. I broke this down into a sample of 2000, but have refined it even more to just the incident type and the borough it occured in.
Essentially, I want to create a plot that will visualize to the 5 most common call types in each borough, with the count of how many of each call types there was in each borough.
Below is a brief look of how my data looks with just Call Type and Borough
> head(df)
FINAL_CALL_TYPE BOROUGH
1804978 INJURY BRONX
1613888 INJMAJ BROOKLYN
294874 INJURY BROOKLYN
1028374 DRUG BROOKLYN
1974030 INJURY MANHATTAN
795815 CVAC BRONX
This shows how many unique values there are
> str(df)
'data.frame': 2000 obs. of 2 variables:
$ FINAL_CALL_TYPE: Factor w/ 139 levels "ABDPFC","ABDPFT",..: 50 48 50 34 50 25 17 138 28 28 ...
$ BOROUGH : Factor w/ 5 levels "BRONX","BROOKLYN",..: 1 2 2 2 3 1 4 2 4 4 ...
This is the code that I have tried
> ggplot(df, aes(x=BOROUGH, y=FINAL_CALL_TYPE)) +
+ geom_bar(stat = 'identity') +
+ facet_grid(~BOROUGH)
and below is the result
I have tried a few suggestions accross this community, but I have not found any that shows how to perform the action with 2 columns.
It would be much appreciated if there is someone who know a solution for this.
Thanks!

If I understand correctly, you can use tidyverse to doo something like:
df <- df %>%
group_by(BOROUGH, FINAL_CALL) %>%
summarise(count = n()) %>%
top_n(n = 5, wt = count)
then plot
ggplot(df, aes(x = FINAL_CALL, y = count) +
geom_col() +
facet(~BOROUGH, scales = "free")

creating the barplot
The first part of your problem is to create the barplot. With geom_bar you only need to supply the x variable, as the y-axis is the count of observations of that variable. You can then use the facet option to separate that count into different panels for another grouping variable.
library(ggplot2)
ggplot(data = diamonds, aes(x = color)) +
geom_bar() +
facet_grid(.~cut)
filtering to top 5 observations
The second part of your problem, limiting the data to only the top five in each group is slightly more complex. An easy way to do this is to first tally the data which will create a column n that has the count of observations. By adding the sort option we can filter the data to the first five rows in each group. tally, like summarize, automatically removes the last group.
In the ggplot call I now use geom_col instead of geom_bar and I explicitly specify that the y-variable is n (n is created by tally).
geom_bar plots the count of observations per x-variable, geom_col plots a y-variable value for each value of the x-variable.
scales = "free_x" removes values from the x-axis that are present in one cut panel but not another.
library(tidyverse)
df <- diamonds %>%
group_by(cut, color) %>%
tally(sort = TRUE) %>%
filter(row_number() <= 5)
ggplot(data = df, aes(x = color, y = n)) +
geom_col() +
facet_grid(.~cut, scales = "free_x")

grouped barplot: order x-axis & keep constant bar width, in case of missing levels

Here is my script (example inspired from here and using the reorder option from here):
library(ggplot2)
Animals <- read.table(
header=TRUE, text='Category Reason Species
1 Decline Genuine 24
2 Improved Genuine 16
3 Improved Misclassified 85
4 Decline Misclassified 41
5 Decline Taxonomic 2
6 Improved Taxonomic 7
7 Decline Unclear 10
8 Improved Unclear 25
9 Improved Bla 10
10 Decline Hello 30')
fig <- ggplot(Animals, aes(x=reorder(Animals$Reason, -Animals$Species), y=Species, fill = Category)) +
geom_bar(stat="identity", position = "dodge")
This gives the following output plot:
What I would like is to order my barplot only on condition 'Decline', and all the 'Improved' would not be inserted in the middle. Here is what I would like to get (after some svg editing):
So now all the whole 'Decline' condition is sorted and the 'Improved' condition comes after. Besides, ideally, the bars would all be at the same width, even if the condition is not represented for the value (e.g. "Bla" has no "Decline" value).
Any idea on how I could do that without having to play with SVG editors? Many thanks!

First let's fill your data.frame with missing combinations like this.
library(dplyr)
Animals2 <- expand.grid(Category=unique(Animals$Category), Reason=unique(Animals$Reason)) %>% data.frame %>% left_join(Animals)
Then you can create an ordering variable for the x-scale:
myorder <- Animals2 %>% filter(Category=="Decline") %>% arrange(desc(Species)) %>% .$Reason %>% as.character
An then plot:
ggplot(Animals2, aes(x=Reason, y=Species, fill = Category)) +
geom_bar(stat="identity", position = "dodge") + scale_x_discrete(limits=myorder)

Define new data frame with all combinations of "Category" and "Reason", merge with data of "Species" from data frame "Animals". Adapt ggplot by correct scale_x_discrete:
Animals3 <- expand.grid(Category=unique(Animals$Category),Reason=unique(Animals$Reason))
Animals3 <- merge(Animals3,Animals,by=c("Category","Reason"),all.x=TRUE)
Animals3[is.na(Animals3)] <- 0
Animals3 <- Animals3[order(Animals3$Category,-Animals3$Species),]
ggplot(Animals3, aes(x=Animals3$Reason, y=Species, fill = Category)) + geom_bar(stat="identity", position = "dodge") + scale_x_discrete(limits=as.character(Animals3[Animals3$Category=="Decline","Reason"]))

To achieve something like that I would adjust the data frame when working with ggplot. Add the missing categories with a value of zero.
Animals <- rbind(Animals,
data.frame(Category = c("Improved", "Decline"),
Reason = c("Hello", "Bla"),
Species = c(0,0)
)
)

Along the same lines as the answer from user Alex, a less manual way of adding the categories might be
d <- with(Animals, expand.grid(unique(Category), unique(Reason)))
names(d) <- names(Animals)[1:2]
Animals <- merge(d, Animals, all.x=TRUE)
Animals$Species[is.na(Animals$Species)] <- 0

ggplot2: facets: different axis limits and free space

I want to display two dimensions in my data, (1) reporting entity in different facets and (2) country associated to the data point on the x-axis. The problem is that the country dimension includes a "total", which is a lot higher than all of the individual values, so I would need an own axis limit for that.
My solution was to try another facetting dimension, but I could not get it working and looking nicely at the same time. Consider the following dummy data:
id <- c(1,1,1,1,1,1,2,2,2,2,2,2)
country <- c("US","US","UK","World","World","World","US","US","UK","World","World","World")
value <- c(150,40,100,1000,1100,1500,5,10,20,150,200,120)
# + some other dimensions
mydat <- data.frame(id,country,value)
id country value
1 1 US 150
2 1 US 40
3 1 UK 100
4 1 World 1000
5 1 World 1100
6 1 World 1500
7 2 US 5
8 2 US 10
9 2 UK 20
10 2 World 150
11 2 World 200
12 2 World 120
If I use a facet grid to display a world total, the axis limit is forced for the other countries as well:
mydat$breakdown <- mydat$country == "World"
ggplot(mydat) + aes(x=country,y=value) + geom_point() +
facet_grid(id ~ breakdown,scales = "free",space = "free_x") +
theme(strip.text.x = element_blank() , strip.background = element_blank(),
plot.margin = unit( c(0,0,0,0) , units = "lines" ) )
(the last part of the plot is just to remove the additional strip).
If I use a facet wrap, it does give me different axis limits for each plot, but then I cannot pass the space = "free_x" argument, meaning that the single column for the total will consume the same space as the entire country overview, which looks ugly for data sets with many countries:
ggplot(mydat) + aes(x=country,y=value) + geom_point() +
facet_wrap(id ~ breakdown,scales = "free")
There are several threads here which ask similar questions, but none of the answers helped me to achieve this yet.
Different axis limits per facet in ggplot2
Is it yet possible to have different axis breaks / limits for individual facets in ggplot with free scale?
Setting individual axis limits with facet_wrap and scales = "free" in ggplot2

Maybe try gridExtra::grid.arrange or cowplot::plot_grid:
lst <- split(mydat, list(mydat$breakdown, mydat$id))
plots <- lapply(seq(lst), function(x) {ggplot(lst[[x]]) +
aes(x=country,y=value) +
geom_point() +
ggtitle(names(lst)[x]) + labs(x=NULL, y=NULL)
})
do.call(gridExtra::grid.arrange,
c(plots, list(ncol=2, widths=c(2/3, 1/3)),
left="Value", bottom="country"))

Generating a histogram and density plot from binned data

I've binned some data and currently have a dataframe that consists of two columns, one that specifies a bin range and another that specifies the frequency like this:-
> head(data)
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16
I want to plot a histogram and density plot using this but I can't seem to find a way of doing so without having to generate new bins etc. Using this solution here I tried to do the following:-
p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")
but it crashes. Anyone know of how to deal with this?
Thank you

the problem is that ggplot doesnt understand the data the way you input it, you need to reshape it like so (I am not a regex-master, so surely there are better ways to do is):
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")
# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")
# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))
or if you don't want the data to be interpreted numerically, you can just simply do the following:
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")
you won't be able to plot a density-plot with your data, given its not continous but rather categorical, thats why I actually prefer the second way of showing it,

You can try
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_col()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

r ggplot stack identity - r

Related

ggplot2 - How to plot length of time using geom_bar?

How to plot top 5 most frequent variables by region in R

grouped barplot: order x-axis & keep constant bar width, in case of missing levels

ggplot2: facets: different axis limits and free space

Generating a histogram and density plot from binned data

Categories

Resources