R group bar chart for my senerio - r

I have 1000 categorical data sampled over 5 years which I collected which I may demonstrate as
senerio <- as.integer(runif(1000, min = 1, max = (4+1)))
the cases are numbers (1,2,3,4) with the first 181 integers for year1, the next 211 integer for year2, the next 205 integers for year3, the next 185 integers for year4, and the last 218 integers for year5. all within a column. I want to draw a group bar chart with year as x-axis (with the case 1,2,3,4 being a sub-bars in the same x_axis) while the y-axis as the frequency of occurrence.
I want to know how many 1's in year1, year2, year3, year4 and also know how many 2s,3s,4s in each year.
my MWE which do no produceenter image description here
barplot(senerio, legend = c("1",2","3","4"),beside=TRUE)
this is how I want the group chart to look like
enter image description here

Using ggplot is the likely solution. First, though, you will need to declare the years in your data. Below is a verbose example to show manually creating a dataframe and creating a years column, as well as a quick ggplot example. I'm not 100% sure I nailed you expected output. However,this is common question so this should provides you a start for exploring similar questions.
library(tidyverse)
senerio <- as.integer(runif(1000, min = 1, max = (4+1)))
senerio <- data.frame(senerio)
colnames(senerio) <- "value"
senerio$value <- as.factor(senerio$value)
senerio$years <- 0
senerio$years[1:181] <- 1
senerio$years[182:392] <- 2
senerio$years[393:597] <- 3
senerio$years[598:782] <- 4
senerio$years[783:1000] <- 5
ggplot(senerio,aes(years,fill=value)) + geom_bar(position=position_dodge())

use ggplot2 :
`
library(ggplot2)
dat1 <- data.frame(
gender = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(13.53, 16.81, 16.24, 17.42)
)
ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge())
`
position is the important property for visual which you need.

Related

Equally distributed bar chart in ggplot2

What I want to do
My dataset consists of several cases (id) with different outcomes (outcome) for a given number of repeated meaures (cycle). Each cycle should be counted as 1 (val) or be visualized of equal length.
The plot I want to end up with is a stacked bar chart, where each cycle of each case has the same length. The sequence of cycles must be continous. The sequence of the outcomes is dependent on the according cycles.
My Problem
The sample code below produces a bar chart that sums up the cycles (although being a factor). However, using the val column instead of cycle messes with the sequence of the outcomes, which must not change.
# setup
library(ggplot2)
library(dplyr)
set.seed(0)
# test data
data.frame(
cycle=factor(rep(1:8,2),levels=1:8),
val=1,
id=factor(rep(1:2,each=8)),
outcome=factor(paste("Outcome",sample(1:8,16,T)),levels=paste("Outcome",1:8))) %>%
# plot
ggplot(.,aes(id,cycle,fill=outcome))+
geom_bar(stat="identity",position=position_stack(reverse=T),width=0.99)+
coord_flip()
My Question
Is it possible to make cycles count as 1 for each id, keeping the outcome sequence?
Thank you in advance!
The Plots
This is what I get when using the above code:
This is what I get, when using val instead of cycle:
The goal is to keep the outcome sequence, while counting each cycle as 1 or making them appear of the same length for each id.
As far as I get it you could achieve your desired result using geom_tile:
library(ggplot2)
set.seed(0)
dat <- data.frame(
cycle = factor(rep(1:8, 2), levels = 1:8),
val = 1,
id = factor(rep(1:2, each = 8)),
outcome = factor(paste("Outcome", sample(1:8, 16, T)), levels = paste("Outcome", 1:8))
)
ggplot(dat, aes(cycle, id, fill = outcome)) +
geom_tile()

How to reorder bars in barplot using ggplot 2 [duplicate]

This question already has answers here:
Order Bars in ggplot2 bar graph
(16 answers)
Closed 1 year ago.
I wanted to move my bars according to this particular order for the beetle number i.e., from 0 to 1-5 to 6-10 to 11-15 to Above 15. I also wanted to place Village first and the Municipality. The plots should also be arranged in terms of the age of the building. Under 5 years first, then 5-10 years followed by Above 10 years
ggplot(g,aes(x=Locality.Division))+
geom_bar(aes(fill=Number.of.Beetle),position="dodge")+
facet_wrap(~Building.Age)
#> Error in ggplot(g, aes(x = Locality.Division)): could not find function "ggplot"
Created on 2021-05-30 by the reprex package (v2.0.0)
The order of the bars is determined by the order of the factor levels of the variable.
You have the Number.of.Beetle variable in your data a character variable. ggplot() converts this to a factor variable with factor(), which by default sorts character variables alphabetically. To specify a different order, convert the variable to a factor yourself before plotting:
g <- mutate(g,
Number.of.Beetle = factor(Number.of.Beetle, levels = c("1-5", "6-10", "11-15", "15+))
)
If the order is shown backwards, then also use forcats::fct_rev() to reverse the order:
g <- mutate(g,
Number.of.Beetle = forcats::fct_rev(factor(Number.of.Beetle, levels = c("1-5", "6-10", "11-15", "15+)))
)
I hope the following helps to get you started. You did not provide a minimal reproducible example, thus, I simulate some data. I also adapted the variable names.
A key strategy to control the order of variables is making them a factor. I do this when plotting.
Note: number of beetles is quasi-sorted given the values used. Here you could also work with a factor, if needed.
library(ggplot2)
set.seed(666) # fix random picks for replicability
# simulate data of 30 buildings
df <- data.frame(
Building = 1:30
, Building.Age = sample(x = c("U5","5-10","A10"), size = 30, replace = TRUE)
, Nbr.Beetle = sample(x = c("1-5","6-10","11-15","15+"), size = 30, replace = TRUE)
, Locality = sample(x = c("A","B","C"), size = 30, replace = TRUE))
# plot my example
ggplot(data = df, aes(x=Locality)) +
geom_bar(aes(fill=Nbr.Beetle),position="dodge") +
# --------------------- control the sequence of panels by forcing level sequence of factor
facet_wrap(. ~ factor( Building.Age, levels = c("U5","5-10","A10") ) )
This yields:

"Heatbars" for visualizing consecutive missing data days?

I am trying to visualize large chunks of consecutive missing data side-by-side on ranges of 3, 5 and 10 years sampled daily. Hopefully using ggplot2 since I already have some aesthetics functions done.
I imagined this would come from a barplot or maybe some heatmap variation, but I am not too sure how to use them with time-series data.
I chose a black/white list of bars because I think it is easier to observe where (1) lies large chunks of missing data and (2) if they are occurring on different moments in time (which would be important to choose which stations to use, etc), while being (3) relatively easy to observe many bars which would not be true to the more conventional line plots for time-series.
This is a draft of what I had in mind.
Here is some example data for 5 stations (in practice this could be up to over 80):
#Data from 5 different stations sampled daily.
df <- cbind(seq(as.Date(("2010/01/01")),by="day",length.out=365*5),data.frame(matrix(rnorm(365*5*5),365*5,5)))
colnames(df) <- c("timestamp","st1","st2","st3","st4","st5")
#Add varying ranges of missing consecutive amount of days to observe result on visualization.
df[1:50,"st1"] <- NA # 50
df[51:200,"st2"] <- NA # 150
df[1:400,"st3"] <- NA # 400
df[501:1300,"st5"] <- NA # 800
Here's a rough stab at it...Alter the scales and theme elements to your liking...
library(ggplot2)
library(scales)
library(reshape2)
melt(df, id.vars = "timestamp") -> k
k$value <- ifelse(is.na(k$value), "NA", "Not NA")
ggplot(data = k) +
geom_point(aes(x = timestamp, y = variable, fill = value, colour = value), shape =22) +
scale_x_date() +
theme_bw()

R Setting Y Axis to Count Distinct in ggplot2

I have a data frame that contains 4 variables: an ID number (chr), a degree type (factor w/ 2 levels of Grad and Undergrad), a degree year (chr with year), and Employment Record Type (factor w/ 6 levels).
I would like to display this data as a count of the unique ID numbers by year as a stacked area plot of the 6 Employment Record Types. So, count of # of ID numbers on the y-axis, degree year on the x-axis, the value of x being number of IDs for that year, and the fill will handle the Record Type. I am using ggplot2 in RStudio.
I used the following code, but the y axis does not count distinct IDs:
ggplot(AlumJobStatusCopy, aes(x=Degree.Year, y=Entity.ID,
fill=Employment.Data.Type)) + geom_freqpoly() +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
I also tried setting y = Entity.ID to y = ..count.. and that did not work either. I have searched for solutions as it seems to be a problem with how I am writing the aes code.
I also tried the following code based on examples of similar plots:
ggplot(AlumJobStatusCopy, aes(interval)) +
geom_area(aes(x=Degree.Year, y = Entity.ID,
fill = Employment.Data.Type)) +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
This does not even seem to work. I've read the documentation and am at my wit's end.
EDIT:
After figuring out the answer to the problem, I realized that I was not actually using the correct values for my Year variable. A count tells me nothing as I am trying to display the rise in a lack of records and the decline in current records.
My Dataset:
Year, int, 1960-2015
Current Record, num: % of total records that are current
No Record, num: % of total records that are not current
Ergo each Year value has two corresponding percent values. I am now using 2 lines instead of an area plot since the Y axis has distinct values instead of a count function, but I would still like the area under the curves filled. I tried using Melt to convert the data from wide to long, but was still unable to fill both lines. Filling is just for aesthetic purposes as I would like to use a gradient for each with 1 fill being slightly lighter than the other.
Here is my current code:
ggplot(Alum, aes(Year)) +
geom_line(aes(y = Percent.Records, colour = "Percent.Records")) +
geom_line(aes(y = Percent.No.Records, colour = "Percent.No.Records")) +
scale_y_continuous(labels = percent) + ylab('Percent of Total Records') +
ggtitle("Active, Living Alumni Employment Record") +
scale_x_continuous(breaks=seq(1960, 2014, by=5))
I cannot post an image yet.
I think you're missing a step where you summarize the data to get the quantities to plot on the y-axis. Here's an example with some toy data similar to how you describe yours:
# Make toy data with three levels of employment type
set.seed(1)
df <- data.frame(Entity.ID = rep(LETTERS[1:10], 3), Degree.Year = rep(seq(1990, 1992), each=10),
Degree.Type = sample(c("grad", "undergrad"), 30, replace=TRUE),
Employment.Data.Type = sample(as.character(1:3), 30, replace=TRUE))
# Here's the part you're missing, where you summarize for plotting
library(dplyr)
dfsum <- df %>%
group_by(Degree.Year, Employment.Data.Type) %>%
tally()
# Now plot that, using the sums as your y values
library(ggplot2)
ggplot(dfsum, aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")
The result could use some fine-tuning, but I think it's what you mean. Here, the bars are equal height because each year in the toy data include an equal numbers of IDs; if the count of IDs varied, so would the total bar height.
If you don't want to add objects to your workspace, just do the summing in the call to ggplot():
ggplot(tally(group_by(df, Degree.Year, Employment.Data.Type)),
aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")

R stacked area chart - ignore NA and retain full x-axis

i've decadal time series from 1700 to 1900 (21 time slices) and for each decade i've got 7 categories that represent a quantity; see here
As you can see, only 5 of the decades actually have data.
I can plot a nice little stacked area chart in R, with the help of this very nice example, which retains only the 5 time slices that have data.
My problem is that i want an x-axis that retains all 21 times slices but still plots a stacked area chart using only the 5 time slices. The idea is that the stacked areas will still only be plotted against the correct year but simply connect up to the next point, 10 ticks down the x-axis, ignoring the no-data in between. i can achieve something in excel but i dont like it.
My reasoning is i want to plot lines on the top of the stacked area that are much more complete, for example from 1700 to 1850, or 1800 to 1900, for visual comparison purposes.
This post suggests how to connect dots in a line chart when you want to ignore NAs but it doesnt work for me in this instance.
a <- 1700:1900
b <- a[seq(1, length(a), 10)]
df <- data.frame("Year"=b,replicate(7,sample(1:21)))
rows <- c(2:10,11:15,17,19,21)
df[rows,2:8] <- NA
df
thanks a lot
If you wish to transform your year to factor, on the lines of the code below:
# Transform the data to long
library(reshape2)
df <- melt(data = df, na.rm = FALSE, id.vars = "Year")
df$Year <- as.factor(df$Year)
# Chart
require(ggplot2)
ggplot(df, aes(Year, value)) +
geom_area(aes(colour = variable, fill= variable), position = 'stack')
It will generate the chart below:
I wasn't sure if you are interested in mapping all of the X variables. I was thinking that this is the case so I reshaped your data. Presumably, it is wiser not to change the Year to factor. The code below:
a <- 1700:1900
b <- a[seq(1, length(a), 10)]
df <- data.frame("Year"=b,replicate(7,sample(1:21)))
rows <- c(2:10,11:15,17,19,21)
df[rows,2:8] <- NA
# Transform the data to long
library(reshape2)
df <- melt(data = df, na.rm = FALSE, id.vars = "Year")
# Leave it as int.
# df$Year <- as.factor(df$Year)
# Chart
require(ggplot2)
ggplot(df, aes(Year, value)) +
geom_area(aes(colour = variable, fill= variable), position = 'stack')
would generate much more meaningful chart:
Potentially, if you decide to use years as factors you may group them and have one category for a number of missing years so the x-axis is more readable. I would say it's a matter of presentation to great extent.

Resources