R stacked area chart - ignore NA and retain full x-axis - r

i've decadal time series from 1700 to 1900 (21 time slices) and for each decade i've got 7 categories that represent a quantity; see here
As you can see, only 5 of the decades actually have data.
I can plot a nice little stacked area chart in R, with the help of this very nice example, which retains only the 5 time slices that have data.
My problem is that i want an x-axis that retains all 21 times slices but still plots a stacked area chart using only the 5 time slices. The idea is that the stacked areas will still only be plotted against the correct year but simply connect up to the next point, 10 ticks down the x-axis, ignoring the no-data in between. i can achieve something in excel but i dont like it.
My reasoning is i want to plot lines on the top of the stacked area that are much more complete, for example from 1700 to 1850, or 1800 to 1900, for visual comparison purposes.
This post suggests how to connect dots in a line chart when you want to ignore NAs but it doesnt work for me in this instance.
a <- 1700:1900
b <- a[seq(1, length(a), 10)]
df <- data.frame("Year"=b,replicate(7,sample(1:21)))
rows <- c(2:10,11:15,17,19,21)
df[rows,2:8] <- NA
df
thanks a lot

If you wish to transform your year to factor, on the lines of the code below:
# Transform the data to long
library(reshape2)
df <- melt(data = df, na.rm = FALSE, id.vars = "Year")
df$Year <- as.factor(df$Year)
# Chart
require(ggplot2)
ggplot(df, aes(Year, value)) +
geom_area(aes(colour = variable, fill= variable), position = 'stack')
It will generate the chart below:
I wasn't sure if you are interested in mapping all of the X variables. I was thinking that this is the case so I reshaped your data. Presumably, it is wiser not to change the Year to factor. The code below:
a <- 1700:1900
b <- a[seq(1, length(a), 10)]
df <- data.frame("Year"=b,replicate(7,sample(1:21)))
rows <- c(2:10,11:15,17,19,21)
df[rows,2:8] <- NA
# Transform the data to long
library(reshape2)
df <- melt(data = df, na.rm = FALSE, id.vars = "Year")
# Leave it as int.
# df$Year <- as.factor(df$Year)
# Chart
require(ggplot2)
ggplot(df, aes(Year, value)) +
geom_area(aes(colour = variable, fill= variable), position = 'stack')
would generate much more meaningful chart:
Potentially, if you decide to use years as factors you may group them and have one category for a number of missing years so the x-axis is more readable. I would say it's a matter of presentation to great extent.

Related

Adding legend to ggplot curves plotted on the same axis [duplicate]

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 4 months ago.
I have a graph that I'm trying to add a legend to but I can't find any answers.
Here's what the graph looks like
I made a dataframe containing my x-axis as a colum and several othe columns containing y values that I graphed against x (fixed) in order to get these curves. I want a legend to appear on the side saying column 1, ...column 11 and corresponding to the color of the graph
How do I do this? I feel like I'm missing something obvious
Here's what my code looks like:(sorry for the pic. I keep getting errors that my code is not formatted correctly even though I'm using the code button)
interval is just 2:100 and aaaa etc... is a vector the same length as interval.
As Peter says, you will need to convert your data into "long" format. Here is an example using reshape2::melt:
library(reshape2)
library(ggplot2)
n <- 20
df <- data.frame(x = seq(n))
tmp <- as.data.frame(do.call("cbind", lapply(seq(5), FUN = function(x){rnorm(n)})))
names(tmp) <- paste0("aaaa", letters[1:5])
df <- cbind(df, tmp)
head(df)
df2 <- melt(df, id.vars = "x")
head(df2)
ggplot(data = df2) + aes(x = x, y = value, color = variable) +
geom_point() +
geom_line()

How to change x axis (in order to be scaled) of a boxplot in R

I'm new to R and i'm having some trouble in solving this problem.
I have the following table/dataframe:
I am trying to generate a boxplot like this one:
However, i want that the x-axis be scaled according to the labels 1000, 2000, 5000, etc.
So, i want that the distance between 1000 and 2000 be different from the distance between 50000 and 100000, since the exact distance is not the same.
Is it possible to do that in R?
Thank you everyone and have a nice day!
Maybe try to convert the data set in to this format, ie as integers in a column, rather than a header title?
# packages
library(ggplot2)
library(reshape2)
# data in ideal format
dt <- data.frame(x=rep(c(1,10,100), each=5),
y=runif(15))
# data that we have. Use reshape2::dcast to get data in to this format
dt$id <- rep(1:5, 3)
dt_orig <- dcast(dt, id~x, value.var = "y")
dt_orig$id <- NULL
names(dt_orig) <- paste0("X", names(dt_orig))
# lets get back to dt, the ideal format :)
# melt puts it in (variable, value) form. Need to configure variable column
dt2 <- melt(dt_orig)
# firstly, remove X from the string
dt2$variable <- gsub("X", "", dt2$variable)
# almost there, can see we have a character, but need it to be an integer
class(dt2$variable)
dt2$variable <- as.integer(dt2$variable)
# boxplot with variable X axis
ggplot(dt2, aes(x=variable, y=value, group=variable)) + geom_boxplot() + theme_minimal()
Base way of re-shaping data: https://www.statmethods.net/management/reshape.html

R group bar chart for my senerio

I have 1000 categorical data sampled over 5 years which I collected which I may demonstrate as
senerio <- as.integer(runif(1000, min = 1, max = (4+1)))
the cases are numbers (1,2,3,4) with the first 181 integers for year1, the next 211 integer for year2, the next 205 integers for year3, the next 185 integers for year4, and the last 218 integers for year5. all within a column. I want to draw a group bar chart with year as x-axis (with the case 1,2,3,4 being a sub-bars in the same x_axis) while the y-axis as the frequency of occurrence.
I want to know how many 1's in year1, year2, year3, year4 and also know how many 2s,3s,4s in each year.
my MWE which do no produceenter image description here
barplot(senerio, legend = c("1",2","3","4"),beside=TRUE)
this is how I want the group chart to look like
enter image description here
Using ggplot is the likely solution. First, though, you will need to declare the years in your data. Below is a verbose example to show manually creating a dataframe and creating a years column, as well as a quick ggplot example. I'm not 100% sure I nailed you expected output. However,this is common question so this should provides you a start for exploring similar questions.
library(tidyverse)
senerio <- as.integer(runif(1000, min = 1, max = (4+1)))
senerio <- data.frame(senerio)
colnames(senerio) <- "value"
senerio$value <- as.factor(senerio$value)
senerio$years <- 0
senerio$years[1:181] <- 1
senerio$years[182:392] <- 2
senerio$years[393:597] <- 3
senerio$years[598:782] <- 4
senerio$years[783:1000] <- 5
ggplot(senerio,aes(years,fill=value)) + geom_bar(position=position_dodge())
use ggplot2 :
`
library(ggplot2)
dat1 <- data.frame(
gender = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(13.53, 16.81, 16.24, 17.42)
)
ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge())
`
position is the important property for visual which you need.

"Heatbars" for visualizing consecutive missing data days?

I am trying to visualize large chunks of consecutive missing data side-by-side on ranges of 3, 5 and 10 years sampled daily. Hopefully using ggplot2 since I already have some aesthetics functions done.
I imagined this would come from a barplot or maybe some heatmap variation, but I am not too sure how to use them with time-series data.
I chose a black/white list of bars because I think it is easier to observe where (1) lies large chunks of missing data and (2) if they are occurring on different moments in time (which would be important to choose which stations to use, etc), while being (3) relatively easy to observe many bars which would not be true to the more conventional line plots for time-series.
This is a draft of what I had in mind.
Here is some example data for 5 stations (in practice this could be up to over 80):
#Data from 5 different stations sampled daily.
df <- cbind(seq(as.Date(("2010/01/01")),by="day",length.out=365*5),data.frame(matrix(rnorm(365*5*5),365*5,5)))
colnames(df) <- c("timestamp","st1","st2","st3","st4","st5")
#Add varying ranges of missing consecutive amount of days to observe result on visualization.
df[1:50,"st1"] <- NA # 50
df[51:200,"st2"] <- NA # 150
df[1:400,"st3"] <- NA # 400
df[501:1300,"st5"] <- NA # 800
Here's a rough stab at it...Alter the scales and theme elements to your liking...
library(ggplot2)
library(scales)
library(reshape2)
melt(df, id.vars = "timestamp") -> k
k$value <- ifelse(is.na(k$value), "NA", "Not NA")
ggplot(data = k) +
geom_point(aes(x = timestamp, y = variable, fill = value, colour = value), shape =22) +
scale_x_date() +
theme_bw()

Plotting a line graph with multiple lines

I am trying to plot a line graph with multiple lines in different colors, but not having much luck. My data set consists of 10 states and the voting turnout rates for each state from 9 elections (so the states are listed in the left column, and each subsequent column is an election year from 1980-2012 with the voting turnout rate for each of the 10 states). I would like to have a graph with the year on the X axis and the voting turnout rate on the Y axis, with a line for each state.
I found this previous answer (Plotting multiple lines from a data frame in R) to a similar question but cannot seem to replicate it using my data. Any ideas/suggestions would be immensely appreciated!
Use tidyr::gather or reshape::melt to transform the data to a long form.
## Simulate data
d <- data.frame(state=letters[1:10],
'1980'=runif(10,0,100),
'1981'=runif(10,0,100),
'1982'=runif(10,0,100))
library(dplyr)
library(tidyr)
library(ggplot2)
## Transform to a long df
e <- d %>% gather(., key, value, -state) %>%
mutate(year = as.numeric(substr(as.character(key), 2, 5))) %>%
select(-key)
## Plot
ggplot(data=e,aes(x=year,y=value,color=state)) +
geom_point() +
geom_line()
Please include your data, or sample data, in your question so that we can answer your question directly and help you get to the root of the problem. Pasting your data is simplified by using dput().
Here's another solution to your problem, using scoa's sample data and the reshape2 package instead of the tidyr package:
# Sample data
d <- data.frame(state = letters[1:10],
'1980' = runif(10,0,100),
'1981' = runif(10,0,100),
'1982' = runif(10,0,100))
library(reshape2)
library(ggplot2)
# Melt data and remove X introduced into year name
melt.d <- melt(d, id = "state")
melt.d[["variable"]] <- gsub("X", "", melt.td[["variable"]])
# Plot melted data
ggplot(data = melt.d,
aes(x = variable,
y = value,
group = state,
color = state)) +
geom_point() +
geom_line()
Produces:
Note that I left out the as.numeric() conversion for year from scoa's example, and this is why the graph above does not include the extra x-axis ticks that scoa's does.

Resources