Dynamically Set X limits on time plot - r

I am wondering how to dynamically set the x axis limits of a time series plot containing two time series with different dates. I have developed the following code to provide a reproducible example of my problem.
#Dummy Data
Data1 <- data.frame(Date = c("4/24/1995","6/23/1995","2/12/1996","4/14/1997","9/13/1998"), Area_2D = c(20,11,5,25,50))
Data2 <- data.frame(Date = c("6/23/1995","4/14/1996","11/3/1997","11/6/1997","4/15/1998"), Area_2D = c(13,15,18,25,19))
Data3 <- data.frame(Date = c("4/24/1995","6/23/1995","2/12/1996","4/14/1996","9/13/1998"), Area_2D = c(20,25,28,30,35))
Data4 <- data.frame(Date = c("6/23/1995","4/14/1996","11/3/1997","11/6/1997","4/15/1998"), Area_2D = c(13,15,18,25,19))
#Convert date column as date
Data1$Date <- as.Date(Data1$Date,"%m/%d/%Y")
Data2$Date <- as.Date(Data2$Date,"%m/%d/%Y")
Data3$Date <- as.Date(Data3$Date,"%m/%d/%Y")
Data4$Date <- as.Date(Data4$Date,"%m/%d/%Y")
#PLOT THE DATA
max_y1 <- max(Data1$Area_2D)
# Define colors to be used for cars, trucks, suvs
plot_colors <- c("blue","red")
plot(Data1$Date,Data1$Area_2D, col=plot_colors[1],
ylim=c(0,max_y1), xlim=c(min_x1,max_x1),pch=16, xlab="Date",ylab="Area", type="o")
par(new=T)
plot(Data2$Date,Data2$Area_2D, col=plot_colors[2],
ylim=c(0,max_y1), xlim=c(min_x1,max_x1),pch=16, xlab="Date",ylab="Area", type="o")
The main problem I see with the code above is there are two different x axis on the plot, one for Data1 and another for Data2. I want to have a single x axis spanning the date range determined by the dates in Data1 and Data2.
My questions is:
How do i dynamically create an x axis for both series? (i.e select the minimum and maximum date from the data frames 'Data1' and 'Data2')

The solution is to combine the data into one data.frame, and base the x-axis on that. This approach works very well with the ggplot2 plotting package. First we merge the data and add an ID column, which specifies to which dataset it belongs. I use letters here:
Data1$ID = 'A'
Data2$ID = 'B'
merged_data = rbind(Data1, Data2)
And then create the plot using ggplot2, where the color denotes which dataset it belongs to (can easily be changed to different colors):
library(ggplot2)
ggplot(merged_data, aes(x = Date, y = Area_2D, color = ID)) +
geom_point() + geom_line()
Note that you get one uniform x-axis here. In this case this is fine, but if the timeseries do not overlap, this might be problematic. In that case we can use multiple sub-plots, known as facets in ggplot2:
ggplot(merged_data, aes(x = Date, y = Area_2D)) +
geom_point() + geom_line() + facet_wrap(~ ID, scales = 'free_x')
Now each facet has it's own x-axis, i.e. one for each sub-dataset. What approach is most valid depends on the specific situation.

Related

How to incorporate data into plot which was constructed in ggplot2 using data from another file (R)?

Using a dataset, I have created the following plot:
I'm trying to create the following plot:
Specifically, I am trying to incorporate Twitter names over the first image. To do this, I have a dataset with each name in and a value that corresponds to a point on the axes. A snippet looks something like:
Name Score
#tedcruz 0.108
#RealBenCarson 0.119
Does anyone know how I can plot this data (from one CSV file) over my original graph (which is constructed from data in a different CSV file)? The reason that I am confused is because in ggplot2, you specify the data you want to use at the start, so I am not sure how to incorporate other data.
Thank you.
The question you ask about ggplot combining source of data to plot different element is answered in this post here
Now, I don't know for sure how this is going to apply to your specific data. Here I want to show you an example that might help you to go forward.
Imagine we have two data.frames (see bellow) and we want to obtain a plot similar to the one you presented.
data1 <- data.frame(list(
x=seq(-4, 4, 0.1),
y=dnorm(x = seq(-4, 4, 0.1))))
data2 <- data.frame(list(
"name"=c("name1", "name2"),
"Score" = c(-1, 1)))
The first step is to find the "y" coordinates of the names in the second data.frame (data2). To do this I added a y column to data2. y is defined here as a range of points from the may value of y to the min value of y with some space for aesthetics.
range_y = max(data1$y) - min(data1$y)
space_y = range_y * 0.05
data2$y <- seq(from = max(data1$y)-space, to = min(data1$y)+space, length.out = nrow(data2))
Then we can use ggplot() to plot data1 and data2 following some plot designs. For the current example I did this:
library(ggplot2)
p <- ggplot(data=data1, aes(x=x, y=y)) +
geom_point() + # for the data1 just plot the points
geom_pointrange(data=data2, aes(x=Score, y=y, xmin=Score-0.5, xmax=Score+0.5)) +
geom_text(data = data2, aes(x = Score, y = y+(range_y*0.05), label=name))
p
which gave this following plot:

GGplot2: plotting values per year of specific columns in a dataframe

I am trying to plot values on the y axis against years on the x axis with ggplot2.
This is the dataset: https://drive.google.com/file/d/1nJYtXPrxD0xvq6rBz2NXlm4Epi52rceM/view?usp=sharing
I want to plot the values of specific countries.
It won't work by just specifying year as the x axis and a country's values on the y axis. I'm reading I need to melt the data frame, so I did that, but it's now in a format that doesn't seem convenient to get the job done.
I'm assuming I haven't correctly melted, but I have a hard time finding what I need to specifically do.
What I did beforehand is manually transpose the data and make the years a column, as well as all the countries.
This is the dataset transposed:
https://drive.google.com/file/d/131wNlubMqVEG9tID7qp-Wr8TLli9KO2Q/view?usp=sharing
Here's how I melted:
inv_melt.data <- melt(investments_t.data, id.vars="Year")
ggplot() +
geom_line(aes(x=Year, y=value), data = inv_melt.data)
The plot shows the aggregated values of all countries per year, but I want them per country in such a manner that I can also select to plot certain countries only.
How do I utilize melt in such a manner? Could someone walk me through this?
There are no columns named "Year" in the linked to data set, there are columns per year. So it need to be melted by "country" and then the "variable" edited with sub.
inv_melt.data <- reshape2::melt(investments_t.data, id.vars="country")
inv_melt.data$variable <- as.integer(sub("^X", "", inv_melt.data$variable))
ggplot(inv_melt.data, aes(variable, value, color = country)) +
geom_line(show.legend = FALSE)
Edit.
The following code keeps only some countries, filtering out the ones with more missing values.
i <- sapply(investments_t.data[-1], function(x) sum(is.na(x)) == 0)
i <- c(1, which(i))
inv_melt.data <- reshape2::melt(investments_t.data[i], id.vars = "Year")
ggplot(inv_melt.data, aes(Year, value, color = variable)) +
geom_line(show.legend = FALSE)

Create barplot to represent time series in ggplot2

I have a basic dataframe with 3 columns: (i) a date (when a sample was taken); (ii) a site location and (iii) a binary variable indicating what the condition was when sampling (e.g. wet versus dry).
Some reproducible data:
df <- data.frame(Date = rep(seq(as.Date("2010-01-01"), as.Date("2010-12-01"), by="months"),times=2))
df$Site <- c(rep("Site.A",times = 12),rep("Site.B",times = 12))
df$Condition<- as.factor(c(0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0))
What I would like to do is use ggplot to create a bar chart indicating the condition of each site (y axis) over time (x axis) - the condition indicated by a different colour. I am guessing some kind of flipped barplot would be the way to do this, but I cannot figure out how to tell ggplot2 to recognise the values chronologically, rather than summed for each condition. This is my attempt so far which clearly doesn't do what I need it to.
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()
So I have 2 questions. Firstly, how do I tell ggplot to recognise changes in condition over time and not just group each condition in a traditional stacked bar chart?
Secondly, it seems ggplot converts the date to a numerical value, how would I reformat the x-axis to show a time period, e.g. in a month-year format? I have tried doing this via the scale_x_date function, but get an error message.
labDates <- seq(from = (head(df$Date, 1)),
to = (tail(df$Date, 1)), by = "1 months")
Datelabels <-format(labDates,"%b %y")
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()+
scale_x_date(labels = Datelabels, breaks=labDates)
I have also tried converting sampling times to factors and displaying these instead. Below I have done this by changing each sampling period to a letter (in my own code, the factor levels are in a month-year format - I put letters here for simplicity). But I cannot format the axis to place each level of the factor as a tick mark. Either a date or factor solution for this second question would be great!
df$Factor <- as.factor(unique(df$Date))
levels(df$Factor) <- list(A = "2010-01-01", B = "2010-02-01",
C = "2010-03-01", D = "2010-04-01", E = "2010-05-01",
`F` = "2010-06-01", G = "2010-07-01", H = "2010-08-01",
I = "2010-09-01", J = "2010-10-01", K= "2010-11-01", L = "2010-12-01")
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()+
scale_y_discrete(breaks=as.numeric(unique(df$Date)),
labels=levels(df$Factor))
Thank you in advance!
It doesn't really make sense to use geom_bar() considering you do not want to summarise the data and require the visualisation over "time"
I would rather use geom_line() and increase the line thickness if you want to portray a bar chart.
library(tidyr)
library(dplyr)
library(ggplot2)
library(scales)
library(lubridate)
df <- data.frame(Date = rep(seq.Date(as.Date("2010-01-01"), as.Date("2010-12-01"), by="months"),times=2))
df$Site <- c(rep("Site.A",times = 12),rep("Site.B",times = 12))
df$Condition<- as.factor(c(0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0))
df$Date <- ymd(df$Date)
ggplot(df) +
geom_line(aes(y=Site,x=Date,color=Condition),size=10)+
scale_x_date(labels = date_format("%b-%y"))
Note using coord_flip() also does not work, I think this causes the Date issue, see below threads:
how to use coord_carteisan and coord_flip together in ggplot2
In ggplot2, coord_flip and free scales don't work together

R plot two series of means with 95% confidence intervals

I am trying to plot the following data
factor <- as.factor(c(1,2,3))
V1_mean <- c(100,200,300)
V2_mean <- c(350,150,60)
V1_stderr <- c(5,9,3)
V2_stderr <- c(12,9,10)
plot <- data.frame(factor,V1_mean,V2_mean,V1_stderr,V2_stderr)
I want to create a plot with factor on the x-axis, value on the y-axis and seperate lines for V1 and V2 (hence the points are the values of V1_mean on one line and V2_mean on the other). I would also like to add error bars for these means based on V1_stderr and V2_stderr
Many thanks
I'm not sure regarding your desired output, but here's a possible solution.
First of all, I wouldn't call your data plot as this is a stored function in R which is being commonly used
Second of all, when you want to plot two lines in ggplot you'll usually have to tide your data using functions such as melt (from reshape2 package) or gather (from tidyr package).
Here's an a possible approach
library(ggplot2)
library(reshape2)
dat <- data.frame(factor, V1_mean, V2_mean, V1_stderr, V2_stderr)
mdat <- cbind(melt(dat[1:3], "factor"), melt(dat[c(1, 4:5)], "factor"))
names(mdat) <- make.names(names(mdat), unique = TRUE)
ggplot(mdat, aes(factor, value, color = variable)) +
geom_point(aes(group = variable)) + # You can also add `geom_point(aes(group = variable)) + ` if you want to see the actual points
geom_errorbar(aes(ymin = value - value.1, ymax = value + value.1))

Subset of data included in more than one ggplot facet

I have a population and a sample of that population. I've made a few plots comparing them using ggplot2 and its faceting option, but it occurred to me that having the sample in its own facet will distort the population plots (however slightly). Is there a way to facet the plots so that all records are in the population plot, and just the sampled records in the second plot?
Matt,
If I understood your question properly - you want to have a faceted plot where one panel contains all of your data, and the subsequent facets contain only a subset of that first plot?
There's probably a cleaner way to do this, but you can create a new data.frame object with the appropriate faceting variable that corresponds to each subset. Consider:
library(ggplot2)
df <- data.frame(x = rnorm(100), y = rnorm(100), sub = sample(letters[1:5], 100, TRUE))
df2 <- rbind(
cbind(df, faceter = "Whole Sample")
, cbind(df[df$sub == "a" ,], faceter = "Subset A")
#other subsets go here...
)
qplot(x,y, data = df2) + facet_wrap(~ faceter)
Let me know if I've misunderstood your question.
-Chase

Resources