all.
I read several previous message at stackoverflow, and went through the documentation of zoo and ggplot2 but didn't find any suitable answer.
Say I have a zoo object called 'data'. The original data in the flat file are as follows:
Date,Quote1,Quote2,Quote3,Quote4,Quote5
18/07/2008,42.36,44.53,28.4302,44.3,42
21/07/2008,43.14,44.87,28.6186,44.83,43.27
22/07/2008,43.26,44.85,28.6056,44.86,42.84
23/07/2008,44.74,45.61,29.7558,45.69,#N/A
24/07/2008,43.99,45.14,29.2944,45.19,#N/A
25/07/2008,43.18,45.33,29.4569,45.46,43.65
28/07/2008,43.45,44.72,28.5016,44.89,43.31
29/07/2008,43.49,44.8,28.1247,44.88,42.85
30/07/2008,44.55,45.54,28.0727,45.58,43.67
31/07/2008,43.36,45.5,27.9818,45.63,43.91
01/08/2008,43.34,44.75,28.0792,44.69,43.04
Now, I want to plot the time series of this five financial products on a single line graph so that to compare their evolution.
I wish to use the ggplot2.
Would anyone be kind to give me some hints?
If data is your zoo object then try this (and see ?autoplot.zoo for more info):
p <- autoplot(data, facet = NULL)
p
or perhaps this since I don't think the automatic varying of linetype looks so good with this many series in the same panel:
p + aes(linetype = NULL)
Here is one way to do it:
df <- read.csv(text = "Date,Quote1,Quote2,Quote3,Quote4,Quote5
18/07/2008,42.36,44.53,28.4302,44.3,42
21/07/2008,43.14,44.87,28.6186,44.83,43.27
22/07/2008,43.26,44.85,28.6056,44.86,42.84
23/07/2008,44.74,45.61,29.7558,45.69,#N/A
24/07/2008,43.99,45.14,29.2944,45.19,#N/A
25/07/2008,43.18,45.33,29.4569,45.46,43.65
28/07/2008,43.45,44.72,28.5016,44.89,43.31
29/07/2008,43.49,44.8,28.1247,44.88,42.85
30/07/2008,44.55,45.54,28.0727,45.58,43.67
31/07/2008,43.36,45.5,27.9818,45.63,43.91
01/08/2008,43.34,44.75,28.0792,44.69,43.04", na.string = "#N/A")
df$Date <- strptime(df$Date, format = "%d/%m/%Y")
Create a zoo object:
library(zoo)
dat <- zoo(df[-1], df$Date)
Transform the object to a data frame for ggplot2:
df_new <- data.frame(value = as.vector(dat),
time = time(dat),
quote = rep(names(dat), each = nrow(dat)))
Plot:
library(ggplot2)
ggplot(df_new, aes(y = value, x = time, colour = quote)) + geom_line()
Here's another slightly different method, using melt from reshape
# Read your data and format date (as proposed by Sven)
df <- read.csv(text = "Date,Quote1,Quote2,Quote3,Quote4,Quote5
18/07/2008,42.36,44.53,28.4302,44.3,42
21/07/2008,43.14,44.87,28.6186,44.83,43.27
22/07/2008,43.26,44.85,28.6056,44.86,42.84
23/07/2008,44.74,45.61,29.7558,45.69,#N/A
24/07/2008,43.99,45.14,29.2944,45.19,#N/A
25/07/2008,43.18,45.33,29.4569,45.46,43.65
28/07/2008,43.45,44.72,28.5016,44.89,43.31
29/07/2008,43.49,44.8,28.1247,44.88,42.85
30/07/2008,44.55,45.54,28.0727,45.58,43.67
31/07/2008,43.36,45.5,27.9818,45.63,43.91
01/08/2008,43.34,44.75,28.0792,44.69,43.04", na.string = "#N/A")
df$Date <- strptime(df$Date, format = "%d/%m/%Y")
library(reshape)
# reshape your data with melt
melted <- melt(df[-1])
# add dates
melted2 <- cbind(df$Date,melted)
# plot with ggplot
ggplot(melted2,aes(y = value, x = melted2[,1], color = variable)) + geom_line()
Related
i have a dataset given with:
Country Time Value
1 USA 1999-Q1 292929
2 USA 1999-Q2 392023
3. USA 1999-Q3 9392992
4
.... and so on. Now I would like to plot this dataframe with Time being on the x-axis and y being the Value. But the problem I face is I dont know how to plot the Time. Because it is not given in month/date/year. If that would be the case I would just code as.Date( format = "%m%d%y"). I am not allowed to change the quarterly name. So when I plot it, it should stay that way. How can I do this?
Thank you in advance!
Assuming DF shown in the Note at the end, convert the Time column to yearqtr class which directly represents year and quarter (as opposed to using Date class) and use scale_x_yearqtr. See ?scale_x_yearqtr for more information.
library(ggplot2)
library(zoo)
fmt <- "%Y-Q%q"
DF$Time <- as.yearqtr(DF$Time, format = fmt)
ggplot(DF, aes(Time, Value, col = Country)) +
geom_point() +
geom_line() +
scale_x_yearqtr(format = fmt)
(continued after graphics)
It would also be possible to convert it to a wide form zoo object with one column per country and then use autoplot. Using DF from the Note below:
fmt <- "%Y-Q%q"
z <- read.zoo(DF, split = "Country", index = "Time",
FUN = as.yearqtr, format = fmt)
autoplot(z) + scale_x_yearqtr(format = fmt)
Note
Lines <- "
Country Time Value
1 USA 1999-Q1 292929
2 USA 1999-Q2 392023
3 USA 1999-Q3 9392992"
DF <- read.table(text = Lines)
Using ggplot2:
library(ggplot2)
ggplot(df, aes(Time, Value, fill = Country)) + geom_col()
I know other people have already answered, but I think this more general answer should also be here.
When you do as.Date(), you can only do the beginning. I tried it on your data frame (I called it df), and it worked:
> as.Date(df$Time, format = "%Y")
[1] "1999-11-28" "1999-11-28" "1999-11-28"
Now, I don't know if you want to use plot(), ggplot(), the ggplot2 library... I don't know that, and it doesn't matter. However you want to specify the y axis, you can do it this way.
I have a time series data with multiple stocks. I would like to plot them in one plot in R.
I tried an existing answer in this website but I got an error. Here is my code:
library(quantmod)
library(TSclust)
library(ggplot2)
# download financial data
symbols = c('ASX', 'AZN', 'BP', 'AAPL')
start = as.Date("2014-01-01")
until = as.Date("2014-12-31")
stocks = lapply(symbols, function(symbol) {
Close = getSymbols(symbol,src='yahoo', from = start, to = until, auto.assign = FALSE)[, 6]
names(adjust) = symbol
adjust
})
I tried the following from an exiting answer (from here)
qplot(symbols, value, data = as.data.frame(stocks), geom = "line", group = variable) +
facet_grid(variable ~ ., scale = "free_y")
I got the following error:
Error: At least one layer must contain all faceting variables: variable.
Plot is missing variable
Layer 1 is missing variable
I would like to have similar to the following plot:
While Len Greski's answer has a great explanation and solution, I thought I'd provide an answer with a more 'standard' approach. Maybe some users will find it simpler.
library(quantmod)
library(ggplot2)
symbols <- c("ASX", "AZN", "BP", "AAPL")
start <- as.Date("2014-01-01")
until <- as.Date("2014-12-31")
# import data into an environment
e <- new.env()
getSymbols(symbols, src = "yahoo", from = start, to = until, env = e)
# extract the adjusted close and merge into one xts object
stocks <- do.call(merge, lapply(e, Ad))
# Remove the ".Adjusted" suffix from each symbol column name
colnames(stocks) <- gsub(".Adjusted", "", colnames(stocks), fixed = TRUE)
# convert the xts object to a long data frame
stocks_df <- fortify(stocks, melt = TRUE)
# plot the data
qplot(Index, Value, data = stocks_df, geom = "line", group = Series) +
facet_grid(Series ~ ., scale = "free_y")
The error messages in the original code are caused by the fact that there is no column called variable in the data that is passed to qplot(). Additionally, in order to produce the desired chart, we need to extract the dates from the xts objects generated by quantmod so we can use them as the x axis variable in the chart.
With some adjustments to place the appropriate variables from the stock data into the qplot() specification we can produce the required chart.
We modify the code to read the list of stocks as follows:
Convert the xts objects to objects of type data.frame
Rename columns to eliminate ticker symbols so we can rbind() into a single data frame in a subsequent step
Extract the rownames() into a data frame column
Having made these changes, the stocks object contains a list of data frames, one per stock ticker.
symbols = c('ASX', 'AZN', 'BP', 'AAPL')
start = as.Date("2014-01-01")
until = as.Date("2014-12-31")
stocks = lapply(symbols, function(symbol) {
aStock = as.data.frame(getSymbols(symbol,src='yahoo', from = start, to = until,
auto.assign = FALSE))
colnames(aStock) <- c("Open","High","Low","Close","Volume","Adjusted")
aStock$Symbol <- symbol
aStock$Date <- rownames(aStock)
aStock
})
Next, we use do.call() with rbind() to combine the data into a single data frame that we'll use with qplot()`.
stocksDf <- do.call(rbind,stocks)
Finally, we use qplot() with Date and Close as the x and y variables, and facet_grid() with Symbol to generate the facets.
qplot(Date, Close, data = stocksDf, geom = "line", group = Symbol) +
facet_grid(Symbol ~ ., scale = "free_y")
...and the initial output:
Having generated the chart, we'll make some adjustments to clean up the x axis labels. On the default chart they are unintelligible because there are 251 different character values, and we need to rescale the axis to print fewer labels.
First, we convert the character-based dates with as.Date(). Second, we use the ggeasy package to adjust the content on the x axis.
stocks = lapply(symbols, function(symbol) {
aStock = as.data.frame(getSymbols(symbol,src='yahoo', from = start, to = until,
auto.assign = FALSE))
colnames(aStock) <- c("Open","High","Low","Close","Volume","Adjusted")
aStock$Symbol <- symbol
aStock$Date <- as.Date(rownames(aStock),"%Y-%m-%d")
aStock
})
stocksDf <- do.call(rbind,stocks)
library(ggeasy)
qplot(Date, Close, data = stocksDf, geom = "line", group = Symbol) +
facet_grid(Symbol ~ ., scale = "free_y") +
scale_x_date(date_breaks = "14 days") +
easy_rotate_x_labels(angle = 45, side = "right")
...and the revised output:
NOTE: to chart the Adjusted Closing price, simply change the y variable in the qplot() function to Adjusted.
I'm currently working on automating some basic experiential analysis using R. Currently, I've got my script setup as follows which generates the plot shown below.
data <- list()
for (experiment in experiments) {
path = paste('../out/', experiment, '/', plot, '.csv', sep="")
data[[experiment]] <- read.csv(path, header=F)
}
df <- data.frame(Year=1:40,
'current'=colMeans(data[['current']]),
'vip'=colMeans(data[['vip']]),
'vipbonus'=colMeans(data[['vipbonus']]))
df <- melt(df, id.vars = 'Year', variable.name = 'Series')
plotted <- ggplot(df, aes(Year, value)) +
geom_line(aes(colour = Series)) +
labs(y = ylabel, title = title)
file = paste(plot, '.png', sep="")
ggsave(filename = file, plot = plotted)
While this is close to what we want the final product to look like, the series labels need to be updated. Ideally we want them to be something like "VIP, no bonus", "VIP, with bonus" and so forth, but obviously using labels like that in the data frame is not valid R (and invalid characters are automatically replaced with . even with backticks). Since these experiments are a work in progress, we also know that we are gong to need more series labels in the future so we don't want to lose the ability of ggplot to automatically set the colors for us.
How can I set the series labels to be appropriate for humans?
The OP explained that he is currently working on automating some basic experiential analysis, part of which is the relabeling of the series. The OP showed also some code which is used to prepare the data to be plotted.
Based on the additional information supplied in comments, I believe the overall processing could be streamlined which will address the series labeling issue as well.
Some preparations
# used for creating file paths
experiments <- c("current", "vip", "vipbonus")
# used for labeling the series
exp_labels <- c("Current", "VIP, no bonus", "VIP, with bonus")
plot <- "dataset1" # e.g.
paths <- paste0(file.path("../out", experiments, plot), ".csv")
paths
#[1] "../out/current/dataset1.csv" "../out/vip/dataset1.csv" "../out/vipbonus/dataset1.csv"
Read data
library(data.table) #version 1.10.4 used here
# read all files into one large data.table
# add running count in column "Series" to identify the source of each row
DT <- rbindlist(lapply(paths, fread, header = FALSE), idcol = "Series")
# rename file chunks = Series, use predefined labels
DT[, Series := factor(Series, labels = exp_labels)]
Reshape and aggregate by groups
# reshape from wide to long
molten <- melt(DT, id.vars = "Series")
# compute means by Series and Year = variable
aggregated <- molten[, .(value = mean(value)), by = .(Series, variable)]
# take factor level number of "variable" as Year
aggregated[, Year := as.integer(variable)]
Note that aggregation is done in long format (after melt()) to save typing the same command for each column.
Create chart & save to disk
library(ggplot2)
ggplot(aggregated, aes(Year, value)) +
geom_line(aes(colour = Series)) +
labs(y = "ylabel", title = "title")
file = paste(plot, '.png', sep="")
ggsave(filename = file) # by default, the last plot is saved
While this may not be an ideal approach, what we found that worked for us was to update the relevant series labels after the melt command was performed:
df$Series <- as.character(df$Series)
df$Series[df$Series == "current"] <- "Current"
df$Series[df$Series == "vip"] <- "VIP, no bonus"
df$Series[df$Series == "vipbonus"] <- "VIP, with bonus"
Which results in plots like the following:
You can try this
library(tidyverse)
df <- df %>% dplyr::mutate(Series = as.character(Series),
Series = fct_recode(Series,
"Current" = "current",
"VIP, no bonus" = "vip",
"VIP, with bonus" = "vipbonus"))
I am wondering how to dynamically set the x axis limits of a time series plot containing two time series with different dates. I have developed the following code to provide a reproducible example of my problem.
#Dummy Data
Data1 <- data.frame(Date = c("4/24/1995","6/23/1995","2/12/1996","4/14/1997","9/13/1998"), Area_2D = c(20,11,5,25,50))
Data2 <- data.frame(Date = c("6/23/1995","4/14/1996","11/3/1997","11/6/1997","4/15/1998"), Area_2D = c(13,15,18,25,19))
Data3 <- data.frame(Date = c("4/24/1995","6/23/1995","2/12/1996","4/14/1996","9/13/1998"), Area_2D = c(20,25,28,30,35))
Data4 <- data.frame(Date = c("6/23/1995","4/14/1996","11/3/1997","11/6/1997","4/15/1998"), Area_2D = c(13,15,18,25,19))
#Convert date column as date
Data1$Date <- as.Date(Data1$Date,"%m/%d/%Y")
Data2$Date <- as.Date(Data2$Date,"%m/%d/%Y")
Data3$Date <- as.Date(Data3$Date,"%m/%d/%Y")
Data4$Date <- as.Date(Data4$Date,"%m/%d/%Y")
#PLOT THE DATA
max_y1 <- max(Data1$Area_2D)
# Define colors to be used for cars, trucks, suvs
plot_colors <- c("blue","red")
plot(Data1$Date,Data1$Area_2D, col=plot_colors[1],
ylim=c(0,max_y1), xlim=c(min_x1,max_x1),pch=16, xlab="Date",ylab="Area", type="o")
par(new=T)
plot(Data2$Date,Data2$Area_2D, col=plot_colors[2],
ylim=c(0,max_y1), xlim=c(min_x1,max_x1),pch=16, xlab="Date",ylab="Area", type="o")
The main problem I see with the code above is there are two different x axis on the plot, one for Data1 and another for Data2. I want to have a single x axis spanning the date range determined by the dates in Data1 and Data2.
My questions is:
How do i dynamically create an x axis for both series? (i.e select the minimum and maximum date from the data frames 'Data1' and 'Data2')
The solution is to combine the data into one data.frame, and base the x-axis on that. This approach works very well with the ggplot2 plotting package. First we merge the data and add an ID column, which specifies to which dataset it belongs. I use letters here:
Data1$ID = 'A'
Data2$ID = 'B'
merged_data = rbind(Data1, Data2)
And then create the plot using ggplot2, where the color denotes which dataset it belongs to (can easily be changed to different colors):
library(ggplot2)
ggplot(merged_data, aes(x = Date, y = Area_2D, color = ID)) +
geom_point() + geom_line()
Note that you get one uniform x-axis here. In this case this is fine, but if the timeseries do not overlap, this might be problematic. In that case we can use multiple sub-plots, known as facets in ggplot2:
ggplot(merged_data, aes(x = Date, y = Area_2D)) +
geom_point() + geom_line() + facet_wrap(~ ID, scales = 'free_x')
Now each facet has it's own x-axis, i.e. one for each sub-dataset. What approach is most valid depends on the specific situation.
What is the best way to construct a barplot to compare two sets of data?
e.g. dataset:
Number <- c(1,2,3,4)
Yresult <- c(1233,223,2223,4455)
Xresult <- c(1223,334,4421,0)
nyx <- data.frame(Number, Yresult, Xresult)
What I want is Number across X and bars beside each other representing the individual X and Y values
It is better to reshape your data into long format. You can do that with for example the melt function of the reshape2 package (alternatives are reshape from base R, melt from data.table (which is an extended implementation of the melt function of reshape2) and gather from tidyr).
Using your dataset:
# load needed libraries
library(reshape2)
library(ggplot2)
# reshape your data into long format
nyxlong <- melt(nyx, id=c("Number"))
# make the plot
ggplot(nyxlong) +
geom_bar(aes(x = Number, y = value, fill = variable),
stat="identity", position = "dodge", width = 0.7) +
scale_fill_manual("Result\n", values = c("red","blue"),
labels = c(" Yresult", " Xresult")) +
labs(x="\nNumber",y="Result\n") +
theme_bw(base_size = 14)
which gives the following barchart: