Simple way to shorten time period of graph in ggplot - r

Let's consider data :
library(ggplot2)
library(quantmod)
start <- as.Date("2013-01-01")
end <- as.Date("2016-10-01")
# Apple stock
getSymbols("AAPL", src = "yahoo", from = start, to = end)
And plot :
autoplot(Cl(AAPL))
My question is : is there any way how can I simlpy shorten time period of my plot ? Let's say for example I want to have my plot from '2013-01-01' up to '2014-01-01'. Of course I can do exactly same thing with changing my start and end variables (defined at the very beggining) and redownload data set. However I found this solution inefficient. Is there any simpler way how it can be performed ?

There are two approaches. One is to specify limits to the plotting routine and the other is to subset the data itself. Since the first one is already illustrated by another answer we will focus on the second:
# xts suppports .../... notation
apple <- Cl(AAPL)['2013-01-01/2014-01-01']
# this will extract all rows for 2013
apple <- Cl(AAPL)['2013']
# window function
apple <- window(Cl(AAPL), start = "2013-01-01", end = "2014-01-01")
Having defined apple we can autoplot it.

You can add an xlim = argument to autoplot:
autoplot(Cl(AAPL),
xlim = as.Date(c("2014-01-01","2016-04-01")))
You can also use the + operator if you prefer:
autoplot(Cl(AAPL)) +
xlim(as.Date(c("2014-01-01","2016-04-01")))
See help(autoplot.zoo) for more.

Related

Highlight a period in graph with R

Is there an easy way to highlight the period in R?
Right now I am trying to use the following, but it does not work for me:
library (PerformanceAnalytics)
period <- c("2018-02/2019-01")
chart.TimeSeries(btc_xts,
period.areas = "2018-02/2019-01",
period.color = "lightgrey")
As a result, I see the historical graph, but I do not the the highlighting.
The outcome attached. Who could help?
Thank you!
the result of the action, no highlighting
I've never used the library but reading from the doc of period.areas :
period.areas : these are shaded areas described by start and end dates in a vector of
xts date rangees, e.g., c("1926-10::1927-11","1929-08::1933-03")
The code should be :
library(PerformanceAnalytics)
period = c("2018-02::2019-01")
chart.TimeSeries(btc_xts,
period.areas = period,
period.color = "lightgrey")
`

Plotting in ggplot after converting to data.frame with a single column?

I'm trying to convert some simple data into a form I thought ggplot2 would accept.
I snag some simple stock data and now I just want to plot, later I want to plot say a 10-day moving average or a 30-day historical volatility period to go with it, which is I'm using ggplot.
I thought it would work something like this line of pseudocode
ggplot(maindata)+geom_line(moving average)+geom_line(30dayvol)
library(quantmod)
library(ggplot2)
start = as.Date("2008-01-01")
end = as.Date("2019-02-13")
start
tickers = c("AMD")
getSymbols(tickers, src = 'yahoo', from = start, to = end)
closing_prices = as.data.frame(AMD$AMD.Close)
ggplot(closing_prices, aes(y='AMD.Close'))
But I can't even get this to work. The problem of course appears to be that I don't have an x-axis. How do I tell ggplot to use the index column as a. Can this not work? Do I have to create a new "date" or "day" column?
This line for instance using the Regular R plot function works just fine
plot.ts(closing_prices)
This works without requiring me to enter a hard x-axis, and produces a graph, however I haven't figured out how to layer other lines onto this same graph, evidently ggplot is better so I tried that.
Any advice?
as.Date(rownames(df)) will get you the rownames and parse it as a date. You also need to specify a geom_line()
library(quantmod)
library(ggplot2)
start = as.Date("2008-01-01")
end = as.Date("2019-02-13")
start
tickers = c("AMD")
getSymbols(tickers, src = 'yahoo', from = start, to = end)
closing_prices = as.data.frame(AMD$AMD.Close)
ggplot(closing_prices, aes(x = as.Date(rownames(closing_prices)),y=AMD.Close))+
geom_line()
Edit
Thought it would be easier to explain in the answers as opposed to the comments.
ggplot and dplyr have two methods of evaluation. Standard and non standard evaluation. Which is why in ggplot you have both aes and aes_(). The former being non standard evaluation and the later being standard evaluation. In addition there is also aes_string() which is also standard evaluation.
How are these different?
Its easy to see when we explore all the methods,
#Cleaner to read, define every operation in one step
#Non Standard Evaluation
closing_prices%>%
mutate(dates = as.Date(rownames(.)))%>%
ggplot()+
geom_line(aes(x = dates,y = AMD.Close))
#Standard Evaluation
closing_prices%>%
mutate(dates = as.Date(rownames(.)))%>%
ggplot()+
geom_line(aes_(x = quote(dates),y = quote(AMD.Close)))
closing_prices%>%
mutate(dates = as.Date(rownames(.)))%>%
ggplot()+
geom_line(aes_string(x = "dates",y = "AMD.Close"))
Why are there so many different ways of doing the same thing? In most cases its okay to use non standard evaluation. However if we want to wrap these plots in functions and dynamically change the column to plot based on function parametrs passed as strings. It is helpful to plot using the aes_ and aes_string.

Plotting Basic Time Series Data in R - Not Plotting Correctly

I'm trying to plot some time series data. My plot looks like the following:
I'm uncertain as to why it displays the date as such. I'm using R Markdown in R studio. Below is my code:
agemployment<-read.csv("Employment-Level1.csv", header=TRUE)
Tried to change the class of Date:
as.Date(as.character(agemployment$Date),format="%m%d%Y")
That did nothing. Rest of code here:
`attach(agemployment)
View(agemployment)
head(agemployment)
agemployment<-ts(agemployment,frequency=12,start=c(2008, 1))
plot(agemployment, col="black", main="Agriculture Employment Level",
ylab="Total Employment Level (Thousands)", ylim=c(0, 250),lwd=2,
xaxs="i", yaxs="i", lty=1)'
This produces the above plot. I'm uncertain what I'm doing wrong. I would appreciate any help. Thank you!
EDIT:
Data here:
I suspect your issues are somehow driven by attach, generally attaching data frames is not a good practice. The following super-simple code worked for me:
# small dataset from your example, I use package readr to load it as data frame
df = readr::read_csv("DATE,Employment
1/1/2008,1245
2/1/2008,1280
3/1/2008,1343
4/1/2008,1251
5/1/2008,1236
6/1/2008,1265")
ts <- ts(data = df$Employment, frequency = 12, start = c(2008, 1))
plot(ts)
Using the file generated reproducibly in the Note at the end read the file into a zoo object making the index of class "yearmon" (representing year and month without day). Then plot it.
library(zoo)
z <- read.csv.zoo("Employment-Level1.csv", format = "%m/%d/%Y", FUN = as.yearmon)
plot(z)
or
library(ggplot2)
autoplot(z) + scale_x_yearmon()
(continued after plots)
If you wanted to convert z to a ts object or data frame:
tt <- as.ts(z)
DF <- fortify.zoo(z)
Note
Lines <- "DATE,Employment
1/1/2008,1245
2/1/2008,1280
3/1/2008,1343
4/1/2008,1251
5/1/2008,1236
6/1/2008,1265"
cat(Lines, file = "Employment-Level1.csv") # write out file
Realize that by providing an image in the question it means that everyone who answers must retype your data so in the future please provide the input data to questions in a reproducible form as we have done here.

Advise a Chemist: Automate/Streamline his Voltammetry Data Graphing Code

I am a chemist dealing with a significant amount of voltammetry data recently. Let me be very clear and give some research information. I run scans from a starting voltage to an ending voltage on solid state conductive films. These scans are saved as .txt files (name scheme: run#.txt) in a single folder. I am looking at how conductance changes as temperature changes. The LINEST line plotting current v. voltage at a given temperature gives me a line with slope = conductance. Once I have the conductances (slopes) for each scan, I plot conductance v. temperature to see the temperature dependent conductance characteristics. I had been doing this in Excel, but have found quicker ways to get the job done using R. I am brand new to R (Rstudio) and recognize that my coding is not the best. Without doubt, this process can be streamlined and sped up which would help immensely. This is how I am performing the process currently:
# Set working directory with folder containing all .txt files for inspection
# Add all .txt files to the global environment
allruns<-list.files(pattern=".txt")
for(i in 1:length(allruns))assign(allruns[i],read.table(allruns[i]))
Since the voltage column (a 1x1000 matrix) is the same for all runs and is in column V1 of each .txt file, I assign a x to be the voltage column from the first folder
x<-run1.txt$V1
All currents (these change as voltage changes) are found in the V2 column of all the .txt files, so I assign y# to each. These are entered one at a time..
y1<-run1.txt$V2
y2<-run2.txt$V2
y3<-run3.txt$V2
# ...
yn<-runn.txt$V2
So that I can get the eqn for each LINEST (one LINEST for each scan and plotted with abline later). Again entered one at a time:
run1<-lm(y1~x)
run2<-lm(y2~x)
run3<-lm(y3~x)
# ...
runn<-lm(yn~x)
To obtain a single graph with all LINEST (one for each scan ) on the same plot, without the data points showing up, I have been using this pattern of coding to first get all data points on a single plot in separate series:
plot(x,y1,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y2,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y3,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
# ...
par(new=TRUE)
plot(x,yn,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
#To obtain all LINEST lines (one for each scan, on the single graph):
abline(run1,col=””, lwd=1)
abline(run2,col=””,lwd=1)
abline(run3,col=””,lwd=1)
# ...
abline(runn,col=””,lwd=1)
# Then to get each LINEST equation:
summary(run1)
summary(run2)
summary(run3)
# ...
summary(runn)
Each time I use summary(), I copy the slope and paste it into an Excel sheet- along with corresponding scan temp which I have recorded separately. I then graph the conductance v temp points for the film as X-Y scatter with smooth lines to give the temperature dependent conductance curve. Giving me a single LINEST lines plot in R and the conductance v temp in Excel.
This technique is actually MUCH quicker than doing it all in Excel, but it can be done much quicker and efficiently!!! Also, if I need to change something, this entire process needs to be reexecuted with whatever change is necessary. This process takes me maybe 5 hours in Excel and 1.5 hours in R (maybe I am too slow). Nonetheless, any tips to help automate/streamline this further are greatly appreciated.
There are plenty of questions about operating on data in lists; storing a list of matrix or a list of data.frame is fast, and code that operates cleanly on one can be applied to the remaining n-1 very easily.
(Note: the way I'm showing it here is one technique: maintaining everything in well-compartmentalized lists. Other will suggest -- very justifiably -- that combing things into a single data.frame and adding a group variable (to identify from which file/experiment the data originated) will help with more advanced multi-experiment regression or combined plotting, such as with ggplot2. I'm not going to go into this latter technique here, not yet.)
It is long decried not to do for(...) assign(..., read.csv(...)); you have the important part done, so this is relatively easy:
allruns <- sapply(list.files(pattern = "*.txt"), read.table, simplify = FALSE)
(The use of sapply(..., simplify=FALSE) is similar to lapply(...), but it has a nice side-effect of naming the individual list-ified elements with, in this case, each filename. It may not be critical here but is quite handy elsewhere.)
Extracting your invariant and variable data is simple enough:
allLMs <- lapply(allruns, function(mdl) lm(V2 ~ V1, data = mdl))
I'm using each table's V1 here instead of a once-extracted x ... though you might wonder why, I argue keeping it like for two reasons: (1) JUST IN CASE the V1 variable is ever even one-row-different, this will save you; (2) it is very easy to construct the model like this.
At this point, each object within allLMs is an lm object, meaning we might do:
summary(allLMs[[1]])
Plotting: I think I understand why you are using par=NEW, and I have to laugh ... I had been deep in R for a while before I started using that technique. What I think you need is actually much simpler:
xlim <- rev(range(allruns[[1]]$V1))
ylim <- range(sapply(allruns, `[`, "V2"))
# this next plot just sets the box and axes, no points
plot(NA, type = "na", xlim = xlim, ylim = ylim)
# no need to plot points with "transparent" ...
ign <- sapply(allLMs, abline, col = "") # and other abline options ...
Copying all models into Excel, again, using lists:
out <- do.call(rbind, sapply(allLMs, function(m) summary(m)$coefficients[,1]))
This will now be a single data.frame with all coefficients in two columns. (Feel free to use similar techniques to extract the other model summary attributes, including std err, t.value, or Pr(>|t|) (in the $coefficients); or $r.squared, $adj.r.squared, etc.)
write.csv(out, file="clipboard", sep="\t")
and paste into Excel. (Or, better yet, save it to a CSV file and import that, since you might want to keep it around.)
One of the tricks to using lists for this is to persevere: keep things in lists as long as you can, so that you don't have deal with models individually. One mantra is that if you do it once, you shouldn't have to type it again, just loop/apply/map/whatever. Don't extract too much from the lists before you have to.
Note: r2evans' answer provides good general advice and doesn't require heavy package dependencies. But it probably doesn't hurt to see alternative strategies.
The tidyverse can be quite handy for this sort of thing, here's a dummy example for illustration,
library(tidyverse)
# creating dummy data files
dummy <- function(T) {
V <- seq(-5, 5, length=20)
I <- jitter(T*V + T, factor = 1)
write.table(data.frame(V=V, I = I),
file = paste0(T,".txt"),
row.names = FALSE)
}
purrr::walk(300:320, dummy)
# reading
lf <- list.files(pattern = "\\.txt")
read_one <- function(f, ...) {cbind(T = as.numeric(gsub("\\.txt", "", f)), read.table(f, ...))}
m <- purrr::map_df(lf, read_one, header = TRUE, .id="id")
head(m)
ggplot(m, aes(V, I, group = T)) +
facet_wrap( ~ T) +
geom_point() +
geom_smooth(se = FALSE)
models <- m %>%
split(.$T) %>%
map(~lm(I ~ V, data = .))
coefs <- models %>% map_df(broom::tidy, .id = "T")
ggplot(coefs, aes(as.numeric(T), estimate)) +
geom_line() +
facet_wrap(~term, scales = "free")

How to create an animation of geospatial / temporal data

I have a set of data which contains around 150,000 observations of 800 subjects. Each observation has: subject ID, latitude, longitude, and the time that the subject was at those coordinates. The data covers a 24-hour period.
If I plot all the data at once I just get a blob. Is anyone able to give me some tips as to how I can animate this data so that I can observe the paths of the subjects as a function of time?
I've read the spacetime vignette but I'm not entirely sure it will do what I want. At this point I'm spending a whole lot of time googling but not really coming up with anything that meets my needs.
Any tips and pointers greatly appreciated!
Here my first use of animation package. It was easier than I anticipated and especially the saveHTML is really amazing. Here my scenario(even I think that my R-code will be clearer:)
I generate some data
I plot a basic plot for all persons as a background plot.
I reshape data to get to a wide format in a way I can plot an arrow between present and next position for each person.
I loop over hours , to generate many plots. I put the llop within the powerful saveHTML function.
You get a html file with a nice animation. I show here one intermediate plot.
Here my code:
library(animation)
library(ggplot2)
library(grid)
## creating some data of hours
N.hour <- 24
dat <- data.frame(person=rep(paste0('p',1:3),N.hour),
lat=sample(1:10,3*N.hour,rep=TRUE),
long=sample(1:10,3*N.hour,rep=TRUE),
time=rep(1:N.hour,each=3))
# the base plot with
base <- ggplot() +
geom_point(data=dat,aes(x=lat, y=long,colour = person),
size=5)+ theme(legend.position = "none")
## reshape data to lat and long formats
library(plyr)
dat.segs <- ddply(dat,.(person),function(x){
dd <- do.call(rbind,
lapply(seq(N.hour-1),
function(y)c(y,x[x$time %in% c(y,y+1),]$lat,
x[x$time %in% c(y,y+1),]$long)))
dd
})
colnames(dat.segs) <- c('person','path','x1','x2','y1','y2')
# a function to create the animation
oopt <- ani.options(interval = 0.5)
saveHTML({
print(base)
interval = ani.options("interval")
for(hour in seq(N.hour-1)){
# a segment for each time
tn <- geom_segment(aes(x= x1, y= y1, xend = x2,
yend = y2,colour = person),
arrow = arrow(), inherit.aes = FALSE,
data =subset(dat.segs,path==hour))
print(base <- base + tn)
ani.pause()
}
}, img.name = "plots", imgdir = "plots_dir",
htmlfile = "random.html", autobrowse = FALSE,
title = "Demo of animated lat/long for different persons",
outdir=getwd())
Your question is a bit vague, but I will share how I have done this kind of animation in the past.
Create a function that plots all the subject locations for one time slice:
plot_time = function(dataset, time_id) {
# make a plot with your favorite plotting package (e.g. `ggplot2`)
# Save it as a file on disk (e.g. using `ggsave`), under a regular name,
# frame001.png, frame002.png, see sprintf('frame%03d', time_index)
}
Call this function on each of your timeslices, e.g. using lapply:
lapply(start_time_id:stop_time_id, plot_time)
leading to a set of graphics files on the hard drive called frame001 to framexxx.
Use a tool to render those frames into a movie, e.g. using ffmpeg, see for example.
This is a general workflow, which has been already implemented in the animation package (thanks for reminding me #mdsummer). You can probably leverage that package to get your animation.

Resources