Suppose I have a vector of numbers from 1:12 and want to plot them over period of time ranged from Jan. 2013 to Dec. 2013. I used the following code to generate the data and plotting:
dates<-seq(as.Date("2013/1/1"), by = "month", length.out = 12)
n<-seq(1:12)
df<-cbind(dates,n)
plot(df)
However, some problems come up with the last code; Firstly could not find an option in the first seq to generate only months and year without day. Secondly, all dates in df become serial even after adding as.Date before dates in cbind. Finally, the x axis in the plot not in the time format as a result of the last two problems.
just use
plot(dates,n)
without cbinding it. cbind creates a matrix (see class(df)). Within this process the dates are saved as class numeric.
For nicer and easier to customize plots use
require(ggplot2)
qplot(dates,n) + xlab("") + ylab("my y lab")
df<-data.frame(dates=dates,n=n)
plot(df$dates, df$n, axes=FALSE)
axis(1, labels=format(df$dates, "%b %Y"), at=df$dates)
axis(2)
Related
#This is my code, I need help in improving the x axis resolution that is monthly (an 2017, Feb 2017.....Dec 2020)
ggplot()+
geom_line(data=IDA_DATA,
aes(y=final1,x= Date,colour="darkblue"),size=1 )+
geom_line(data=IDA_DATA,
aes(y=fpmc2,x= Date,colour="red"),
size=1) +
scale_color_discrete(name = "Y series", labels = c("Adjusted Trend", "Long Term Comp"))+
theme(legend.position = c(.85, .85))+
labs(y="PM10 Conc (ug/m3)")
I am assuming your Date variable is an actual date type (ie. numeric).
If your data is continuous (measurements on multiple days per month), you can use the breaks parameter in scale_x_continuous() to specify each break you want. Your breaks will also have to be in date format, not text strings, so you'll need dmy() to convert a vector of all the first-day-of-the-month dates. That can be tedious if your data spans many years, but I can't think of how else you can force all the month labels to show up on a continuous axis.
If your data can be summarised as a single x value for each entire month, you can convert the Date into a month-year string or factor variable and use that as a discrete x axis. Discrete axis automatically show all the x values, so you won't have to create all the breaks. This should be relatively easy to achieve.
I'm trying to change the x-axis labels on a ts plot from the default (e.g. year.samplenumber) to an actual date. I had already searched in other threads, but the solution I found isn't quite working for me.
mm17
date fullband band1
1 2015/1/14 109.0873 107.0733
2 2015/1/15 110.1434 109.1999
3 2015/1/16 109.8811 108.6232
4 2015/1/17 110.4814 109.8164
5 2015/1/18 110.1513 109.2764
6 2015/1/19 110.3266 109.5860
mm17.ts<-ts(mm17.perday[,2], frequency=365, start=c(2015, 14))
cols<-c("red", "green", "orange", "purble", "blue")
dates<-as.Date(mm17[,1])
ts.plot(mm17.ts, col=cols[1], xaxt="n")
axis(1, dates, format(dates, "%m %d"), cex.axis = .7)
As you can see the axis command isn't working for some reason.
In general, ts class is not a good fit for daily data. It is more suitable for monthly, quarterly and annual data.
For daily data, it would be easier to just convert it to zoo and plot:
library(zoo)
z <- read.zoo(mm17, format = "%Y/%m/%d")
plot(z$fullband, col = "red")
Note
We assume the mm17 is given as shown below.
Lines <- " date fullband band1
1 2015/1/14 109.0873 107.0733
2 2015/1/15 110.1434 109.1999
3 2015/1/16 109.8811 108.6232
4 2015/1/17 110.4814 109.8164
5 2015/1/18 110.1513 109.2764
6 2015/1/19 110.3266 109.5860"
mm17 <- read.table(text = Lines)
What's happening here is a mismatch between the underlying numeric values of the dates as plotted by ts.plot and the dates vector. The x-axis dates in the output of ts.plot literally have magnitudes of 2015.1, 2015.2 etc. However, the underlying numeric values of the dates in the dates vector are the number of days from January 1, 1970 to the given date (dates in R are actually numeric values with a Date class attached). For example:
dates
[1] "2015-01-14" "2015-01-15" "2015-01-16" "2015-01-17" "2015-01-18" "2015-01-19"
as.numeric(dates)
[1] 16449 16450 16451 16452 16453 16454
x=16449
class(x)="Date"
x
[1] "2015-01-14"
You can also see this with the following code. We expand the x-axis range to include the numeric values listed above. Note how you can see one of your date labels way out on the right end of the plot at 16,449, while the data values are plotted near the left side at 2015:
ts.plot(mm17.ts, col=cols[1], xlim=c(0, 16455))
axis(1, dates, format(dates, "%m %d"), cex.axis = .8, col.axis="red")
axis(1, 2015, 2015, cex.axis = .8, col.axis="red")
So, let's change the at argument in the axis function so that we get the date labels placed at the correct locations. We'll use a couple of functions from the lubridate package to help with this. Also, note that to remove the default x-axis labels, ts.plot requires that xaxt (and other graphical parameters) be passed as a list using the gpars argument (see the ts.plot help for more on this):
library(lubridate)
ts.plot(mm17.ts, col=cols[1], gpars=list(xaxt="n"))
axis(1, at=year(dates) + yday(dates)/365.24, labels=format(dates, "%m %d"), cex.axis = .7)
What is the smartest way to manipulate POSIX for use in ggplot axis?
I am trying to create a function for plotting many graphs (One per day) spanning a period of weeks, using POSIX time for the x axis.
To do so, I create an additional integer column DF$Day with the day, that I input into the function. Then, I create a subset using that day, which I plot using ggplot2. I figured how to use scale_x_datetime to format the POSIX x axis. Basically, I have it show the hours & minutes only, omitting the date.
Here is my question: How can I set the limits for each individual graph in hours of the day?
Below is some working, reproducible code to get an idea. It creates the first day, shows it for 3 seconds & the proceeds to create the second day. But, each days limits is chosen based on the range of the time variable. How can I make the range, for instance, all day long (0h - 24h)?
DF <- data.frame(matrix(ncol = 0, nrow = 4))
DF$time <- as.POSIXct(c("2010-01-01 02:01:00", "2010-01-01 18:10:00", "2010-01-02 04:20:00", "2010-01-02 13:30:00"))
DF$observation <- c(1,2,1,2)
DF$Day <- c(1,1,2,2)
for (Individual_Day in 1:2) {
Day_subset <- DF[DF$Day == as.integer(Individual_Day),]
print(ggplot( data=Day_subset, aes_string( x="time", y="observation") ) + geom_point() +
scale_x_datetime( breaks=("2 hour"), minor_breaks=("1 hour"), labels=date_format("%H:%M")))
Sys.sleep(3) }
Well, here's one way.
# ...
for (Individual_Day in 1:2) {
Day_subset <- DF[DF$Day == as.integer(Individual_Day),]
lower <- with(Day_subset,as.POSIXct(strftime(min(time),"%Y-%m-%d")))
upper <- with(Day_subset,as.POSIXct(strftime(as.Date(max(time))+1,"%Y-%m-%d"))-1)
limits = c(lower,upper)
print(ggplot( data=Day_subset, aes( x=time, y=observation) ) +
geom_point() +
scale_x_datetime( breaks=("2 hour"),
minor_breaks=("1 hour"),
labels=date_format("%H:%M"),
limits=limits)
)
}
The calculation for lower takes the minimum time in the subset and coerces it to character with only the date part (e.g., strips away the time part). Converting back to POSIXct generates the beginning of that day.
The calculation for upper is a little more complicated. You have to convert the maximum time to a Date value and add 1 (e.g., 1 day), then convert to character (strip off the time part), convert back to POSIXct, and subtract 1 (e.g., 1 second). This generates 23:59 on the end day.
Huge amount of work for such a small thing. I hope someone else posts a simpler way to do this...
:)
I would like to ask you something about the plot in R. I would be very happy if someone could help me!!!
I wrote a code of Heston Model in R. It produced a vector of Option Prices, lets call it H1.
Each Option price (each element of the vector H1) corresponds to one day. The vector is very long ( 3556 elements ). I wanted to plot it and analyse the graph. I used the function plot(.....). Then I wanted on the axis x to have the dates and on the axis y the prices of my options. So I used the function axis(1, z) ( where z is the vector which contains all 3556 dates) and axis(2,H1) ( where H1 contains all 3556 option prices).
The point is all dates and all option prices are contained on my graph :/ and it looks very badly and none can see clearly anything because of the huge amount of dates in axis x and the huge amount of option prices in axis y. How can I reduce the number of dates and the option prices? I mean to write them with some interval?
If it is not clear write me please and I will send the whole code.
Thank you very much!!!!!!!!!!!!!!!!!!!!!!!!!! :)
How about using the ggplot2 library for plotting along with plyr::ddply() and cut() to reduce it to 20 (or whatever) intervals?
prices<-10+runif(3000)*3+(1:3000)/3000
dates<-as.Date(1:3000,origin="1980-01-01")
df<-data.frame(prices,dates)
#required libraries
require(plyr) # library for ddply call
require(ggplot2)# plotting library
#call explained below
plotdata<-ddply(df,.(date_group=cut(dates,20)),summarise,avg_price=mean(prices))
ggplot(plotdata) + # base plot
geom_line(aes(date_group,avg_price,group=1)) + # line
geom_smooth(aes(date_group,avg_price,group=1)) + # smooth fit with CI
theme(axis.text.x = element_text(angle = 90, hjust = 1)) # rotate x axis labels
#explanation of ddply function
ddply( #call ddply function, takes a dataframe, returns a dataframe
df, #input data data.frame 'df'
.(date_group=cut(dates,20)), #summarise by cut() - cuts the date into 20 blocks
summarise, #tell to summarise
avg_price=mean(prices) #for each value of the cut (each group), average the prices
)
The data are a series of dates and times.
date time
2010-01-01 09:04:43
2010-01-01 10:53:59
2010-01-01 10:57:18
2010-01-01 10:59:30
2010-01-01 11:00:44
…
My goal was to represent a scatterplot with the date on the horizontal axis (x) and the time on the vertical axis (y). I guess I could also add a color intensity if there are more than one time for the same date.
It was quite easy to create an histogram of dates.
mydata <- read.table("mydata.txt", header=TRUE, sep=" ")
mydatahist <- hist(as.Date(mydata$day), breaks = "weeks", freq=TRUE, plot=FALSE)
barplot(mydatahist$counts, border=NA, col="#ccaaaa")
I haven't figured out yet how to create a scatterplot where the axis are date and/or time.
I would like also to be able to have axis not necessary with linear dates YYYY-MM-DD, but also based on months such as MM-DD (so different years accumulate), or even with a rotation on weeks.
Any help, RTFM URI slapping or hints is welcome.
The ggplot2 package handles dates and times quite easily.
Create some date and time data:
dates <- as.POSIXct(as.Date("2011/01/01") + sample(0:365, 100, replace=TRUE))
times <- as.POSIXct(runif(100, 0, 24*60*60), origin="2011/01/01")
df <- data.frame(
dates = dates,
times = times
)
Then get some ggplot2 magic. ggplot will automatically deal with dates, but to get the time axis formatted properly use scale_y_datetime():
library(ggplot2)
library(scales)
ggplot(df, aes(x=dates, y=times)) +
geom_point() +
scale_y_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) +
theme(axis.text.x=element_text(angle=90))
Regarding the last part of your question, on grouping by week, etc: To achieve this you may have to pre-summarize the data into the buckets that you want. You can use possibly use plyr for this and then pass the resulting data to ggplot.
I'd start by reading about as.POSIXct, strptime, strftime, and difftime. These and related functions should allow you to extract the desired subsets of your data. The formatting is a little tricky, so play with the examples in the help files.
And, once your dates are converted to a POSIX class, as.numeric() will convert them all to numeric values, hence easy to sort, plot, etc.
Edit: Andre's suggestion to play w/ ggplot to simplify your axis specifications is a good one.