Is it possible to have a datetime scale not consider weekends as part of the time continuum? For instance, if I am plotting stock prices over a 2 week period with a line geometry, I do not want to plot a 2 day period of flattness during the weekend. I would like friday to connect with Monday.
I imagine that there's a better way, but you could always just use an index for the plot and then assign the dates as labels afterwards:
p <- qplot(1:3, 1:3, geom='line')
p + scale_x_continuous("", breaks=1:3, labels = as.Date(c("2010-06-03", "2010-06-04", "2010-06-07")))
Related
I am plotting sales over years of variable x over the years.
I already have a variable "quarters" indicating quarter 1,2,3 or 4.
With the following code i plotted unit sales over the years:
ggplot(data = hoogvliet_tess ,aes(x=date, y=UnitSales)) +
geom_line() + ggtitle("UnitSales x over the years")
Is there a possibility to extend the plot with for example vertical lines indicating the quarters of every year?
With kind regard,
#This is my code, I need help in improving the x axis resolution that is monthly (an 2017, Feb 2017.....Dec 2020)
ggplot()+
geom_line(data=IDA_DATA,
aes(y=final1,x= Date,colour="darkblue"),size=1 )+
geom_line(data=IDA_DATA,
aes(y=fpmc2,x= Date,colour="red"),
size=1) +
scale_color_discrete(name = "Y series", labels = c("Adjusted Trend", "Long Term Comp"))+
theme(legend.position = c(.85, .85))+
labs(y="PM10 Conc (ug/m3)")
I am assuming your Date variable is an actual date type (ie. numeric).
If your data is continuous (measurements on multiple days per month), you can use the breaks parameter in scale_x_continuous() to specify each break you want. Your breaks will also have to be in date format, not text strings, so you'll need dmy() to convert a vector of all the first-day-of-the-month dates. That can be tedious if your data spans many years, but I can't think of how else you can force all the month labels to show up on a continuous axis.
If your data can be summarised as a single x value for each entire month, you can convert the Date into a month-year string or factor variable and use that as a discrete x axis. Discrete axis automatically show all the x values, so you won't have to create all the breaks. This should be relatively easy to achieve.
I have a continuous variable y measured on different dates. I need to make boxplots with a box showing the distribution of y for each 5 year interval.
Sample data:
rdob <- as.Date(dob, format= "%m/%d/%y")
ggplot(data = data, aes(x=rdob, y=ageyear)) + geom_boxplot()
#Warning message:
#Continuous x aesthetic -- did you forget aes(group=...)?
This image is the first one I tried. What I want is a box for every five year interval, instead of a box for every year.
Here is a way to pull out the year in base R:
format(as.Date("2008-11-03", format="%Y-%m-%d"), "%Y")
Simply wrap your date vector in a format() and add the "%Y". To get this to be integer, you can use as.integer.
You could also take a look at the year function in the lubridate package which will make this extraction a little bit more straightforward.
One method to get 5 year intervals is to use cut to create a factor variable that creates levels at selected break points. Unless you have dozens of years your best bet would be to set the break points manually:
df$myTimeInterval <- cut(df$years, breaks=c(1995, 2000, 2005, 2010, 2015))
Here's an example taking Dave2e's suggestion of using cut on date intervals along with ggplot's group aesthetic mapping:
library(ggplot2)
n <- 1000
## Randomly sample birth dates and dummy up an effect that trends upward with DOB
dobs <- sample(seq(as.Date('1970/01/01'), Sys.Date(), by="day"), n)
effect <- rnorm(n) + as.numeric(as.POSIXct(dobs)) / as.numeric(as.POSIXct(Sys.Date()))
data <- data.frame(dob=dobs, effect=effect)
## boxplot w/ DOB binned to 5 year intervals
ggplot(data=data, aes(x=dob, y=effect)) + geom_boxplot(aes(group=cut(dob, "5 year")))
library(lubridate)
year=year(rdob)
I'm trying to calculate w/w growth rates entirely in R. I could use excel, or preprocess with ruby, but that's not the point.
data.frame example
date gpv type
1 2013-04-01 12900 back office
2 2013-04-02 16232 back office
3 2013-04-03 10035 back office
I want to do this factored by 'type' and I need to wrap up the Date type column into weeks. And then calculate the week over week growth.
I think I need to do ddply to group by week - with a custom function that determines if a date is in a given week or not?
Then, after that, use diff and find the growth b/w weeks divided by the previous week.
Then I'll plot week/week growths, or use a data.frame to export it.
This was closed but had same useful ideas.
UPDATE: answer with ggplot:
All the same as below, just use this instead of plot
ggplot(data.frame(week=seq(length(gr)), gr), aes(x=week,y=gr*100)) + geom_point() + geom_smooth(method='loess') + coord_cartesian(xlim = c(.95, 10.05)) + scale_x_discrete() + ggtitle('week over week growth rate, from Apr 1') + ylab('growth rate %')
(old, correct answer but using only plot)
Well, I think this is it:
df_net <- ddply(df_all, .(date), summarise, gpv=sum(gpv)) # df_all has my daily data.
df_net$week_num <- strftime(df_net$date, "%U") #get the week # to 'group by' in ddply
df_weekly <- ddply(df_net, .(week_num), summarize, gpv=sum(gov))
gr <- diff(df_weekly$gpv)/df_weekly$gpv[-length(df_weekly$gpv)] #seems correct, but this I don't understand via: http://stackoverflow.com/questions/15356121/how-to-identify-the-virality-growth-rate-in-time-series-data-using-r
plot(gr, type='l', xlab='week #', ylab='growth rate percent', main='Week/Week Growth Rate')
Any better solutions out there?
For the last part, if you want to calculate the growth rate you can take logs and then use diff, with the default parameters lag = 1 (previos week) and difference = 1 (first difference):
df_weekly_log <- log(df_weekly)
gr <- diff(df_weekly_log , lag = 1, differences = 1)
The later is an approximation, valid for small differences.
Hope it helps.
The data are a series of dates and times.
date time
2010-01-01 09:04:43
2010-01-01 10:53:59
2010-01-01 10:57:18
2010-01-01 10:59:30
2010-01-01 11:00:44
…
My goal was to represent a scatterplot with the date on the horizontal axis (x) and the time on the vertical axis (y). I guess I could also add a color intensity if there are more than one time for the same date.
It was quite easy to create an histogram of dates.
mydata <- read.table("mydata.txt", header=TRUE, sep=" ")
mydatahist <- hist(as.Date(mydata$day), breaks = "weeks", freq=TRUE, plot=FALSE)
barplot(mydatahist$counts, border=NA, col="#ccaaaa")
I haven't figured out yet how to create a scatterplot where the axis are date and/or time.
I would like also to be able to have axis not necessary with linear dates YYYY-MM-DD, but also based on months such as MM-DD (so different years accumulate), or even with a rotation on weeks.
Any help, RTFM URI slapping or hints is welcome.
The ggplot2 package handles dates and times quite easily.
Create some date and time data:
dates <- as.POSIXct(as.Date("2011/01/01") + sample(0:365, 100, replace=TRUE))
times <- as.POSIXct(runif(100, 0, 24*60*60), origin="2011/01/01")
df <- data.frame(
dates = dates,
times = times
)
Then get some ggplot2 magic. ggplot will automatically deal with dates, but to get the time axis formatted properly use scale_y_datetime():
library(ggplot2)
library(scales)
ggplot(df, aes(x=dates, y=times)) +
geom_point() +
scale_y_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) +
theme(axis.text.x=element_text(angle=90))
Regarding the last part of your question, on grouping by week, etc: To achieve this you may have to pre-summarize the data into the buckets that you want. You can use possibly use plyr for this and then pass the resulting data to ggplot.
I'd start by reading about as.POSIXct, strptime, strftime, and difftime. These and related functions should allow you to extract the desired subsets of your data. The formatting is a little tricky, so play with the examples in the help files.
And, once your dates are converted to a POSIX class, as.numeric() will convert them all to numeric values, hence easy to sort, plot, etc.
Edit: Andre's suggestion to play w/ ggplot to simplify your axis specifications is a good one.