I have plotted water meter averages for different dates. I want to colour the averages which are measured on the weekends? How do I do this please?
plot <- ggplot(DF, aes(Date, Measurement)) +
geom_point() +
ggtitle('Water Meter Averages') +
xlab('Day No') +
ylab('Measurement in Cubic Feet')
Date <- c("2018-06-25", "2018-06-26", "2018-06-27", "2018-06-28", "2018-06-29", "2018-06-30", "2018-07-01")
Measurement <- c("1","3","5","2","4","5","7")
DF <- data.frame(Date, Measurement)
"2018-06-30" and "2018-07-01" are weekend dates with the corresponding values 5 and 7 respectively. How can I adapt my ggplot code so that R recognizes these dates as weekends and colors the points related to this dates on my ggplot.
First, make sure your data values are actually coded as date/time values in R and not strings or factors. Then you can do
# Make sure class(DF$Date)=="Date"
DF <- data.frame(Date=as.Date(Date), Measurement)
ggplot(DF, aes(Date, Measurement, color=weekdays(Date) %in% c("Saturday","Sunday")))+geom_point() +
ggtitle('Water Meter Averages') +
xlab('Day No') +
ylab('Measurement in Cubic Feet') +
scale_color_discrete(name="Is Weekend")
Related
I feel like I've tried about everything...If I transform the months into factors, I get 16 thousand NA's. As my code is I get the plot to come out, but with the months out of order.
I got the original code here: https://www.r-graph-gallery.com/283-the-hourly-heatmap.html
I've edited it to fit my data, but my months come out out of order.
My months are numbers in the csv file (int in r), then changing them to abbreviations makes them characters.
SoilT.data<-read.csv(file="Transect 1 Soil Temp RStudio Number month.csv")
library(ggplot2)
library(dplyr)
library(viridis)
library(ggExtra)
library(lubridate)
df <-SoilT.data %>% select(Lower.Panel,Day,Hourly,Month,Year)
df <- transform(df, MonthAbb = month.abb[Month])
Panel.Area <-unique(df$Lower.Panel)
p <-ggplot(df,aes(Day,Hourly,fill=Lower.Panel))+geom_tile(color= "white",size=0.1)+scale_fill_viridis(name="Hrly Temps",option ="C")
p <-p + facet_grid(Year~MonthAbb)
p <-p + scale_y_continuous(trans = "reverse", breaks = unique(df$Hourly))
p <-p + scale_x_continuous(breaks =c(1,10,20,31))
p <-p + labs(title= paste("Hourly Temperature - Lower Panel",Panel.Area),x="Day", y="Hourly")
p <-p + theme(legend.position = "bottom")+theme(plot.title=element_text(size =14))+theme(axis.text.y=element_text(size=6)) +theme(strip.background =element_rect(colour="white"))+theme(plot.title=element_text(hjust=0))+theme(axis.ticks=element_blank())+theme(axis.text=element_text(size=7))+theme(legend.title=element_text(size=8))+theme(legend.text=element_text(size=6))+removeGrid()
p
enter image description here
You should have constructed MonthAbb as a factor. That way you could have specified the ordering of the levels attribute which most plotting functions will honor when it comes time for plotting.
df <- transform(df, MonthAbb = factor(month.abb[Month], month.abb(1:12))
Factor vectors are actually integers which plotting functions use as indices into the attribute specified at ttime of creation (or the default which was what was being used by your heatmapping function).
I have a dataframe which has 12 columns (one for each month of the year) and an id. Each record in this dataframe corresponds to the transaction amount(in dollars) a customer has made over the course of last twelve months. I want to plot these columns as series. And I also want to plot all the customers in the dataframe. The x-axis will be the month index and y-axis will be dollar value. So basically for each customer I need a line or series chart on the same graph.
Code for generating random data
a <- data.frame(id = seq(1,1000,1))
b <- data.frame(replicate(12,sample(1000:100000,1000,rep=TRUE)))
df <- cbind(a,b)
This is what I tried but its not what I want
library(reshape2)
library(ggplot2)
df_lg <- melt(df, id = 'id') # convert from wide to tall
ggplot(data=df_lg,
aes(x=variable, y=value, colour=variable)) +
geom_line()
Any ideas how to do this?
Just add group to your aesthetics, so the colour and group should be the id variable you want in the legend.
ggplot(data=df_lg,
aes(x=variable, y=value, colour=id, group = id)) +
geom_line()
I have about 20 years of daily data in a time series. It has columns Date, rainfall and other data.
I am trying plot rainfall vs Time. I want to get 20 line plots with different colours and legend is generated that show the years in one graph. I tried the following codes but it is not giving me the desired results. Any suggestion to fix my issue would be most welcome
library(ggplot2)
library(seas)
data(mscdata)
p<-ggplot(data=mscdata,aes(x=date,y=precip,group=year,color=year))
p+geom_line()+scale_x_date(labels=date_format("%m"),breaks=date_breaks("1 months"))
It doesnt look great but here's a method. We first coerce the data into dates in the same year:
mscdata$dayofyear <- as.Date(format(mscdata$date, "%j"), format = "%j")
Then we plot:
library(ggplot2)
library(scales)
p <- ggplot(data = mscdata, aes(x = dayofyear, y = precip, group = year, color = year))
p + geom_line() +
scale_x_date(labels = date_format("%m"), breaks = date_breaks("1 months"))
While I agree with #Jaap that this may not be the best way to depict these data, try to following:
mscdata$doy <- as.numeric(strftime(mscdata$date, format="%j"))
ggplot(data=mscdata,aes(x=doy,y=precip,group=year)) +
geom_line(aes(color=year))
Although the given answers are good answers to your questions as it stands, i don't think it will solve your problem. I think you should be looking at a different way to present the data. #Jaap already suggested using facets. Take for example this approach:
#first add a month column to your dataframe
mscdata$month <- format(mscdata$date, "%m")
#then plot it using boxplot with year on the X-axis and month as facet.
p1 <- ggplot(data = mscdata, aes(x = year, y = precip, group=year))
p1 + geom_boxplot(outlier.shape = 3) + facet_wrap(~month)
This will give you a graph per month, showing the rainfall per year next to one each other. Because i use boxplot, the peaks in rainfall show up as dots ('normal' rain events are inside box).
Another possible approach would be to use stat_summary.
I have two problems handling my time variable in Gnu R!
Firstly, I cannot recode the time data (downloadable here) from factor (or character) with as.Posixlt or with as.Date without an error message like this:
character string is not in a standard unambiguous format
I have then tried to covert my time data with:
dates <- strptime(time, "%Y-%m-%j")
which only gives me:
NA
Secondly, the reason why I wanted (had) to convert my time data is that I want to plot it with ggplot2 and adjust my scale_x_continuous (as described here) so that it only writes me every 50 year (i.e. 1250-01-01, 1300-01-01, etc.) in the x-axis, otherwise the x-axis is too busy (see graph below).
This is the code I use:
library(ggplot2)
library(scales)
library(reshape)
df <- read.csv(file="https://dl.dropboxusercontent.com/u/109495328/time.csv")
attach(df)
dates <- as.character(time)
population <- factor(Number_Humans)
ggplot(df, aes(x = dates, y = population)) + geom_line(aes(group=1), colour="#000099") + theme(axis.text.x=element_text(angle=90)) + xlab("Time in Years (A.D.)")
You need to remove the quotation marks in the date column, then you can convert it to date format:
df <- read.csv(file="https://dl.dropboxusercontent.com/u/109495328/time.csv")
df$time <- gsub('\"', "", as.character(df$time), fixed=TRUE)
df$time <- as.Date(df$time, "%Y-%m-%j")
ggplot(df, aes(x = time, y = Number_Humans)) +
geom_line(colour="#000099") +
theme(axis.text.x=element_text(angle=90)) +
xlab("Time in Years (A.D.)")
Say we have the following simple data-frame of date-value pairs, where some dates are missing in the sequence (i.e. Jan 12 thru Jan 14). When I plot the points, it shows these missing dates on the x-axis, but there are no points corresponding to those dates. I want to prevent these missing dates from showing up in the x-axis, so that the point sequence has no breaks. Any suggestions on how to do this? Thanks!
dts <- c(as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16')))
df <- data.frame(dt = dts, val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point() +
scale_x_date(format = '%d%b', major='days')
I made a package that does this. It's called bdscale and it's on CRAN and github. Shameless plug.
To replicate your example:
> library(bdscale)
> library(ggplot2)
> library(scales)
> dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
> ggplot(df, aes(x=dt, y=val)) + geom_point() +
scale_x_bd(business.dates=dts, labels=date_format('%d%b'))
But what you probably want is to load known valid dates, then plot your data using the valid dates on the x-axis:
> nyse <- bdscale::yahoo('SPY') # get valid dates from SPY prices
> dts <- as.Date('2011-01-10') + 1:10
> df <- data.frame(dt=dts, val=seq_along(dts))
> ggplot(df, aes(x=dt, y=val)) + geom_point() +
scale_x_bd(business.dates=nyse, labels=date_format('%d%b'), max.major.breaks=10)
Warning message:
Removed 3 rows containing missing values (geom_point).
The warning is telling you that it removed three dates:
15th = Saturday
16th = Sunday
17th = MLK Day
Turn the date data into a factor then. At the moment, ggplot is interpreting the data in the sense you have told it the data are in - a continuous date scale. You don't want that scale, you want a categorical scale:
require(ggplot2)
dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
df <- data.frame(dt = dts, val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point() +
scale_x_date(format = '%d%b', major='days')
versus
df <- data.frame(dt = factor(format(dts, format = '%d%b')),
val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()
which produces:
Is that what you wanted?
First question is : why do you want to do that? There is no point in showing a coordinate-based plot if your axes are not coordinates. If you really want to do this, you can convert to a factor. Be careful for the order though :
dts <- c(as.Date( c('31-10-2011', '01-11-2011', '02-11-2011',
'05-11-2011'),format="%d-%m-%Y"))
dtsf <- format(dts, format= '%d%b')
df <- data.frame(dt=ordered(dtsf,levels=dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()
With factors you have to be careful, as the order is arbitrary in a factor,unless you make it an ordered factor. As factors are ordered alphabetically by default, you can get in trouble with some date formats. So be careful what you do. If you don't take the order into account, you get :
df <- data.frame(dt=factor(dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()