I created a boxplot with ggplot with the following data.frame:
library(lubridate)
library(ggplot2)
library(ggplotly)
df <- data.frame(
time = c("00:43:20", "00:44:30","00:45:40"),
sex = c("m","m","m")
)
df$sex <- factor(df$sex)
df$time <- lubridate::hms(df$time)
Now I created my boxplot with ggplot
g <- ggplot(df) +
geom_boxplot(aes(sex, time)) +
scale_y_time()
Everything looks fine and now get interactive with ggploty():
plotly::ggplotly(g)
But when I hoover over the boxplot, I just see seconds, not the lubridate format.
How can I manage to see the data as shown on the y-axis?
The problem is rather complex from what I understand. The main issue seems to be that lubridate stores times as periods. Therefore you get the seconds in plotly as in ggplot they are seconds as well, they just where converted on the scale by "scale_y_time".
From my understanding the work arround would be to convert the time value to a numeric value of minutes. Though this means a minutes will have 100sec after the comma/dot:
1st option with ggplot:
library(plotly)
library(ggplot)
library(lubridate)
# calculate time as minutes passed and get it as numeriic
mins <- as.numeric(lubridate::hms(df$time) - hms("00:00:00"))/60
df$sex <- factor(df$sex)
df$time <- mins
g <- ggplot2::ggplot(df) +
ggplot2::geom_boxplot(aes(sex, time))
plotly::ggplotly(g)
2nd option with plotly directly (only for the text data not sure if you could add sex F as x or if you need a second trace and some cosmetics need to be done also... anyhow ggplot gives practicalle the same result)
plotly::plot_ly(y = ~mins, type = "box")
Possibly there is a better solution - I just could not figure it out in the last 2 hours ;(
Related
I have a very simple question here. I have a dataset from 2009-2012. I want to plot the data with facets. I have created the faceted plot as follows.
R-code
ggplot(al02428400,aes(x=date,y=as.numeric(Discharge)))+geom_line()+ylab("Discharge(cfs)")+facet_wrap(~Year,scales=("free_x"))+theme_bw()
The output of the above R code is as follows:
On the X-axis I only want to show the month. By default it is showing month and year. Is there any way I can get rid of year ?
The fully reproducible code is as follows:
library(ggplot2)
url <- "http://nwis.waterdata.usgs.gov/usa/nwis/uv/?cb_00060=on&cb_00065=on&format=rdb&period=&begin_date=2009-01-01&end_date=2012-12-31&site_no=02428400"
download.file(url,destfile="Data load for stations/data/alabamariver-at-monroeville-2009.txt")
al02428400 <- read.table("Data load for stations/data/alabamariver-at-monroeville-2009.txt",header=T,skip=1,sep="\t")
head(al02428400)
sapply(al02428400,class)
al02428400 <- al02428400[-1,]
names(al02428400)<- c("Agency","SiteNo","Datetime", "TZ","Discharge","Status","Gageheight","gstatus")
al02428400$date <- strptime(al02428400$Datetime, format="%Y-%m-%d %H:%M")
al02428400$Discharge <- as.numeric(as.character(al02428400$Discharge))
al02428400$Year <- as.numeric(format(al02428400$date, "%Y"))
ggplot(al02428400,aes(x=date,y=as.numeric(Discharge)))+geom_line()+ylab("Discharge(cfs)")+facet_wrap(~Year,scales=("free_x"))+theme_bw()
Thanks.
As your x values are date you can use scale_x_date() to change format of labels. Library scales is needed to get better formatting of breaks and labels.
library(scales)
+scale_x_datetime(labels = date_format("%b"))
For me, what worked was
library(scales)
+ scale_x_date(date_labels = "%b-%d-%Y")
More info here
I have a basic dataframe with 3 columns: (i) a date (when a sample was taken); (ii) a site location and (iii) a binary variable indicating what the condition was when sampling (e.g. wet versus dry).
Some reproducible data:
df <- data.frame(Date = rep(seq(as.Date("2010-01-01"), as.Date("2010-12-01"), by="months"),times=2))
df$Site <- c(rep("Site.A",times = 12),rep("Site.B",times = 12))
df$Condition<- as.factor(c(0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0))
What I would like to do is use ggplot to create a bar chart indicating the condition of each site (y axis) over time (x axis) - the condition indicated by a different colour. I am guessing some kind of flipped barplot would be the way to do this, but I cannot figure out how to tell ggplot2 to recognise the values chronologically, rather than summed for each condition. This is my attempt so far which clearly doesn't do what I need it to.
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()
So I have 2 questions. Firstly, how do I tell ggplot to recognise changes in condition over time and not just group each condition in a traditional stacked bar chart?
Secondly, it seems ggplot converts the date to a numerical value, how would I reformat the x-axis to show a time period, e.g. in a month-year format? I have tried doing this via the scale_x_date function, but get an error message.
labDates <- seq(from = (head(df$Date, 1)),
to = (tail(df$Date, 1)), by = "1 months")
Datelabels <-format(labDates,"%b %y")
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()+
scale_x_date(labels = Datelabels, breaks=labDates)
I have also tried converting sampling times to factors and displaying these instead. Below I have done this by changing each sampling period to a letter (in my own code, the factor levels are in a month-year format - I put letters here for simplicity). But I cannot format the axis to place each level of the factor as a tick mark. Either a date or factor solution for this second question would be great!
df$Factor <- as.factor(unique(df$Date))
levels(df$Factor) <- list(A = "2010-01-01", B = "2010-02-01",
C = "2010-03-01", D = "2010-04-01", E = "2010-05-01",
`F` = "2010-06-01", G = "2010-07-01", H = "2010-08-01",
I = "2010-09-01", J = "2010-10-01", K= "2010-11-01", L = "2010-12-01")
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()+
scale_y_discrete(breaks=as.numeric(unique(df$Date)),
labels=levels(df$Factor))
Thank you in advance!
It doesn't really make sense to use geom_bar() considering you do not want to summarise the data and require the visualisation over "time"
I would rather use geom_line() and increase the line thickness if you want to portray a bar chart.
library(tidyr)
library(dplyr)
library(ggplot2)
library(scales)
library(lubridate)
df <- data.frame(Date = rep(seq.Date(as.Date("2010-01-01"), as.Date("2010-12-01"), by="months"),times=2))
df$Site <- c(rep("Site.A",times = 12),rep("Site.B",times = 12))
df$Condition<- as.factor(c(0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0))
df$Date <- ymd(df$Date)
ggplot(df) +
geom_line(aes(y=Site,x=Date,color=Condition),size=10)+
scale_x_date(labels = date_format("%b-%y"))
Note using coord_flip() also does not work, I think this causes the Date issue, see below threads:
how to use coord_carteisan and coord_flip together in ggplot2
In ggplot2, coord_flip and free scales don't work together
I have time-series data of four years. Now I want to plot the same data year-wise and do comparative analysis. The dummy data is as
library(xts)
library(ggplot2)
timeindex <- seq(as.POSIXct('2016-01-01'),as.POSIXct('2016-12-31 23:59:59'), by = "1 mins")
dataframe <- data.frame(year1=rnorm(length(timeindex),100,10),year2=rnorm(length(timeindex),150,7),
year3=rnorm(length(timeindex),200,3),
year4=rnorm(length(timeindex),350,4))
xts_df <- xts(dataframe,timeindex)
Now, when I use ggplot it takes too long to plot all the series using following lines
visualize_dataframe_all_columns(xts_df)
The above function is defined as:
visualize_dataframe_all_columns <- function(xts_data) {
library(RColorBrewer)# to increase no. of colors
library(plotly)
dframe <- data.frame(timeindex=index(xts_data),coredata(xts_data))
df_long <- reshape2::melt(dframe,id.vars = "timeindex")
colourCount = length(unique(df_long$variable))
getPalette = colorRampPalette(brewer.pal(8, "Dark2"))(colourCount) # brewer.pal(8, "Dark2") or brewer.pal(9, "Set1")
g <- ggplot(df_long,aes(timeindex,value,col=variable,group=variable))
g <- g + geom_line() + scale_colour_manual(values=getPalette)
ggplotly(g)
}
Problems with above approach are:
It takes long time to plot. Can I reduce the plot time?
It is very diffcult to zoom into the plot using plotly. Is there any other better way
Are there any better approaches to visualize this data?
I faced more or less the same problem with frequency of 10 mins data. However, the question is that, does it make sense to plot the minute data for whole year? Human eyes cannot recognize the difference.
I would create a daily xts from that data and and plot it for the year. And modify the function to plot for a period of time for the minute data.
The data are a series of dates and times.
date time
2010-01-01 09:04:43
2010-01-01 10:53:59
2010-01-01 10:57:18
2010-01-01 10:59:30
2010-01-01 11:00:44
…
My goal was to represent a scatterplot with the date on the horizontal axis (x) and the time on the vertical axis (y). I guess I could also add a color intensity if there are more than one time for the same date.
It was quite easy to create an histogram of dates.
mydata <- read.table("mydata.txt", header=TRUE, sep=" ")
mydatahist <- hist(as.Date(mydata$day), breaks = "weeks", freq=TRUE, plot=FALSE)
barplot(mydatahist$counts, border=NA, col="#ccaaaa")
I haven't figured out yet how to create a scatterplot where the axis are date and/or time.
I would like also to be able to have axis not necessary with linear dates YYYY-MM-DD, but also based on months such as MM-DD (so different years accumulate), or even with a rotation on weeks.
Any help, RTFM URI slapping or hints is welcome.
The ggplot2 package handles dates and times quite easily.
Create some date and time data:
dates <- as.POSIXct(as.Date("2011/01/01") + sample(0:365, 100, replace=TRUE))
times <- as.POSIXct(runif(100, 0, 24*60*60), origin="2011/01/01")
df <- data.frame(
dates = dates,
times = times
)
Then get some ggplot2 magic. ggplot will automatically deal with dates, but to get the time axis formatted properly use scale_y_datetime():
library(ggplot2)
library(scales)
ggplot(df, aes(x=dates, y=times)) +
geom_point() +
scale_y_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) +
theme(axis.text.x=element_text(angle=90))
Regarding the last part of your question, on grouping by week, etc: To achieve this you may have to pre-summarize the data into the buckets that you want. You can use possibly use plyr for this and then pass the resulting data to ggplot.
I'd start by reading about as.POSIXct, strptime, strftime, and difftime. These and related functions should allow you to extract the desired subsets of your data. The formatting is a little tricky, so play with the examples in the help files.
And, once your dates are converted to a POSIX class, as.numeric() will convert them all to numeric values, hence easy to sort, plot, etc.
Edit: Andre's suggestion to play w/ ggplot to simplify your axis specifications is a good one.
I'm a graphic designer who is trying to use R to create graphs that are too complicated for Excel. I'm specifically trying to create an irregular time series step chart. I've had no problems creating a regular time series chart, but for some reason, the irregular dates are throwing everything off.
I'm starting with a basic text file with two columns of data:
01-04-1940 4
05-29-1963 35
12-02-2002 24
I've loaded the data using
d <- read.delim("file.txt", header = TRUE)
and I've converted the first column in Unix time using
d$date <- as.Date(d$date, format = "%m-%d-%Y")
But at this point, I can't find any more information anywhere on how to proceed. I've seen the R package "ITS," But I cannot find any documentation on it beyond technical descriptions of the classes involved.
I'd much appreciate it if someone with some experience in R could point out the few lines of code I need to create this graph. Thanks!
ggplot deals quite nicely with data in date format. Here are some suggestions:
d <- data.frame(
date = c("01-04-1940", "05-29-1963", "12-02-2002"),
value = c(4, 35, 24)
)
d$date <- as.Date(d$date, format = "%m-%d-%Y")
ggplot(d, aes(x=date, y=value)) + geom_step(colour="blue")
ggplot(d, aes(x=date, y=value)) + geom_line(colour="red")
I would use xts/zoo. They both handle irregular time series easily.
z <- zoo(d[,2], d[,1])
plot(z)
plot(z, type="s")