I am always struggeling with this, so I think it is finally time to ask some help...
I tried to make a reproducible example, but for some reason I cannot get my x$monthday in the %m-%d format :(.
x<-data.frame(seq(as.POSIXct('2012-10-01'), as.POSIXct('2015-03-01'), by= "day"))
names(x)<- "date"
x$month<- months(x$date)
x$monthday<- as.POSIXct(x$date, format= "%m-%d")
x1<- x[x$month== 'October' |x$month== 'November' | x$month== 'December' |x$month== 'January'|x$month== 'February', ]
y<- 1: nrow(x1)
x2<-cbind(x1, y)
x3<- aggregate(list(y=x2$y), list(monthday=x2$monthday), mean)
plot(x3$monthday, x3$y)
The date has the format of %m/%d and is of a time series from October-March.
R orders the axis beautifully from January to December, which causes a big gap in between, because my data range from October-March.
How can I make my x axis order in the form from October-March?
Thank you very much in advance.
library(dplyr)
library(ggplot2)
library(lubridate)
# Fake data
dat <- data.frame(date=seq(as.POSIXct('2012-10-01'), as.POSIXct('2015-03-01'), by="day"))
set.seed(23)
dat$temperature = cumsum(rnorm(nrow(dat)))
# Subset to October - March
dat <- dat[months(dat$date) %in% month.name[c(1:2,10:12)], ]
# Calculate mean daily temperature
dat = dat %>% group_by(Month=month(date), Day=day(date)) %>%
summarise(dailyMeanTemp = mean(temperature)) %>%
mutate(newDate = as.Date(ifelse(Month %in% 10:12,
paste0("2014-", Month, "-", Day),
paste0("2015-", Month, "-", Day))))
The mutate function above creates a fake year, only so that we can keep the dates in "date" format and get them ordered from October to March. There's probably a better way to do it (maybe a function in the zoo or xts packages), but this seems to work.
ggplot(dat, aes(newDate, dailyMeanTemp)) +
geom_line() + geom_point() +
labs(y="Mean Temperature", x="Month")
Or, in base graphics:
plot(dat$newDate, dat$dailyMeanTemp)
Related
let's say I have a list of dates from March 1st to July 15th:
daterange = as.data.frame(seq(as.Date("2020-3-1"), as.Date("2020-7-15"), "days"))
I want to group the dates by 1-15 and 16-30/31 for each month. So the dates in March will be separated into two groups: Mar 1-15 and Mar 16-31. Then keep doing this for every month.
I know the lubridate package can sort by week, but I don't know how to set a custom range.
Thanks
We can create a logical vector on day as well as a group on yearmon
library(dplyr)
library(zoo)
library(lubridate)
library(stringr)
daterange2 <- daterange %>%
set_names('Date') %>%
group_by(yearmon = as.yearmon(Date),
Daygroup = (day(Date) > 15) + 1) %>%
mutate(Label = str_c(format(Date, '%b'),
str_c(min(day(Date)), max(day(Date)), sep='-'), sep= ' '))
Using base R, you can create two groups in each month by pasting the month value from each date and assign value 1/2 based on the date.
newdaterange <- transform(daterange, group = paste0(format(date, "%b"), '-group-',
ifelse(as.integer(format(date, "%d")) > 15, 1, 2)))
I am trying to use lubridate to sort out time series data from my temperature sensors. I would ultimately like a plot that has time on the x axis and temperature on the y axis. I have been using the function parse_date_time to try and create a new date variable but all I get is NA.
temps<-temps %>% as_tibble() %>%
mutate(date = parse_date_time(Date.Time..GMT..0500, "mdYHM"))
temps
The problem is that you inserted a capital Y when the year part only contains two digits. So you should use a small-case y, i.e.
temps %>% as_tibble() %>%
mutate(date = parse_date_time(Date.Time..GMT..0500, "mdyHM"))
To produce a simple plot, here is a basic code
ggplot(temps) +
aes(x = date, y = TempF) +
geom_line()
For further details on the plot itself, I suggest you to have a look at ggplot2 documentation.
In my sample data it worked
temps <- data.frame(
Date.Time..GMT..0500 = c("6/18/18 12:57", "6/18/18 13:57", "6/18/18 14:57"),
var = c(1,2,3)
)
parse_date_time(temps$Date.Time..GMT..0500, "mdYHM")
# [1] "2018-06-18 12:57:00 UTC" "2018-06-18 13:57:00 UTC" "2018-06-18 14:57:00 UTC"
I have a huge data set containing bacteria samples (4 types of bacteria) from 10 water resources from 2010 until 2019. some values are missing so we need to not include them in the plot or analysis.
I want to plot a time series for each type of bacteria for each resource for all years.
What is the best way to do that?
library("ggplot2")
BactData= read.csv('RÃ¥vannsdata_Bergen_2010_2018a.csv', sep='\t',header=TRUE)
summary(BactData,na.rm = TRUE)
df$Date = as.Date( df$Date, '%d/%m/%Y')
#require(ggplot2)
ggplot( data = df, aes( Date,BactData$Svartediket_CB )) + geom_line()
#plot(BactData$Svartediket_CB,col='brown')
plot(BactData$Svartediket_CP,col='cyan')
plot(BactData$Svartediket_EC,col='magenta')
plot(BactData$Svartediket_IE,col='darkviolet')
using plot is not satisfactory because the x axis is just numbers not dates . Tried to use ggplot but got an error message. I am beginner in R.
Error message
Error in df$Date : object of type 'closure' is not subsettable
Data as CVS file with tab delimiter
This will do the trick
BactData = read.csv('RÃ¥vannsdata_Bergen_2010_2018a.csv', sep='\t',header=TRUE, stringsAsFactors = F)
colnames(BactData)[1] <- "Date"
library(lubridate)
BactData$Date = dmy(BactData$Date) # converts strings to date class
ggplot(data = BactData, aes(Date, Svartediket_CB )) + geom_line()
You can filter for any year using dplyr with lubridate. For example, 2017:
library(dplyr)
BactData %>% filter(year(Date) == 2017) %>%
ggplot(aes(Date, Svartediket_CB )) + geom_line()
Or for two years
library(dplyr)
BactData %>% filter(year(Date) == 2017 | year(Date) == 2018) %>%
ggplot(aes(Date, Svartediket_CB )) + geom_line()
I retrieved Twitter tweets for various hashtags with different tracking period. For example, hashtag1 was tracked for 6 days, Hashtag2 tracked for 4 days, Hashtag3 tracked for 2 days. How can I normalize each hashtag? How can I divide them into equal quarters? Thanks in advance...Here is the code ......>
library(streamR)
library(rjson)
setwd("/Users/Desktop")
Tweets = parseTweets("Hashtag1.json")
table(Tweets$created_at)
dated_Tweets <- as.POSIXct(Tweets$created_at, format = "%a %b %d %H:%M:%S
+0000 %Y")
hist(dated_Tweets, breaks="hours", freq=TRUE, xlab="dated_Tweets", main=
"Distribution of tweets", col="blue")
I think your main stumbling block is to convert date-times to 6-hour bins. You can achieve this with format.POSIXct and cut. Here is a suggestion, complete with a histogram. There are many ways to do the histograms, maybe you will prefer a table instead.
library(magrittr)
library(ggplot2)
## create some tweet times
hash1 <- lubridate::ymd("20170101") + lubridate::seconds(runif(100, 0, 10*86400))
hash2 <- lubridate::ymd("20170101") + lubridate::seconds(runif(100, 0, 31*86400))
hash3 <- lubridate::ymd("20170101") + lubridate::seconds(runif(300, 0, 5*86400))
## bin these into 6h intervals
bins1 <- format(hash1, "%H") %>%
as.numeric() %>%
cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
hTags <- data.frame(tag="#1", bins=bins1)
bins2 <- format(hash2, "%H") %>%
as.numeric() %>%
cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
hTags <- rbind(hTags,
data.frame(tag="#2", bins=bins2 ))
bins3 <- format(hash3, "%H") %>%
as.numeric() %>%
cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
hTags <- rbind(hTags,
data.frame(tag="#3", bins=bins3 ))
ggplot(data=hTags, aes(x=bins, fill=tag)) + geom_bar(position="dodge", aes(y=..prop.., group=tag))
You can use chron package and deal only with hours by converting into bins as written in https://stackoverflow.com/a/37666558/7418254
I have multiple observation of rainfall for the same station for around 14 years the data frame is in something like this :
df (from date -01/01/2000)
v1 v2 v3 v4 v5 v6 ........ v20
1 1 2 4 8 9..............
1.4 4 3.8..................
1.5 3 1.6....................
1.6 8 .....................
.
.
.
.
till date 31/01/2013 i.e total 5114 observations
where v1 v2 ...v20 are the rainfall simulation for the same point; I want to plot the box plot which represents the collective range of quantiles and median monthly when all the observations are taken together.
I can plot box plot for single monthly values using :
df$month<-factor(month.name,levels=month.name)
library(reshape2)
df.long<-melt(df,id.vars="month")
ggplot(df.long,aes(month,value))+geom_boxplot()
but in this problem as the data is daily and there are multiple observations i don't get idea where to start.
sample data
df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))
In case if u want to work with a zoo object :
date<-seq(as.POSIXct("2000-01-01 00:00:00","GMT"),as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min")
If you want yo can also convert it to zoo object
x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"), as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
I am not familiar with zoo. So, I converted your sample to data frame. Your idea of using melt() is a right way. Then, you need to aggregate rain amount by month. I think it is good to look up aggregate() and other options. Here, I used dplyr and tidyr to arrange the sample data. I hope this will let you move forward.
### zoo to data frame by # Joshua Ulrich
### http://stackoverflow.com/questions/14064097/r-convert-between-zoo-object-and-data-frame-results-inconsistent-for-different
zoo.to.data.frame <- function(x, index.name="Date") {
stopifnot(is.zoo(x))
xn <- if(is.null(dim(x))) deparse(substitute(x)) else colnames(x)
setNames(data.frame(index(x), x, row.names=NULL), c(index.name,xn))
}
### to data frame
foo <- zoo.to.data.frame(df)
str(foo)
library(dplyr)
library(tidyr)
### wide to long data frame, aggregate rain amount by Date
ana <- foo %>%
melt(., id.vars = "Date") %>%
group_by(Date) %>%
summarize(rain = sum(value))
### Aggregate rain amount by year and month
bob <- ana %>%
separate(Date, c("year", "month", "date")) %>%
group_by(year, month) %>%
summarize(rain = sum(rain))
### Drawing a ggplot figure
ggplot(data = bob, aes(x = month, y = rain)) +
geom_boxplot()
just found out an easier way to do it, hwoever your answered really helped jazzuro
install.packages("reshape2")
library(dplyr)
library(reshape2)
require(ggplot2)
df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))
x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"),
as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
v<-aggregate(x, as.yearmon, mean)
months<- rep(1:12,14)
lol<-data.frame(v,months)
df.m <- melt(lol, id.var = "months")
View(df.m)
p <- ggplot(df.m, aes(factor(months), value))
p + geom_boxplot(aes(fill = months))