time count
2017-03-08 19:33 1
2017-03-23 22:11 1
2017-03-30 3:30 10
2017-03-09 19:33 13
2017-03-23 22:11 1
2017-03-31 3:30 1
.....
this data is about how fast consumers comments write
so I want to make a plot which I can easily know about how fast comments on.
For example,
In X axis, time series starts from 2017-03-08
through same interval(seconds or minute) there is a bar plot
so if the comments write speed is fast, the bar plot is dense.
and then time goes on, spped is not that fast, the bar plot is not dense
how can I make it?
cc5<-dt[, tdiff := difftime(cc, shift(cc, fill=cc[1L]), units="secs"),
by=title]
using this code, I can make difftime column
I have one more problem time column is character type
so I try to change it to date type using as.Date it doesn't work
so I change it to POSIXct type
I think to make X axis in time series I need to change date type
I'm not 100% sure that I'm really understanding the result that you want,
but generally when I want to put dates in the x-axis, I go to Understanding dates and plotting a histogram with ggplot2 in R
and use Gauden's Code v1. If you have successfully changed the character into a POSIXct time, as.Date() should work fine.
Related
I'm trying to create a simple time series plot in R with the following data (it's in tbl format):
Date sales
<date> <dbl>
1 2010-02-05 1105572.
2 2010-09-03 1048761.
3 2010-11-19 1002791.
4 2010-12-24 1798476.
5 2011-02-04 1025625.
6 2011-11-18 1031977.
When I use the following command: plot(by_date$Date, by_date$sales, type = 'l'), the resulting graph just skips the individual dates, as I want it to display, and just shows the year on the x-axis, like this (please ignore the axis labels for now):
I've checked the format of the date column using class(by_date$Date) and it does show up as 'Date'. I've also checked other threads here and the closest one that came to answering my query is the one here. Tried that approach but didn't work for me, while plotting in ggplot or converting data to data frame didn't work either. Please help, thanks.
With ggplot this should work -
library(ggplot2)
ggplot(by_date, aes(Date, sales)) + geom_line()
You can use scale_x_date to format your x-axis as you want.
What is the correct way to deal with datetimes in ggplot ?
I have data at several different dates and I would like to facet each date by the same time of day, e.g. between 1:30PM and 1:35PM, and plot the points between this time frame, how can I achieve this?
My data looks like:
datetime col1
2015-01-02 00:00:01 20
... ...
2015-01-02 11:59:59 34
2015-02-19 00:00:03 12
... ...
2015-02-19 11:59:58 27
I find myself often wanting to ggplot time series using datetime objects as the x-axis but I don't know how to use times only when dates aren't of interest.
The lubridate package will do the trick. There are commands you could use, specifically floor_date or ceiling_date to transform your datetime array.
I always use the chron package for times. It completely disregards dates and stores your time numerically (e.g. 1:30PM is stored as 13.5 because it's 13.5 hours into the day). That allows you to perform math on times, which is great for a lot of reasons, including calculating average time, the time between two points, etc.
For specific help with your plot you'll need to share a sample data frame in an easily copy-able format, and show the code you've tried so far.
This is a question I'd asked previously regarding the chron package, and it also gives an idea of how to share your data/ask a question that's easier for folks to reproduce and therefore answer:
Clear labeling of times class data on horizontal barplot/geom_segment
I am trying to create a plot of weekly data. Though this is not the exact problem I am having it illustrates it well. Basically imagine you want to make a plot of 1,2,....,7 for for 7 weeks from Jan 1 2015. So basically my plot should just be a line that trends upward but instead I get 7 different lines. I tried the code (and some other to no avail). Help would be greatly appreciated.
startDate = "2015-01-01"
endDate = "2015-02-19"
y=c(1,2,3,4,5,6,7)
tsy=ts(y,start=as.Date(startDate),end=as.Date(endDate))
plot(tsy)
You are plotting both the time and y together as individual plots.
Instead use:
plot(y)
lines(y)
Also, create a date column based on the specifics you gave which will be a time series. From here you can add the date on the x-axis to easily see how your variable changes over time.
To make your life easier I think your first step should be to create a (xts) time series object (install/load the xts-package), then it is a piece of cake to plot, subset or do whatever you like with the series.
Build your vector of dates as a sequence with start/end date:
seq( as.Date("2011-07-01"), by=1, len=7)
and your data vector: 1:7
a one-liner builds and plots the above time series object:
plot(as.xts(1:7,order.by=seq( as.Date("2011-07-01"), by=1, len=7)))
There must be a very easy way to do this but I don't know what it is...
As the title says, I would like to know how I can plot every second timestep of a time series in R? For example, I have half hourly data but I only want to plot the data on the hour e.g. I have
10:00 0
10:30 1
11:00 2
11:30 3
12:00 4
I just want to plot
10:00 0
11:00 2
12:00 4
Something like
plot(x[seq_along(x)%%2==0])
?
Edit: I don't know how you are plotting your data set above, but however you're doing it, you can subset your data as follows
halfhourdata <- fulldata[seq(nrow(fulldata)) %%2 == 1,]
If you give more details someone might tell you how to figure out which time values are hourly rather than relying (as here) on the fact that they are the odd-numbered rows ...
Slightly less verbose and not quite as clear as Ben's solution but you can use vector recycling and indexing using a boolean to achieve this (as long as you're just interested in every other observation).
# Extract the data you want (assuming you want to keep
# the first observation and skip the second, ...
newdat <- x[c(T,F)]
plot(newdat)
I wish to make a probability distribution of some time series data. My data is in the following format
00:00, 3
01:00, 50
05:00, 13
10:00, 34
17:00, 80
21:00, 100
The time column has some missing values that R will have to interpolate. I want to get a nice smooth curve to highlight the busy periods. I have tried with ts, density and plot but these don't produce what I'm after. For example,
data1 <- read.csv(file="c:\\abc\\ts.csv", head=FALSE, sep=",")
data1$V1 <- strptime(data1$V1, format="%H:%M")
plot(data1$V2, density(data1$V1), type="l")
But this gives me lines drawn in crazy order and as a probability distribution.
I think you are definitely after package zoo, which has several functions to deal with NAs. See na.aggregate, na.approx and na.locf also.
You made it a little harder than you might realize. I'll make it easier for now by adding a date in front of your times.
Also, I added a variable "texinp" and a textConnection() statement so you can cut/paste the following code and run it directly. The data is loaded into variable texinp and is read by the read.zoo statement in a similar way to reading a .csv file. For now, this will allow you to plot things and gives you an idea of how to read .csv files using read.zoo.
library(zoo)
library(chron)
texinp <- "
Time, Mydata
2011-02-06 00:00, 3
2011-02-06 01:00, 50
2011-02-06 05:00, 13
2011-02-06 10:00, 34
2011-02-06 17:00, 80
2011-02-06 21:00, 100"
myd.zoo <- read.zoo(textConnection(texinp), header=TRUE, FUN = as.chron, sep=",")
myd.zoo
plot(myd.zoo)
From your question, you talked about "busy periods". I may be wrong, but I'm assuming that the value of 100 at time 21:00 is the "busiest period". If that's true, then you don't need a density plot, and the above plot is what you're after.
Let me know if I'm wrong.