I have a dataset currently sorted by date and time. I have a column called 'day' and is just the day of the month, in numerical form i.e. 1-31
I have a 14 days stretch that I want to plot, however it starts from 30th of one month, to the 13th of the next.
When I try to plot it, it orders 1-13,30,31.
How can I plot the x axis as it is found within the dataframe?
Thanks.
Make sample data with columns day and value.
df<-data.frame(day=c(30,31,1,2,3,4,5,6,7,8),value=rnorm(10))
If column day contains just day values as numbers you can convert them to factor and set levels as original order of values.
ggplot(df,aes(factor(day,levels=df$day),value,group=1))+geom_line()
Related
I have a data written in specific expression. To simplify the data, here is the example I made:
df<-data.frame(date=c(2012034,2012044,2012051,2012063,2012074),
math=c(100,100,23,46,78))
2012034 means 4th week of march,2012. Likewise 2012044 means 4th week of April,2012. I was trying to make the values of date expressing some order. The reason why I have to do this is because when I don't change them to time expressions, x axis of the scatter plot looks really weird.
My goal is this:
Find the oldest date in date column and name it as 1. In this case, 2012034 should be 1. Next, find the second oldest date in date column and calculate how many weeks passed after that date. The second oldest date in date is 2012044.So, 5 weeks after the oldest date 2012034. So it should be changed as 1+5=6. So, likewise, I want to number the date to indicate how many weeks have passed since the oldest date
One way to do it is by also specifying the day of the week and subtract it at the end, i.e.
as.Date(paste0(df$date, '-1'), '%Y%m%U-%u') - 1
#[1] "2012-03-22" "2012-04-22" "2012-05-01" "2012-06-15" "2012-07-22"
I am new to ML/timeseries so not sure if this question is very basic.
Have the following dataframe:
week 1,1,1,1,,2,2,2,2,2,2,2,2,3,3,3,3,4,4...(1 - 145 weeks) numOrder 120,110,100.....
There is no set frequency i.e the number of records for each week can be same of different
how do I convert this dataframe to timeseries object
A simple tm=ts(dataframe name) give "mts","ts",matrix with week as column1 and numOrder as column 2. but a plot of plot(tm[,2]) gives a time series but x axis does not show time as weeks (1,2,3)
please guide how to convert this dataframe to timeseries object
There is a data frame like this:
The first two columns in the df describe the start date (month and year) and the end date (month and year). Column names describe every single month and year of a certain time period.
I need a function/loop that insterts "1" or "0" in each cell - "1" when the date from given column name is within the period described by the two first columns, and "0" if not.
I would appreciate any help.
You want to do two different things. (a) create a dummy variable and (b) see if a particular date is in an interval.
Making a dummy variable is the easiest one, in base R you can use ifelse. For example in the iris data frame:
iris$dummy <- ifelse(iris$Sepal.Width > 2.5, 1, 0)
Now working with dates is more complicated. In this answer we will use the library lubridate. First you need to convert all those dates to a format 'Month Year' to something that R can understand. For example for February you could do:
new_format_february_2016 <- interval(ymd('2016-02-01'), ymd('2016-03-01') - dseconds(1))
#[1] 2016-02-01 UTC--2016-02-29 23:59:59 UTC
This is February, the interval of time from the 1 of February to one second before the 1 of March. You can do the same with your start date column and you end date column.
To compare two intevals of time (so, to see if a particular month fall into your other intervals) you can do:
int_overlaps(new_format_february_2016, other_interval)
If this returns true, the two intervals (one particular month and another one) overlaps. This is not the same as one being inside another, but in your case it will work. Using this you can iterate over different columns and rows and build your dummy variable.
But before doing so, I would recommend to clean your data, as your current format is complicate to work with. To get all the power that vector types in R provides ideally you would want to have one row per observation and one variable per column. This does not seem to be the case with your data frame. Take a look to the chapter 'Tidy data' of 'R for Data Science' specially the spreading and gathering subsection:
Tidy data
Imagine an intra-day set of data, e.g. hourly intervals. Thanks to Google and valuable Joshua's answers to other people, I managed to create new columns in the xts object carrying DAILY Open/High/Low/Close values. These are daily values applied on intra-day intervals so all rows of the same day have the same value in particular column. Since the HLC values are look-ahead biased, I want to move them to the next day. Let's focus on just one column called Prev.Day.Close.
Actual status:
My Prev.Day.Close column caries proper values for the current day. All "2010-01-01 ??:??" rows have the same value - Close of 2010-01-01 trading session. So it is not PREVIOUS day at the moment how the column name says.
What I need:
Lag the Prev.Day.Close column to the NEXT DAY OF THE SET.
I cannot lag it using lag() because it works on row (not day) basis. It must not be fixed calendar day like:
C <- ave(x$Close, .indexday(x), FUN = last)
index(C) <- index(C) + 86400
x$Prev.Day.Close <- C
Because this solution does not care about real data in the set. For example it adds new rows because the original data set has holes on weekends and holidays. Moreover, two particular days may not have the same number of intervals (rows) so the shifted data will not fit.
Desired result:
All rows of the first day in the set have NA in Prev.Day.Close because there is no previous day to get data from.
All rows of the second day have the same value in Prev.Day.Close - Any of the values I actually have in Prev.Day.Close of previous day.
The same for every next row.
If I understand correctly, here's one way to do it:
require(xts)
# sample data
dt <- .POSIXct(seq(1, 86400*4, 3600), tz="UTC")-1
x <- xts(seq_along(dt), dt)
# get the last value for each calendar day
daily.last <- apply.daily(x, last)
# merge the last value of the day with the origianl data set
y <- merge(x, daily.last)
# now lag the last value of the day and carry the NA forward
# y$daily.last <- na.locf(lag(y$daily.last))
y$daily.last <- lag(y$daily.last)
y$daily.last <- na.locf(y$daily.last)
Basically, you want to get the end of day values, merge them with the original data, then lag them. That will align the previous end of day values with the beginning of the day.
I have a data set with sales by date, where date is not unique and not all dates are represented: my data set has dates (the date of the sale), quantity, and totalprice. This is an irregular time series.
What I'd like is a vector of sales by date, with every date represented exactly once, and quantities and totalprice summed by date, with zeros where there are no sales.
I have part of this now; I can make a sequence containing all dates:
first_date=as.Date(min(dates))
last_date=as.Date(max(dates))
all_dates=seq(first_date, by=1, to=last_date)
And I can aggregate the sales data by sale date:
quantitybydate=aggregate(quantity, by=list(as.Date(dates)), sum)
But not sure what to do next. If this were python I'd loop through one of the dates arrays, setting or getting the related quantity. But this being R I suspect there's a better way.
Make a dataframe with the all_dates as a column, then merge with quantitybydate using the by variable columns as the by.y, and all.x=TRUE. Then replace the NA's by 0.