R chart by date - r

I've got transactional data from a SQL query which I turn into a data frame. The first column of the df contains UNIX timestamps (format="%Y/%d/%m %H:%M") which I would like to use to create a graphics plot using par to display 1 unique lineplot per date. At the moment I am fumbling around with splitting column 1 and comparing with previous row to look for a change then assigning a dummy indicator to use in my plot command.
Thanks,
Will

Somewhat hard to answer without any example data but I'll take a shot.
I'm guessing your date looks like this: "2009-03-04 17:45"
It's probably being read as character. You can verify the class of each column of your data frame by running str(data.frame)
Using package stringr, you can just read the y/d/m and convert that to a Date class like so:
library(stringr)
date="2009-03-04 17:45"
date=as.Date(str_replace_all(str_sub(date,3,10),"-","/"), "%y/%d/%m")
You can then use date as a group in ggplot2 to plot one line per date. You could also create separate panels (one per date) using + facet_wrap(~date) in your ggplot call.

start by just getting the date part from your timestamp
SELECT *,DATE(timestampcolumn) as thedate FROM yourtable;
Convert date column to factor
mydf <- transform(mydf,as.factor(thedate))
Plot it with e.g. xyplot
library(lattice)
xyplot(varx~vary|thedate,data=mydf)

Related

R: How to fill in values in a new column using the values of another column

I have a dataset in R with a column called event_date.
The variables look like this:
31-Dec-18
30-Dec-18
28-Dec-18
And so on.
I want to create a new column called date where I separate out the day of the event. So it looks like:
31
30
28
I'm pretty new to working with R, so I'm wondering whether a for loop is the way to go, or if there's a more efficient way I don't know about.
if the dates are of type character
df$date <- sub(".*-.*-(.*)","\\1", df$event_date)
otherwise you can look into creating data type objects in R.
If the days are two digit, then substr would be faster
df$day <- substr(df$event_date, 1, 2)
Or convert to Date class and extract the day
df$day <- format(as.Date(df$event_date, "%d-%b-%y"), "%d")

How to sort 'mm-yyyy' format in R?

Date
01-2018
02-2018
01-2019
02-2019
I tried using arrange(df, Date)
It gets arranged as
01-2018
01-2019
02-2018
02-2019
Here is one base R option. We can try ordering the data frame using an on the fly date based on the text strings.
df <- data.frame(Date=c("01-2018", "02-2018", "01-2019", "02-2019"),
stringsAsFactors=FALSE)
df[order(as.Date(paste0("01-", df$Date), format="%d-%m-%Y")), ]
[1] "01-2018" "02-2018" "01-2019" "02-2019"
Note that I form a complete date by arbitrarily using the first of the month, for each text date, using as.Date with the correct format mask to generate a bona fide date.
For best results, consider storing your dates in a proper date column, or, if you must use text, use an ISO format which would at least sort properly.

Research panel analysis in R

I am a newbie to Stackoverflow, stats and R, so apologies for the simple nature of my question/request for advice:
I am completing analysis of a large data-set comprising of 2 files: a txt containing internal temperature data and a second SPSS data file.
To kick off, I have exported the SPSS data into CSV format and stripped back to contain just the few columns i think i need - house type and occupant type. I have imported all the temperature data and merged the two using a common identifier.
So now I have a merged data frame, containing all the data i need (to begin with) to start completing some analysis.
First question: I have year, date and time as separate columns. However the time column has imported with an incorrect date before "30/12/1899". How can i delete the date part of all observations from this column, but retain the time?
Second question Similar to above, the date colum shows the correct date, but has the time following, which is not correct (every observation showing 00:00:00), how can I delete all the times from this column?
Third question How can I combine the correct Time with correct date, to end up with DD/MM/YYYY HH:MM:SS
Fourth question Should i create subsets of merged to facilitate the analysis: ie: each house type (seperate subsets) vs temp, time and occupant type?
Dates can be brought in as they are instead of factor via the parameter as.is = TRUE i.e.
data <- read.csv(choose.files(), as.is = T)
I would try reading the csv file again and then working with the date time. It will come in as a chron or some format like that and you'll need to change it to Posixct, well I do anyway. To view help on a function, type question mark followed by function name i.e. ?as.posixct.
Date.Time: chron "2018/08/04 10:10:00", ... # '%Y-%m-%d %H:%M:%S' current format as read in from my system.
# Date format you want is '%d/%m/%Y %H:%M'
# tz='' is an empty time zone can't remember exactly you probably should read up on
# finally on the left side of the assign <- I am creating a new column Date.
# You can over write the old column, Date.Time, but can't hurt to learn how to delete
# a column.
data$Date <- as.POSIXct(date$Date.Time, tz='', '%d/%m/%Y %H:%M:%S')
# Now remove the original column. -Date.Time take out Date.Time, if you leave the
# minus out, the data will contain the subset Date.Time and no other columns.
data <- subset(data, select = -Date.Time)
Try this first, and I will look into removing time with in a date field. I have an idea, but I'd rather see if this helps with the problem first.
Though if you do want to merge the Year, month, day columns, you could try something like this, seem like a logical thing to do, you can always keep the original format and delete it later. It's not hurting anything.
data$YMD <- paste(data$Year," ",
data$Month, " ",
data$Day)
Also while you are at it. Install a library called dplyr, written by the same guy that did ggplot2, Hadley....
install.packages("dplyr")
# The add it to the top of your file like ggplot.
library(dplyr)

Time series of counts of records with ggplot

I have a series of records of events (in this case deaths). They are now in a data frame, with one column containing the date as class Date and other columns containing details as factors (e.g. where the death occurred). They records are sorted into data order.
I am trying to use ggplot to plot time series of the numbers of deaths. But I get various error messages with different approaches. I thought the minimum that would create a bar chart of numbers of deaths by date would be something like:
F1 <- ggplot(DeathsSorted.df, aes('Date of death'))
F1 + geom_bar()
But all that produces is a greyed out block with no bars.
What's worse is that this code seemed to work before I updated to the latest version of R Studio and R.
This works fine:
df <- data.frame(date=as.Date(c("2017-09-08","2017-09-09","2017-09-08",
"2017-09-10","2017-09-08","2017-09-10","2017-09-01","2017-09-11")) )
F1 <- ggplot(df, aes(x=date))
F1 + geom_bar()
You just had a typo; date should not between quotes.
Next time, post some fake data (like my df), so people can help you better/more easily.

Having difficulty with the start argument for ts( ). Losing date formatting

First, new to programming.
I built a table with 3 columns and I want to evaluate based on time series, so I'm playing around with the ts() function. The first column of my table is DATE as.date in the format "yyyy-mm-dd". I have one observation per variable per day. I've apply ts() to the table and tried start=1 (first observation?) and checked head(df) and the DATE column is sending back loose sequence of numbers that I can't identify (12591, 12592, 12593, 12594, 12597, 12598).
Could it be that the as.date is messing things up?
The line I use is:
ts(dy2, start=1, frequency= 1)
I've also been playing with the deltat argument. In the help file it suggests 1/12 for monthly data. Naturally, I tried 1/365 (for daily data), but have yet to be successful.
As suggested by G. Grothendieck you can use the zoo package. Try this:
require(zoo)
dates <- as.Date(dy2[,1], format = "%Y-%m-%d")
x1 <- zoo(dy2[,2], dates)
plot(x2)
x2 <- zoo(dy2[,3], dates)
plot(x1)
If this does not work, please provide further details about your data as requested by MrFlick. For example, print the output of dput(dy2) or at least head(dy2).

Resources