Looking for advice on creating Tidy data from the start - r

I have a data set that will be growing. It is categorical observations (i.e., 1=yes, 2=no) by date and hour. Is the following an acceptable method of formatting for import to R or is there a better way?

I would use a template like this:
Using one column for the date makes it much easier to read/import into R. Also, the YYYY-MM-DD is the default format in R for date columns. Trying to write date and hour together in one column could be done but seems like it could be tedious and not as easy to see what is going on in the data. As was mentioned in the comments above, each observation should be on a separate row. Once you save the data as a csv, it will be easily imported into R.
Good luck.

Related

Reading dates as dates and not characters

The data I have been working with reads everything just fine, **except** for the date column. It always reads it as characters instead.
This would be fine except that, when I have lots of dates (like over 400 of them), then you can see something like this on a scatterplot:
Scatter Plot
In essence, I have two questions.
The first is, apart from using as.Date, which is fine when I'm needed temporary stuff, how do I permanently make R read the date column as legit dates? What I mean is, is there a way I can make that date column read as dates when I am using read.csv or read.excel?
When graphing, like the graph I have included here, how can I only include some of the labels throughout so that it won't be so cramped up? I still want all the data, but really do not want all those labels.
I was hoping to add the data file, but I am unaware of how to add excel/csv files on this website and my data set is quite long (n = 491). I do have 9 columns, 1 of which is the date column. The others are numbers or actual letters (the latter of which is in fact a character). I can add maybe a few rows just to help out.
Some of the data set

Date / Time calculations

I'm trying to calculate the difference in 2 dates / times. My problem is the each date and time is in a separate column (see screenshot). Following is the formula I have been using:
=IF(RC[-1]-RC[-4] =0,"",RC[-1]-RC[-4])
This worked until the 2 date columns weren't the same day.
I'm having trouble trying to combine the dates and time within the formula. I could write a macro to do this or I could combine each date and time paring into one column if that makes it easier. I'd rather not combine them as separate columns is easier for the user base.
Any help or suggestions would be greatly appreciated. Thanks in advance for your help....
First concatenate the Date and Time
=concatenate(text(A2,"mm/dd/yyyy")&" "&text(B2,"hh:mm:ss"))
then
subtract them
Other wise look at this. You can direct Subtract the dates and time without adding any extra columns
enter image description here
=(CONCATENATE(TEXT(C2,"mm/dd/yyyy")&" "&TEXT(D2,"hh:mm:ss AM/PM"))-CONCATENATE(TEXT(A2,"mm/dd/yyyy")&" "&TEXT(B2,"hh:mm:ss AM/PM")))*24
File Reference

Apply commands to variables in datasets in a list?

I have a list that contains 475 datasets with 14 identical columns. The "timestamp" column gives a date and time, but the formatting is not consistent from one dataset to the next. I need to get the formatting uniform across all datasets, but can't figure out how to apply a command to each "timestamp" variable.
I'm relatively new to R and feel like I'm missing something obvious... Help?
enter image description here
It's tough to know whether this will do the trick without access to the data. Try using the package lubridate. It can output different formats but it will accept any POSIXct and POSIXt. You'll have to loop over all of your 475 datasets. Here's a guess at a solution with the lubridate function ymd_hms():
library(lubridate)
for (i in 1:length(files)){
files[[i]]$timestamp <- ymd_hms(files[[i]]$timestamp)
}
This will format all of the timestamps as "2018-11-28 17:08:00", for example. See this cheatsheet for more formats.
Thanks for the info. I was after the basic code to implement any command on a variable (within a list within a list) and the date issue was one of several things I needed to mess with. The for loop did the trick. Thanks!

Difference in Days Between Two Date Columns in a Dataframe with Different Date Formats

Just looking for help working with some dates in R. Code for a simple data frame is below, with one column of start dates and one column of end dates. I would like to create a new column with the difference in days between each set of dates - start date and end date. Also, the dates are in different formats, so is there an easy way to convert all dates to a similar format? I've been reading about the lubridate package but haven't found anything yet on this particular situation that is easy for me to quickly learn as an R newbie. It would be great to link the answer to the dplyr pipeline as well, if possible, to calculate average number of days, etc.
Start.date<-c("05-May-15", "10-June-15", "July-12-2015")
End.date<-c("12-July-15", "2015-Aug-15", "Sept-12-2015")
Dates.df<-data.frame(Start.date,End.date)

Transforming Date Column after reading csv file

I have read a csv file in R. After reading I need to transform the Data column as Date object and Time as Time Object. How do I do it. The file is in memory at this point in time.
Also How do I get classes of all columns in a file? I tried lapply and sapply. It prints out name of column to console but not say anything abut class.
You're going to need the strptime function. The exact code is going to differ depending on the time format of the .csv file, however you can find the way to do it at the link below here.
So if your date-time is given like this:
2014/01/04 12:30:36
Then your code will look something like this:
strptime(data$column_name, format="%Y/%m/%d %H:%M:%S")
For finding the class, simply use the class() function.
These tools can be discovered fairly easily with a little bit of research. Next time put in a little bit more effort before asking your question.
Hope this helps.

Resources