Apply commands to variables in datasets in a list? - r

I have a list that contains 475 datasets with 14 identical columns. The "timestamp" column gives a date and time, but the formatting is not consistent from one dataset to the next. I need to get the formatting uniform across all datasets, but can't figure out how to apply a command to each "timestamp" variable.
I'm relatively new to R and feel like I'm missing something obvious... Help?
enter image description here

It's tough to know whether this will do the trick without access to the data. Try using the package lubridate. It can output different formats but it will accept any POSIXct and POSIXt. You'll have to loop over all of your 475 datasets. Here's a guess at a solution with the lubridate function ymd_hms():
library(lubridate)
for (i in 1:length(files)){
files[[i]]$timestamp <- ymd_hms(files[[i]]$timestamp)
}
This will format all of the timestamps as "2018-11-28 17:08:00", for example. See this cheatsheet for more formats.

Thanks for the info. I was after the basic code to implement any command on a variable (within a list within a list) and the date issue was one of several things I needed to mess with. The for loop did the trick. Thanks!

Related

Looking for advice on creating Tidy data from the start

I have a data set that will be growing. It is categorical observations (i.e., 1=yes, 2=no) by date and hour. Is the following an acceptable method of formatting for import to R or is there a better way?
I would use a template like this:
Using one column for the date makes it much easier to read/import into R. Also, the YYYY-MM-DD is the default format in R for date columns. Trying to write date and hour together in one column could be done but seems like it could be tedious and not as easy to see what is going on in the data. As was mentioned in the comments above, each observation should be on a separate row. Once you save the data as a csv, it will be easily imported into R.
Good luck.

Exporting data from R that is in a list generated by a function

So I've used the decompose function and I want to export all the lists it generates not just the plot it creates. I tried converting the lists into either a matrix or data frame but then that gets rid of the date header and year columns so if someone knows how to convert it and keep the list formatting that would solve my issue I think.
Anyway, The closest I've got to being able to do this keeping the list format is by doing
capture.output(decompose, file = "filename.csv")
As you can see from the image attached though:
Sometimes the months arent all together in a row which is really not helpful or what I want. It also just puts it in one column and I'm having to go into the excel after and do the text to column option which is going to get old really quickly.
Any help would be greatly appriciated. I'm really new to R so apologise if there is an obvious fix I'm missing.

Converting data between classes in R: large CSV file, prices recorded like "$10.00"

So full disclosure, I am new to R and programming in general. Because of that, it is very hard for me to search when I have problems because I am not even sure what keywords to use. I am learning, and all I am hoping for y'all to do is point me in the right direction.
I have a very large csv file that I imported into R. Around 2 million observations (don't worry, I am not planning on using all 2 million). The only problem is that the people recording the data formatted the file to record to prices as "$10.00". Because of this, R recognizes the data has a factor, and also treats each individual price as a separate variable because of the dollar sign. I would like to reformat this column as a numeric variable.
I am sure there is some way to go about reformatting this in R, the only problem is I am not sure which functions I need. Sorry for the very basic question, I have just hit a wall a figured I would reach out.
Any and all help is much appreciated!
Thank you!
We could also use sub
as.numeric(sub('\\D+', '', x))
#[1] 10.00 11.24 15.22
data
x<-c("$10.00","$11.24","$15.22")
Suppose that your data looks like this:
x<-c("$10.00","$11.24","$15.22")
You can use the substring function to trim the initial dollar sign (which will still leave you with strings) and then use as.numeric to turn it to a numeric vector.
newx<-as.numeric(substring(x,2))
will produce a vector named newx with value
c(10.00,11.24,15.22)
We tell the substring to start at the 2nd character (strings in R are 1-indexed), and then cast to numeric.
In your data frame (suppose it is called df), you can replace the column like
df$MoneyColumn <- as.numeric(substring(df$MoneyColumn,2))

Strangeness with filtering in R and showing summary of filtered data

I have a data frame loaded using the CSV Library in R, like
mySheet <- read.csv("Table.csv", sep=";")
I now can print a summary on that mySheet object
summary(mySheet)
and it will show me a summary for each column, for example, one column named Diagnose has the unique values RCM, UCM, HCM and it shows the number of occurences of each of these values.
I now filter by a diagnose, like
subSheet <- mySheet[mySheet$Diagnose=='UCM',]
which seems to be working, when I just type subSheet in the console it will print only the rows where the value has been matched with 'UCM'
However, if I do a summary on that subSheet, like
summary(subSheet)
it still 'knows' about the other two possibilities RCM and HCM and prints those having a value of 0. However, I expected that the new created object will NOT know about the possible values of the original mySheet I initially loaded.
Is there any way to get rid of those other possible values after filtering? I also tried subset but this one just seems to be some kind of shortcut to '[' for the interactive mode... I also tried DROP=TRUE as option, but this one didn't change the game.
Totally mind squeezing :D Any help is highly appreciated!
What you are dealing with here are factors from reading the csv file. You can get subSheet to forget the missing factors with
subSheet$Diagnose <- droplevels(subSheet$Diagnose)
or
subSheet$Diagnose <- subSheet$Diagnose[ , drop=TRUE]
just before you do summary(subSheet).
Personally I dislike factors, as they cause me too many problems, and I only convert strings to factors when I really need to. So I would have started with something like
mySheet <- read.csv("Table.csv", sep=";", stringsAsFactors=FALSE)

Operating with time intervals like 08:00-08:15

I would like to import a time-series where the first field indicates a period:
08:00-08:15
08:15-08:30
08:30-08:45
Does R have any features to do this neatly?
Thanks!
Update:
The most promising solution I found, as suggested by Godeke was the cron package and using substring() to extract the start of the interval.
I'm still working on related issues, so I'll update with the solution when I get there.
CRAN shows a package that is actively updated called "chron" that handles dates. You might want to check that and some of the other modules found here: http://cran.r-project.org/web/views/TimeSeries.html
xts and zoo handle irregular time series data on top of that. I'm not familiar with these packages, but a quick look over indicates you should be able to use them fairly easily by splitting on the hyphen and loading into the structures they provide.
So you're given a character vector like c("08:00-08:15",08:15-08:30) and you want to convert to an internal R data type for consistency? Check out the help files for POSIXt and strftime.
How about a function like this:
importTimes <- function(t){
t <- strsplit(t,"-")
return(lapply(t,strptime,format="%H:%M:%S"))
}
This will take a character vector like you described, and return a list of the same length, each element of which is a POSIXt 2-vector giving the start and end times (on today's date). If you want you could add a paste("1970-01-01",x) somewhere inside the function to standardize the date you're looking at if it's an issue.
Does that help at all?

Resources