For loop to convert variables to Date format - r

I have two data frames of identical dimensions and column names.
I want on both convert the dates stored currently as characters to dates. Is there any way to automate this using a for loop? I though to something similar to the following script:
names <- c("old.df", "new.df")
# use Date format
for (i in names) {
i$Date <- as.Date(i$Date, "%d/%m/%Y")
i$Datetime <- as.Date(i$Datetime, "%d/%m/%Y %h:%m:%s.000")
i$ClickDatetime <- as.Date(i$ClickDatetime, "%d/%m/%Y %h:%m:%s.000")
}
This actually doesn't work and returns the following error message:
Error in i$Date : $ operator is invalid for atomic vectors
I don't think I can use the i object in this way. I'm wondering if there is a nice workaround you usually use to achieve the same goal is similar conditions.

Correct, it won't work like that because R sees i as the string, not the dataframe named by the string. Something like this should work:
df_list <- list(old_df, new_df)
# use Date format
for (df in df_list) {
df["Date"] <- as.Date(df["Date"], "%d/%m/%Y")
df["Datetime"] <- as.Date(df["Datetime"], "%d/%m/%Y %h:%m:%s.000")
df["ClickDatetime"] <- as.Date(df["ClickDatetime"], "%d/%m/%Y %h:%m:%s.000")
}
old_df <- df_list[[1]]
new_df <- df_list[[2]]
There are lots of ways to do this.
With only two dataframes, doing each one individually might be as good an option. With many dataframes with identical columns you could stack them up with rbind (putting in an identifier column to tell you which row belongs to which df), apply your changes and then split them apart again. Or put then in a list and build a function which can be used with lapply.

Related

Automatically name the elements of a list after importing using lapply

I have a list of dataframes which I imported using
setwd("C:path")
fnames <- list.files()
csv <- lapply(fnames, read.csv, header = T, sep=";")
I will need to do this multiple times creating more lists, I would like to keep all the dataframes available separately (i.e. I don't want or need to combine them), I simply used the above code to import them all in quickly. But accessing them now is a little cumbersome and not intuitive (to me at least). Rather having to use [[1]] to access the first element, is there a way that I could amend the first bit of code so that I can name the elements in the list, for example based off a Date which is a variable in each of the dataframes in the list? The dates are stored as chr in the format "dd-mm-yyyy" , so I could potentially just name the dataframes using dd-mm from the Date variable.
You can extract the required data from the 1st value in the Date column of each dataframe and assign it as name of the list.
names(csv) <- sapply(csv, function(x) substr(x$Date[1], 1, 5))
Or extract the data using regex.
names(csv) <- sapply(csv, function(x) sub("(\\w+-\\w+).*", "\\1", x$Date[1]))
We can use
names(csv) <- sapply(csv, function(x) substring(x$Date[1], 1, 5))

R for loop question: "Error: $ operator is invalid for atomic vectors" when looping over dataframes [duplicate]

This question already has answers here:
change column values in list of dataframes in R
(2 answers)
Closed 2 years ago.
I am trying to reformat a string variable as date in R for several datasets that contain the same variable. When I run my code on only one dataframe, it works, but when I try to iterate over several dataframes using a for loop, I receive the error: Error: $ operator is invalid for atomic vectors. Here is my code:
# dataframes are df1, df2, and df3, all containing the column "date" in character format
list <- list(df1, df2, df3)
for (i in seq_along(list)) {
i$date <- as.Date(i$date, "%Y-%m-%d")
}
This results in the error mentioned above. I have tried with and without seq_along() and get the same error. When I run the following on only one dataframe, the code works:
df1$date <- as.Date(df1$date, "%Y-%m-%d")
Can someone please suggest a fix? Thank you
i is an integer, not something you can $-index on, perhaps you meant list[[i]]?
BTW, naming a variable the same as a base (and very-frequently used) function is a really bad idea and bad practice. I first recommend renaming it:
mylist <- list(df1, df2, df3)
for (i in seq_along(mylist)) {
mylist[[i]]$date <- as.Date(mylist[[i]]$date, "%Y-%m-%d")
}
Alternatively, you can use lapply to apply a function to each element, and save it back into the original list.
mylist <- lapply(mylist, function(L) {
L$date <- as.Date(L$date, "%Y-%m-%d")
L
})
One can shorten this a little (as reminded by #Onyambu) with
mylist <- lapply(mylist, transform , date = as.Date(date,"%Y-%m-%d"))
If you're familiar with the tidyverse dialect, transform is the base R equivalent of mutate. (If not, then ignore this note :-)

Creating a function to iterate over tibble elements within a large list in R

I am trying to create a function to automatize some basic formatting I need to do before I combine multiple datasets into an xts. I can do a bulk read of the files and create a large list of tibbles. But I'm having a hard time creating a function to iterate over that list. When I read individual files into a df, I have been running the following:
df<-df[!(duplicated(df$DateTime)),]
dfx<-xts(df[,-1], order.by = as.POSIXct(df$DateTime, format="%d-%b-%y %H:%M:%S"))
Then I do an merge.xts of all of the 'dfx' objects. One issue I have with the data is that the DateTime does not always match up between files and the above method gives me a large xts with NAs, which is what I prefer to another type of merge/rbind. I would like to create a function to do this over and over, especially because reading each file into separate data frames and then merging is grueling now that I have to combine 10+. All of my attempts have not been successful and now I am just stuck. :/ Any help would be appreciated!
If it is a list, we can use lapply to loop over the list and use an anonymous function call to apply the function
lst2 <- lapply(lst1, function(df) {
df<-df[!(duplicated(df$DateTime)),]
xts(df[,-1], order.by = as.POSIXct(df$DateTime, format="%d-%b-%y %H:%M:%S"))
})
and then use Reduce to do the merge
Reduce(merge, lst2)

Function to update dataframe name stored in variable

I am having to convert some dates to Character formats for a project I am working on, to make the code cleaner I wanted to write a function that you pass the name of the dataframe (and possibly the column name, though in this example it doesn't change so can be hard coded) to and it does the format for each, rather than having to repeat the full line for each dataframe I am formatting the column in.
Is this possible to do? I have done a lot of googling and can't seem to find an answer.
kpidataRM$Period <- format(kpidataRM$Period, "%b-%y")
kpidataAFM$Period <- format(kpidataAFM$Period, "%b-%y")
kpidataNATIONAL$Period <- format(kpidataNATIONAL$Period, "%b-%y")
kpidataHOD$Period <- format(kpidataHOD$Period, "%b-%y")
To answer your specific question, you could create a very simple function like this:
# Your function here takes as input the dataframe name (df) and formats the predefined column (Period)
new_function <- function(df){
df$Period <- format(df$Period, "%b-%y")
return(df)
}
and then run
df1 <- new_function(df1)
df2 <- new_function(df2)
for each of your dataframes (in your example df1 would be kpidataRM for instance). If you would like to include the column as a variable as well in your function you can write it like this:
# Your function here takes as input the dataframe name (df) and column name (col) and formats it.
new_function2 <- function(df, col){
df[[col]] <- format(df[[col]], "%b-%y")
return(df)
}
However, I would say though that this is not the best approach in this case, as you only seem to want to format a set of columns from a set of dataframes, in a specific way. What i would instead propose, exactly as Roland suggested, is to make a list of dataframes and iterate through each element. A simple example would look like this:
# Push all your dataframes in a list (dflist)
dflist <- list(df1,df2)
# Apply in this list a function that changes the column format (lapply)
dflist <- lapply(dflist, function(x){x[[Period]] <- format(x[[Period]], "%b-%y")})
Hope this works for you.

Convert all POSIXct/POSIXlt columns to Date

I am trying to load multiple dataframes in R and all databases got at least one column (dateCreated) in POSIXct format. Some dataframes got multiple POSIXct columns. I want to convert all POSIXct columns to dates.
a<-sapply(i, is.POSIXct)
i[a]<-lapply(i[a], as.Date)
I don't find a function like is.POSIXct implemented yet. Any help?
You can try:
i[] <- lapply(i, function(x) if(inherits(x, "POSIXct")) as.Date(x) else x)
Notice how we take advantage of i[] to skip the first sapply.

Resources