I am trying to load multiple dataframes in R and all databases got at least one column (dateCreated) in POSIXct format. Some dataframes got multiple POSIXct columns. I want to convert all POSIXct columns to dates.
a<-sapply(i, is.POSIXct)
i[a]<-lapply(i[a], as.Date)
I don't find a function like is.POSIXct implemented yet. Any help?
You can try:
i[] <- lapply(i, function(x) if(inherits(x, "POSIXct")) as.Date(x) else x)
Notice how we take advantage of i[] to skip the first sapply.
Related
Well, first things first, I'm still a noob and am learning R. I've a a dataset with 0.9 million rows and 36 columns. Of these columns, a column, let's say DATE has dates in string format and an other column, let's say TZ has timezones as strings too.
What I'm trying to do is contract these two columns into one with type POSIXlt date, which has date, time, timezone. Here's my code for trying to get a vector of all the converted dates:
# Let's suppose my data exist in a variable "data" with dates in "DATE" column and timezones in "TZ"
indices <- NULL
dates <- NULL
zones <- unique (data$TZ)
for(i in seq_along(zones)){
indices <<- which(data$TZ==zones[i])
dates <<- c(dates, as.POSIXlt(data$DATE[indices], format = "%m/%d/%Y %H:%M:%S", tz = zones[i]))
}
Now, although there are ~1 million observations, it seems to do the job in 3-4 seconds. Only, that it "seems" to. The result I get is a list with NAs.
It does work when I try to convert a group individually, i.e., store result for every iteration in a different variable, or not run a for loop and do each iteration manually, storing each result in a different variable and, in the end, concatenate it all using c() function.
What am I doing wrong?
For anyone who might stumble here, I figured it.
You can't use c() on a POSIXlt object as it'll convert it into local timezone. (Not the reason for NAs but it's helpful.)
POSIXlt is stored as a list of different variables like mday, zone etc, due to which it's value cannot be used in a data frame element. Instead of POSIXlt, we can use POSIXct as that's internally represented as seconds from 1970-01-01.
Since we'll be replacing a data frame column with dates it's easier to do so with converting it into a tibble using dplyr::as_tibble() and then use dplyr::rbind() to combine the different results.
The reason of NAs being introduced is the lexical scoping in R. I used dates <<- c(dates, as.POSIXlt(data$DATE[indices], format = "%m/%d/%Y %H:%M:%S", tz = zones[i])) due to which, the value of i in zones[i] was NA or unknown.
So, the correct working code is -
dates <- NULL
for (i in seq_along(zones)) {
indices <- which(data$TZ==zones[i])
dts <- as.POSIXct(data$BGN_DATE[indices], format = "%m/%d/%Y %H%M", tz = zones[i])
dates <<- rbind(dates,as_tibble(dts))
}
#Further, to combine the dates into data frame
data <- arrange(data, TZ) %>% mutate(DATEandTime = dates$value) %>% select(-c("DATE","TZ"))
How to avoid R converting dates to numeric in a for loop? This is related to this question that shows the same behavior for mapply disabling mapply automatically converting Dates to numeric
date <- c('2008-02-20','2009-10-05')
date <- as.Date(date, format = '%Y-%m-%d')
date
[1] "2008-02-20" "2009-10-05"
for (i in date) print(i)
[1] 13929
[1] 14522
disabling mapply automatically converting Dates to numeric
Edit
I have reopened this question since the duplicate Looping over a datetime object results in a numeric iterator asks why R loops convert date and datetime objects to numeric, this question asks how to avoid that behavior. And the answer is to the point solving the problem, unlike the accepted and other answers in the duplicate, that correctly answer that other question.
The for loop coerces the sequence to vector, unless it is vector, list, or some other things. date is not a vector, and there is no such thing as vector of dates. So you need as.list to protect it from coercion to vector:
for (d in as.list(date)) print(d)
This question already has answers here:
change column values in list of dataframes in R
(2 answers)
Closed 2 years ago.
I am trying to reformat a string variable as date in R for several datasets that contain the same variable. When I run my code on only one dataframe, it works, but when I try to iterate over several dataframes using a for loop, I receive the error: Error: $ operator is invalid for atomic vectors. Here is my code:
# dataframes are df1, df2, and df3, all containing the column "date" in character format
list <- list(df1, df2, df3)
for (i in seq_along(list)) {
i$date <- as.Date(i$date, "%Y-%m-%d")
}
This results in the error mentioned above. I have tried with and without seq_along() and get the same error. When I run the following on only one dataframe, the code works:
df1$date <- as.Date(df1$date, "%Y-%m-%d")
Can someone please suggest a fix? Thank you
i is an integer, not something you can $-index on, perhaps you meant list[[i]]?
BTW, naming a variable the same as a base (and very-frequently used) function is a really bad idea and bad practice. I first recommend renaming it:
mylist <- list(df1, df2, df3)
for (i in seq_along(mylist)) {
mylist[[i]]$date <- as.Date(mylist[[i]]$date, "%Y-%m-%d")
}
Alternatively, you can use lapply to apply a function to each element, and save it back into the original list.
mylist <- lapply(mylist, function(L) {
L$date <- as.Date(L$date, "%Y-%m-%d")
L
})
One can shorten this a little (as reminded by #Onyambu) with
mylist <- lapply(mylist, transform , date = as.Date(date,"%Y-%m-%d"))
If you're familiar with the tidyverse dialect, transform is the base R equivalent of mutate. (If not, then ignore this note :-)
So here is a column of dates that I want to convert to a Date in R. Its a julian date where the first 4 digits means the year and the last three are the # of days from 1st of that year. There are numerous 0's , and the system will output that if there is no date associated. The function I wrote for it is given below.
julian=function(x)
{
if(x>0)
{
x=as.character(x)
year=substr(x,1,4)
days=as.integer(substr(x,5,7))-1
return(as.Date(paste("01/01/",year),format="%m/%d/%Y")+days)
}
return(NULL)
}
I would like to apply this across the column. I tried using lapply but everytime I try to convert the list to a dataframe, I get random integers. Any help would be great thanks!
sapply (or unlisting the result from lapply) converts your dates to numeric values. If your column is called df$date, you could to the following:
df$new_date <- do.call(c, lapply(df$date,julian))
Hope this helps!
I have two data frames of identical dimensions and column names.
I want on both convert the dates stored currently as characters to dates. Is there any way to automate this using a for loop? I though to something similar to the following script:
names <- c("old.df", "new.df")
# use Date format
for (i in names) {
i$Date <- as.Date(i$Date, "%d/%m/%Y")
i$Datetime <- as.Date(i$Datetime, "%d/%m/%Y %h:%m:%s.000")
i$ClickDatetime <- as.Date(i$ClickDatetime, "%d/%m/%Y %h:%m:%s.000")
}
This actually doesn't work and returns the following error message:
Error in i$Date : $ operator is invalid for atomic vectors
I don't think I can use the i object in this way. I'm wondering if there is a nice workaround you usually use to achieve the same goal is similar conditions.
Correct, it won't work like that because R sees i as the string, not the dataframe named by the string. Something like this should work:
df_list <- list(old_df, new_df)
# use Date format
for (df in df_list) {
df["Date"] <- as.Date(df["Date"], "%d/%m/%Y")
df["Datetime"] <- as.Date(df["Datetime"], "%d/%m/%Y %h:%m:%s.000")
df["ClickDatetime"] <- as.Date(df["ClickDatetime"], "%d/%m/%Y %h:%m:%s.000")
}
old_df <- df_list[[1]]
new_df <- df_list[[2]]
There are lots of ways to do this.
With only two dataframes, doing each one individually might be as good an option. With many dataframes with identical columns you could stack them up with rbind (putting in an identifier column to tell you which row belongs to which df), apply your changes and then split them apart again. Or put then in a list and build a function which can be used with lapply.