How to not convert POSIXct to numeric in R loop [duplicate] - r

I can initialize a data.frame via
df <- data.frame(a=numeric(), b=character())
But how do I define a column of type POSIXct?
df <- data.frame(a=numeric(), b=character(), c=POSIXct())
won't work.

You can try
df <- data.frame(a=numeric(), b=character(), c=as.POSIXct(character()))
Similarly, you can create a POSIXct column of NAs in a data frame with > 0 rows by creating a new column with as.POSIXct(NA).

An additional tip to the above initialization: If you begin rbind() activities to add rows to this empty data frame, you may encounter an error like the following if you follow this pattern:
oneDF <- rbind(oneDF,twoDF,stringsAsFactors=FALSE)
Error in as.POSIXct.default(value) :
do not know how to convert 'value' to class "POSIXct"
I finally discovered that removing the stringsAsFactors=FALSE allowed for the POSIXct value (both integer time and time zone) to transfer to the target DF.
oneDF <- rbind(oneDF,twoDF)
examining the result:
unclass(oneDF$mytime)
[1] 1282089600
attr(,"tzone")
[1] "GMT"

Related

Convert character dates in r (weird format)

I have columns that are named "X1.1.21", "X12.31.20" etc.
I can get rid of all the "X"s by using the substring function:
names(df) <- substring(names(df), 2, 8)
I've been trying many different methods to change "1.1.21" into a date format in R, but I'm having no luck so far. How can I go about this?
R doesn't like column names that start with numbers (hence you get X in front of them). However, you can still force R to allow column names that start with number by using check.names = FALSE while reading the data.
If you want to include date format as column names, you can use :
df <- data.frame(X1.1.21 = rnorm(5), X12.31.20 = rnorm(5))
names(df) <- as.Date(names(df), 'X%m.%d.%y')
names(df)
#[1] "2021-01-01" "2020-12-31"
However, note that they look like dates but are still of type 'character'
class(names(df))
#[1] "character"
So if you are going to use the column names for some date calculation you need to change it to date type first.
as.Date(names(df))

How to subtract datetimes and store them in a separate column?

i am working with csv file and i have a column with name "statistics_lastLocatedTime" as shown in
csv file image
i would like to subtract second row of "statistics_lastLocatedTime" from first row; third row from second row and so on till the last row and then store all these differences in a separate column and then combine this column to the other related columns as shown in the code given below:
##select related features
data <- read.csv("D:/smart tech/store/2016-10-11.csv")
(columns <- data[with(data, macAddress == "7c:11:be:ce:df:1d" ),
c(2,10,11,38,39,48,50) ])
write.csv(columns, file = "updated.csv", row.names = FALSE)
## take time difference
date_data <- read.csv("D:/R/data/updated.csv")
(dates <- date_data[1:40, c(2)])
NROW(dates)
for (i in 1:NROW(dates)) {
j <- i+1
r1 <- strptime(paste(dates[i]),"%Y-%m-%d %H:%M:%S")
r2 <- strptime(paste(dates[j]),"%Y-%m-%d %H:%M:%S")
diff <- as.numeric(difftime(r1,r2))
print (diff)
}
## combine time difference with other related columns
combine <- cbind(columns, diff)
combine
now the problem is that i am able to get the difference of rows but not able to store these values as a column and then combine that column with other related columns. please help me. thanks in advance.
This is a four-liner:
Define a custom class 'myDate', and a converter function for your custom datetime, as per Specify custom Date format for colClasses argument in read.table/read.csv
Read in the datetimes as actual datetimes; no need to repeatedly convert later.
Simply use the vectorized diff operator on your date column (it sees their type, and automatically dispatches a diff function for POSIXct Dates). No need for for-loops:
.
setClass('myDate') # this is not strictly necessary
setAs('character','myDate', function(from) {
as.POSIXct(from, format='%d-%m-%y %H:%S', tz='UTC') # or whatever timezone
})
data <- read.csv("D:/smart tech/store/2016-10-11.csv",
colClasses=c('character','myDate','myDate','numeric','numeric','integer','factor'))
# ...
data$date_diff <- c(NA, diff(data$statistics_lastLocatedTime))
Note that diff() produces a result of length one shorter than vector that we diff'ed. Hence we have to pad it (e.g. with a leading NA, or whatever you want).
Consider directly assigning the diff variable using vapply. Also, there is no need for the separate date_data df as all operations can be run on the columns df. Notice too the change in time format to align to the format currently in dataframe:
columns$diff <- vapply(seq(nrow(columns)), function(i){
r1 <- strptime(paste(columns$statistics_lastLocatedTime[i]),"%d-%m-%y %H:%M")
r2 <- strptime(paste(columns$statistics_lastLocatedTime[i+1]),"%d-%m-%y %H:%M")
diff <- difftime(r1, r2)
}, numeric(1))

Converting data frame column from character to numeric

I have a data frame that I construct as such:
> yyz <- data.frame(a = c("1","2","n/a"), b = c(1,2,"n/a"))
> apply(yyz, 2, class)
a b
"character" "character"
I am attempting to convert the last column to numeric while still maintaining the first column as a character. I tried this:
> yyz$b <- as.numeric(as.character(yyz$b))
> yyz
a b
1 1
2 2
n/a NA
But when I run the apply class it is showing me that they are both character classes.
> apply(yyz, 2, class)
a b
"character" "character"
Am I setting up the data frame wrong? Or is it the way R is interpreting the data frame?
If we need only one column to be numeric
yyz$b <- as.numeric(as.character(yyz$b))
But, if all the columns needs to changed to numeric, use lapply to loop over the columns and convert to numeric by first converting it to character class as the columns were factor.
yyz[] <- lapply(yyz, function(x) as.numeric(as.character(x)))
Both the columns in the OP's post are factor because of the string "n/a". This could be easily avoided while reading the file using na.strings = "n/a" in the read.table/read.csv or if we are using data.frame, we can have character columns with stringsAsFactors=FALSE (the default is stringsAsFactors=TRUE)
Regarding the usage of apply, it converts the dataset to matrix and matrix can hold only a single class. To check the class, we need
lapply(yyz, class)
Or
sapply(yyz, class)
Or check
str(yyz)

Use dplyr::mutate and lubridate::force_tz based on arguments from data frame columns

I am trying to use lubridate::force_tz to add timezone information to timestamps (date+time) formatted as strings (as.character()). Both are stored as two columns in a data frame:
require(lubridate)
require(dplyr)
row1<-c(as.character(now()),"Etc/UTC")
row2<-c(as.character(now()+5),"America/Chicago")
df<-as.data.frame(rbind(row1,row2))
names(df)<-c("dt","tz")
x<-force_tz(as.POSIXct(as.character(now())),"Etc/UTC") #works
df<-df%>%mutate(newDT=force_tz(as.POSIXct(dt),tz)) #fails
I get: Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "c('matrix', 'character')"
Following Stibu's comments, I tried (an un-R like) approach with an iteration:
for (i in seq(from=1,to=length(df$dt))){
timestamp<-as.character(df[i,1])
tz<-as.character(df[i,2])
print(tz)
newdt<-force_tz(as.POSIXct(timestamp),tz)
df[i,3]<-newdt
print(attr(df[i,3],"tzone"))
df$timezone<-attr(df[i,3],"tzone")
}
This extracts the values correctly, but seems to get stuck with setting the value of the tz to the first value encountered - weirdly:
[1] "Etc/UTC"
[1] "Etc/UTC"
[1] "America/Chicago"
[1] "Etc/UTC"
I would have expected the last printout to result in "America/Chicago"
The df then looks like:
> df
dt tz newDT timezone
1 2016-04-13 23:07:45 Etc/UTC 2016-04-13 23:07:45 Etc/UTC
2 2016-04-13 23:07:50 America/Chicago 2016-04-14 04:07:50 Etc/UTC
You have actually two issues in your code that I will discuss separately below.
dplyr works with data frames
Your df is a matrix, not a data frame. But mutate() (and functions from dplyr in general) works with data frames. The error message simply tells you that mutate() does not know what to do with a matrix.
You can solve this by converting df to a data frame:
df <- as.data.frame(df)
names(df)<-c("dt","tz")
A remark regarding names(): This function can be used to get/set the column names of a data frame. For matrices, the corresponding function is colnames(). You used names() on a matrix, which did not set the column names of the matrix. Therefore, the names of the data frame are also not set after conversion.
You could also create a data frame from the start as follows:
df <- data.frame(dt = as.character(c(now(), now() + 5)),
tz = c("Etc/UTC", "America/Chicago"),
stringsAsFactors = FALSE)
Note that you need to define the contents column-wise, not row-wise as you did.
If you use the data frame df, there will be no error from mutate().
One time zone per vector
Unfortunately, there is a second issue. What you want to do simply cannot be done. The reason is the following.
Let's convert the first column of df to POSIXct with time zone CET:
ts <- as.POSIXct(df$dt, tz = "CET")
ts
## [1] "2016-04-13 14:42:26 CEST" "2016-04-13 14:42:31 CEST"
Let's try to do the same with two time zones:
ts <- as.POSIXct(df$dt, tz = c("CET", "UTC"))
## Error in strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
## invalid 'tz' value
This does not work. The reason is that there is a single time zone per vector and not a time zone per element in the vector. Look at the attributes of ts:
attributes(ts)
## $class
## [1] "POSIXct" "POSIXt"
##
## $tzone
## [1] "CET"
The time zone is set as an attribute of the entire vector and it is not a property of each element.

How to initialize data.frame with column of type POSIXct?

I can initialize a data.frame via
df <- data.frame(a=numeric(), b=character())
But how do I define a column of type POSIXct?
df <- data.frame(a=numeric(), b=character(), c=POSIXct())
won't work.
You can try
df <- data.frame(a=numeric(), b=character(), c=as.POSIXct(character()))
Similarly, you can create a POSIXct column of NAs in a data frame with > 0 rows by creating a new column with as.POSIXct(NA).
An additional tip to the above initialization: If you begin rbind() activities to add rows to this empty data frame, you may encounter an error like the following if you follow this pattern:
oneDF <- rbind(oneDF,twoDF,stringsAsFactors=FALSE)
Error in as.POSIXct.default(value) :
do not know how to convert 'value' to class "POSIXct"
I finally discovered that removing the stringsAsFactors=FALSE allowed for the POSIXct value (both integer time and time zone) to transfer to the target DF.
oneDF <- rbind(oneDF,twoDF)
examining the result:
unclass(oneDF$mytime)
[1] 1282089600
attr(,"tzone")
[1] "GMT"

Resources