i am working with csv file and i have a column with name "statistics_lastLocatedTime" as shown in
csv file image
i would like to subtract second row of "statistics_lastLocatedTime" from first row; third row from second row and so on till the last row and then store all these differences in a separate column and then combine this column to the other related columns as shown in the code given below:
##select related features
data <- read.csv("D:/smart tech/store/2016-10-11.csv")
(columns <- data[with(data, macAddress == "7c:11:be:ce:df:1d" ),
c(2,10,11,38,39,48,50) ])
write.csv(columns, file = "updated.csv", row.names = FALSE)
## take time difference
date_data <- read.csv("D:/R/data/updated.csv")
(dates <- date_data[1:40, c(2)])
NROW(dates)
for (i in 1:NROW(dates)) {
j <- i+1
r1 <- strptime(paste(dates[i]),"%Y-%m-%d %H:%M:%S")
r2 <- strptime(paste(dates[j]),"%Y-%m-%d %H:%M:%S")
diff <- as.numeric(difftime(r1,r2))
print (diff)
}
## combine time difference with other related columns
combine <- cbind(columns, diff)
combine
now the problem is that i am able to get the difference of rows but not able to store these values as a column and then combine that column with other related columns. please help me. thanks in advance.
This is a four-liner:
Define a custom class 'myDate', and a converter function for your custom datetime, as per Specify custom Date format for colClasses argument in read.table/read.csv
Read in the datetimes as actual datetimes; no need to repeatedly convert later.
Simply use the vectorized diff operator on your date column (it sees their type, and automatically dispatches a diff function for POSIXct Dates). No need for for-loops:
.
setClass('myDate') # this is not strictly necessary
setAs('character','myDate', function(from) {
as.POSIXct(from, format='%d-%m-%y %H:%S', tz='UTC') # or whatever timezone
})
data <- read.csv("D:/smart tech/store/2016-10-11.csv",
colClasses=c('character','myDate','myDate','numeric','numeric','integer','factor'))
# ...
data$date_diff <- c(NA, diff(data$statistics_lastLocatedTime))
Note that diff() produces a result of length one shorter than vector that we diff'ed. Hence we have to pad it (e.g. with a leading NA, or whatever you want).
Consider directly assigning the diff variable using vapply. Also, there is no need for the separate date_data df as all operations can be run on the columns df. Notice too the change in time format to align to the format currently in dataframe:
columns$diff <- vapply(seq(nrow(columns)), function(i){
r1 <- strptime(paste(columns$statistics_lastLocatedTime[i]),"%d-%m-%y %H:%M")
r2 <- strptime(paste(columns$statistics_lastLocatedTime[i+1]),"%d-%m-%y %H:%M")
diff <- difftime(r1, r2)
}, numeric(1))
Related
I have columns that are named "X1.1.21", "X12.31.20" etc.
I can get rid of all the "X"s by using the substring function:
names(df) <- substring(names(df), 2, 8)
I've been trying many different methods to change "1.1.21" into a date format in R, but I'm having no luck so far. How can I go about this?
R doesn't like column names that start with numbers (hence you get X in front of them). However, you can still force R to allow column names that start with number by using check.names = FALSE while reading the data.
If you want to include date format as column names, you can use :
df <- data.frame(X1.1.21 = rnorm(5), X12.31.20 = rnorm(5))
names(df) <- as.Date(names(df), 'X%m.%d.%y')
names(df)
#[1] "2021-01-01" "2020-12-31"
However, note that they look like dates but are still of type 'character'
class(names(df))
#[1] "character"
So if you are going to use the column names for some date calculation you need to change it to date type first.
as.Date(names(df))
Hope you can help me out!
For all of my dates in a column, I would like to get a range for each date - 14 days. So for example, if the first date in my column is 29-04-2021, I would like to get the dates from 15-04-2021 until 29-04-2021. I found the function seq that does this but to do this for all the values in my column I need to put the seq function in a for loop.
This is what I tried but the output is only the last row and the date format changed. This is my code (test_IIVAC$'Vacdate 1' is my column with the dates):
df <- data.frame()
for(i in 1:length(test_IIVAC$`Vacdate 1`)){
te <- as.Date(seq(test_IIVAC$`Vacdate 1`[i]-14, test_IIVAC$`Vacdate 1`[i], by = "day"))
df1 <- rbind(df, te)
}
Can anyone help me out in getting the ranges of all the dates in the column and place them in one dataframe with the Date format? The desired output would be:
Output
Thanks a bunch!
You can use any of apply command to generate a sequence of date values and add it as a new column in the original dataframe.
test_IIVAC$dates <- lapply(df$a, function(x) seq(x-14, x, by = 'day'))
A lot of my work involves unioning new datasets to old, but often the standardized "date" name I have in the master dataset won't match up to the date name in the new raw data (which may be "Date", "Day", "Time.Period", etc...). To make life easier, I'd like to create a custom function that will:
Detect the date columns in the new and old datasets
Standardize the column name to "date" (oftentimes the raw new data will come in with the date column named "Date" or "Day" or "Time Period", etc..)
Here are a couple datasets to play with:
Dates_A <- seq(from = as.Date("2017-01-01"), to = as.Date("2017-12-31"), by = "day")
Dates_B <- seq(from = as.Date("2017-01-01"), to = as.Date("2017-12-31"), by = "day")
Numbers <- rnorm(365)
df_a <- data.frame(Dates_A, Numbers)
df_b <- data.frame(Dates_B, Numbers)
My first inclination is to try a for-loop that searches for the class of the columns by index and automatically renames any with Class = Date to "date", but ideally I'd also like the function to solve for the examples below, where the class of the date column may be character or factor.
Dates_C <- as.character(Dates_B)
df_c <- data.frame(Dates_C, Numbers)
df_d <- data.frame(Dates_C, Numbers, stringsAsFactors = FALSE)
If you have any ideas or can point me in the right direction, I'd really appreciate it!
Based on the description, we could check whether a particular column is Date class, get a logical index and assign the name of that column to 'date'
is.date <- function(x) inherits(x, 'Date')
names(df_a)[sapply(df_a, is.date)] <- 'date'
Assuming that there is only a single 'date' column in the dataset. If there are multiple 'date' columns, inorder to avoid duplicate column names, use make.unique
names(df_a) <- make.unique(names(df_a))
akrun's solution works for columns of class Date but not for columns of classes factor or character like you ask at the end of the question, so maybe the following can be of use to you.
library(lubridate)
checkDates <- function(x) {
op <- options(warn = -1) # needed to keep stderr clean
on.exit(options(op)) # reset to original value
!all(is.na(ymd(x)))
}
names(df_c)[sapply(df_c, checkDates)] <- 'date'
names(df_d)[sapply(df_d, checkDates)] <- 'date'
Note that maybe you can get some inspiration on both solutions and combine them into one function. If inherits returns TRUE all done else try ymd.
I have this sample code to create a new data frame 'new_data' from the existing data frame 'my_data'.
new_data = NULL
n = 10 #this number correspond to the number of rows in my_data
conditions = c("Bas_A", "Bas_T", "Oper_A", "Oper_T") # the vector characters correspond to the target column names in my_data
for (cond in conditions){
for (i in 1:n){
new_data <- rbind(new_data, c(cond, my_data$cond[i]))
}
}
The problem is that my_data$cond (where cond is a variable, and not the column name) is not accepted.
How can I call a column of a data frame by using, after the dollar sign, a variable value?
To access a column, use:
my_data[ , cond]
or
my_data[[cond]]
The ith row can be accessed with:
my_data[i, ]
Combine both to obtain the desired value:
my_data[i, cond]
or
my_data[[cond]][i]
I guess you need get().
For example,
get(x,list), where list is the list and x is the variable(can be a string), which equals list$x.
But in get(x,list), x can be a variable while using $, x cannot be a variable.
$ works on columns, not individual column objects. It's a form of vectorization. The code
corrections$BookDate = as.Date(corrections$BookDate, format = "%m/%d/%Y")
converts the contents of the BookDate column of the corrections table from strings to Date objects. It performs it in one operation, assignment.
Do the following and it will fix your problem:
new_data <- rbind(new_data, c(cond, my_data$cond))
I am trying to understand why R behaves differently with the "aggregate" function. I wanted to average 15m-data to hourly data. For this, I passed the 15m-data together with a pre-designed "hour" array (4 times the same date per hour, taking the original POSIXct array) to the aggregate function.
After some time, I realized that the function was behaving odd (well, probably the data was odd, but why?) when giving over the date-array with
strftime(data.15min$posix, format="%Y-%m-%d %H")
However, if I handed over the data with
cut(data.15min$posix, "1 hour")
the data was averaged correctly.
Below, a minimal example is embedded, including a sample of the data.
I would be happy to understand what I did wrong.
Thanks in advance!
d <- 3
bla <- read.table("test_daten.dat",header=TRUE,sep=",")
data.15min <- NULL
data.15min$posix <- as.POSIXct(bla$dates,tz="UTC")
data.15min$o3 <- bla$o3
hourtimes <- unique(as.POSIXct(paste(strftime(data.15min$posix, format="%Y-%m-%d %H"),":00:00",sep=""),tz="Universal"))
agg.mean <- function (xx, yy, rm.na = T)
# xx: parameter that determines the aggregation: list(xx), e.g. hour etc.
# yy: parameter that will be aggregated
{
aa <- yy
out.mean <- aggregate(aa, list(xx), FUN = mean, na.rm=rm.na)
out.mean <- out.mean[,2]
}
#############
data.o3.hour.mean <- round(agg.mean(strftime(data.15min$posix, format="%m/%d/%y %H"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Wrong
##############
data.o3.hour.mean <- round(agg.mean(cut(data.15min$posix, "1 hour"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Correct
Data:
Download data
Too long for a comment.
The reason your results look different is that aggregate(...) sorts the results by your grouping variable(s). In the first case,
strftime(data.15min$posix, format="%m/%d/%y %H")
is a character vector with poorly formatted dates (they do not sort properly). So the first row corresponds to the "date" "01/01/96 00".
In your second case,
cut(data.15min$posix, "1 hour")
generates actual POSIXct dates, which sort properly. So the first row corresponds to the date: 1995-11-04 13:00:00.
If you had used
strftime(data.15min$posix, format="%Y-%m-%d %H")
in your first case you would have gotten the same result as using cut(...)