replacing NA's in a Large POSIXct with Sys.time()

replacing NA's in a Large POSIXct with Sys.time() - r

I have a large POSIXct of around 70,000 elements.
resolutionDate <- c(as.POSIXct(data$Resolution.Date, format = '%b %d, %Y'))
The code above changes the values from Jun 5, 2018 3:21 PM to 2018-06-05.
However, some values are NA and I would like to replace all NA's with Sys.time(), for today's date.
I tried using the replace() method as so,
replace(resolutionDate, if(resolutionData == "NA"), Sys.time())
But did not work..
How can I do this?

Something like this?
# generate time vector
a <- as.POSIXct(1:70000,origin="1970-01-01")
# replace the 5th with a NA value and show first 10 elements
a[5] <- NA
a[1:10]
# replace all na values with the current system time
a[is.na(a)] <- Sys.time()
# show result
a[1:10]

Related

How to change specific dates in POSIXct/POSIXt format to NA

I have imported an SPSS file, which contains several date/time variables of the following class:
[1] "POSIXct" "POSIXt"
The user-defined missing value for these variables is 8888-08-08 00:00:00. How can I convert this value to NA for the set of relevant date/time variables in R?
I tried running df$datetime[df$datetime == "8888-08-08"] <- NA as well as df$datetime[df$datetime == as.Date("8888-08-08")] <- NA to no avail.

As these are in POSIXct, use the same type to convert and assign to NA
df$datetime[df$datetime == as.POSIXct("8888-08-08 00:00:00")] <- NA
data
set.seed(24)
df <- data.frame(datetime = sample(c(Sys.time(), Sys.time() + 1:5,
as.POSIXct("8888-08-08 00:00:00")), 20, replace =TRUE))

Method for recoding all user-defined missing values from SPSS to NA during import in R? [duplicate]

I have imported an SPSS file, which contains several date/time variables of the following class:
[1] "POSIXct" "POSIXt"
The user-defined missing value for these variables is 8888-08-08 00:00:00. How can I convert this value to NA for the set of relevant date/time variables in R?
I tried running df$datetime[df$datetime == "8888-08-08"] <- NA as well as df$datetime[df$datetime == as.Date("8888-08-08")] <- NA to no avail.

As these are in POSIXct, use the same type to convert and assign to NA
df$datetime[df$datetime == as.POSIXct("8888-08-08 00:00:00")] <- NA
data
set.seed(24)
df <- data.frame(datetime = sample(c(Sys.time(), Sys.time() + 1:5,
as.POSIXct("8888-08-08 00:00:00")), 20, replace =TRUE))

How to change syntax of column in R?

I have df1:
ID Time
1 16:00:00
2 14:30:00
3 9:23:00
4 10:00:00
5 23:59:00
and would like to change the current 'character' column 'Time' into a an 'integer' as below:
ID Time
1 1600
2 1430
3 923
4 1000
5 2359

We could replace the :'s, make numeric, divide by 100, and convert to integer like this:
df1$Time = as.integer(as.numeric(gsub(':', '', df1$Time))/100)

You want to use as.POSIXct().
Functions to manipulate objects of classes "POSIXlt" and "POSIXct" representing calendar dates and times.
R Documents as.POSIXct()
So in the case of row 1: as.POSIXct("16:00:00", format = "%H%M")
Then use as.numeric if you need it to truly be an int.
Converts a character matrix to a numeric matrix.
R Docs as.Numeric()

df1 <- data.frame(Time = "16:00:00")
df1[, "Time"] <- as.numeric(paste0(substr(df1[, "Time"], 1, 2), substr(df1[, "Time"], 4, 5)))
print(df1)
# Time
# 1 1600

There are many ways to process this, but here's one example:
library(dplyr)
df1 <- mutate(df1, Time = gsub(":", "", Time) # replace colons with blanks
df1 <- mutate(df1, Time = as.numeric(Time)/100) # coerce to numeric type, divide by 100

Conditional subset of data from list base on date R

I have several .csv files containing hourly data. Each file represents data from a point in space. The start and end date is different in each file.
The data can be read into R using:
lstf1<- list.files(pattern=".csv")
lst2<- lapply(lstf1,function(x) read.csv(x,header = TRUE,stringsAsFactors=FALSE,sep = ",",fill=TRUE, dec = ".",quote = "\""))
head(lst2[[800]])
datetime precip code
1 2003-12-30 00:00:00 NA M
2 2003-12-30 01:00:00 NA M
3 2003-12-30 02:00:00 NA M
4 2003-12-30 03:00:00 NA M
5 2003-12-30 04:00:00 NA M
6 2003-12-30 05:00:00 NA M
datetime is YYYY-MM-DD-HH-MM-SS, precip is the data value, codecan be ignored.
For each dataframe (df) in lst2 I want to select data for the period 2015-04-01 to 2015-11-30 based on the following conditions:
1) If precip in a df contains all NAswithin this period, delete it (do not select)
2) If precip is not all NAs select it.
The desired output (lst3) contains the sub-setted data for the period 2015-04-01 to 2015-11-30.
All dataframes in lst3 should have equal length with days and hourswithout precipdenoted as NA
The I can write the files in lst3 to my directory using something like:
sapply(names(lst2),function (x) write.csv(lst3[[x]],file = paste0(names(lst2[x]), ".csv"),row.names = FALSE))
The link to a sample file can be found here (~200 KB)

It's a little hard to understand exactly what you are trying to do, but this example (using dplyr, which has nice filter syntax) on the file you provided should get you close:
library(dplyr)
df <- read.csv ("L112FN0M.262.csv")
df$datetime <- as.POSIXct(df$datetime, format="%d/%m/%Y %H:%M")
# Get the required date range and delete the NAs
df.sub <- filter(df, !is.na(precip),
datetime >= as.POSIXct("2015-04-01"),
datetime < as.POSIXct("2015-12-01"))
# Check if the subset has any rows left (it will be empty if it was full of NA for precip)
if nrow(df.sub > 0) {
df.result <- filter(df, datetime >= as.POSIXct("2015-04-01"),
datetime < as.POSIXct("2015-12-01"))
# Then add df.result to your list of data frames...
} # else, don't add it to your list
I think you are saying that you want to retain NAs in the data frame if there are also valid precip values--you only want to discard if there are NAs for the entire period. If you just want to strip all NAs, then just use the first filter statement and you are done. You obviously don't need to use POSIXct if you've already got your dates encoded correctly another way.
EDIT: w/ function wrapper so you can use lapply:
library(dplyr)
# Get some example data
df <- read.csv ("L112FN0M.262.csv")
df$datetime <- as.POSIXct(df$datetime, format="%d/%m/%Y %H:%M")
dfnull <- df
dfnull$precip <- NA
# list of 3 input data frames to test, 2nd one has precip all NA
df.list <- list(df, dfnull, df)
# Function to do the filtering; returns list of data frames to keep or null
filterprecip <- function(d) {
if (nrow(filter(d, !is.na(precip), datetime >= as.POSIXct("2015-04-01"), datetime < as.POSIXct("2015-12-01"))) >
0) {
return(filter(d, datetime >= as.POSIXct("2015-04-01"), datetime < as.POSIXct("2015-12-01")))
}
}
# Function to remove NULLS in returned list
# (Credit to Hadley Wickham: http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8102.html)
compact <- function(x) Filter(Negate(is.null), x)
# Filter the list
results <- compact(lapply(df.list, filterprecip))
# Check that you got a list of 2 data frames in the right date range
str(results)

Based on what you've written, is sounds like you're just interested in subsetting your list of files if data exists in the precip column for this specific date range.
> valuesExist <- function(df,start="2015-04-01 0:00:00",end="2015-11-30 23:59:59"){
+ sub.df <- df[df$datetime>=start & df$datetime>=end,]
+ if(sum(is.na(sub.df$precip)==nrow(df)){return(FALSE)}else{return(TRUE)}
+ }
> lst2.bool <- lapply(lst2, valuesExist)
> lst2 <- lst2[lst2.bool]
> lst3 <- lapply(lst2, function(x) {x[x$datetime>="2015-04-01 0:00:00" & x$datetime>="2015-11-30 23:59:59",]}
> sapply(names(lst2), function (x) write.csv(lst3[[x]],file = paste0(names(lst2[x]), ".csv"),row.names = FALSE))
If you want to have a dynamic start and end time, toss a variable with these values into the valueExist function and replace the string timestamp in the lst3 assignment with that same variable.
If you wanted to combine the two lapply loops into one, be my guest, but I prefer having a boolean variable when I'm subsetting.

Conditional subsetting of data frame based on HH:MM:SS formatted column

So I have a large df with a column called "session" that is in the format
HH:MM:SS (e.g. 0:35:24 for 35 mins and 24 secs).
I want to create a subset of the df based on a condition like > 2 mins or < 90 mins from the "sessions" column
I tried to first convert the column format into Date:
df$session <- as.Date(df$session, "%h/%m/%s")
I was going to then use the subset() to create my conditional subset but the above code generates a column of NAs.
subset.morethan2min <-subset(df, CONDITION)
where CONDITION is df$session >2 mins?
How should I manipulate the "session" column in order to be able to subset on a condition as described?
Sorry very new to R so welcome any suggestions.
Thanks!
UPDATE:
I converted the session column to POSIXct then used function minute() from lubridate package to get numerical values for hour and minute components. Not a near solution but seems to work for my needs right now. Still would welcome a neater solution though.
df$sessionPOSIX <- as.POSIXct(strptime(df$session, "%H:%M:%S"))
df$minute <- minute(df$sessionPOSIX)
subset.morethan2min <- subset(df, minute > 2)

A date is not the same as a period. The easiest way to handle periods is to use the lubridate package:
library(lubridate)
df$session <- hms(df$session)
df.morethan2min <- subset(df, df$session > period(2, 'minute'))
hms() converts your duration stamps into period objects, and period() creates a period object of the specified length for comparison.
As an aside, there are numerous other ways to subset data frames, including the [ operator and functions like filter() in the dplyr package, but that's beyond what you need for your current purposes.

Probably simpler ways to do this, but here's one solution:
set.seed(1234)
tDF <- data.frame(
Val = rnorm(100),
Session = paste0(
sample(0:23,100,replace=TRUE),
":",
sample(0:59,100,replace=TRUE),
":",
sample(0:59,100,replace=TRUE),
sep="",collapse=NULL),
stringsAsFactors=FALSE
)
##
toSec <- function(hms){
Long <- as.POSIXct(
paste0(
"2013-01-01 ",
hms),
format="%Y-%m-%d %H:%M:%S",
tz="America/New_York")
3600*as.numeric(substr(Long,12,13))+
60*as.numeric(substr(Long,15,16))+
as.numeric(substr(Long,18,19))
}
##
tDF <- cbind(
tDF,
Seconds = toSec(tDF$Session),
Minutes = toSec(tDF$Session)/60
)
##
> head(tDF)
Val Session Seconds Minutes
1 -1.2070657 15:21:41 55301 921.6833
2 0.2774292 12:58:24 46704 778.4000
3 1.0844412 7:32:45 27165 452.7500
4 -2.3456977 18:26:46 66406 1106.7667
5 0.4291247 12:56:34 46594 776.5667
6 0.5060559 17:27:11 62831 1047.1833
Then you can just subset your data easily by doing subset(Data, Minutes > some_number).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

replacing NA's in a Large POSIXct with Sys.time() - r

Something like this? # generate time vector a <- as.POSIXct(1:70000,origin="1970-01-01") # replace the 5th with a NA value and show first 10 elements a[5] <- NA a[1:10] # replace all na values with the current system time a[is.na(a)] <- Sys.time() # show result a[1:10]

Related

How to change specific dates in POSIXct/POSIXt format to NA

Method for recoding all user-defined missing values from SPSS to NA during import in R? [duplicate]

How to change syntax of column in R?

Conditional subset of data from list base on date R

Conditional subsetting of data frame based on HH:MM:SS formatted column

Categories

Resources