Convert Factor to only Time in R - r

I have a got a dataframe having factor columns as shown below
df
ColA
14:59:33.0000000
15:59:33.0000000
16:59:33.0000000
17:59:33.0000000
ColA is a factor. Can we concert them to only time
Expected Output
df
ColA
14:59:33
15:59:33
16:59:33
17:59:33

Using strptime and format.
format(strptime(v, "%T"), "%T")
# [1] "14:59:33" "15:59:33" "16:59:33" "17:59:33"
Data
v <- structure(1:4, .Label = c("14:59:33.0020000", "15:59:33.0000000",
"16:59:33.0000000", "17:59:33.0000000"), class = "factor")

We can use as.ITime
library(data.table)
as.ITime(as.character(v))
#[1] "14:59:33" "15:59:33" "16:59:33" "17:59:33"
data
v <- structure(1:4, .Label = c("14:59:33.0020000", "15:59:33.0000000",
"16:59:33.0000000", "17:59:33.0000000"), class = "factor")

Related

Subset data frame by rows containing the system date

I would like to subset a data frame by selecting only the rows with the current system date.
For example, I have this data frame:
df = data.frame("var" = c("A", "A", "B", "B"),
"date" = c("2020-03-01", "2020-03-17",
"2020-03-01", "2020-03-17"))
df$date = as.POSIXct(df$date, format = "%Y-%m-%d")
If today is 2020-03-17, I would like to subset the rows that contain only the current date.
I have tried the following:
df_today = df[which(df$date == Sys.Date()),]
Which gives the error:
Warning message: In which(df$date == Sys.Date()) :
Incompatible methods ("Ops.POSIXt", "Ops.Date") for "=="
I have also tried:
df[which(df$date == as.POSIXct(Sys.Date())),]
Which returns an empty data frame. What I found works is if I coerce the date column as a character and then subset the rows in this way:
df$date = as.character(df$date)
df[which(df$date == as.character(Sys.Date)),]
This can work, but I would like to know where I am going wrong with my my previous attempts and if there is a better way than converting back and forth between character and POSIXct?
Thank you in advance for any input!
Class "Date" is not the same as class "POSIXct", you need to convert first to the former using local Sys.timezone().
df[as.Date(df$date, tz=Sys.timezone()) == Sys.Date(),]
# var date
# 2 A 2020-03-17
# 4 B 2020-03-17
Data used
df <- structure(list(var = structure(c(1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), date = structure(c(1583017200, 1584399600,
1583017200, 1584399600), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = c(NA,
-4L), class = "data.frame")
library(dplyr)
df$date = as.Date(df$date, format = "%Y-%m-%d")
df %>% filter(date==Sys.Date())

Add date of character class with another date

Can we add the date of Character class to another date (lag of specific date). I want to reduce by 05:30:00
df
Date
12:48:36
12:48:37
13:48:36
Required dateframe
df
Date
07:48:36
07:48:37
08:48:36
df <- structure(list(Date = structure(1:3, .Label = c("12:48:36", "12:48:37",
"13:48:36"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
You could use as.ITime from data.table
library(data.table)
setDT(df)
df[, Date := as.ITime(Date) - as.ITime('05:00:00')]
df
# Date
# 1: 07:48:36
# 2: 07:48:37
# 3: 08:48:36
Edit: If you have stored Date as a factor (as in this example) you need to convert to character first
df[, Date := as.character(Date)]

How do I aggregate data in R in a way that returns the entire row that satisfies the aggregation condition? [no dplyr]

I have data that looks like this:
ID FACTOR_VAR INT_VAR
1 CAT 1
1 DOG 0
I want to aggregate by ID such that the resulting dataframe contains the entire row that satisfies my aggregate condition. So if I aggregate by the max of INT_VAR, I want to return the whole first row:
ID FACTOR_VAR INT_VAR
1 CAT 1
The following will not work because FACTOR_VAR is a factor:
new_data <- aggregate(data[,c("ID", "FACTOR_VAR", "INT_VAR")], by=list(data$ID), fun=max)
How can I do this? I know dplyr has a group by function, but unfortunately I am working on a computer for which downloading packages takes a long time. So I'm looking for a way to do this with just vanilla R.
If you want to keep all the columns, use ave instead :
subset(df, as.logical(ave(INT_VAR, ID, FUN = function(x) x == max(x))))
You can use aggregate for this. If you want to retain all the columns, merge can be used with it.
merge(aggregate(INT_VAR ~ ID, data = df, max), df, all.x = T)
# ID INT_VAR FACTOR_VAR
#1 1 1 CAT
data
df <- structure(list(ID = c(1L, 1L), FACTOR_VAR = structure(1:2, .Label = c("CAT", "DOG"), class = "factor"), INT_VAR = 1:0), class = "data.frame", row.names = c(NA,-2L))
We can do this in dplyr
library(dplyr)
df %>%
group_by(ID)
filter(INT_VAR == max(INT_VAR))
Or using data.table
library(data.table)
setDT(df)[, .SD[INT_VAR == max(INT_VAR)], by = ID]

Convert monthly data from one type to another

I have a column with monthly date that has such type (number of month, _, year):
Date
9_2018
1_2013
12_2014
etc.
I want to convert this date format to a date of the following form (year, month):
New_Date
201809
201301
201412
How can I do this?
We can use zoo::as.yearmon to convert the date and then use format to get data in the required format.
format(zoo::as.yearmon(df$Date, "%m_%Y"), "%Y%m")
#[1] "201809" "201301" "201412"
Or can be done in base R as well by pasting an arbitrary date to year-month value we have.
format(as.Date(paste0("1_", df$Date), "%d_%m_%Y"), "%Y%m")
data
df <- structure(list(Date = structure(c(3L, 1L, 2L), .Label = c("1_2013",
"12_2014", "9_2018"), class = "factor")),
class = "data.frame",row.names = c(NA, -3L))
Padding zeros for month and then splitting at the "_":
library(stringr)
mon <- sapply(strsplit(Date, "_"), FUN="[", 1)
mon <- str_pad(mon, width=2, pad="0")
year <- sapply(strsplit(Date, "_"), FUN="[", 2)
paste0(year,mon)
[1] "201809" "201301" "201412"

Replace month abbreviation with number

For example: df$Date
SN Date
1 07-Mar-2019
2 06-Feb-2019
how do I set a condition to replace the value "Mar" = "03" and "Feb" = "02" in df$Date?
So that the output will be:
SN Date
1 07-03-2019
2 06-02-2019
anyone can help? Thank you
You can use as.Date. You can read about different formats at ?strptime
df$Date <- as.Date(df$Date, "%d-%b-%Y")
df
# SN Date
#1 1 2019-03-07
#2 2 2019-02-06
Or if you don't want to worry about format use dmy from lubridate
df$Date <- lubridate::dmy(df$Date)
Or anydate function from anytime.
df$Date <- anytime::anydate(df$Date)
To get output exactly in the same format as shown, we can do
df$Date <- format(as.Date(df$Date, "%d-%b-%Y"), "%d-%m-%Y")
df
# SN Date
#1 1 07-03-2019
#2 2 06-02-2019
data
df <- structure(list(SN = 1:2, Date = structure(2:1, .Label = c("06-Feb-2019",
"07-Mar-2019"), class = "factor")), class = "data.frame", row.names = c(NA, -2L))

Resources