Extract AM/PM from Time in R [duplicate] - r

This question already has answers here:
Extract part of string before the first semicolon
(4 answers)
Create categories by comparing a numeric column with a fixed value
(3 answers)
Closed 2 years ago.
Hi I have a sample data frame like this
Time <- c('0:00', '1:00', '2:00', '13:00', '14:00')
Time = data.frame(x)
So what I would like to do is create another column "AMPM" based on the "Time" column. "AMPM" should able to show if the time is in AM or PM
The final output should look like this
Time AMPM
1 0:00 AM
2 1:01 AM
3 2:09 AM
4 13:52 PM
5 14:06 PM
6 15:33 PM
7 16:27 PM
8 21:40 PM

You can remove everything after colon, convert data to integer and assign 'PM' to all the values greater than 11 and "AM" otherwise.
df <- data.frame(Time = c('0:00', '1:00', '2:00', '13:00', '14:00'))
df$AMPM <- ifelse(as.integer(sub(':.*', '', df$Time)) > 11, 'PM', 'AM')
#Without ifelse
#c('AM', 'PM')[(as.integer(sub(':.*', '', x)) > 11) + 1]
df
# Time AMPM
#1 0:00 AM
#2 1:00 AM
#3 2:00 AM
#4 13:00 PM
#5 14:00 PM

Related

using if else statements to manipulate dates [duplicate]

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 6 years ago.
I am trying to do an if else statement to say if the value is less than 10 add a zero in front, if not leave it as is. I am trying to get all of my dates to be 2 digits. Please assist.
if(df$col < 10){
paste '0'
else df$col
}
I was trying to break it down into different columns
EventID SampleDate SampleTime
130466 3/19/2008 12:30:00
131392 4/30/2008 08:45:00
131658 5/14/2008 10:00:00
117770 6/11/2008 08:45:00
118680 7/23/2008 09:15:00
118903 8/6/2008 09:00:00
SampleDatech year month day2
3/19/2008 2008 3 19
4/30/2008 2008 4 30
5/14/2008 2008 5 14
6/11/2008 2008 6 11
7/23/2008 2008 7 23
8/6/2008 2008 8 6
If you are trying to output just the day with a leading zero to a new column, you can use a combination of strftime and as.Date.
df$day = strftime(as.Date(df$SampleDate, "%m/%d/%Y"), "%d")
Or if you want to keep the whole date, but add the leading zero to the day you can do this.
df$NewDate = strftime(as.Date(df$SampleDate, "%m/%d/%y"), "%m/%d/%Y")

month.abb[] is resulting in incorrect results

I have the following data set. I am trying to split the date_1 field into month and days. Then converting the month number to a month name.
date_1,no_of_births_1
1/1,1482
2/2,1213
3/23,1220
4/4,1319
5/11,1262
6/18,1271
I am using month.abb[] for converting the month number to name. But instead of providing month name for each value of month number, the result is generating wrong array.
for example: month.abb[2] is generating Apr instead of Feb.
date_1 no_of_births_1 V1 V2 month
1 1/1 1482 1 1 Jan
2 2/2 1213 2 2 Apr
3 3/23 1220 3 23 May
4 4/4 1319 4 4 Jun
5 5/11 1262 5 11 Jul
6 6/18 1271 6 18 Aug
below is the code i am using,
birthday<-read.csv("Birthday_s.csv",header = TRUE)
birthday$date_1<-as.character(birthday$date_1)
#split the data
listx<-sapply(birthday$date_1,function(x) strsplit(x,"/"))
library(base)
#convert to data frame
mat<-as.data.frame(matrix(unlist(listx),ncol = 2, byrow = TRUE))
#combine birthday and mat
birthday2<-cbind(birthday,mat)
#convert month number to month name
birthday2$month<-sapply(birthday2$V1, function(x) month.abb[as.numeric(x)])
When I run your code, I get the correct months. However, your code is more complicated than necessary. Here are two ways to extract month and day from date_1:
First, when you read the data, use stringsAsFactors=FALSE, which prevents strings from getting converted to factors.
birthday <- read.csv("Birthday_s.csv",header = TRUE, stringsAsFactors=FALSE)
Extract month and days using date functions:
library(lubridate)
birthday$month = month(as.POSIXct(birthday$date_1, format="%m/%d"), abbr=TRUE, label=TRUE)
birthday$day = day(as.POSIXct(birthday$date_1, format="%m/%d"))
Extract month and days using Regular Expressions:
birthday$month = month.abb[as.numeric(gsub("([0-9]{1,2}).*", "\\1", birthday$date_1))]
birthday$day = as.numeric(gsub(".*/([0-9]{1,2}$)", "\\1", birthday$date_1))

How to split two numerical values into two columns? [duplicate]

This question already has answers here:
Split data frame string column into multiple columns
(16 answers)
Closed 7 years ago.
I have tried my best using the split function and others, but to no avail.
We can use read.table/read.csv with the sep option.
read.table(text=as.character(df1$datetime), sep=' ',
col.names=c('date', 'time'),
header=FALSE, stringsAsFactors=FALSE)
# date time
#1 01/01/2011 0:00
#2 01/01/2011 1:00
#3 01/01/2011 2:00
#4 01/01/2011 3:00
#5 01/01/2011 4:00
Or with tidyr
library(tidyr)
separate(df1, datetime, into= c('date', 'time'), sep=' ')

Removing multiple data entries based on a total number of entries per day

I start with a data frame titled 'dat' in R that looks like the following:
datetime lat long id extra step
1 8/9/2014 13:00 31.34767 -81.39117 36 1 31.38946
2 8/9/2014 17:00 31.34767 -81.39150 36 1 11155.67502
3 8/9/2014 23:00 31.30683 -81.28433 36 1 206.33342
4 8/10/2014 5:00 31.30867 -81.28400 36 1 11152.88177
What I need to do is find out what days have less than 3 entries and remove all entries associated with those days from the original data.
I initially did this by the following:
library(plyr)
datetime<-dat$datetime
###strip the time down to only have the date no hh:mm:ss
date<- strptime(datetime, format = "%m/%d/%Y")
### bind the date to the old data
dat2<-cbind(date, dat)
### count using just the date so you can ID which days have fewer than 3 points
datecount<- count(dat2, "date")
datecount<- subset(datecount, datecount$freq < 3)
This end up producing the following:
row.names date freq
1 49 2014-09-26 1
2 50 2014-09-27 2
3 135 2014-12-21 2
Which is great, but I cannot figure out how to remove the entries from these days with less than three entries from the original 'dat' because this is a compressed version of the original data frame.
So to try and deal with this I have come up with another way of looking at the problem. I will use the strptime and cbind from above:
datetime<-dat$datetime
###strip the time down to only have the date no hh:mm:ss
date<- strptime(datetime, format = "%m/%d/%Y")
### bind the date to the old data
dat2<-cbind(date, dat)
And I will utilize the column titled "extra". I would like to create a new column which is the result of summing the values in this "extra" column by the simplified strptime dates. But find a way to apply this new value to all entries from that date, like the following:
date datetime lat long id extra extra_sum
1 2014-08-09 8/9/2014 13:00 31.34767 -81.39117 36 1 3
2 2014-08-09 8/9/2014 17:00 31.34767 -81.39150 36 1 3
3 2014-08-09 8/9/2014 23:00 31.30683 -81.28433 36 1 3
4 2014-08-10 8/10/2014 5:00 31.30867 -81.28400 36 1 4
5 2014-08-10 8/10/2014 13:00 31.34533 -81.39317 36 1 4
6 2014-08-10 8/10/2014 17:00 31.34517 -81.39317 36 1 4
7 2014-08-10 8/10/2014 23:00 31.34483 -81.39283 36 1 4
8 2014-08-11 8/11/2014 5:00 31.30600 -81.28317 36 1 2
9 2014-08-11 8/11/2014 13:00 31.34433 -81.39300 36 1 2
The code that creates the "extra_sum" column is what I am struggling with.
After creating this I can simply subset my data to all entries that have a value >2. Any help figuring out how to use my initial methodology or this new one to remove days with fewer than 3 entries from my initial data set would be much appreciated!
The plyr way.
library(plyr)
datetime <- dat$datetime
###strip the time down to only have the date no hh:mm:ss
date <- strptime(datetime, format = "%m/%d/%Y")
### bind the date to the old data
dat2 <-cbind(date, dat)
dat3 <- ddply(dat2, .(date), function(df){
if (nrow(df)>=3) {
return(df)
} else {
return(NULL)
}
})
I recommend using the data.table package
library(data.table)
dat<-data.table(dat)
dat$Date<-as.Date(as.character(dat$datetime), format = "%m/%d/%Y")
dat_sum<-dat[, .N, by = Date ]
dat_3plus<-dat_sum[N>=3]
dat<-dat[Date%in%dat_3plus$Date]

Adding the values of second column based on date and time of first column

I have a data frame with 2 variables. the first column "X" represents date and time with format dd/mm/yyyy hh:mm, the values in the second column "Y" are the electricity meter reading which are taken each after 5 minutes. Now I want to add the values of each half an hour. For instance
X Y
13/12/2014 12:00 1
13/12/2014 12:05 2
13/12/2014 12:10 1
13/12/2014 12:15 2
13/12/2014 12:20 2
13/12/2014 12:25 1
At the end i want to present a result as:
13/12/2014 12:00 9
13/12/2014 12:30 12
and so on...
Here's an alternative approach which actually takes X in count (as per OP comment).
First, we will make sure X is of proper POSIXct format so we could manipulate it correctly (I'm using the data.table package here for convenience)
library(data.table)
setDT(df)[, X := as.POSIXct(X, format = "%d/%m/%Y %R")]
Then, we will aggregate per cumulative minutes instances of 00 or 30 within X while summing Y and extracting the first value of X per each group. I've made a more complicated data set in order illustrate more complicated scenarios (see below)
df[order(X), .(X = X[1L], Y = sum(Y)), by = cumsum(format(X, "%M") %in% c("00", "30"))]
# cumsum X Y
# 1: 0 2014-12-13 12:10:00 6
# 2: 1 2014-12-13 12:30:00 6
# 3: 2 2014-12-13 13:00:00 3
Data
df <- read.table(text = "X Y
'13/12/2014 12:10' 1
'13/12/2014 12:15' 2
'13/12/2014 12:20' 2
'13/12/2014 12:25' 1
'13/12/2014 12:30' 1
'13/12/2014 12:35' 1
'13/12/2014 12:40' 1
'13/12/2014 12:45' 1
'13/12/2014 12:50' 1
'13/12/2014 12:55' 1
'13/12/2014 13:00' 1
'13/12/2014 13:05' 1
'13/12/2014 13:10' 1", header = TRUE)
Some explanations
The by expression:
format(X, "%M") gets the minutes out of X (see ?strptime)
Next step is check if they match 00 or 30 (using %in%)
cumsum separates these matched values into separate groups which we aggregate by by putting this expression into the by statement (see ?data.table)
The jth epression
(X = X[1L], Y = sum(Y)) is simply getting the first value of X per each group and the sum of Y per each group.
The ith expression
I've added order(X) in order to make sure the data set is properly ordered by date (one of the main reasons I've converted X to proper POSIXct format)
For a better understanding on how data.table works, see some tutorials here
t1 <- tapply(df$Y, as.numeric(as.POSIXct(df$X, format = '%d/%m/%Y %H:%M')) %/% 1800, sum)
data.frame(time = as.POSIXct(as.numeric(names(t1))*1800 + 1800, origin = '1970-01-01'), t1)
t1 groups the values using integer division by 1800 (30 minutes)
Considering your data frame as df. You can try -
unname(tapply(df$Y, (seq_along(df$Y)-1) %/% 6, sum))

Resources