I need to replace awkward strings in R, specifically the times that are in a weird format. The data looks like this:
Date | Time | AmbientTemp
2000-01-01 | 11:00 a | 25
2000-01-01 | 11:30 a | 25.5
2000-01-01 | 11:00 p | 20
2000-01-01 | 11:30 p | 19.5
The a and p mean AM and PM respectively (obviously).
lubridate and base R cannot convert these dates to a correct format. Thus, I turned to the cumbersome str_replace_all function (from package stringr) to convert ALL my times in a large dataframe: >130000 records.
Example functions:
uploadDat$Time = str_replace_all(uploadDat$Time,"11:00 a","11:00")
uploadDat$Time = str_replace_all(uploadDat$Time,"11:00 p","23:00")
I changed the class of the times using as.character() before applying stringr's functions.
The result is perfect except for the 11'o clock times (like above) that are converted as follow:
Date | Time | AmbientTemp
2000-01-01 | 101:00 | 25
2000-01-01 | 101:30 | 25.5
2000-01-01 | 113:30 | 20
2000-01-01 | 113:30 | 19.5
Why are these specific times converted incorrectly?
We can paste "m" at the end of time, convert it into POSIXct
format(as.POSIXct(paste0(df$Time, "m"), format = "%I:%M %p"), "%T")
#[1] "11:00:00" "11:30:00" "23:00:00" "23:30:00"
Related
I have a sample df below (with date formatted into as.Date):
| date |
--------------
| 2020-03-03 |
| 2020-06-30 |
| 2020-01-23 |
| 2020-02-10 |
| 2020-11-29 |
I am trying to add a column according to cut-off date of 2020-05-01 and expects to get this table:
| date | cutoff |
------------------------
| 2020-03-03 | prior |
| 2020-06-30 | later |
| 2020-01-23 | prior |
| 2020-02-10 | prior |
| 2020-11-29 | later |
I used dplyr and called the mutate to create a column and initially applied case_when:
df %>%
mutate(cutoff = case_when(
date < 2020-05-01 ~ "prior",
"later"
))
The code above created cutoff column with only "later" values.
I also tried the ifelse:
df <- with(df, ifelse(date < 2020-05-01, "prior", "later"))
The code above replaced the values in the date column with NA value.
I gave a different code a try:
df %>%
mutate(cutoff = case_when(date < 2020-05-01 ~ "prior",
TRUE ~ "later"))
but the result was the same as the first code I tried.
I thought of converting the date into POSixct format, but each code above produced the same output as above.
First define date class with ymd then use ifelse:
library(lubridate)
library(dplyr)
df %>%
mutate(date = ymd(date),
cutoff = ifelse(date < ymd("2020-05-01"), "prior", "later"))
date cutoff
1 2020-03-03 prior
2 2020-06-30 later
3 2020-01-23 prior
4 2020-02-10 prior
5 2020-11-29 later
data:
df <- structure(list(date = c("2020-03-03", "2020-06-30", "2020-01-23",
"2020-02-10", "2020-11-29")), class = "data.frame", row.names = c(NA,
-5L))
I am trying to format my data for the spatially-explicit capture-recapture model (secr), for which we need an occasion column. I am taking a 90-day period of the overall data set per each year, with each row being a separate record of an animal on one of camera traps. Let's say that the first day is Feb 1st, 2019. It should get '1' in the 'occasion' column. The last day, May 1st, should get '90' in that column.
However, here's a catch: there wasn't a capture on every date of that time period, and some days there were multiple captures. So, the dates in the 'dt' column may go like this:
2019-02-01
2019-02-04
2019-02-05
2019-02-06
2019-02-07
2019-02-07
2019-02-07
I want to create an 'occasion' column so that my final table could have columns like this:
2019-02-01 | 1
2019-02-04 | 4
2019-02-05 | 5
2019-02-06 | 6
2019-02-07 | 7
2019-02-07 | 7
2019-02-07 | 7
I have gone two ways about this but neither was succesfull. Firstly, I tried this:
data_new = data_old %>%
arrange(dt) %>%
mutate(occasion = as.numeric(factor(dt))
Which gave me the the table that looked like this:
2019-02-01 | 1
2019-02-04 | 2
2019-02-05 | 3
2019-02-06 | 4
2019-02-07 | 5
2019-02-07 | 5
2019-02-07 | 5
So, the numbers for identical dates were identical, just how I wanted, but it didn't skip the number if the corresponding date was missing. I tried something more complicated:
First, I got the start dates for each of the 90-day periods per year.
mydttemp <- as.POSIXct("2014-02-01")
mydates = seq.POSIXt(from = mydttemp, length.out = 7, by = "1 year")
The final product for the list 'mydates' looked like this:
`
"2014-02-01 +11"
"2015-02-01 +10"
"2016-02-01 +10"
"2017-02-01 +10"
"2018-02-01 +10"
"2019-02-01 +10"
"2020-02-01 +10"
`
Second, I made the 90-day period for each year. I use a loop and the object fileNumber that goes through each year in the list (hence, the mydates[fileNumber] expression).
mydate = seq.POSIXt(from = mydates[fileNumber], length.out = 90, by = "1 day")
mydateseq = seq(as.character(mydate))
Finally, I feed these into the same part of my code, and it looks like this:
for (m in mydateseq) {
data_new = data_old %>%
arrange(dt) %>%
mutate(occasion = if_else(dt %in% mydate,
true = mydateseq[m],
false = NA_real_)
}
The idea was that if a date matched any of the dates in the 'dt' column was found in the created list, it would put the corresponding number for that date into a column. But that just gave me a column full of NAs. Any ideas?
Thank you in advance.
Subtract the first date from the dates and add 1:
# test data
s <- c("2019-02-01", "2019-02-04", "2019-02-05", "2019-02-06",
"2019-02-07", "2019-02-07", "2019-02-07")
d <- as.Date(s)
as.integer(d - d[1] + 1L)
## [1] 1 4 5 6 7 7 7
This question already has answers here:
keep only hour: minute:second from a "POSIXlt" "POSIXt" object
(1 answer)
how do you pick the hour, minute and second from the posixct formated datetime in R
(2 answers)
Date time conversion and extract only time
(6 answers)
Closed 3 years ago.
I have observation data which only give hour and minute mark, the data looks like this:
- date | time | value
- 2019-01-01 | 00:00 | 14
- 2019-01-01 | 00:30 | 23
- 2019-01-01 | 01:00 | 32
- 2019-01-01 | 01:30 | 41
- 2019-01-02 | 00:00 | 41
- 2019-01-02 | 00:30 | 32
- 2019-01-02 | 01:00 | 23
- 2019-01-02 | 01:30 | 14
- ....
I successfully convert the date into the data format using
data$date <- as.Date(data$date, "%Y/%m/%d")
but when i try to convert the time to its format, i encounter problem, I tried using this:
data$time <- strptime(data$time, "%H:%M")
this give me the result of time with current date: "2019-03-14 00:00:00". which is not what i'm looking for and the date is false
I also tried using:
trydata$jam <- timestamp(trydata$jam, "%H:%M")
this give me the result:
%H:%M00:00 ------##
What is the best way to do this? I also want to extract the data in certain duration of time (like from 10:00 to 13:00)
Thank You
I have a table:
Name| Start | Finish |
----|-----------|-----------|
A |2015-01-22 |2015-02-04 |
B |2015-01-02 |2015-01-10 |
A |2015-01-22 |2015-02-14 |
B |2015-01-02 |2015-02-10 |
I need to break periods by months. If a period starts in one month and ends in the next one then I need to split it into two periods. If a period starts and ends at the same month then it should be as it is. Let's assume period cannot contain more than one 1st day of the month. In other words, each line can be splitted for not more than two lines. Finish (end of the period) is always bigger than Start.
That's what I want to get:
Name| Start | Finish |
----|-----------|-----------|
A |2015-01-22 |2015-01-31 |
A |2015-02-01 |2015-02-04 |
A |2015-01-22 |2015-01-31 |
A |2015-02-01 |2015-02-14 |
B |2015-01-02 |2015-01-10 |
B |2015-01-02 |2015-01-31 |
B |2015-02-01 |2015-02-10 |
The order of output rows isn't a matter.
Here is a code for the table:
Name = c("A", "B", "A", "B")
Start = c(as.Date("2015-01-22"), as.Date("2015-01-02"), as.Date("2015-01-22"), as.Date("2015-01-02"))
Finish = c(as.Date("2015-02-04"), as.Date("2015-01-10"), as.Date("2015-02-14"), as.Date("2015-02-10"))
df = data.frame(Name, Start, Finish)
Any suggestion how it can be done?
The question has been changed. Originally the Name column uniquely identified the row but the changed version of the question no longer has that. The answer here has been modified accordingly so that now we identify rows by row number, i.e. 1:nrow(df), rather than df$Name in the second argument to by. Otherwise, code is unchanged.
Use by to split the data frame by row giving single rows and operating on each one with the anonymous function. It calculates the end-of-month (eom) for the Start and if the Finish is greater outputs a two-row data frame and otherwise returns the same data frame. Put it all together with rbind.
library(zoo)
do.call("rbind", by(df, 1:nrow(df), function(x) with(x, {
eom <- as.Date(as.yearmon(Start), frac = 1)
if (eom < Finish)
data.frame(Name, Start = c(Start, eom+1), Finish = c(eom, Finish))
else x
})))
giving:
Name Start Finish
1.1 A 2015-01-22 2015-01-31
1.2 A 2015-02-01 2015-02-04
2 B 2015-01-02 2015-01-10
3.1 A 2015-01-22 2015-01-31
3.2 A 2015-02-01 2015-02-14
4.1 B 2015-01-02 2015-01-31
4.2 B 2015-02-01 2015-02-10
Here's another approach in base R:
idx <- with(df, format(Finish, "%Y-%m") > format(Start, "%Y-%m"))
rbind(df[!idx,],
transform(df[idx,], Finish = as.Date(paste0(format(Finish, "%Y-%m"), "-01"))-1),
transform(df[idx,], Start = as.Date(paste0(format(Finish, "%Y-%m"), "-01"))))
# Name Start Finish
#2 B 2015-01-02 2015-01-10
#1 A 2015-01-22 2015-01-31
#3 A 2015-01-22 2015-01-31
#4 B 2015-01-02 2015-01-31
#11 A 2015-02-01 2015-02-04
#31 A 2015-02-01 2015-02-14
#41 B 2015-02-01 2015-02-10
Edit:
This answers the original question:
require(dplyr)
require(zoo)
df %>%
filter(Finish>as.Date(as.yearmon(Start),frac=1)) %>%
group_by(Name) %>%
do(rbind(.,c(.$Name,
paste(as.Date(as.yearmon(.$Start),frac=1)+1),
.$Finish))) %>%
mutate(Finish:=ifelse(as.Date(as.yearmon(Start),frac=1)<Finish,
paste(as.Date(as.yearmon(Start),frac=1)),Finish))
Output:
Name Start Finish
1 A 2015-01-22 2015-01-31
2 A 2015-02-01 2015-02-04
3 B 2015-03-02 2015-03-31
4 B 2015-04-01 2015-04-10
Sample data:
require(data.table)
df <- fread("Name Start Finish
A 2015-01-22 2015-02-01
B 2015-03-02 2015-04-01")
I would like to get a column that has the earliest date in each row from multiple date columns.
My dataset is like this.
df = data.frame( x_date = as.Date( c("2016-1-3", "2016-3-5", "2016-5-5")) , y_date = as.Date( c("2016-2-2", "2016-3-1", "2016-4-4")), z_date = as.Date(c("2016-3-2", "2016-1-1", "2016-7-1")) )
+---+-----------+------------+-----------+
| | x_date | y_date | z_date |
+---+-----------+------------+-----------+
|1 | 2016-01-03 | 2016-02-02 |2016-03-02 |
|2 | 2016-03-05 | 2016-03-01 |2016-01-01 |
|3 | 2016-05-05 | 2016-04-04 |2016-07-01 |
+---+-----------+------------+-----------+
I would like to get something like the following column.
+---+---------------+
| | earliest_date |
+---+---------------+
|1 | 2016-01-03 |
|2 | 2016-01-01 |
|3 | 2016-04-04 |
+---+---------------+
This is my code, but it outputs the earliest date from the overall columns and rows....
library(dplyr)
df %>% dplyr::mutate(earliest_date = min(x_date, y_date, z_date))
One option is pmin
df %>%
mutate(earliest_date = pmin(x_date, y_date, z_date))
# x_date y_date z_date earliest_date
#1 2016-01-03 2016-02-02 2016-03-02 2016-01-03
#2 2016-03-05 2016-03-01 2016-01-01 2016-01-01
#3 2016-05-05 2016-04-04 2016-07-01 2016-04-04
If we need only the single column, then transmute is the option
df %>%
transmute(earliest_date = pmin(x_date, y_date,z_date))
You need to transform your data set first if you want the output to be a data frame with columns in rows.
library(reshape2)
melt(df) %>% group_by(variable) %>% summarize(earliest_date = min(value))
You can apply rowwise to get minimum of the date (as the dates are already of class Date)
apply(df, 1, min)
#[1] "2016-01-03" "2016-01-01" "2016-04-04"
Or you can also use pmin with do.call
do.call(pmin, df)
#[1] "2016-01-03" "2016-01-01" "2016-04-04"