Making data the same structure of timestamp - r

My time data are in this format:
datatimedf = data.frame(day_time = c('Apr 2005', '1992', "2004", "Jan 2001", "2015"))
I would like to add Jan in rows which only have year.
How is it possible to make it?
An example of expected output is this:
datatimedf = data.frame(day_time = c('Apr 2005', 'Jan 1992', "Jan 2004", "Jan 2001", "Jan 2015"))
What I have for only one row is this:
x[2,1] <- sub("^", "Jan ", x[2,1])
but how can I make it to the whole dataframe?

Here is a quick way to do it using dplyr:
library(dplyr)
datatimedf$day_time <- as.character(datatimedf$day_time)
datatimedf <- datatimedf %>%
transform(day_time = ifelse(nchar(day_time) == 4, paste("Jan", day_time), day_time))
#> day_time
#> 1 Apr 2005
#> 2 Jan 1992
#> 3 Jan 2004
#> 4 Jan 2001
#> 5 Jan 2015
For each line it checks if the length of the string is 4 and if so adds "Jan" to the beginning, otherwise it keeps the original. This isn't very applicable to other situations but it should get you started if you wanted to make it more generic and able to handle more types of input.

Related

How can i overwrite following values for certain amount of months?

I am trying to build a function that keeps values constant for a certain amount of months (rows) in a time series. I already have a function which keeps values constant as long as the following rows are NAs. I would like to change the function (or make a new one) in a way so that it keeps the following rows constant for a certain amount of months.
This is my function:
na_locf_until = function(x, n) {
# in time series data, fill in na's until indicated n
l <- cumsum(! is.na(x))
c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > (n+1), 0) + 1]
}
Example:
htm <- data.frame (Date = c("Jan 2001", "Feb 2001", "Mar 2001", "Apr 2001", "May 2001", "Jun 2001", "Jul 2001", "Aug 2001", "Aug 2001"),
prc = c(34,35,38,24,22,18,30,32,38),
buy = c(1, 1, 1, 0, 0, 1, 0, 0, 0),
htm_prc = c(34,34,38,38,22,18,18,32,38))
The binary column indicates in Jan 2001 buy. The function should keep - in a next column (or the same) - the value 24 constant for e.g. this month and the next month if the binary variable was 1. I struggle, as i do not want the htm_prc value in Feb 2001 to be 35. Column htm_prc shows my desired outcome.
Maybe my function works as an inspiration.
Thanks in advance!!
Perhaps this helps
library(dplyr)
library(data.table)
htm %>%
mutate(grp = rleid(buy|lag(buy))) %>%
group_by(grp) %>%
mutate(grp2 =as.integer(gl(n(), 2, n()))) %>%
group_by(grp2, .add = TRUE) %>%
mutate(htm_prc2 = if(1 %in% buy) first(prc) else prc) %>%
ungroup %>%
select(-grp, -grp2)
-output
# A tibble: 9 × 5
Date prc buy htm_prc htm_prc2
<chr> <dbl> <dbl> <dbl> <dbl>
1 Jan 2001 34 1 34 34
2 Feb 2001 35 1 34 34
3 Mar 2001 38 1 38 38
4 Apr 2001 24 0 38 38
5 May 2001 22 0 22 22
6 Jun 2001 18 1 18 18
7 Jul 2001 30 0 18 18
8 Aug 2001 32 0 32 32
9 Aug 2001 38 0 38 38

How can I get the middle date of three dates in R?

I have a datatable with three date columns x, y and z and I am trying to create a new column (new_col) that is the middle date of the three dates in each row once ranked from earliest to latest, i.e., I want the date between the min and max date – please see table below:
x
y
z
new_col
1st Jan 2005
4th May 1998
2nd Mar 2009
1st Jan 2005
9th May 2010
14th Feb 2003
9th Jan 2008
9th Jan 2008
7th Sept 2002
8th Dec 2010
23rd May 2012
8th Dec 2010
So, for rows 1, 2, and 3 I would like the dates from column x, z, and y, respectively. How can I go about this in R? I have used pmin and pmax but I can't isolate the date in the middle
Thanks in advance!
The approach below
coerces the character date strings to numeric type Date as there is no arithmetic with character dates,
finds the position of the "middle" date in each row
and returns the corresponding character string
which eventually becomes new_col.
This can be implemented using apply() on each row using an appropriate function:
df$new_col <- apply(df, 1L, function(x) x[order(lubridate::dmy(x))][2L])
df
x y z new_col
1 1st Jan 2005 4th May 1998 2nd Mar 2009 1st Jan 2005
2 9th May 2010 14th Feb 2003 9th Jan 2008 9th Jan 2008
3 7th Sept 2002 8th Dec 2010 23rd May 2012 8th Dec 2010
Note
This returns the expected result. new_col is a character date string.
However, if the OP intends to continue working with type Date, e.g. doing more arithmetic, I recommend to follow Ben's example and to coerce the whole data.frame to type Date and to stick to it.
First make sure all your dates are "Date" type, you can use dmy from lubridate for this (assumes your data frame is called df):
library(lubridate)
df[] <- lapply(df, dmy)
Next, sort each row in chronological order, and take the middle column (column 2) to be the new_col:
df$new_col <- as.Date(t(apply(df, 1, sort))[,2])
Finally, if you want the result to be displayed in same text format (e.g., "1st Jan 2005" instead of "2005-01-01") then you can use a custom function based on this answer:
library(dplyr)
date_to_text <- function(dates){
dayy <- day(dates)
suff <- case_when(dayy %in% c(11,12,13) ~ "th",
dayy %% 10 == 1 ~ 'st',
dayy %% 10 == 2 ~ 'nd',
dayy %% 10 == 3 ~'rd',
TRUE ~ "th")
paste0(dayy, suff, " ", format(dates, "%b %Y"))
}
df[] <- lapply(df, date_to_text)
Output
x y z new_col
1 1st Jan 2005 4th May 1998 2nd Mar 2009 1st Jan 2005
2 9th May 2010 14th Feb 2003 9th Jan 2008 9th Jan 2008
3 7th Sep 2002 8th Dec 2010 23rd May 2012 8th Dec 2010
Data
df <- structure(list(x = c("1st Jan 2005", "9th May 2010", "7th Sept 2002"
), y = c("4th May 1998", "14th Feb 2003", "8th Dec 2010"), z = c("2nd Mar 2009",
"9th Jan 2008", "23rd May 2012")), class = "data.frame", row.names = c(NA,
-3L))

How to split an existing column into three and then append it to the data set?

Good morning, everyone! I am a beginner with R, and I was given an assignment. It looks as follows:
"Split the “eventdate” column in the ACLED dataset into separate day, month, and year columns that are then appended to the ACLED dataset."
We are working with strsplit() and paste(), but I suspect this is not enough.
The eventdate column in ACLED looks like this:
"01 August 2020"
I was trying to do it like this:
strsplit(brazil_acled$event_date, " ")
and then use the paste() function to append it. But I still do not understand how to create three columns out of splitting text in the existing data set.
I am really new with R and I am with students that are advanced. I am having a difficult time, and I appreciate any help.
Thank you!
Note that I need to do this without using loops.
Splitting the dates gives you a list that you want to rbind first before appending to the data frame.
S <- strsplit(as.character(ACLED$event_date), " ")
S
# [[1]]
# [1] "18" "September" "2020"
#
# [[2]]
# [1] "19" "September" "2020"
#
# [[3]]
# [1] "20" "September" "2020"
#
# [[4]]
# [1] "21" "September" "2020"
#
# [[5]]
# [1] "22" "September" "2020"
#
# [[6]]
# [1] "23" "September" "2020"
For rbinding multiple elements together we need do.call(rbind, ...) and perhaps want to assign appropriate column names.
R <- do.call(rbind, S)
colnames(R) <- c("day", "month", "year")
R
# day month year
# [1,] "18" "September" "2020"
# [2,] "19" "September" "2020"
# [3,] "20" "September" "2020"
# [4,] "21" "September" "2020"
# [5,] "22" "September" "2020"
# [6,] "23" "September" "2020"
Finally just cbind the result to the original data frame.
ACLED <- cbind(ACLED, R)
ACLED
# event_date sth_else day month year
# 1 18 September 2020 -0.78445901 18 September 2020
# 2 19 September 2020 -0.85090759 19 September 2020
# 3 20 September 2020 -2.41420765 20 September 2020
# 4 21 September 2020 0.03612261 21 September 2020
# 5 22 September 2020 0.20599860 22 September 2020
# 6 23 September 2020 -0.36105730 23 September 2020
You may also do this in one single step.
cbind(ACLED, `colnames<-`(do.call(rbind, strsplit(ACLED$event_date, " ")),
c("day", "month", "year")))
Note:
Maybe you need the splitted date as "integer". In this case you may modify R in the following way before rbinding to ACLED.
R[,2] <- match(R[,2], month.name) ## using constant built into R
mode(R) <- "integer"
R
# day month year
# [1,] 18 9 2020
# [2,] 19 9 2020
# [3,] 20 9 2020
# [4,] 21 9 2020
# [5,] 22 9 2020
# [6,] 23 9 2020
Example data:
ACLED <- structure(list(event_date = c("18 September 2020", "19 September 2020",
"20 September 2020", "21 September 2020", "22 September 2020",
"23 September 2020"), sth_else = c(1.37095844714667, -0.564698171396089,
0.363128411337339, 0.63286260496104, 0.404268323140999, -0.106124516091484
)), class = "data.frame", row.names = c(NA, -6L))
Sorry paste is of no use here since it sort of the opposite of strsplit. Another possibility that you may find clearer using strsplit.
You had the basics correct and you recognized what you needed which was some tool to take the 3 results from strsplit and put them in the right columns.
May I suggest sapply. Although not exactly simple you can quickly divine that "[", 1 is r's way of saying grab the nth piece
# fake data
brazil_acled <- structure(list(event_date = c("18 September 2020", "19 September 2020",
"20 September 2020", "21 September 2020", "22 September 2020",
"23 September 2020")), class = "data.frame", row.names = c(NA, -6L))
brazil_acled
#> event_date
#> 1 18 September 2020
#> 2 19 September 2020
#> 3 20 September 2020
#> 4 21 September 2020
#> 5 22 September 2020
#> 6 23 September 2020
brazil_acled$day <- sapply(strsplit(brazil_acled$event_date, " "), "[", 1)
brazil_acled$month <- sapply(strsplit(brazil_acled$event_date, " "), "[", 2)
brazil_acled$year <- sapply(strsplit(brazil_acled$event_date, " "), "[", 3)
brazil_acled
#> event_date day month year
#> 1 18 September 2020 18 September 2020
#> 2 19 September 2020 19 September 2020
#> 3 20 September 2020 20 September 2020
#> 4 21 September 2020 21 September 2020
#> 5 22 September 2020 22 September 2020
#> 6 23 September 2020 23 September 2020
paste would be useful to paste them back together in a different order...
brazil_acled$YMD <- paste(brazil_acled$year, brazil_acled$month, brazil_acled$day, sep = "-")

Removing String from the column in R

After executing the R code, the values I got in the column of dataframe are:
25 July 2012 bet
22 June 2015 bet
09 April 2015 be
14 November 2016
I want only the dates, How can I remove "bet", "be" from the values?
I am using the below code to extract the above values from the text document:
coalesce((substr((stringr::str_match(text, "ISDA Master Agreement dated as of (.) ")[, 2]),1,16)),(substr((stringr::str_match(text, "ISDA Master Agreement dated as of (.) ")[, 2]),1,13)))
If I swipe the coalesce arguements, then the 4th value gets truncated.
I am ok with the code, but while cleaning, how should I remove the "bet","be"?
I am far away from being a regex expert, but here goes a tidyverse way of doing what you want:
library(tidyverse, verbose = F)
df <- tibble::tribble(
~V1, ~V2,
1L, "25 July 2012 bet",
2L, "22 June 2015 bet",
3L, "09 April 2015 be",
4L, "14 November 2016"
)
df %>%
mutate(V2 = str_replace(V2, pattern = "[:space:]be.*", replacement = ""))
#> # A tibble: 4 x 2
#> V1 V2
#> <int> <chr>
#> 1 1 25 July 2012
#> 2 2 22 June 2015
#> 3 3 09 April 2015
#> 4 4 14 November 2016
Created on 2020-02-21 by the reprex package (v0.3.0)
We can use sub to remove whitespace and everything with "be"
sub("\\s+be.*", "", c("25 July 2012 bet", "09 April 2015 be"))
#[1] "25 July 2012" "09 April 2015"
If you use lubridate you can strip away the excess text after the date:
library(lubridate)
test_strings <- c("25 July 2012 bet", "09 April 2015 be")
dmy(test_strings)
[1] "2012-07-25" "2015-04-09"

change of unit of analysis for panel data in R

I looking at foreign powers intervening into civil wars using R studio. My first dataset unit of analysis is conflict year while the second one is conflict month. I would need to have both of them in conflict years so I can merge them.
Is there any command that allows you to do the opposite of expanding rows?
It's hard to give you specifics without a sample of your data so we know what the structure is. I'm assuming your month-level dataset stores the month as a character string that includes a year. You should be able to extract the year with separate from the tidyr package:
library(tidyverse)
month <- c("June 2015", "July 2015", "September 2016", "August 2016", "March 2014")
conflict <- c("A", "B", "C", "D", "E")
my.data <- data.frame(month, conflict)
my.data
month conflict
1 June 2015 A
2 July 2015 B
3 September 2016 C
4 August 2016 D
5 March 2014 E
my.data <- my.data %>%
separate(month, c("month", "year"), sep = " ")
> my.data
month year conflict
1 June 2015 A
2 July 2015 B
3 September 2016 C
4 August 2016 D
5 March 2014 E

Resources