Read a ASCII file with multiple responses per respondent

Read a ASCII file with multiple responses per respondent - r

I am trying to read some data from the Roper Center into R to do some analysis with it. The older data sometimes comes in only ASCII format, it is just a data file of numbers, sometimes with no spaces or delimiters. Also every person has several rows. Here is an example
0001 01 06722121 101632 3113581R50106 050110M323
0001 0202089917300208991744 100154109020B73013.22 1O
0001 039049MON FEB 8 1999 05:30pm 1 8 0208991830 6:30PM 05071
0001 04 5 51
0001 052206 32 1 21 111
0001 06 1122223413323 1122160921080711122112 11
0001 0722221205111223241121212220612111111122 21 2222
0002 01 09318035 001582 2123551R00106 0501I333
0002 0202089917320208991746 50074616080B42014.20 1O
0002 039039MON FEB 8 1999 05:31pm 1 8 0208991831 6:31PM 05041
0002 04 2 61
0002 05 206 32 3 11 121
0002 06 1245545554555 1152080614031221121131 11
0002 0752321202112112322112434410722131242122 21 122222
I changed some numbers in there, hopefully I didn't mess it up but I think you need a subscription to the Roper Center to get this data.
I need to extract several elements for each respondent and put them into columns. Ill be doing this many times so code that only works for this case is not practical.
I have been using the package readr in R so far, but now that there are many rows per person its becoming more complicated and I wondered if anyone knew of a fast way to handle this with a R package or simple function.
A good example would be to get all of the weights in this sample. Those occur in columns 13-15 and are found in the first row for each person.

Cool solution: your files come with a dictionary of fixed widths, right? In that case, use readr::read_fwf
Ugly solution below. Will probably choke if you have a lot of data, and might (no, will) fail to separate some variables.
x designates your ASCII file.
library(dplyr)
library(readr)
x <- read_lines(x)
x <- data_frame(
uid = str_sub(x, 1, 4), # careful here, assuming UIDs are 4-length
txt = str_sub(x, 8) # careful here too
)
x <- lapply(unique(x$uid), function(y) {
paste0(x$txt[ x$uid == y], collapse = " ") %>%
strsplit("\\s+") %>%
unlist %>%
matrix(ncol = length(.)) %>%
as_data_frame
}) %>%
bind_rows %>%
write_csv("whatever.csv")
You can now reimport the data with neat variable names and set the correct column types:
x <- read_csv(x, col_names = c(
# column names
),
col_types = "cccciiii -- etc.")

Related

Refining my code for data frame extraction from excel

Looking for advice on refining my code and also trimming to a date range.
The spreadsheet itself is pulled from another system and so the structure of the excel cannot be changed. When you pull the data it basically starts at E2, with the first date column in F2, and the first item in E3. The data will continue to populate to the right for as long as it goes on for. I have replicated the structure below.
AndI want it to look like:
I have come up with the below, which works, but I was looking for advice on refining it down to fewer individual step by steps.
In the below code:
= extracting data
= pulling the dates out
= formatting from
excel number to an actual date
= grabbing the item names
= transposing data and skipping some parts
= adding in dates to the row names
#1
df <- data.frame(read_excel("C:/example.xlsx",
sheet = "Sheet1"))
#2
dfdate <- gtb[1, -c(1,2,3,4,5)]
#3
dfdate <- format(as.Date(as.numeric(dfdate),
origin = "1899-12-30"), "%d/%m/%Y")
#4
rownames(gtb) <- gtb[,1]
#5
gtb <- as.data.frame(t(gtb[, -c(1,2,3,4,5)]))
#6
rownames(gtb) <- dfdate
After the row names have been added the structure is such that I am happy to start creating the visuals where needed.
thanks for your advice
David

Here is one suggestion, I don't really have easy access to your data, but I am including code to remove those columns as you do, based on their names, which can be nicer than removing by index.
df <- read.table( text=
"Item_Code 01/01/2018 01/02/2018 01/03/2018 01/04/2018
Item 99 51 60 69
Item2 42 47 88 2
Item3 36 81 42 48
",header=TRUE, check.names=FALSE) %>%
rename( `Item Code` = Item_Code )
library(tibble)
library(lubridate)
x <- df %>% select( -matches("Code \\d|Internal Code") ) %>%
column_to_rownames("Item Code") %>%
t %>% as.data.frame %>%
rownames_to_column("Item Code") %>%
mutate( `Item Code` = dmy(`Item Code`) )
x
Output:
> x
Item Code Item Item2 Item3
1 2018-01-01 99 42 36
2 2018-02-01 51 47 81
3 2018-03-01 60 88 42
4 2018-04-01 69 2 48
I went a bit forth and back with this solution, but it can be nice to also showcase how to remove columns by a regex on their column names, since you are removing several similarly named columns.
The t trick, that you also use, works becuase there is really only one more column there that would cause problems with this, as others have commented, and this can be temporarily stowed away as rownames. If that weren't the case, you're looking at a more complex solution involving pivot_wider and pivot_longer or splitting the data.frame and transposing only one of the halves.

Separate character variable into two columns

I have scraped some data from a url to analyse cycling results. Unfortunately the name column exists of the name and the name of the team in one field. I would like to extract these from each other. Here's the code (last part doesn't work)
#get url
stradebianchi_2020 <- read_html("https://www.procyclingstats.com/race/strade-bianche/2020/result")
#scrape table
results_2020 <- stradebianchi_2020%>%
html_nodes("td")%>%
html_text()
#transpose scraped data into dataframe
results_stradebianchi_2020 <- as.data.frame(t(matrix(results_2020, 8, byrow = F)))
#rename
names(results_stradebianchi_2020) <- c("rank", "#", "name", "age", "team", "UCI point", "PCS points", "time")
#split rider from team
separate(data = results_stradebianchi_2020, col = name, into = c("left", "right"), sep = " ")
I think the best option is to get the team variable name and use that name to remove it from the 'name' column.
All suggestions are welcome!

I think your request is wrongly formulated. You want to remove team from name.
That's how you should do it in my opinion:
results_stradebianchi_2020 %>%
mutate(name = stringr::str_remove(name, team))
Write this instead of your line with separate.
In this case separate is not an optimal solution for you because the separation character is not clearly defined.
Also, I would advise you to remove the initial blanks from name with stringr::str_trim(name)

You could do this in base R with gsub and replace in the name column the pattern of team column with "", i.e. nothing. We use apply() with MARGIN=1 to go through the data frame row by row. Finally we use trimws to clean from whitespace (where we change to whitespace="[\\h\\v]" for better matching the spaces).
res <- transform(results_stradebianchi_2020,
name=trimws(apply(results_stradebianchi_2020, 1, function(x)
gsub(x["team"], "", x["name"])), whitespace="[\\h\\v]"))
head(res)
# rank X. name age team UCI.point PCS.points time
# 1 1 201 van Aert Wout 25 Team Jumbo-Visma 300 200 4:58:564:58:56
# 2 2 234 Formolo Davide 27 UAE-Team Emirates 250 150 0:300:30
# 3 3 87 Schachmann Maximilian 26 BORA - hansgrohe 215 120 0:320:32
# 4 4 111 Bettiol Alberto 26 EF Pro Cycling 175 100 1:311:31
# 5 5 44 Fuglsang Jakob 35 Astana Pro Team 120 90 2:552:55
# 6 6 7 Štybar Zdenek 34 Deceuninck - Quick Step 115 80 3:593:59

How to use tidyr in Rstudio to seperate a column with numbers and characters?

So I am using tidyr in Rstudio and I am trying to separate the data in the 'player' column (attached below) into 4 individual columns: 'number', 'name','position' and 'school'. I tried using the separate() function, but can't get the number to separate and can't use a str_sub because some numbers are double digits. Does anyone know how to separate this column to the appropriate 4 columns?

A method using a series of separate calls.
# Example data
df <- data.frame(
player = c('11Vita VeaDT | Washington',
'16Clelin FerrellEDGE | Clemson',
"17K'Lavon ChaissonEdge | LSU",
'15Cody FordOT | Oklahoma',
'20Derrius GuiceRB',
'1Joe BurrowQB | LSU'))
The steps are:
separate school using |
separate number using the distinction of numbers and letters
separate position using capital and lowercase, but starting at the end
cleanup, trim off white space, or extra spaces around the text
df %>%
separate(player, into = c('player', 'school'), '\\|') %>%
separate(player, into = c('number', 'player'), '(?<=[0-9])(?=[A-Za-z])') %>%
separate(player, into = c('last', 'position'), '(?<=[a-z])(?=[A-Z])') %>%
mutate_if(is.character, trimws)
# Results
number name position school
1 11 Vita Vea DT Washington
2 16 Clelin Ferrell EDGE Clemson
3 17 K'Lavon Chaisson Edge LSU
4 15 Cody Ford OT Oklahoma
5 20 Derrius Guice RB <NA>
6 1 Joe Burrow QB LSU

Using paste for Dynamic addition

I have a report that i need to do on a quarterly basis that involves adding various components of revenue together to formulate a trailing 12 month and trailing 24 month total.
rather than retyping a bunch of column names to add each column together on a rolling basis i was hoping to create a function where i could declare variables for the trailing months so i can sum them together easier.
my dataframe all_rel contains all the data i need to sum together. it contains the following fields (unfortunately i just inherited this report an it isn't exactly in tidy format)
Total_Processing_Revenue
Ancillary_Revenue
in the data frame i have T24 months of these data points within separate columns
the script that someone put together that i inherited uses the following to add the columns together:
all_rel$anci_rev_cy_ytd = all_rel$X201701Ancillary_Revenue+all_rel$X201702Ancillary_Revenue+all_rel$X201703Ancillary_Revenue+...+all_rel$X201712Ancillary_Revenue
i'm trying was hoping to do something with paste but can't seem to get it to work
dfname <- 'all_rel$X'
revmonth1 <- '01'
revmonth2 <- '02'
revmonth3 <- '03'
revmonth4 <- '04'
revmonth5 <- '05'
revmonth6 <- '06'
revmonth7 <- '07'
revmonth8 <- '08'
revmonth9 <- '09'
revmonth10 <- '10'
revmonth11 <- '11'
revmonth12 <- '12'
cy <- '2017'
py <- '2016'
rev1 <- 'Total_Processing_Revenue'
rev2 <- 'Ancillary_Revenue'
all_rel$anci_rev_py_ytd = paste(dfname,py,revmonth1,rev2, sep ='')+paste(dfname,py,revmonth2,rev2, sep ='')+...paste(dfname,py,revmonth12,rev2, sep ='')
when i try to sum these fields together i get a "non-numeric argument to binary operator" error. Is there something else i can do instead of what i've been trying to do?
paste(rpt,py,revmonth1,rev2, sep ='') returns "all_rel$X201601Ancillary_Revenue"
is there a way that I can tell R that the reason why I'm pasting these names is to reference the data within them rather than the text I'm pasting?
i'm fairly new to R (i've been learning on the fly to try to make my life easier.
ultimately i need to figure out how to convert this mess to a tidy data format where each of the revenue columns has a month and year but i was hoping to use this issue to understand how to use substitution logic to better automate processes. Maybe i just worded my searches incorrectly but i was struggling to find the exact issue i'm trying to solve.
Any help is greatly appreciated.
::edit::
added dput(head)
structure(list(Chain = c("000001", "000029", "000060", "000064","000076", "000079"), X201601Net_Revenue = c(-2.92, 25005.14,55787.59, 3996.69, 14229.41, 3455.85),X201601Total_Processing_Revenue = c(0,16140.48, 23238.89, 3574.17, 4093.51, 641.1), X201601Ancillary_Revenue = c(-2.92,8864.66, 32548.7, 422.52, 10135.9, 2814.75), X201602Net_Revenue = c(0,41918.84, 56696.34, 4789.57, 13113.2, 5211.27), X201602Total_Processing_Revenue = c(0,13253.19, 24733.04, 4395.69, 4102.79, 546.68), X201602Ancillary_Revenue = c(0,28665.65, 31963.3, 393.88, 9010.41, 4664.59), X201603Net_Revenue = c(0,23843.76, 62494.51, 5262.87, 20551.79, 7646.75), X201603Total_Processing_Revenue = c(0,15037.39, 27523.19,4792.63,4805.61,2134.72)),.Names=c("Chain","X201601Net_Revenue","X201601Total_Processing_Revenue","X201601Ancillary_Revenue","X201602Net_Revenue","X201602Total_Processing_Revenue","X201602Ancillary_Revenue","X201603Net_Revenue", "X201603Total_Processing_Revenue"), row.names = c(NA,6L), class = "data.frame")

Here's how to tidy your data (calling your data dd):
library(tidyr)
library(dplyr)
gather(dd, key = key, value = value, -Chain) %>%
mutate(year = substr(key, start = 2, 5),
month = substr(key, 6, 7),
metric = substr(key, 8, nchar(key))) %>%
select(-key) %>%
spread(key = metric, value = value)
# Chain year month Ancillary_Revenue Net_Revenue Total_Processing_Revenue
# 1 000001 2016 01 -2.92 -2.92 0.00
# 2 000001 2016 02 0.00 0.00 0.00
# 3 000001 2016 03 NA 0.00 0.00
# 4 000029 2016 01 8864.66 25005.14 16140.48
# 5 000029 2016 02 28665.65 41918.84 13253.19
# 6 000029 2016 03 NA 23843.76 15037.39
# 7 000060 2016 01 32548.70 55787.59 23238.89
# 8 000060 2016 02 31963.30 56696.34 24733.04
# 9 000060 2016 03 NA 62494.51 27523.19
# 10 000064 2016 01 422.52 3996.69 3574.17
# 11 000064 2016 02 393.88 4789.57 4395.69
# 12 000064 2016 03 NA 5262.87 4792.63
# 13 000076 2016 01 10135.90 14229.41 4093.51
# 14 000076 2016 02 9010.41 13113.20 4102.79
# 15 000076 2016 03 NA 20551.79 4805.61
# 16 000079 2016 01 2814.75 3455.85 641.10
# 17 000079 2016 02 4664.59 5211.27 546.68
# 18 000079 2016 03 NA 7646.75 2134.72
With that done, you can use whatever grouped operations you want - sums, rolling sums or averages, etc. You might be interested in the yearmon class provided in the zoo package, this question on rolling sums by group, and of course the R-FAQ on grouped sums.

R - how to loop through multiple regex rules instead of an OR statement?

I have a text filed with different types of date formats and its id`s. I need to extract all the strings using regex.
df <- data.frame(id=1:8,text=c("deficit based on wage statement 7/14/ to 7/17/2015.",
"Deficit Due: $1205.73 -$879.63= $326.10 x 70%=$228.2.",
"Deficit Due for 12 wks pd - 7/14/15 thru 10/5/15;Deficit due to wage,",
"statement: 4/22/15 thru 5/12/15,depos transcript 7/10/15 for 7/8/15 depos,",
"difference owed for 4/25/15-5/22/15 10-29-99 Feb. 25, 2009,",
"tpd 4:30:2015 - 5:22:2015--09/26/99, 7-14 1.3.99, 1.3.1999,",
"Medical TREATMENT DATES: 6/30/2015 - 30/06/2015 09/26/1999,",
"4/25/15-5/22/15,Medical 2010-01-29 **2010-01-30 February25,2009, February 25, 2009"))
So far, I have created regex using multiple OR statements.
#Different string patterns
#all day formats
day<-c(31:1,"01","02","03","04","05","06","07","08","09")
day_p<-paste(day,collapse = "|")
day_p <- paste0("(",day_p,")")
#all month formats
month<-c(12:1,"01","02","03","04","05","06","07","08","09")
month_p<-paste(month,collapse="|")
month_p <- paste0("(",month_p,")")
#all year 4 digit formats
year<-"\\d{4}"
year_p<-paste(year,collapse="|")
year_p <- paste0("(",year_p,")")
#all year 2 digit formats
year_i<-"\\d{2}"
year_i_p<-paste(year_i,collapse="|")
year_i_p <- paste0("(",year_i_p,")")
#all seperator symbol
symbol_p<-paste(c("\\.","\\|","\\/","\\-","\\:","\\,"),collapse="|")
symbol_p <- paste0("(",symbol_p,")")
patterns<-paste0("(",month_p,symbol_p,day_p,symbol_p,year_p,")","|",
"(",day_p,symbol_p,month_p,symbol_p,year_p,")","|",
"(",year_p,symbol_p,month_p,symbol_p,day_p,")","|",
"(",month_p,symbol_p,day_p,symbol_p,year_i_p,")","|",
"(",day_p,symbol_p,month_p,symbol_p,year_i_p,")","|",
"(",year_i_p,symbol_p,month_p,symbol_p,day_p,")","|",
"(",month_p,"\\-",day_p,")","|",
"(",day_p,"\\-",month_p,")","|",
"(",month_p,"\\/",day_p,")","|",
"(",day_p,"\\/",month_p,")")
#String extaction
extract= str_extract_all(df$text,patterns)
Is there an approach to put all the regex rules in a data frame, name each rule and do a string extraction?
#regex patterns in a data frame
df_patterns<-data.frame(pattern=c(paste0("(",month_p,symbol_p,day_p,symbol_p,year_p,")"),
paste0("(",day_p,symbol_p,month_p,symbol_p,year_p,")")),
rule=c(1,2))
The output data frame should include the extraction values and the rule which triggered its extraction.
#output data frame
output<-data.frame(id=c(1,1,2,3,3),string=c("7/14","7/17/2015",NA,"7/14/15","10/5/15"),rule=c(9,1,NA,2,3))

stringr has a function called str_match_all that can extract all matches as well as return the capture groups that matched in separate columns. This is convenient for this question since you can name the capture groups and associate them to each column of output from str_match_all:
#Different string patterns
#all day formats
day_p <- "[0-3]?[0-9]"
#all month formats
month_p <- "[0-1]?[0-9]"
#all year 4 digit formats
year_p <- "\\d{4}"
#all year 2 digit formats
year_i_p <- "\\d{2}"
#all seperator symbol
symbol_p <- "[-/:.]"
# Patterns to match structured as combination of capture groups
patterns<-paste0("(",month_p,symbol_p,day_p,symbol_p,year_p,")","|",
"(",day_p,symbol_p,month_p,symbol_p,year_p,")","|",
"(",year_p,symbol_p,month_p,symbol_p,day_p,")","|",
"(",month_p,symbol_p,day_p,symbol_p,year_i_p,")","|",
"(",day_p,symbol_p,month_p,symbol_p,year_i_p,")","|",
"(",year_i_p,symbol_p,month_p,symbol_p,day_p,")","|",
"(",month_p,"[-]",day_p,")","|",
"(",day_p,"[-]",month_p,")","|",
"(",month_p,"[/]",day_p,")","|",
"(",day_p,"[/]",month_p,")","|",
"(", "\\w+[.]?[\\s]?\\d+[,]\\s?",year_p,")")
# Name the capture groups
rule_names = c("MDYYYY", "DMYYYY",
"YYYYMD", "MDYY",
"DMYY", "YYMD",
"MD_dash", "DM_dash",
"MD_slash", "DM_slash",
"MDYYYY_word")
library(dplyr)
library(tidyr)
library(purrr)
df$text %>%
str_match_all(patterns) %>%
map2(df$id, function(x, y){
if(nrow(x) == 0){
x = rbind(x, NA)
}
data.frame(id = y, x)
}) %>%
do.call(rbind, .) %>%
mutate_at(vars(X2:X11), funs(ifelse(!is.na(.), 1, NA))) %>%
setNames(c("id", "string", rule_names)) %>%
gather(rule, value, -id, -string) %>%
na.omit() %>%
select(-value) %>%
arrange(id)
Notes:
This final part does all the work. str_match_all returns a list with each element a character matrix of matches and capture groups for each df$text value.
map2 binds the id's with the character matrices, so that each row refers to an id + match combination. The if statement checks if an element has no match and rbinds an NA value if it is the case. This allows id to have at least one row to bind to.
mutate_at converts each of the "capture_group" columns to dummy variables indicating whether "this capture group has a match"
Rename capture group columns with rule_names and transform all dummy into one single categorical variable.
Important note is that there is no way of knowing whether "5/6/2015" is MDYYYY or DMYYYY format, so in this case, you will have to order patterns to have one of them take precedence (e.g. if MDYYYY is before DMYYYY in patterns, MDYYYY will match first for "5/6/2015")
Result:
id string rule
1 1 7/17/2015 MDYYYY
2 1 7/14 MD_slash
3 3 7/14/15 MDYY
4 3 10/5/15 MDYY
5 4 4/22/15 MDYY
6 4 5/12/15 MDYY
7 4 7/10/15 MDYY
8 4 7/8/15 MDYY
9 5 4/25/15 MDYY
10 5 5/22/15 MDYY
11 5 10-29-99 MDYY
12 5 Feb. 25, 2009 MDYYYY_word
13 6 4:30:2015 MDYYYY
14 6 5:22:2015 MDYYYY
15 6 1.3.1999 MDYYYY
16 6 09/26/99 MDYY
17 6 1.3.99 MDYY
18 6 7-14 MD_dash
19 7 6/30/2015 MDYYYY
20 7 09/26/1999 MDYYYY
21 7 30/06/2015 DMYYYY
22 8 2010-01-29 YYYYMD
23 8 2010-01-30 YYYYMD
24 8 4/25/15 MDYY
25 8 5/22/15 MDYY
26 8 February25,2009 MDYYYY_word
27 8 February 25, 2009 MDYYYY_word

Answer
Brief
Correct me if I'm wrong, but I believe R does support PCRE regex. That being the case, you can use the following regex to catch any date in the formats you specified.
Code
See this regex in use here
(?(DEFINE)
(?# Definitions )
(?<day>[12]\d|3[01]|0?[1-9])
(?<month>1[0-2]|0?[1-9])
(?<year>\d+)
(?<separator>[.|\/:,-])
(?# Date formats )
(?<mdy>(?&month)(?<mdy_1>(?&separator))(?&day)(?&mdy_1)(?&year))
(?<dmy>(?&day)(?<dmy_1>(?&separator))(?&month)(?&dmy_1)(?&year))
(?<ymd>(?&year)(?<ymd_1>(?&separator))(?&month)(?&ymd_1)(?&day))
(?<md>(?&month)(?<md_1>(?&separator))(?&day)(?&md_1)?)
(?<dm>(?&day)(?<dm_1>(?&separator))(?&month)(?&dm_1)?)
(?# Date )
(?<date>(?&mdy)|(?&dmy)|(?&ymd)|(?&md)|(?&dm))
)
(?<=\b|\s)(?&date)(?=\b|\s)
Explanation
The define block specifies all our definitions for what constitutes a day, month, year, separator. It also defines our date formats (mdy, dmy, ymd, md, dm). Finally it defines our date group which is a simple OR between all our date formats.
The final regex simply specifies that the preceding or following tokens should be word boundary characters \b or whitespace character \s (whitespace added here in the case of the last character being a word boundary character, it will catch the final character as well - you can test this with the first match by removing the |\s in the final regex to see the result).
Please note that this assumes the days of a month can go to 31 (a more specific check would result in a very lengthy regex and seems pointless when you can validate it through code).
Results
Input
deficit based on wage statement 7/14/ to 7/17/2015.
Deficit Due: $1205.73 -$879.63= $326.10 x 70%=$228.2.
Deficit Due for 12 wks pd - 7/14/15 thru 10/5/15;Deficit due to wage,
statement: 4/22/15 thru 5/12/15,depos transcript 7/10/15 for 7/8/15 depos,
difference owed for 4/25/15-5/22/15 10-29-99 Feb. 25, 2009,
tpd 4:30:2015 - 5:22:2015--09/26/99, 7-14 1.3.99, 1.3.1999,
Medical TREATMENT DATES: 6/30/2015 - 30/06/2015 09/26/1999,
4/25/15-5/22/15,Medical 2010-01-29 **2010-01-30 February25,2009, February 25, 2009
Output
7/14/
7/17/2015
7/14/15
10/5/15
4/22/15
5/12/15
7/10/15
7/8/15
4/25/15
5/22/15
10-29-99
4:30:2015
5:22:2015
09/26/99
7-14
1.3.99
1.3.1999
6/30/2015
30/06/2015
09/26/1999
4/25/15
5/22/15
2010-01-29
2010-01-30
Edits
Code
See this code in use here
(?(DEFINE)
(?# Definitions )
(?<day>[12]\d|3[01]|0?[1-9])
(?<month>1[0-2]|0?[1-9])
(?<year>\d+)
(?<separator>[.|\/:,-])
(?# Date formats )
(?<f_mdy>(?&month)(?<mdy_1>(?&separator))(?&day)(?&mdy_1)(?&year))
(?<f_dmy>(?&day)(?<dmy_1>(?&separator))(?&month)(?&dmy_1)(?&year))
(?<f_ymd>(?&year)(?<ymd_1>(?&separator))(?&month)(?&ymd_1)(?&day))
(?<f_md>(?&month)(?<md_1>(?&separator))(?&day)(?&md_1)?)
(?<f_dm>(?&day)(?<dm_1>(?&separator))(?&month)(?&dm_1)?)
(?<f_Mdy>(?:jan(?:uary|\.)?|feb(?:ruary|\.)?|mar(?:ch|\.)?|apr(?:il|\.)?|may|jun(?:e|\.)?|jul(?:y|\.)?|aug(?:ust|\.)?|sep(?:tember|\.)?|oct(?:ober|\.)?|nov(?:ember|\.)?|dec(?:ember|\.)?)\s*(?&day)(?:\s*(?&separator)|(?&separator)\s*|\s+)(?&year))
)
(?<=\b|\s)(?:(?<mdy>(?&f_mdy))|(?<dmy>(?&f_dmy))|(?<ymd>(?&f_ymd))|(?<md>(?&f_md))|(?<dm>(?&f_dm))|(?<Mdy>(?&f_Mdy)))(?=\b|\s)
This will set captures into named capture groups. If you look at the output in the link, you'll see named groups with the content it matched.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Read a ASCII file with multiple responses per respondent - r

Related

Refining my code for data frame extraction from excel

Separate character variable into two columns

How to use tidyr in Rstudio to seperate a column with numbers and characters?

Using paste for Dynamic addition

R - how to loop through multiple regex rules instead of an OR statement?

Categories

Resources