R Regex for Postive Look-Around to Match Following

R Regex for Postive Look-Around to Match Following - r

I have a dataframe in R. I want to match with and keep the row if
"woman" is the first or
the second word in a sentence, or
if it is the third word in a sentence and preceded by the words "no," "not," or "never."
phrases_with_woman <- structure(list(phrase = c("woman get degree", "woman obtain justice",
"session woman vote for member", "woman have to end", "woman have no existence",
"woman lose right", "woman be much", "woman mix at dance", "woman vote as member",
"woman have power", "woman act only", "she be woman", "no committee woman passed vote")), row.names = c(NA,
-13L), class = "data.frame")
In the above example, I want to be able to match with all rows except for "she be woman."
This is my code so far. I have a positive look-around ((?<=woman\\s)\\w+") that seems to be on the right track, but it matches with too many preceding words. I tried using {1} to match with just one preceding word, but this syntax didn't work.
matches <- phrases_with_woman %>%
filter(str_detect(phrase, "^woman|(?<=woman\\s)\\w+"))
Help is appreciated.

Each of the conditions can be an alternative although the last one requires two alternatives assuming that no/not/never can be either the first or second word.
library(dplyr)
pat <- "^(woman|\\w+ woman|\\w+ (no|not|never) woman|(no|not|never) \\w+ woman)\\b"
phrases_with_woman %>%
filter(grepl(pat, phrase))

I haven't come up with a regex solution but here is a workaround.
library(dplyr)
library(stringr)
phrases_with_woman %>%
filter(str_detect(word(phrase, 1, 2), "\\bwoman\\b") |
(word(phrase, 3) == "woman" & str_detect(word(phrase, 1, 2), "\\b(no|not|never)\\b")))
# phrase
# 1 woman get degree
# 2 woman obtain justice
# 3 session woman vote for member
# 4 woman have to end
# 5 woman have no existence
# 6 woman lose right
# 7 woman be much
# 8 woman mix at dance
# 9 woman vote as member
# 10 woman have power
# 11 woman act only
# 12 no committee woman passed vote

Related

R - If column contains a string from vector, append flag into another column

My Data
I have a vector of words, like the below. This is an oversimplification, my real vector is over 600 words:
myvec <- c("cat", "dog, "bird")
I have a dataframe with the below structure:
structure(list(id = c(1, 2, 3), onetext= c("cat furry pink british",
"dog cat fight", "bird cat issues"), cop= c("Little Grey Cat is the nickname given to a kitten of the British Shorthair breed that rose to viral fame on Tumblr through a variety of musical tributes and photoshopped parodies in late September 2014",
"Dogs have soft fur and tails so do cats Do cats like to chase their tails",
"A cat and bird can coexist in a home but you will have to take certain measures to ensure that a cat cannot physically get to the bird at any point"
), text3 = c("On October 4th the first single topic blog devoted to the little grey cat was launched On October 20th Tumblr blogger Torridgristle shared a cutout exploitable image of the cat, which accumulated over 21000 notes in just over three months.",
"there are many fights going on and this is just an example text",
"Some cats will not care about a pet bird at all while others will make it its life mission to get at a bird You will need to assess the personalities of your pets and always remain on guard if you allow your bird and cat to interact"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L))
It looks like the below picture
My issue
For each keyword on my vector myvec, I need to go around the dataset and check the columns onetext, cop, text3, and if I find the keyword on either of those 3 columns, then I need to append the keyword into a new column. The result would be as the image as follows:
My original dataset is quite large (the last column is the longest), so doing multiple nested loops (which is what I tried) is not ideal.
EDIT: Note that as long as the word appears once in that row, that's enough and should be listed. All keywords should be listed.
How could I do this? I'm using tidyverse, so my dataset is actually a tibble.
Similar Posts (but not quite)
The following posts are somewhat similar, but not quite:
If Column Contains String then enter value for that row
R Column Check if Contains Value from Another Column
Add new column if range of columns contains string in R

Update:
If a list is preferred: Using str_extract_all:
df %>%
transmute(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}"))
gives:
new_colonetext new_colcop new_coltext3
<list> <list> <list>
1 <chr [1]> <NULL> <chr [2]>
2 <chr [2]> <chr [2]> <NULL>
3 <chr [2]> <chr [4]> <chr [5]>
Here is how you could achieve the result:
create a pattern of the vector
use mutate across to check the needed columns
if the desired string is detected then extract to a new column !
myvec <- c("cat", "dog", "bird")
pattern <- paste(myvec, collapse="|")
library(dplyr)
library(tidyr)
df %>%
mutate(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) %>%
unite(topic, starts_with('new'), na.rm = TRUE, sep = ',')
id onetext cop text3 topic
<dbl> <chr> <chr> <chr> <chr>
1 1 cat furry pink british Little Grey Cat is the nickname given to a kitten of the British Shorthai~ On October 4th the first single topic blog devoted to the little grey cat was lau~ "cat,NULL,c(\"cat\", \"cat\")"
2 2 dog cat fight Dogs have soft fur and tails so do cats Do cats like to chase their tails there are many fights going on and this is just an example text "c(\"dog\", \"cat\"),c(\"cat\", \"cat\"),~
3 3 bird cat issues A cat and bird can coexist in a home but you will have to take certain me~ Some cats will not care about a pet bird at all while others will make it its lif~ "c(\"bird\", \"cat\"),c(\"cat\", \"bird\"~

keeping the best string matched by fuzzy matching in R

I have two dataframes in R. one a dataframe of the phrases I want to match along with their synonyms in another column (df.word), and the other a data frame of the strings I want to match along with codes (df.string). The strings are complicated but to make it easy say we have:
df.word <- data.frame(label = c('warm wet', 'warm dry', 'cold wet'),
synonym = c('hot and drizzling\nsunny and raining','sunny and clear sky\ndry sunny day', 'cold winds and raining\nsnowing'))
df.string <- data.frame(day = c(1,2,3,4),
weather = c('there would be some drizzling at dawn but we will have a hot day', 'today there are cold winds and a bit of raining or snowing at night', 'a sunny and clear sky is what we have today', 'a warm dry day'))
I want to create df.string$extract in which I want to have the best match available for the string.
a column like this
df$extract <- c('warm wet', 'cold wet', 'warm dry', 'warm dry')
thanks in advance for anyone helping.

There are a few points that I did not quite understand in your question; however, I am proposing a solution for your question. Check whether it will work for you.
I assume that you want to find the best-matching labels for the weather texts. If so, you can use stringsim function from library(stringdist) in the following way.
First Note: If you clean the \n in your data, the result will be more accurate. So, I clean them for this example, but if you want you can keep them.
Second Note: You can change the similarity distance based on the different methods. Here I used cosine similarity, which is a relatively good starting point. If you want to see the alternative methods, please see the reference of the function:
?stringsim
The clean data is as follow:
df.word <- data.frame(
label = c("warm wet", "warm dry", "cold wet"),
synonym = c(
"hot and drizzling sunny and raining",
"sunny and clear sky dry sunny day",
"cold winds and raining snowing"
)
)
df.string <- data.frame(
day = c(1, 2, 3, 4),
weather = c(
"there would be some drizzling at dawn but we will have a hot day",
"today there are cold winds and a bit of raining or snowing at night",
"a sunny and clear sky is what we have today",
"a warm dry day"
)
)
Install the library and load it
install.packages('stringdist')
library(stringdist)
Create a n x m matrix that contains the similarity scores for each whether text with each synonym. The rows show each whether text and the columns represent each synonym group.
match.scores <- sapply( ## Create a nested loop with sapply
seq_along(df.word$synonym), ## Loop for each synonym as 'i'
function(i) {
sapply(
seq_along(df.string$weather), ## Loop for each weather as 'j'
function(j) {
stringsim(df.word$synonym[i], df.string$weather[j], ## Check similarity
method = "cosine", ## Method cosine
q = 2 ## Size of the q -gram: 2
)
}
)
}
)
r$> match.scores
[,1] [,2] [,3]
[1,] 0.3657341 0.1919924 0.24629819
[2,] 0.6067799 0.2548236 0.73552828
[3,] 0.3333974 0.6300619 0.21791793
[4,] 0.1460593 0.4485426 0.03688556
Get the best matches across the rows for each whether text, find the labels with the highest matching scores, and add these labels to the data frame.
ranked.match <- apply(match.scores, 1, which.max)
df.string$extract <- df.word$label[ranked.match]
df.string
r$> df.string
day weather extract
1 1 there would be some drizzling at dawn but we will have a hot day warm wet
2 2 today there are cold winds and a bit of raining or snowing at night cold wet
3 3 a sunny and clear sky is what we have today warm dry
4 4 a warm dry day warm dry

R extract multiple variables from column

I'm new to R so my apologies if this is unclear.
My data contains 1,000 observations of 3 variable columns: (a) person, (b) vignette, (c) response. The vignette column contains demographic information presented in a paragraph, including age (20, 80), sex (male, female), employment (employed, not employed, retired), etc. Each person received a vignette that randomly presented one of the values for age (20 or 80), sex (male or female), employment (employed, not employed, retired), etc.
(e.x. Person #1 received: A(n) 20 year old male is unemployed. Person #2 received: A(n) 80 year old female is retired. Person #3 received: A(n) 20 year old male is unemployed... Person # 1,000 received: A(n) 20 year old female is employed.)
I'm trying to use tidyr:extract on (b) vignette to extract the rest of the demographic information and create several new variable columns labeled "age", "sex" "employment" etc. So far, I've only been able to extract "age" using this code:
tidyr::extract(data, vignette, c("age"), "([20:80]+)")
I want to extract all of the demographic information and create variable columns for (b) age, (c) sex, (d) employment, etc. My goal is to have 1,000 observation rows with several variable columns like this:
(a) person, (b) age, (c) sex, (d) employment (e) response
Person #1 20 Male unemployed Very Likely
Person #2 80 Female retired Somewhat Likely
Person #3 20 Male unemployed Very Unlikely
...
Person #1,000 20 Female employed Neither Likely nor Unlikely
Vignette Example:
structure(list(Response_ID = "R_86Tm81WUuyFBZhH", Vignette = "A(n) 18 year-old Hispanic woman uses heroin several times a week. This person is receiving welfare, is employed and has no previous criminal conviction for drug possession. - Based on this description, how likely or unlikely is it that this person has a drug addiction?", Response = "Very Likely"), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
I appreciate any guidance or help!

I made up some regex's to pull out your info. Experience shows that you're going to spend many hours tweaking the regex before you get anything reasonably satisfactory. E.g. you won't pull the employment status correctly out of a sentence like "Neither she nor her boyfriend are employed"
raw <- structure(list(Response_ID = "R_86Tm81WUuyFBZhH",
Vignette = "A(n) 18 year-old Hispanic woman uses heroin several times a week. This person is receiving welfare, is employed and has no previous criminal conviction for drug possession. - Based on this description, how likely or unlikely is it that this person has a drug addiction?",
Response = "Very Likely"), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
raw2 <- raw %>%
add_row(Response_ID = "R_xesrew",
Vignette = "A 22 year-old White boy drinks bleach. He is unemployed",
Response = "Unlikely")
rzlt <- raw2 %>%
tidyr::extract(Vignette, "Age", "(?ix) (\\d+) \\s* year\\-old", remove = FALSE) %>%
tidyr::extract(Vignette, "Race", "(?ix) (hispanic|white|asian|black|native \\s* american)", remove = FALSE) %>%
tidyr::extract(Vignette, "Job", "(?ix) (not \\s+ employed|unemployed|employed|jobless)", remove = FALSE) %>%
tidyr::extract(Vignette, "Sex", "(?ix) (female|male|woman|man|boy|girl)", remove = FALSE) %>%
select(- Vignette)
Gives
# A tibble: 2 x 6
Response_ID Sex Job Race Age Response
<chr> <chr> <chr> <chr> <chr> <chr>
1 R_86Tm81WUuyFBZhH woman employed Hispanic 18 Very Likely
2 R_xesrew boy unemployed White 22 Unlikely
Save your work
library(readr)
write_csv(rzlt, "myResponses.csv")
Alternatively
library(openxlsx)
openxlsx::write.xlsx(rzlt, "myResponses.xlsx", asTable = TRUE)

better and easy way to find who spoke top 10 anger words from conversation text

I have a dataframe that contains variable 'AgentID', 'Type', 'Date', and 'Text' and a subset is as follows:
structure(list(AgentID = c("AA0101", "AA0101", "AA0101", "AA0101",
"AA0101"), Type = c("PS", "PS", "PS", "PS", "PS"), Date = c("4/1/2019", "4/1/2019", "4/1/2019", "4/1/2019", "4/1/2019"), Text = c("I am on social security XXXX and I understand it can not be garnished by Paypal credit because it's federally protected.I owe paypal {$3600.00} I would like them to cancel this please.",
"My XXXX account is being reported late 6 times for XXXX per each loan I was under the impression that I was paying one loan but it's split into three so one payment = 3 or one missed payment would be three missed on my credit,. \n\nMy account is being reported wrong by all credit bureaus because I was in forbearance at the time that these late payments have been reported Section 623 ( a ) ( 2 ) States : If at any time a person who regularly and in the ordinary course of business furnishes information to one or more CRAs determines that the information provided is not complete or accurate, the furnisher must promptly provide complete and accurate information to the CRA. In addition, the furnisher must notify all CRAs that received the information of any corrections, and must thereafter report only the complete and accurate information. \n\nIn this case, I was in forbearance during that tie and document attached proves this. By law, credit need to be reported as of this time with all information and documentation",
"A few weeks ago I started to care for my credit and trying to build it up since I have never used my credit in the past, while checking my I discover some derogatory remarks in my XXXX credit report stating the amount owed of {$1900.00} to XXXX from XX/XX/2015 and another one owed to XXXX for {$1700.00} I would like to address this immediately and either pay off this debt or get this negative remark remove from my report.",
"I disputed this XXXX account with all three credit bureaus, the reported that it was closed in XXXX, now its reflecting closed XXXX once I paid the {$120.00} which I dont believe I owed this amount since it was an fee for a company trying to take money out of my account without my permission, I was charged the fee and my account was closed. I have notified all 3 bureaus to have this removed but they keep saying its correct. One bureau is showing XXXX closed and the other on shows XXXX according to XXXX XXXX, XXXX shows a XXXX, this account has been on my report for seven years",
"On XX/XX/XXXX I went on XXXX XXXX and noticed my score had gone down, went to check out why and seen something from XXXX XXXX and enhanced recovery company ... I also seen that it had come from XXXX and XXXX dated XX/XX/XXXX, XX/XX/XXXX, and XX/XX/XXXX ... I didnt have neither one before, I called and it the rep said it had come from an address Im XXXX XXXX, Florida I have never lived in Florida ever ... .I have also never had XXXX XXXX nor XXXX XXXX ... I need this taken off because it if affecting my credit score ... This is obviously identify theft and fraud..I have never received bills from here which proves that is was not done by me, I havent received any notifications ... if it was not for me checking my score I wouldnt have known nothing of this" )), row.names = c(NA, 5L), class = "data.frame")
First, I found out the top 10 anger words using the following:
library(tm)
library(tidytext)
library(tidyverse)
library(sentimentr)
library(wordcloud)
library(ggplot2)
CS <- function(txt){
MC <- Corpus(VectorSource(txt))
SW <- stopwords('english')
MC <- tm_map(MC, tolower)
MC<- tm_map(MC,removePunctuation)
MC <- tm_map(MC, removeNumbers)
MC <- tm_map(MC, removeWords, SW)
MC <- tm_map(MC, stripWhitespace)
myTDM <- as.matrix(TermDocumentMatrix(MC))
v <- sort(rowSums(myTDM), decreasing=TRUE)
FM <- data.frame(word = names(v), freq=v)
row.names(FM) <- NULL
FM <- FM %>%
mutate(word = tolower(word)) %>%
filter(str_count(word, "x") <= 1)
return(FM)
}
DF <- CS(df$Text)
# using nrc
nrc <- get_sentiments("nrc")
# create final dataset
DF_nrc = DF %>% inner_join(nrc)
And the I created a vector of top 10 anger words as follows:
TAW <- DF_nrc %>%
filter(sentiment=="anger") %>%
group_by(word) %>%
summarize(freq = mean(freq)) %>%
arrange(desc(freq)) %>%
top_n(10) %>%
select(word)
Next what I wanted to do is to find which were the 'Agent'(s) who spoke these words frequently and rank them. But I am confused how we could do that? Should I search the words one by one and group all by agents or is there some other better way. What I am looking at as a result, something like as follows:
AgentID Words_Spoken Rank
A0001 theft, dispute, money 1
A0001 theft, fraud, 2
.......

If you are more of a dplyr/tidyverse person, you can take an approach using some dplyr verbs, after converting your text data to a tidy format.
First, let's set up some example data with several speakers, one of whom speaks no anger words. You can use unnest_tokens() to take care of most of your text cleaning steps with its defaults, such as splitting tokens, removing punctuation, etc. Then remove stopwords using anti_join(). I show using inner_join() to find the anger words as a separate step, but you could join these up into one big pipe if you like.
library(tidyverse)
library(tidytext)
my_df <- tibble(AgentID = c("AA0101", "AA0101", "AA0102", "AA0103"),
Text = c("I want to report a theft and there has been fraud.",
"I have taken great offense when there was theft and also poison. It is distressing.",
"I only experience soft, fluffy, happy feelings.",
"I have a dispute with the hateful scorpion, and also, I would like to report a fraud."))
my_df
#> # A tibble: 4 x 2
#> AgentID Text
#> <chr> <chr>
#> 1 AA0101 I want to report a theft and there has been fraud.
#> 2 AA0101 I have taken great offense when there was theft and also poison.…
#> 3 AA0102 I only experience soft, fluffy, happy feelings.
#> 4 AA0103 I have a dispute with the hateful scorpion, and also, I would li…
tidy_words <- my_df %>%
unnest_tokens(word, Text) %>%
anti_join(get_stopwords())
#> Joining, by = "word"
anger_words <- tidy_words %>%
inner_join(get_sentiments("nrc") %>%
filter(sentiment == "anger"))
#> Joining, by = "word"
anger_words
#> # A tibble: 10 x 3
#> AgentID word sentiment
#> <chr> <chr> <chr>
#> 1 AA0101 theft anger
#> 2 AA0101 fraud anger
#> 3 AA0101 offense anger
#> 4 AA0101 theft anger
#> 5 AA0101 poison anger
#> 6 AA0101 distressing anger
#> 7 AA0103 dispute anger
#> 8 AA0103 hateful anger
#> 9 AA0103 scorpion anger
#> 10 AA0103 fraud anger
Now you now which anger words each person used, and the next step is to count them up and rank people. The dplyr package has fantastic support for exactly this kind of work. First you want to group_by() the person identifier, then calculate a couple of summarized quantities:
the total number of words (so you can arrange by this)
a pasted-together string of the words used
Afterwards, arrange by the number of words and make a new column that gives you the rank.
anger_words %>%
group_by(AgentID) %>%
summarise(TotalWords = n(),
WordsSpoken = paste0(word, collapse = ", ")) %>%
arrange(-TotalWords) %>%
mutate(Rank = row_number())
#> # A tibble: 2 x 4
#> AgentID TotalWords WordsSpoken Rank
#> <chr> <int> <chr> <int>
#> 1 AA0101 6 theft, fraud, offense, theft, poison, distressi… 1
#> 2 AA0103 4 dispute, hateful, scorpion, fraud 2
Do notice that with this approach, you don't have a zero entry for the person who spoke no anger words; they get dropped at the inner_join(). If you want them in the final data set, you will likely need to join back up with an earlier dataset and use replace_na().
Created on 2019-09-11 by the reprex package (v0.3.0)

Not the most elegant solution, but here's how you could count the words based on the line number:
library(stringr)
# write a new data.frame retaining the AgentID and Date from the original table
new.data <- data.frame(Agent = df$AgentID, Date = df$Date)
# using a for-loop to go through every row of text in the df provided.
for(i in seq(nrow(new.data))){ # i represent row number of the original df
# write a temporary object (e101) that:
## do a boolean check to see if the text from row i df[i, "Text"] the TAW$Word with stringr::str_detect function
## loop the str_detect with sapply so that the str_detect do a boolean check on each TAW$Word
## return the TAW$Word with TAW$Word[...]
e101 <- TAW$word[sapply(TAW$word, function(x) str_detect(df[i, "Text"], x))]
# write the number of returned words in e101 as a corresponding value in new data.frame
new.data[i, "number_of_TAW"] <- length(e101)
# concatenate the returned words in e101 as a corresponding value in new data.frame
new.data[i, "Words_Spoken"] <- ifelse(length(e101)==0, "", paste(e101, collapse=","))
}
new.data
# Agent Date number_of_TAW Words_Spoken
# 1 AA0101 4/1/2019 0
# 2 AA0101 4/1/2019 0
# 3 AA0101 4/1/2019 2 derogatory,remove
# 4 AA0101 4/1/2019 3 fee,money,remove
# 5 AA0101 4/1/2019 1 theft

180 nested conditions in a separate file to create a new id variable for each row in the my dataframe

I need to identify 180 short sentences written by experiment participants and match to each sentence, a serial number in a new column. I have 180 conditions in a separate file. All the texts are in Hebrew but I attach examples in English that can be understood.
I'm adding example of seven lines from 180-line experiment data. There are 181 different conditions. Each has its own serial number. So I also add small 6-conditions example that match this participant data:
data_participant <- data.frame("text" = c("I put a binder on a high shelf",
"My friend and me are eating chocolate",
"I wake up with superhero powers",
"Low wooden table with cubes",
"The most handsome man in camopas invites me out",
"My mother tells me she loves me and protects me",
"My laptop drops and breaks"),
"trial" = (1:7) )
data_condition <- data.frame("condition_a" = c("wooden table" , "eating" , "loves",
"binder", "handsome", "superhero"),
"condition_b" = c("cubes", "chocolate", "protects me",
"shelf","campos", "powers"),
"condition_c" = c("0", "0", "0", "0", "me out", "0"),
"i.d." = (1:6) )
I decided to use ifelse function and a nested conditions strategy and to write 181 lines of code. For each condition one line. It's also cumbersome because it requires moving from English to Hebrew. But after 30 lines I started getting an error message:
contextstack overflow
A screenshot of the error in line 147 means that after 33 conditions.
In the example, there are at most 3 keywords per condition but in the full data there are conditions with 5 or 6 keywords. (The reason for this is the diversity in the participants' verbal formulations). Therefore, the original table of conditions has 7 columns: on for i.d. no. and the rest are the words identifiers for the same condition with operator "or".
data <- mutate(data, script_id = ifelse((grepl( "wooden table" ,data$imagery))|(grepl( "cubes" ,data$imagery))
,"1",
ifelse((grepl( "eating" ,data$imagery))|(grepl( "chocolate" ,data$imagery))
,"2",
ifelse((grepl( "loves" ,data$imagery))|(grepl( "protect me" ,data$imagery))
,"3",
ifelse((grepl( "binder" ,data$imagery))|(grepl( "shelf" ,data$imagery))
,"4",
ifelse( (grepl("handsome" ,data$imagery)) |(grepl( "campus" ,data$imagery) )|(grepl( "me out" ,data$imagery))
,"5",
ifelse((grepl("superhero", data$imagery)) | (grepl( "powers" , data$imagery ))
,"6",
"181")))))))
# I expect the output will be new column in the participant data frame
# with the corresponding ID number for each text.
# I managed to get it when I made 33 conditions rows. And then I started
# to get an error message contextstack overflow.
final_output <- data.frame("text" = c("I put a binder on a high shelf", "My friend and me are eating chocolate",
"I wake up with superhero powers", "Low wooden table with cubes",
"The most handsome man in camopas invites me out",
"My mother tells me she loves me and protects me",
"My laptop drops and breaks"),
"trial" = (1:7),
"i.d." = c(4, 2, 6, 1, 5, 3, 181) )

Here's an approach using fuzzymatch::regex_left_join.
data_condition_long <- data_condition %>%
gather(col, text_match, -`i.d.`) %>%
filter(text_match != 0) %>%
arrange(`i.d.`)
data_participant %>%
fuzzyjoin::regex_left_join(data_condition_long %>% select(-col),
by = c("text" = "text_match")) %>%
mutate(`i.d.` = if_else(is.na(`i.d.`), 181L, `i.d.`)) %>%
# if `i.d.` is doubles instead of integers, use this:
# mutate(`i.d.` = if_else(is.na(`i.d.`), 181, `i.d.`)) %>%
group_by(trial) %>%
slice(1) %>%
ungroup() %>%
select(-text_match)
# A tibble: 7 x 3
text trial i.d.
<fct> <int> <int>
1 I put a binder on a high shelf 1 4
2 My friend and me are eating chocolate 2 2
3 I wake up with superhero powers 3 6
4 Low wooden table with cubes 4 1
5 The most handsome man in camopas invites me out 5 5
6 My mother tells me she loves me and protects me 6 3
7 My laptop drops and breaks 7 181

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R Regex for Postive Look-Around to Match Following - r

Related

R - If column contains a string from vector, append flag into another column

keeping the best string matched by fuzzy matching in R

R extract multiple variables from column

better and easy way to find who spoke top 10 anger words from conversation text

180 nested conditions in a separate file to create a new id variable for each row in the my dataframe

Categories

Resources