Problem Statement: I'm creating a dynamic application in which user select inputs and they are passed into URL to filter data. User can select single or multiple values. I'm using knitr::combine_words(Selected_Input, before = ",", and = "", sep = ",") to get them in single quotes and comma separated. But facing issue when user selects single value (as described below):
#User selecting multiple values
Selected_Input <- c("Apple","Banana","Cherry")
knitr::combine_words(Selected_Input, before = ",", and = "", sep = ",")
Result: 'Apple','Banana','Cherry' which works for my code.
But when user selects single value
#User selecting single value
Selected_Input <- c("Apple")
knitr::combine_words(Selected_Input, before = ",", and = "", sep = ",")
Result: ,Apple, which doesn't work. As it should be single quoted.
I'm using this knitr::combine_words inside paste0 to create a dynamic URL. So I'm looking for a way which works inside paste0.
If I'm using cat() function inside paste0 then the output doesn't work in my code. The url doesn't fall in place.
vector <- c("apple", "banana", "cherry")
out <- paste(sQuote(vector, FALSE), collapse=", ")
cat(out, "\n")
#> 'apple', 'banana', 'cherry'
cat(toString(sQuote(vector, FALSE)))
paste0("url",cat(toString(sQuote(vector, FALSE))),"url")
Result: 'apple', 'banana', 'cherry'[1] "urlurl"
What about:
fruits <- c("apple", "banana", "cherry")
all_fruit_in_one <- paste0(paste0("'", fruits, "'"), collapse = ", ")
cat(all_fruit_in_one)
Output:
'apple', 'banana', 'cherry'
Another option using sQuote:
Single or double quote text by combining with appropriate single or
double left and right quotation marks.
vector <- c("apple", "banana", "cherry")
out <- paste(sQuote(vector, FALSE), collapse=", ")
cat(out, "\n")
#> 'apple', 'banana', 'cherry'
Created on 2022-07-08 by the reprex package (v2.0.1)
I think it was just because of a typo in your code, i.e., it should be before = "'" instead of before = ",".
> Selected_Input <- c("Apple","Banana","Cherry")
> knitr::combine_words(Selected_Input, before = "'", and = "", sep = ",")
'Apple','Banana','Cherry'
> Selected_Input <- c("Apple")
> knitr::combine_words(Selected_Input, before = "'", and = "", sep = ",")
'Apple'
Use sprintf to insert the quotes and then use toString (assuming that comma with a space is acceptable as the separator). Optionally cat or print the result depending on exactly what you want; however, simply entering it into the console will print it.
toString(sprintf("'%s'", fruits))
## [1] "'apple', 'banana', 'cherry'"
toString(sprintf("'%s'", fruits[1]))
## [1] "'apple'"
This can also be expressed in terms of pipes:
fruits |> sprintf(fmt = "'%s'") |> toString()
## [1] "'apple', 'banana', 'cherry'"
Note
The input in reproducible form is assumed to be:
fruits <- c("apple", "banana", "cherry")
Related
Using R in Databricks.
I have the following sample list of possible text entries.
extract <- c("codeine", "tramadol", "fentanyl", "morphine")
I want check if any of these appear more than once in a string (example below) and return a binary output in a new column.
Example = ("codeine with fentanyl oral")
The output for this example would be 1.
I have tried the following with only partial success:
df$testvar1 <- +(str_count(df$medname, fixed(extract))> 1)
also tried
df$testvar2 <- cSplit_e(df$medname, split.col = "String", sep = " ", type = "factor", mode = "binary", fixed = TRUE, fill = 0)
and also
df$testvar3 <- str_extract_all(df$medname, paste(extract, collapse = " "))
Combine your extract with |.
+(stringr::str_count(Example, paste(extract, collapse = "|"))> 1)
# [1] 1
I tried the following and it worked for my code
df$testvar <- sapply(df$medname, function(x) str_extract(x, paste(extract, collapse="|")))
So, I am writing a function that, among many other things, is supposed to keep only the first sentence from each paragraph of a text and preserve the paragraph structure (i.e. each sentence is in its own line). Here is the code that I have so far:
text_shortener <- function(input_text) {
lapply(input_text, function(x)str_split(x, "\\.", simplify = T)[1])
first.sentences <- unlist(lapply(input_text, function(x)str_split(x, "\\.", simplify = T)[1]))
no.spaces <- gsub(pattern = "(?<=[\\s])\\s*|^\\s+|\\s+$", replacement = "", x = first.sentences, perl = TRUE)
stopwords <- c("the", "really", "truly", "very", "The", "Really", "Truly", "Very")
x <- unlist(strsplit(no.spaces, " "))
no.stopwords <- paste(x[!x %in% stopwords], collapse = " ")
final.text <- gsub(pattern = "(?<=\\w{5})\\w+", replacement = ".", x = no.stopwords, perl=TRUE)
return(final.text)
}
All of the functions are working as they should, but the one part I can't figure out is how to get the output to print onto separate lines. When I run the function with a vector of text (I was using some text from Moby Dick as a test), this is what I get:
> text_shortener(Moby_Dick)
[1] "Call me Ishma. It is a way I have of drivi. off splee., and regul. circu. This is my subst. for pisto. and ball"
What I want is for the output of this function to look like this:
[1] "Call me Ishma."
[2] "It is a way I have of drivi. off splee., and regul. circu."
[3] "This is my subst. for pisto. and ball"
I am relatively new to R and this giving me a real headache, so any help would be much appreciated! Thank you!
Looking at your output, it seems like splitting on a period followed by a capital letter if what you need.
You could accomplish that with strsplit() and split the string up like so:
strsplit("Call me Ishma. It is drivi. off splee., and regul. circu. This is my subst. for pisto.","\\. (?=[A-Z])", perl=T)
That finds instances where a period is followed by a space and a capital letter and splits the character up there.
Edit: You could add it to the end of your function like so:
text_shortener <- function(input_text) {
lapply(input_text, function(x)str_split(x, "\\.", simplify = T)[1])
first.sentences <- unlist(lapply(input_text, function(x)str_split(x, "\\.", simplify = T)[1]))
no.spaces <- gsub(pattern = "(?<=[\\s])\\s*|^\\s+|\\s+$", replacement = "", x = first.sentences, perl = TRUE)
stopwords <- c("the", "really", "truly", "very", "The", "Really", "Truly", "Very")
x <- unlist(strsplit(no.spaces, " "))
no.stopwords <- paste(x[!x %in% stopwords], collapse = " ")
trim.text <- gsub(pattern = "(?<=\\w{5})\\w+", replacement = ".", x = no.stopwords, perl=TRUE)
final.text <- strsplit(trim.text, "\\. (?=[A-Z])", perl=T)
return(final.text)
}
I'm trying to split a string by multiple criteria and store the splitting criteria for each split.
I have been trying to use the stringr::str_split package but cannot pass more than one splitting criteria to the function.
For example if I have the following string:
data = "Julie (title) : This is the text Julie has: said. Extra sentence one. Extra sentence 2 and so on. Rt Hon Ellen: This is the text Ellen has said in response to Julie. TITLE OF SECTION Julie: More words from Julie."
and splitting criteria:
names = c("Julie:", "Ellen:")
I would like an output like this:
data.frame(Names = c("Julie:", "Ellen:","Julie:"),
text = c(" This is the text Julie has: said. Extra sentence one. Extra sentence 2 and so on. ", "This is the text Ellen has said in response to Julie.","More words from Julie."))
I have seen your comments in coatless's answer and created a sample data, which probably reflects what you said. One way would be the following. I first create a data frame. I split the string for each sentence using unnest_tokens(). Then, I split the sentences using separate(). Finally, I replaced NAs with person's name. I hope this will help you to some extent.
library(tidyverse)
library(tidytext)
library(zoo)
so <- tibble(text = "Ana: I went to school today. I learned text mining. Bob: That is great! Ana: I know what to do: practice.")
unnest_tokens(so, output = sentence,
input = text,
token = "sentences") %>%
separate(col = sentence, into = c("person", "sentence"), sep = ": ",
extra = "merge", fill = "left") %>%
mutate(person = na.locf(person))
# A tibble: 4 x 2
# person sentence
# <chr> <chr>
#1 ana i went to school today.
#2 ana i learned text mining.
#3 bob that is great!
#4 ana i know what to do: practice.
Long-winded inefficient Base R solution:
# Store a vector of the names:
text_names <- c("Julie", "Ellen")
# Create a dataframe of the patterns:
pattern_search <- data.frame(name_search = c(paste0(text_names, ":"),
paste0(text_names, " :"),
paste0(text_names, ".* :")),
stringsAsFactors = F)
# Split the text into sentences:
split_text <- data.frame(sentences = trimws(unlist(strsplit(df$Text, "[.]")), "both"), stringsAsFactors = F)
# Extract the names, store them in a vector:
names_in_order <- gsub("[[:punct:]]|\\s+.*",
"",
regmatches(grep(paste0(pattern_search$name_search, collapse = "|"),
split_text$sentences, value = T),
regexpr(paste0(pattern_search$name_search, collapse = "|"),
grep(paste0(pattern_search$name_search, collapse = "|"),
split_text$sentences, value = T))))
# Store a logical vector denoting which elements the names should go:
split_text$who_said_this <- grepl(paste0(pattern_search$name_search, collapse = "|"),
split_text$sentences)
# Replace all occurences of TRUE with the elements of the vector of names:
split_text$who_said_this[which(split_text$who_said_this == TRUE)] <- names_in_order
# Replace FALSE with NA values:
split_text$who_said_this[which(split_text$who_said_this == "FALSE")] <- NA
# Store a vector that's values denote the number of times dialogue changes between the names:
split_text$speech_group_no <- ave(split_text$who_said_this,
split_text$who_said_this,
FUN = seq.int)
# Apply a function to fill NA values with the non-NA value above it:
split_text <- data.frame(lapply(split_text, function(x){na.omit(x)[cumsum(!is.na(x))]}),
stringsAsFactors = F)
# Row-wise concatenate the dataframe by group:
split_text <- aggregate(list(sentences = c(split_text$sentences)),
list(speech_group_no = paste0(split_text$who_said_this, " - ", split_text$speech_group_no)),
paste0,
sep = ". ")
# Flatten list vector into a character vector and clean up punctuation:
split_text$sentences <- gsub(" [,] ", " ", sapply(split_text$sentences, toString))
# Order the dialogue:
split_text <- split_text[match(split_text$speech_group_no,
paste(names_in_order, ave(names_in_order, names_in_order, FUN = seq.int), sep = " - ")),]
Data:
df <- structure(
list(Text = "Julie (title) : This is the text Julie has: said. Extra sentence one. Extra sentence 2 and so on. Rt Hon Ellen: This is the text Ellen has said in response to Julie. TITLE OF SECTION Julie: More words from Julie."),
class = "data.frame",
row.names = c(NA,-1L)
)
I basically need the outcome (string) to have double quotations, thus need of escape character. Preferabily solving with R base, without extra R packages.
I have tried with squote, shQuote and noquote. They just manipulate the quotations, not the escape character.
My list:
power <- "test"
myList <- list (
"power" = power)
I subset the content using:
myList
myList$power
Expected outcome (a string with following content):
" \"power\": \"test\" "
Using package glue:
library(glue)
glue(' "{names(myList)}": "{myList}" ')
"power": "test"
Another option using shQuote
paste(shQuote(names(myList), type = "cmd"),
shQuote(unlist(myList), type = "cmd"),
sep = ": ")
# [1] "\"power\": \"test\""
Not sure to get your expectation. Is it what you want?
myList <- list (
"power" = "test"
)
stringr::str_remove_all(
as.character(jsonlite::toJSON(myList, auto_unbox = TRUE)),
"[\\{|\\}]")
# [1] "\"power\":\"test\""
If you want some spaces:
x <- stringr::str_remove_all(
as.character(jsonlite::toJSON(myList, auto_unbox = TRUE)),
"[\\{|\\}]")
paste0(" ", x, " ")
text = c("Hello abc01","Go to abc02")
value = c(0,1)
symbol=c("abc01","abc02")
df1 = data.frame(text)
df2 = data.frame(symbol,value)
I want to replace the symbols contained as text in df1 with the corresponding value in df2, to obtain: 'Hello 0', 'Go to 1'.
Typically for string-replacement I used gsub(pattern, replacement, x)
Ex: If I want to replace "abc01" and "abc02" with "OK":
df1 = apply(df1,2,function(x) gsub("abc[0-9]{2}","OK",x))
My idea is to use a function in replace section:
gsub(df1,2,function(x) gsub("(abc)", Support(KKK),x)
in which I'll do the substitution, but I don't know how I can passing as argument KKK, the matching-strings determined(abc01,abc02).
Here is an idea (not as slick as the one in comments). What this does, it basically replaces the last word of df1$text with the df2$value of the matched df2$symbol
sapply(df1$text, function(i)
gsub(paste(df2$symbol, collapse = '|'),
df2$value[match(sub('^.* ([[:alnum:]]+)$', '\\1', i), df2$symbol)], i))
#[1] "Hello 0" "Go to 1"
P.S. I borrowed the sub('^.* ([[:alnum:]]+)$', '\\1', i) from here
df1[["text"]] <- stri_replace_all_fixed(text, symbol, value, vectorize_all = FALSE)
Thanks Jota for the solution.