I have a list of names such as
lastname1, Abc-Def
lastname2, Abc
I am trying to find a regex to initialize the given names (that come after the comma ,) so it gives me:
lastname1, A.-D.
lastname2, A.
The closest I got: https://regex101.com/r/nKtPCq/2/
(.*), ([A-zÀ-ú])\w*-?([A-zÀ-ú])+
In R, instead of regex you could also do this if you want:
str1 = "lastname1, Abc-Def"
str2 = "lastname2, Abc"
initialize = function(nameString) {
namesList = strsplit(nameString, ", ")
splitLast = strsplit(namesList[[1]][2], "-")
initials = paste(substr(splitLast[[1]], 1, 1), ".", sep="", collapse="-")
paste(namesList[[1]][1], ", ", initials, sep="")
}
print(initialize(str1)) # "lastname1, A.-D."
print(initialize(str2)) # "lastname2, A."
Demo
Related
Problem Statement: I'm creating a dynamic application in which user select inputs and they are passed into URL to filter data. User can select single or multiple values. I'm using knitr::combine_words(Selected_Input, before = ",", and = "", sep = ",") to get them in single quotes and comma separated. But facing issue when user selects single value (as described below):
#User selecting multiple values
Selected_Input <- c("Apple","Banana","Cherry")
knitr::combine_words(Selected_Input, before = ",", and = "", sep = ",")
Result: 'Apple','Banana','Cherry' which works for my code.
But when user selects single value
#User selecting single value
Selected_Input <- c("Apple")
knitr::combine_words(Selected_Input, before = ",", and = "", sep = ",")
Result: ,Apple, which doesn't work. As it should be single quoted.
I'm using this knitr::combine_words inside paste0 to create a dynamic URL. So I'm looking for a way which works inside paste0.
If I'm using cat() function inside paste0 then the output doesn't work in my code. The url doesn't fall in place.
vector <- c("apple", "banana", "cherry")
out <- paste(sQuote(vector, FALSE), collapse=", ")
cat(out, "\n")
#> 'apple', 'banana', 'cherry'
cat(toString(sQuote(vector, FALSE)))
paste0("url",cat(toString(sQuote(vector, FALSE))),"url")
Result: 'apple', 'banana', 'cherry'[1] "urlurl"
What about:
fruits <- c("apple", "banana", "cherry")
all_fruit_in_one <- paste0(paste0("'", fruits, "'"), collapse = ", ")
cat(all_fruit_in_one)
Output:
'apple', 'banana', 'cherry'
Another option using sQuote:
Single or double quote text by combining with appropriate single or
double left and right quotation marks.
vector <- c("apple", "banana", "cherry")
out <- paste(sQuote(vector, FALSE), collapse=", ")
cat(out, "\n")
#> 'apple', 'banana', 'cherry'
Created on 2022-07-08 by the reprex package (v2.0.1)
I think it was just because of a typo in your code, i.e., it should be before = "'" instead of before = ",".
> Selected_Input <- c("Apple","Banana","Cherry")
> knitr::combine_words(Selected_Input, before = "'", and = "", sep = ",")
'Apple','Banana','Cherry'
> Selected_Input <- c("Apple")
> knitr::combine_words(Selected_Input, before = "'", and = "", sep = ",")
'Apple'
Use sprintf to insert the quotes and then use toString (assuming that comma with a space is acceptable as the separator). Optionally cat or print the result depending on exactly what you want; however, simply entering it into the console will print it.
toString(sprintf("'%s'", fruits))
## [1] "'apple', 'banana', 'cherry'"
toString(sprintf("'%s'", fruits[1]))
## [1] "'apple'"
This can also be expressed in terms of pipes:
fruits |> sprintf(fmt = "'%s'") |> toString()
## [1] "'apple', 'banana', 'cherry'"
Note
The input in reproducible form is assumed to be:
fruits <- c("apple", "banana", "cherry")
So, I am writing a function that, among many other things, is supposed to keep only the first sentence from each paragraph of a text and preserve the paragraph structure (i.e. each sentence is in its own line). Here is the code that I have so far:
text_shortener <- function(input_text) {
lapply(input_text, function(x)str_split(x, "\\.", simplify = T)[1])
first.sentences <- unlist(lapply(input_text, function(x)str_split(x, "\\.", simplify = T)[1]))
no.spaces <- gsub(pattern = "(?<=[\\s])\\s*|^\\s+|\\s+$", replacement = "", x = first.sentences, perl = TRUE)
stopwords <- c("the", "really", "truly", "very", "The", "Really", "Truly", "Very")
x <- unlist(strsplit(no.spaces, " "))
no.stopwords <- paste(x[!x %in% stopwords], collapse = " ")
final.text <- gsub(pattern = "(?<=\\w{5})\\w+", replacement = ".", x = no.stopwords, perl=TRUE)
return(final.text)
}
All of the functions are working as they should, but the one part I can't figure out is how to get the output to print onto separate lines. When I run the function with a vector of text (I was using some text from Moby Dick as a test), this is what I get:
> text_shortener(Moby_Dick)
[1] "Call me Ishma. It is a way I have of drivi. off splee., and regul. circu. This is my subst. for pisto. and ball"
What I want is for the output of this function to look like this:
[1] "Call me Ishma."
[2] "It is a way I have of drivi. off splee., and regul. circu."
[3] "This is my subst. for pisto. and ball"
I am relatively new to R and this giving me a real headache, so any help would be much appreciated! Thank you!
Looking at your output, it seems like splitting on a period followed by a capital letter if what you need.
You could accomplish that with strsplit() and split the string up like so:
strsplit("Call me Ishma. It is drivi. off splee., and regul. circu. This is my subst. for pisto.","\\. (?=[A-Z])", perl=T)
That finds instances where a period is followed by a space and a capital letter and splits the character up there.
Edit: You could add it to the end of your function like so:
text_shortener <- function(input_text) {
lapply(input_text, function(x)str_split(x, "\\.", simplify = T)[1])
first.sentences <- unlist(lapply(input_text, function(x)str_split(x, "\\.", simplify = T)[1]))
no.spaces <- gsub(pattern = "(?<=[\\s])\\s*|^\\s+|\\s+$", replacement = "", x = first.sentences, perl = TRUE)
stopwords <- c("the", "really", "truly", "very", "The", "Really", "Truly", "Very")
x <- unlist(strsplit(no.spaces, " "))
no.stopwords <- paste(x[!x %in% stopwords], collapse = " ")
trim.text <- gsub(pattern = "(?<=\\w{5})\\w+", replacement = ".", x = no.stopwords, perl=TRUE)
final.text <- strsplit(trim.text, "\\. (?=[A-Z])", perl=T)
return(final.text)
}
I am not very familiar with regex in R.
in a column I am trying to extract words before // and after || symbol. I.e. this is what I have in my column:
qtaro_269//qtaro_269||qtaro_353//qtaro_353||qtaro_375//qtaro_375||qtaro_11//qtaro_11
This is what I want:
qtaro_269; qtaro_353; qtaro_375; qtaro_11
I found this: Extract character before and after "/" and this: Extract string before "|". However I don't know how to adjust it to my input. Any hint is much appreciated.
EDIT:
a qtaro_269//qtaro_269||qtaro_353//qtaro_353||qtaro_375//qtaro_375||qtaro_11//qtaro_11
b
c qtaro_269//qtaro_269||qtaro_353//qtaro_353||qtaro_375//qtaro_375||qtaro_11//qtaro_11
What about the following?
# Split by "||"
x2 <- unlist(strsplit(x, "\\|\\|"))
[1] "qtaro_269//qtaro_269" "qtaro_353//qtaro_353" "qtaro_375//qtaro_375" "qtaro_11//qtaro_11"
# Remove everything before and including "//"
gsub(".+//", "", x2)
[1] "qtaro_269" "qtaro_353" "qtaro_375" "qtaro_11"
And if you want it as one string with ; for separation:
paste(gsub(".+//", "", x2), collapse = "; ")
[1] "qtaro_269; qtaro_353; qtaro_375; qtaro_11"
This is how I solved it. For sure not the most intelligent and elegant way, so suggestions to improve it are welcome.
df <-unlist(lapply(strsplit(df[[2]],split="\\|\\|"), FUN = paste, collapse = "; "))
df <-unlist(lapply(strsplit(df[[2]],split="\\/\\/"), FUN = paste, collapse = "; "))
df <- sapply(strsplit(df$V2, "; ", fixed = TRUE), function(x) paste(unique(x), collapse = "; "))
Suppose I build mystring in a for loop as followings in r:
mystring = ""
colorIndex = 17
for(i in 1:ncol(myTable)){
mystring = paste(mystring, paste("$('td:eq(",i, ")', nRow).attr('title', full_text);", sep = ""))
mystring = paste(mystring, paste("$('td:eq(",i,")', nRow).css('cursor', 'pointer');", sep = ""))
mystring = paste(mystring, "if(aData[",colorIndex,"] == 0){
$(nRow).css('background-color','#f8f8ff')
}else if(aData[",colorIndex,"]==1){
$(nRow).css('background-color','#9EFAC5')
}else{
$(nRow).css('background-color','#FAF99E')
};", sep ="")
}
Now, suppose my table had 60 columns. I'm trying to figure out the easiest way to do this. Do I need to make one large string, with a special character and then grep out the character? How to iterate over the i efficiently is throwing me. However, given how slow R is with strings, I would prefer not to do this in a loop.
You don't need a loop at all because paste is vectorized:
i <- 1:ncol(myTable)
yourstring <-
paste(
paste0(
paste0(" ", "$('td:eq(",i, ")', nRow).attr('title', full_text);"),
" ",
paste0("$('td:eq(",i,")', nRow).css('cursor', 'pointer');"),
"if(aData[",colorIndex,"] == 0){
$(nRow).css('background-color','#f8f8ff')
}else if(aData[",colorIndex,"]==1){
$(nRow).css('background-color','#9EFAC5')
}else{
$(nRow).css('background-color','#FAF99E')
};"
),
collapse = "")
Maybe you could use glue for this, because it makes things look cleaner, and put combinations in a data frame in advance, such that you don't need the loop:
myTable <- iris
mystring <- "your string with some glue-elements in it: i = {paste_df$i} and colorIndex = {paste_df$colorIndex}"
paste_df <- data.frame(i = seq_len(ncol(myTable)), colorIndex = 17)
string <- glue::glue(mystring)
# or, a little messy but the same, with paste0:
string <- paste0("your string with some glue-elements in it: i = ",
paste_df$i, " and colorIndex = ", paste_df$colorIndex)
# and in the end, collapse the string:
paste0(string, collapse = "")
I have the following string:
data_string = c("Aa_Bbbbb_0_ID1",
"Aa_Bbbbb_0_ID2",
"Aa_Bbbbb_0_ID3",
"Ccccc_D_EEE_0_ID1")
I just wanted to split all the string to have these results:
"Aa_Bbbbb"
"Aa_Bbbbb"
"Aa_Bbbbb"
"Ccccc_D_EEE"
So basically, I'm looking for a function which take data_string, set a separator, and take the split position :
remove_tail(data_table, sep = '_', del = 2)
only removing the tail from 2nd last separator to the end of the string (not split all the string)
Try below:
# split on "_" then paste back removing last 2
sapply(strsplit(data_string, "_", fixed = TRUE),
function(i) paste(head(i, -2), collapse = "_"))
We can make our own function:
# custom function
remove_tail <- function(x, sep = "_", del = 2){
sapply(strsplit(x, split = sep, fixed = TRUE),
function(i) paste(head(i, -del), collapse = sep))
}
remove_tail(data_string, sep = '_', del = 2)
# [1] "Aa_Bbbbb" "Aa_Bbbbb" "Aa_Bbbbb" "Ccccc_D_EEE"
Using gsub
gsub("_0_.*","",data_string)
We can also use sub tp match the _ followed by one or more digits (\\d+) and the rest of the characters, replace it with blank ("")
sub("_\\d+.*", "", data_string)
#[1] "Aa_Bbbbb" "Aa_Bbbbb" "Aa_Bbbbb" "Ccccc_D_EEE"