How to add single quotes around multiple strings - r

strings <- c("apple", "banana", "029")
> strings
[1] "apple" "banana" "029"
I would like to add single quotes to each element in strings and separate the strings with ,. My desired output is this:
desired_strings <- "'apple','banana','029'"
> desired_strings
[1] "'apple','banana','029'"
My attempt:
a <- "'"
paste0(mapply(paste0, a, strings, a), ",")
[1] "'apple'," "'banana'," "'029',"
However, this is not quite right.

You can use sQuote() and then collapse to a single string with paste():
paste(sQuote(strings, q = FALSE), collapse = ",")
[1] "'apple','banana','029'"

Using sprintf.
toString(sprintf("'%s'", strings))
# [1] "'apple', 'banana', '029'"
or
paste(sprintf("'%s'", strings), collapse=",")
# [1] "'apple','banana','029'"

Related

how to extract part of a string matching pattern with separation in r

I'm trying to extract part of a file name that matches a set of letters with variable length. The file names consist of several parameters separated by "_", but they vary in the number of parts. I'm trying to pull some of the parameters out to use separately.
Example file names:
a = "Vel_Mag_ft_modelExisting_350cfs_blah3.tif"
b = "Depth_modelDesign_11000cfs_blah2.tif"
I'm trying to pull out the parts that start with "model" so I end up with
"modelExisting"
"modelDesign"
The filenames are stored as a variable in a data.frame
I've tried
library(tidyverse)
tibble(files = c(a,b))%>%
mutate(attempt1 = str_extract(files, "model"),
attempt2 = str_match(str_split(files, "_"), "model"))
but just ended up with the "model" in all cases and not the "model...." that I need.
The pieces I need are a consisent number of pieces from the end, but I couldn't figure out how to specify that either. I tried
str_split(files, "_")[-3]
but this threw an error that it must be size 480 or 1 not size 479
We can create a function to capture the word before the _ and one or more digits (\\1), in the replacement, specify the backreference (\\1) of the captured group
f1 <- function(x) sub(".*_([[:alpha:]]+)_\\d+.*", "\\1", x)
-testing
> f1(a)
[1] "modelExisting"
> f1(b)
[1] "modelDesign"
We can use strsplit or regmatches like below
> s <- c("Vel_Mag_ft_modelExisting_350cfs_blah3.tif", "Depth_modelDesign_11000cfs_blah2.tif")
> lapply(strsplit(s, "_"), function(x) x[which(grepl("^\\d+", x)) - 1])
[[1]]
[1] "modelExisting"
[[2]]
[1] "modelDesign"
> regmatches(s, gregexpr("[[:alpha:]]+(?=_\\d+)", s, perl = TRUE))
[[1]]
[1] "modelExisting"
[[2]]
[1] "modelDesign"

Extract a string that spans across multiple lines - stringr

I need to extract a string that spans across multiple lines on an object.
The objetc:
> text <- paste("abc \nd \ne")
> cat(text)
abc
d
e
With str_extract_all I can extract all the text between ‘a’ and ‘c’, for example.
> str_extract_all(text, "a.*c")
[[1]]
[1] "abc"
Using the function ‘regex’ and the argument ‘multiline’ set to TRUE, I can extract a string across multiple lines. In this case, I can extract the first character of multiple lines.
> str_extract_all(text, regex("^."))
[[1]]
[1] "a"
> str_extract_all(text, regex("^.", multiline = TRUE))
[[1]]
[1] "a" "d" "e"
But when I try the to extract "every character between a and d" (a regex that spans across multiple lines), the output is "character(0)".
> str_extract_all(text, regex("a.*d", multiline = TRUE))
[[1]]
character(0)
The desired output is:
“abcd”
How to get it with stringr?
dplyr:
library(dplyr)
library(stringr)
data.frame(text) %>%
mutate(new = lapply(str_extract_all(text, "(?!e)\\w"), paste0, collapse = ""))
text new
1 abc \nd \ne abcd
Here we use the character class \\w, which does not include the new line metacharacter \n. The negative lookahead (?!e) makes sure the e is not matched.
base R:
unlist(lapply(str_extract_all(text, "(?!e)\\w"), paste0, collapse = ""))
[1] "abcd"
str_remove_all(text,"\\s\\ne?")
[1] "abcd"
OR
paste0(trimws(strsplit(text, "\\ne?")[[1]]), collapse="")
[1] "abcd"
The anwers above remove line breaks. So, a two step approach can work to get the desired output 'abcd'.
1 - Use str_remove_all or gsub to remove the line breaks (in this case, also removing blank spaces).
2 - Use str_extract_all to get the desired output ('abcd' in this case).
> text %>%
+ str_remove_all("\\s\\n") %>%
+ str_extract_all("a.*d")
[[1]]
[1] "abcd"
Short regex reference:
\n - new line (return)
\s - any whitespace
\r - carriage return
Update:
In base R to get the desired output abcd:
text <- gsub("[\r\n]|[[:blank:]]", "", text)
substr(text,1, nchar(text)-1)
[1] "abcd"
First answer:
We can use gsub:
gsub("[\r\n]|[[:blank:]]", "", text)
[1] "abcde"

How to apply the function substr to each element of a string

I have this string of words
string<-c("chair-desk-tree-table-computer-mousse")
I want to retrieve the first three characters of each word and store them in an object like that:
newstring==> [1] "cha-des-tre-tab-com-mou"
> newstring <- substring( strsplit(string, "-")[[1]], 1, 3)
> newstring <- paste0(newstring, collapse = "-")
> newstring
[1] "cha-des-tre-tab-com-mou"
Using gsub with a regex lookaround to match one or more lower case letters that precede 3 lower case letters
gsub("(?<=\\b[a-z]{3})[a-z]+", "", string, perl = TRUE)
[1] "cha-des-tre-tab-com-mou"
Using the edited string
> string <- c(string, "K29-E665-I1190")
> gsub("(?<=\\b[[:alnum:]]{3})[[:alnum:]]+", "", string, perl = TRUE)
[1] "cha-des-tre-tab-com-mou" "K29-E66-I11"

Using Regex to edit a column in R [duplicate]

I've got a column people$food that has entries like chocolate or apple-orange-strawberry.
I want to split people$food by - and get the first entry from the split.
In python, the solution would be food.split('-')[0], but I can't find an equivalent for R.
If you need to extract the first (or nth) entry from each split, use:
word <- c('apple-orange-strawberry','chocolate')
sapply(strsplit(word,"-"), `[`, 1)
#[1] "apple" "chocolate"
Or faster and more explictly:
vapply(strsplit(word,"-"), `[`, 1, FUN.VALUE=character(1))
#[1] "apple" "chocolate"
Both bits of code will cope well with selecting whichever value in the split list, and will deal with cases that are outside the range:
vapply(strsplit(word,"-"), `[`, 2, FUN.VALUE=character(1))
#[1] "orange" NA
For example
word <- 'apple-orange-strawberry'
strsplit(word, "-")[[1]][1]
[1] "apple"
or, equivalently
unlist(strsplit(word, "-"))[1].
Essentially the idea is that split gives a list as a result, whose elements have to be accessed either by slicing (the former case) or by unlisting (the latter).
If you want to apply the method to an entire column:
first.word <- function(my.string){
unlist(strsplit(my.string, "-"))[1]
}
words <- c('apple-orange-strawberry', 'orange-juice')
R: sapply(words, first.word)
apple-orange-strawberry orange-juice
"apple" "orange"
I would use sub() instead. Since you want the first "word" before the split, we can simply remove everything after the first - and that's what we're left with.
sub("-.*", "", people$food)
Here's an example -
x <- c("apple", "banana-raspberry-cherry", "orange-berry", "tomato-apple")
sub("-.*", "", x)
# [1] "apple" "banana" "orange" "tomato"
Otherwise, if you want to use strsplit() you can round up the first elements with vapply()
vapply(strsplit(x, "-", fixed = TRUE), "[", "", 1)
# [1] "apple" "banana" "orange" "tomato"
I would suggest using head rather than [ in R.
word <- c('apple-orange-strawberry','chocolate')
sapply(strsplit(word, "-"), head, 1)
# [1] "apple" "chocolate"
dplyr/magrittr approach:
library(magrittr)
library(dplyr)
word = c('apple-orange-strawberry', 'chocolate')
strsplit(word, "-") %>% sapply(extract2, 1)
# [1] "apple" "chocolate"
Using str_remove() to delete everything after the pattern:
df <- data.frame(words = c('apple-orange-strawberry', 'chocolate'))
mutate(df, short = stringr::str_remove(words, "-.*")) # mutate method
stringr::str_remove(df$words, "-.*") # str_remove example
stringr::str_replace(df$words, "-.*", "") # str_replace method
stringr::str_split_fixed(df$words, "-", n=2)[,1] # str_split method similar to original question's Python code
tidyr::separate(df, words, into = c("short", NA)) # using the separate function
words short
1 apple-orange-strawberry apple
2 chocolate chocolate
stringr 1.5.0 introduced str_split_i to do this easily:
library(stringr)
str_split_i(c('apple-orange-strawberry','chocolate'), "-", 1)
[1] "apple" "chocolate"
The third argument represents the index you want to extract. Also new is that you can use negative values to index from the right:
str_split_i(c('apple-orange-strawberry','chocolate'), "-", -1)
[1] "strawberry" "chocolate"

Grep to subset in R

How can i grep all the gene names starting only with "Gm" from data1[,7].
I tried data2[grep("^Gm",data2$Genes),]; but it extract the entire row which starts with "Gm".
data1[,7] <-
[1] "Ighmbp2,Mrpl21,Cpt1a,Mtl5,Gal,Ppp6r3,Gm23940,Lrp5"
[2] "Gm5852,Gm5773,Tdpoz4,Tdpoz3,Gm9116,Gm9117,Tdpoz5"
[3] "Arhgap15,Gm22867"
One option would be to split the string (strsplit(..) by , and then extract words in the output (which is a list, so lapply can be used) that begin with "Gm" using grep. (^- denotes the beginning of word/string)
lapply(strsplit(Genes, ','), function(x) grep('^Gm', x, value=TRUE))
#[[1]]
#[1] "Gm23940"
#[[2]]
#[1] "Gm5852" "Gm5773" "Gm9116" "Gm9117"
#[[3]]
#[1] "Gm22867"
Or you could extract the words by stri_extract_all from stringi
library(stringi)
stri_extract_all_regex(Genes, 'Gm[[:alnum:]]+')
Or if you need it as a vector, you can use unlist on the above output or use gsub to remove those words that don't begin with "Gm" (\\b(?!Gm)\\w+\\b) and ,', then usescan`.
scan(text=gsub('\\b(?!Gm)\\w+\\b|,', ' ',
Genes, perl=TRUE), what='', quiet=TRUE)
#[1] "Gm23940" "Gm5852" "Gm5773" "Gm9116" "Gm9117" "Gm22867"
Update
If you need to remove all the words starting with Gm
scan(text=gsub('\\bGm\\w+\\b|,', ' ', Genes, perl=TRUE),
what='', quiet=TRUE)
# [1] "Ighmbp2" "Mrpl21" "Cpt1a" "Mtl5" "Gal" "Ppp6r3"
# [7] "Lrp5" "Tdpoz4" "Tdpoz3" "Tdpoz5" "Arhgap15"
data
Genes <- c("Ighmbp2,Mrpl21,Cpt1a,Mtl5,Gal,Ppp6r3,Gm23940,Lrp5",
"Gm5852,Gm5773,Tdpoz4,Tdpoz3,Gm9116,Gm9117,Tdpoz5",
"Arhgap15,Gm22867")

Resources