R gsub add leading line break - r

I need to add a leading lines break "\n" to a list of axis label names in R. I cannot work out how to do this with gsub. For example, I need "Q1\n/\n15" to read "\nQ1\n/\n15". Neither google nor the help commands are leading me to the answer. Any advice?
Thanks in advance.

So there are about 4 answers in the comments (as of this writing), so I'll just summarize them in a proper answer.
examp <- "Q1\n/\n15"
paste("\n", examp, sep="")
gsub("^(.)","\n\\1",examp)
sprintf("\n%s", examp)
gsub("^", "\n", examp)
all of which give
[1] "\nQ1\n/\n15"
And all of which are properly vectorized (that is, if examp <- c("Q1\n/\n15", "Q1\n/\n16"), all return [1] "\nQ1\n/\n15" "\nQ1\n/\n16".

Related

Extract character string in middle of string with R

I have character strings which look something like this:
a <- c("miRNA__hsa-mir-521-3p.iso.t5:", "miRNA__hsa-mir-947b.ref.t5:")
I want to extract the middle portion only eg. hsa-mir-521-3p and hsa-mir-947b
I have tried the following so far:
a1 <- substr(a, 8,21)
[1] "hsa-mir-521-3p" "hsa-mir-947b.r"
this obviously does not work because my desired substrings have varying lengths
a2 <- sub('miRNA__', '', a)
[1] "hsa-mir-521-3p.iso.t5:" "hsa-mir-947b.ref.t5:"
this works to remove the upstream string (“miRNA__”), but I still need to remove the downstream string
Could someone please advise what else I could try or if there is a simpler way to achieve this? I am still learning how to code with R. Thank you very much!
You haven't clearly defined the "middle portion" but based on the data shared we can extract everything between the last underscore ("_") and a dot (".").
sub('.*_(.*?)\\..*', '\\1', a)
#[1] "hsa-mir-521-3p" "hsa-mir-947b"
You can try the following regex like below
> gsub(".*_|\\..*","",a)
[1] "hsa-mir-521-3p" "hsa-mir-947b"
which removes the left-most (.*_) and right-most (\\..*) parts, therefore keeping the middle part.
We could also use trimws from base R
trimws(a, whitespace = '.*_|\\..*')
#[1] "hsa-mir-521-3p" "hsa-mir-947b"

how create a sequence of strings with different numbers in R

I just cant figure it out how to create a vector in which the strings are constant but the numbers are not. For example:
c("raster[1]","raster[2]","raster[3]")
I'd like to use something like seq(raster[1],raster[99], by=1), but this does not work.
Thanks in advance.
The sprintf function should also work:
rasters <- sprintf("raster[%s]",seq(1:99))
head(rasters)
[1] "raster[1]" "raster[2]" "raster[3]" "raster[4]" "raster[5]" "raster[6]"
As suggested by Richard Scriven, %d is more efficient than %s. So, if you were working with a longer sequence, it would be more appropriate to use:
rasters <- sprintf("raster[%d]",seq(1:99))
We can do
paste0("raster[", 1:6, "]")
# [1] "raster[1]" "raster[2]" "raster[3]" "raster[4]" "raster[5]" "raster[6]"

How do I remove a specific sign like a comma partially from a data set

I have a data set like this:
Quest_main=c("quest2,","quest5,","quest4,","quest12,","quest4,","quest5,quest7")
And I would like to remove the comma from for example "quest2," so that it is "quest2", but not from the "quest5,quest7". I think I have to use substr or ifelse, but I am not sure. The final result is this when I call up Quest_main:
"quest2" "quest5" "quest4" "quest12" "quest4" "quest5,quest7"
Thanks!
All you need is
gsub(",$","",Quest_main)
The $ signifies the end of a string: for full explanation, see the (long and complicated) ?regexp, or a more general introduction to regular expressions, or search for the tags [r] [regex] on Stack Overflow.
If you insist on doing it with substr() and ifelse(), you can:
nc <- nchar(Quest_main)
lastchar <- substr(Quest_main,nc,nc)
ifelse(lastchar==",",substr(Quest_main,1,nc-1),
Quest_main)
With substring and ifelse:
ifelse(substring(Quest_main,nchar(Quest_main))==',',substring(Quest_main,1,nchar(Quest_main)-1),Quest_main)
Here's an alternative approach (just for general knowledge) using negative lookahead
gsub("(,)(?!\\w)", "", Quest_main, perl = TRUE)
## [1] "quest2" "quest5" "quest4" "quest12" "quest4" "quest5,quest7"
This approach is more general in case you want to delete commas not only from end of the word, but specify other conditions too
A more general solution would be using stringis stri_trim_right which will work in cases Bens or Jealie solutions will fail, for example when you have many commas at the end of the sentence which you want to get rid of, for example:
Quest_main <- c("quest2,,,," ,"quest5,quest7,,,,")
Quest_main
#[1] "quest2,,,," "quest5,quest7,,,,"
library(stringi)
stri_trim_right(Quest_main, pattern = "[^,]")
#[1] "quest2" "quest5,quest7"

R extract text until, and not including x

I have a bunch of strings of mixed length, but all with a year embedded. I am trying to extract just the text part, that is everything until the number start and am having problem with lookeahead assertions assuming that is the proper way of such extractions.
Here is what I have (returns no match):
>grep("\\b.(?=\\d{4})","foo_1234_bar",perl=T,value=T)
In the example I am looking to extract just foo but there may be several, and of mixed lengths, separated by _ before the year portion.
Look-aheads may be overkill here. Use the underscore and the 4 digits as the structure, combined with a non-greedy quantifier to prevent the 'dot' from gobbling up everything:
/(.+?)_\d{4}/
-first matching group ($1) holds 'foo'
This will grab everything up until the first digit
x <- c("asdfas_1987asdf", "asd_das_12")
regmatches(x, regexpr("^[^[:digit:]]*", x))
#[1] "asdfas_" "asd_das_"
Another approach (often I find that strsplit is faster than regex searching but not always (though this does use a slight bit of regexing):
x <- c("asdfas_1987asdf", "asd_das_12") #shamelessly stealing Dason's example
sapply(strsplit(x, "[0-9]+"), "[[", 1)

Extracting specified word from a vector using R

I have a text e.g
text<- "i am happy today :):)"
I want to extract :) from text vector and report its frequency
Here's one idea, which would be easy to generalize:
text<- c("i was happy yesterday :):)",
"i am happy today :)",
"will i be happy tomorrow?")
(nchar(text) - nchar(gsub(":)", "", text))) / 2
# [1] 2 1 0
I assume you only want the count, or do you also want to remove :) from the string?
For the count you can do:
length(gregexpr(":)",text)[[1]])
which gives 2. A more generalized solution for a vector of strings is:
sapply(gregexpr(":)",text),length)
Edit:
Josh O'Brien pointed out that this also returns 1 of there is no :) since gregexpr returns -1 in that case. To fix this you can use:
sapply(gregexpr(":)",text),function(x)sum(x>0))
Which does become slightly less pretty.
This does the trick but might not be the most direct way:
mytext<- "i am happy today :):)"
# The following line inserts semicolons to split on
myTextSub<-gsub(":)", ";:);", mytext)
# Then split and unlist
myTextSplit <- unlist(strsplit(myTextSub, ";"))
# Then see how many times the smiley turns up
length(grep(":)", myTextSplit))
EDIT
To handle vectors of text with length > 1, don't unlist:
mytext<- rep("i am happy today :):)",2)
myTextSub<-gsub(":\\)", ";:\\);", mytext)
myTextSplit <- strsplit(myTextSub, ";")
sapply(myTextSplit,function(x){
length(grep(":)", x))
})
But I like the other answers better.

Resources