How can I get the last n characters from a string in R?
Is there a function like SQL's RIGHT?
I'm not aware of anything in base R, but it's straight-forward to make a function to do this using substr and nchar:
x <- "some text in a string"
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
substrRight(x, 6)
[1] "string"
substrRight(x, 8)
[1] "a string"
This is vectorised, as #mdsumner points out. Consider:
x <- c("some text in a string", "I really need to learn how to count")
substrRight(x, 6)
[1] "string" " count"
If you don't mind using the stringr package, str_sub is handy because you can use negatives to count backward:
x <- "some text in a string"
str_sub(x,-6,-1)
[1] "string"
Or, as Max points out in a comment to this answer,
str_sub(x, start= -6)
[1] "string"
Use stri_sub function from stringi package.
To get substring from the end, use negative numbers.
Look below for the examples:
stri_sub("abcde",1,3)
[1] "abc"
stri_sub("abcde",1,1)
[1] "a"
stri_sub("abcde",-3,-1)
[1] "cde"
You can install this package from github: https://github.com/Rexamine/stringi
It is available on CRAN now, simply type
install.packages("stringi")
to install this package.
str = 'This is an example'
n = 7
result = substr(str,(nchar(str)+1)-n,nchar(str))
print(result)
> [1] "example"
>
Another reasonably straightforward way is to use regular expressions and sub:
sub('.*(?=.$)', '', string, perl=T)
So, "get rid of everything followed by one character". To grab more characters off the end, add however many dots in the lookahead assertion:
sub('.*(?=.{2}$)', '', string, perl=T)
where .{2} means .., or "any two characters", so meaning "get rid of everything followed by two characters".
sub('.*(?=.{3}$)', '', string, perl=T)
for three characters, etc. You can set the number of characters to grab with a variable, but you'll have to paste the variable value into the regular expression string:
n = 3
sub(paste('.+(?=.{', n, '})', sep=''), '', string, perl=T)
UPDATE: as noted by mdsumner, the original code is already vectorised because substr is. Should have been more careful.
And if you want a vectorised version (based on Andrie's code)
substrRight <- function(x, n){
sapply(x, function(xx)
substr(xx, (nchar(xx)-n+1), nchar(xx))
)
}
> substrRight(c("12345","ABCDE"),2)
12345 ABCDE
"45" "DE"
Note that I have changed (nchar(x)-n) to (nchar(x)-n+1) to get n characters.
A simple base R solution using the substring() function (who knew this function even existed?):
RIGHT = function(x,n){
substring(x,nchar(x)-n+1)
}
This takes advantage of basically being substr() underneath but has a default end value of 1,000,000.
Examples:
> RIGHT('Hello World!',2)
[1] "d!"
> RIGHT('Hello World!',8)
[1] "o World!"
Try this:
x <- "some text in a string"
n <- 5
substr(x, nchar(x)-n, nchar(x))
It shoudl give:
[1] "string"
An alternative to substr is to split the string into a list of single characters and process that:
N <- 2
sapply(strsplit(x, ""), function(x, n) paste(tail(x, n), collapse = ""), N)
I use substr too, but in a different way. I want to extract the last 6 characters of "Give me your food." Here are the steps:
(1) Split the characters
splits <- strsplit("Give me your food.", split = "")
(2) Extract the last 6 characters
tail(splits[[1]], n=6)
Output:
[1] " " "f" "o" "o" "d" "."
Each of the character can be accessed by splits[[1]][x], where x is 1 to 6.
someone before uses a similar solution to mine, but I find it easier to think as below:
> text<-"some text in a string" # we want to have only the last word "string" with 6 letter
> n<-5 #as the last character will be counted with nchar(), here we discount 1
> substr(x=text,start=nchar(text)-n,stop=nchar(text))
This will bring the last characters as desired.
For those coming from Microsoft Excel or Google Sheets, you would have seen functions like LEFT(), RIGHT(), and MID(). I have created a package known as forstringr and its development version is currently on Github.
if(!require("devtools")){
install.packages("devtools")
}
devtools::install_github("gbganalyst/forstringr")
library(forstringr)
the str_left(): This counts from the left and then extract n characters
the str_right()- This counts from the right and then extract n characters
the str_mid()- This extract characters from the middle
Examples:
x <- "some text in a string"
str_left(x, 4)
[1] "some"
str_right(x, 6)
[1] "string"
str_mid(x, 6, 4)
[1] "text"
I used the following code to get the last character of a string.
substr(output, nchar(stringOfInterest), nchar(stringOfInterest))
You can play with the nchar(stringOfInterest) to figure out how to get last few characters.
A little modification on #Andrie solution gives also the complement:
substrR <- function(x, n) {
if(n > 0) substr(x, (nchar(x)-n+1), nchar(x)) else substr(x, 1, (nchar(x)+n))
}
x <- "moSvmC20F.5.rda"
substrR(x,-4)
[1] "moSvmC20F.5"
That was what I was looking for. And it invites to the left side:
substrL <- function(x, n){
if(n > 0) substr(x, 1, n) else substr(x, -n+1, nchar(x))
}
substrL(substrR(x,-4),-2)
[1] "SvmC20F.5"
Just in case if a range of characters need to be picked:
# For example, to get the date part from the string
substrRightRange <- function(x, m, n){substr(x, nchar(x)-m+1, nchar(x)-m+n)}
value <- "REGNDATE:20170526RN"
substrRightRange(value, 10, 8)
[1] "20170526"
Is it possible to extract words from a string starting with $ in R?
x <- c(“$abc”, “abc”, “$123”, “456”)
desired results
(case 1)
[1] “$abc”, “$123”
or even better (case 2)
[1] “$abc”
Thanks
We can use str_detect from stringr
library(stringr)
x[str_detect(x, "^\\$[A-Za-z]")]
#[1] "$abc" "$AC-DC"
data
x <- c("$abc", "abc", "$123", "456", "$AC-DC", "A-Z")
The startWith function (base) returns per index if the value starts with a string provided as a parameter (TRUE) or not (FALSE), so you could do something like this
x[startsWith(x,"$")]
This is my python code. But it will give you an idea about how it can work in r. The logic is same here:
L = [“$abc”, “abc”, “$123”, “456”]
for i in L:
if "$" in i:
print(i)
I just created a list named L.
Then I used a for loop to get all the strings inside a list line by line and then printing it.
Using grep:
x <- c("$abc", "abc", "$123", "456", "$AC-DC", "A-Z")
grep("^\\$[A-Za-z]", x, value=TRUE)
#[1] "$abc" "$AC-DC"
^ means starts with.
\\$ means search for literal $.
[A-Za-z] means any letter.
I have to extract parts of a string in R based on a symbol and a word. I have a name such as
s <-"++can+you+please-help +me"
and the output would be:
"+ can" "+you" "+please" "-help" "+me"
where all words with the corresponding symbol before are shown. I've tried to use the strsplit and sub functions but I´m struggling in getting the output that I want. Can you please help me? Thanks!
Do
library(stringi)
result = unlist(stri_match_all(regex = "\\W\\w+",str = s))
Result
> result
[1] "+can" "+you" "+please" "-help" "+me"
No symbols
If you only want the words (no symbols), do:
result = unlist(stri_match_all(regex = "\\w+",str = s))
result
[1] "can" "you" "please" "help" "me"
Here is one option using base R
regmatches(s, gregexpr("[[:punct:]]\\w+", s))[[1]]
#[1] "+can" "+you" "+please" "-help" "+me"
I am trying to get two strings that contain quotations ("") combined as a character/string vector or with R function paste so I can plug the result in the argument x of writeFormula in openxlsx package.
An example is like this
paste('HYPERLINK("file)',':///"&path!$C$1&TRIM(MID(CELL("filename",B',sep="")
and I hope that it should produce the result like this
HYPERLINK("file:///"&path!$C$1&TRIM(MID(CELL("filename",B
but it actually produces the result with a backslash in front of the ":
[1] "HYPERLINK(\"file):///\"&path!$C$1&TRIM(MID(CELL(\"filename\",B"
I have searched for many potential solutions like replace paste with cat or add noquote function in front of paste but the output is not a character vector. Functions like toString or as.character could convert these results to strings but the backslash comes back as well.
Really appreciate any helps with this. Thanks.
There are no backslashes in p. The backslashes you see are just how R displays a quote (so that you know that the quote is part of the string and not the ending delimiter) but are not in the string itself.
p <- paste0('HYPERLINK("file)', ':///"&path!$C$1&TRIM(MID(CELL("filename",B')
p
## [1] "HYPERLINK(\"file):///\"&path!$C$1&TRIM(MID(CELL(\"filename\",B"
# no backslashes are found in p
grepl("\\", p, fixed = TRUE)
## [1] FALSE
noquote(p), cat(p, "\n") or writeLines(p) can be used to display the string without the backslash escapes:
noquote(p)
## [1] HYPERLINK("file):///"&path!$C$1&TRIM(MID(CELL("filename",B
cat(p, "\n")
## HYPERLINK("file):///"&path!$C$1&TRIM(MID(CELL("filename",B
writeLines(p)
## HYPERLINK("file):///"&path!$C$1&TRIM(MID(CELL("filename",B
One can see the individual characters separated by spaces like this. We see that there are no backslashes:
do.call(cat, c(strsplit(p, ""), "\n"))
## H Y P E R L I N K ( " f i l e ) : / / / " & p a t h ! $ C $ 1 & T R I M ( M I D ( C E L L ( " f i l e n a m e " , B
As another example here p2 contains one double quote and has a single character in it, not 2:
p2 <- '"'
p2
## [1] "\""
nchar(p2)
## [1] 1
In R (3.2.2):
print('\*') #triggers an error
print('\\*') #prints '\\*' (two backslashes)
How do I print '\*' (1 backslash)? I've read FAQ 7.3.7 and various answers here about printing literal backslashes, but they all seem to say 'use two backslashes', which doesn't work.
In case it matters, what I really want to do is
str <- sprintf("complex string \* %s %s %s",other,complex,strings),
so I can't use
cat('\\*') which does produce '\*'
I think you are after message or cat, which prints to the screen
R> message("\\*")
\*
R> cat('\\*')
\*
The print function returns an object (note the [1])
R> print('\\*')
[1] "\\*"
which you can then pass. So
## Doesn't work
R> m1 = message("\\*")
\*
R> m1
NULL
## Good to go
R> m2 = print('\\*')
[1] "\\*"
R> m2
[1] "\\*"
Hence to use with sprintf, we have
str <- sprintf("complex string \\* %s","test")
message(str)