How to use grep to match exactly with space? - r

I have a list as following:
S = "Alicia Chang"
N=c("Alicia Chang", "Heather May", "Alicia Chang J")
I want to use grep to turn the first one only. How could I do it. When I use grep(S, N), it return 3 of them. When I use grep(^S$, N), it gave me error.

We need to use paste to create the pattern for grep.
grep(paste0('^', S, '$'), N)
#[1] 1

Related

Find the line of the match character

I would like to see the line where a character exists.
The expected answer would be the 4-row numbers which include the character BTC.
library(stringr)
library(quantmod)
symbols <- stockSymbols()
symbols <- symbols[,1]
u <- symbols
a <- "BTC"
str_detect(a, u)
table(str_detect(a, u))
We could use grepl with which
which(grepl(a, u))
You could either use the tidyverse way, using the filter() function:
filter(dataset, column == "BTC")
Or using the grep() function from base R:
grep("BTC", dataset$column)
That will give you the index (i.e. place) of what you are looking for
Another base R option might be which + regexpr (but I think grep or grepl is obviously more efficient and straightforward)
which(regexpr(a, u)>0)
You can use grep to get the index where pattern a occurs.
#Index
grep(a, u)
#[1] 3437
#Value
grep(a, u, value = TRUE)
#[1] "EBTC"
Using stringr :
library(stringr)
#Index
str_which(u, a)
#Value
str_subset(u, a)

Substitute/Placeholder Variable in R Syntax

I have the following problem:
I need the same syntax over and over again for different variable-sets.
They all have the same "core" name (for example: variable_1) and different suffixes like:
variable_1_a, variable_1_b, variable_1_c, variable_1_d, variable_1_e, ...
since the syntax is large and I need to run it for (example) variables _2, _3, _4, _5, ... and so on, I was wondering whether there is some form of placeholder-expression I could define with the "core" name with which I want to run it each time, instead of copy pasting the whole syntax and substituting every "variable_1" with the next core-name.
For example saving the core name in term !XY! (the "!" is just to represent that it is something atypical) and having that term in the whole syntax with "_a", "_b", "_c" attached
!XY!_a, !XY!_b, !XY!_c, !XY!_d, !XY!_e, ...
I played around with saving the core-name in an element called XY and pasting it with the endings:
XY <- "variable_1"
paste0(as.character(XY),"_a")
"variable_1_a"
OR
as.symbol(paste0(as.character(XY),"_a"))
variable_1_a
of course that looks horribly long but that I would accept if it worked to also use it as an expression which I could address to use like a variable: for example to read or write it which results in an error.
as.symbol(paste0(as.character(XY),"_a")) <- "test"
Error in as.symbol(paste0(as.character(XY),"_a")) <- "test" :
could not find function "as.symbol<-"
It would be a huge time-saver if there is a chance to write one syntax to fit all procedures!
Thx a lot for your ideas!
Let's assume you have 5 variables ("variable_1", "variable_2" etc) and 4 letters ("_a", "_b" etc).
We can use outer like :
n <- 1:5
l <- letters[1:4]
c(outer(n, l, function(x, y) paste("variable", x, y, sep = "_")))
#Or a bit shorter :
#paste0("variable_", c(outer(n, l, paste, sep = "_")))
#[1] "variable_1_a" "variable_2_a" "variable_3_a" "variable_4_a"
#[5] "variable_5_a" "variable_1_b" "variable_2_b" "variable_3_b"
#[9] "variable_4_b" "variable_5_b" "variable_1_c" "variable_2_c"
#[13] "variable_3_c" "variable_4_c" "variable_5_c" "variable_1_d"
#[17] "variable_2_d" "variable_3_d" "variable_4_d" "variable_5_d"

Write variable length strings in single console line

I use RStudio. Within a loop, I want to display in a single console line a string of variable length. I am using cat(). If I use \n, different lines are written (not what I want):
A <- c("AAAAA","BBB","C")
for (i in 1:3){cat(A[i],"\n"); Sys.sleep(1)}
AAAAA
BBB
C
The use of \r works well when names are of the (nearly) same length, but in this case, the result is again not what I want:
for (i in 1:3){cat(A[i],"\r"); Sys.sleep(1)}
C B A
as it should be only the string "C" when the loop is finished.
I have also tried deleting many spaces with \b, but the length difference is large and many times the information is written one line above the current console line.
Is there a simple way to do this? (base R preferred)
Edit: What I want is that, in a single line, first the string "AAAAA" appears. After one second, only the string "BBB" should appear (not "BBB A"). After one second, only the string "C" should appear (not "C B A").
Your current method works if you first pad all the strings to the length of the longest one:
A <- c("AAAAA","BBB","C")
max_length = max(nchar(A))
A_filled = stringr::str_pad(A, max_length, side = "right")
for (i in 1:3){cat(A_filled[i],"\r"); Sys.sleep(1)}
To pad the strings in base R you can use sprintf:
max_length = max(nchar(A))
pad_format = paste0("%-", max_length, "s")
A_filled = sprintf(pad_format, A)
I tend to believe you want all the strings printed: This is a base R solution
A <- c("AAAAA","BBB","C")
x <-formatC(A, width = -max(nchar(A)))
for (i in 1:3){cat("\r",x[i]); Sys.sleep(1)}
I hope, just a simple cat works fine.
> for (i in 1:3){cat(A[i], " "); Sys.sleep(1)}
AAAAA BBB C
> for (i in 1:3){cat(A[i]); Sys.sleep(1)}
AAAAABBBC

A better way to extract functions from an R script?

Say I have a file "myfuncs.R" with a few functions in it:
A <- function(x) x
B <- function(y) y
C <- function(z) z
I want to place all the functions contained within "myfuncs.R" into their own files, named appropriately. I have a simple Bash-shell script to extract functions and place them in separate files:
split -p "function\(" myfuncs.R tmpfunc
grep "function(" tmpfunc* | awk '{
# strip first-instances of function assignment
sub("<-", " ")
sub("=", " ")
sub(":", " ") # and colon introduced by grep
mv=$1
mvto=sprintf("func_%s.R",$2)
print "mv", mv, mvto
}' | sh
leaving me with:
func_A.R
func_B.R
func_C.R
But, this script has obvious limitations. For example, it will misbehave when function 'A' has a nested function:
A <- function(x){
Aa <- function(x){x}
return(Aa)
}
and outright fails if the whole function is on a single line.
Does anyone know of a more robust, and less error-prone method to do this?
Source your functions and then type package.skeleton()
Separate files will be made for each function.

grep at the beginning of the string with fixed =T in R?

How to grep with fixed=T, but only at the beginning of the string?
grep("a.", c("a.b", "cac", "sss", "ca.f"), fixed = T)
# 1 4
I would like to get only the first occurrence.
[Edit: the string to match is not known in advance, and can be anything. "a." is just for the sake of example]
Thanks.
[Edit: I sort of solved it now, but any other ideas are highly welcome. I will accept as an answer any alternative solution.
s <- "a."
res <- grep(s, c("a.b", "cac", "sss", "ca.f"), fixed = T, value = T)
res[substring(res, 1, nchar(s)) == s]
]
If you want to match an exact string (string 1) at the beginning of the string (string 2), then just subset your string 2 to be the same length as string 1 and use ==, should be fairly fast.
Actually, Greg -and you- have mentioned the cleanest solution already. I would even drop the grep altogether:
> name <- "a#"
> string <- c("a#b", "cac", "sss", "ca#f")
> string[substring(string, 1, nchar(name)) == name]
[1] "a#b"
But if you really insist on grep, you can use Dwins approach, or following mindboggling solution:
specialgrep <- function(x,y,...){
grep(
paste("^",
gsub("([].^+?|[#\\-])","\\\\\\1",x)
,sep=""),
y,...)
}
> specialgrep(name,string,value=T)
[1] "a#b"
It might be I forgot to include some characters in the gsub. Be sure you keep the ] symbol first and the - last in the characterset, otherwise you'll get errors. Or just forget about it, use your own solution. This one is just for fun's sake :-)
Do you want to use fixed=T because of the . in the pattern? In that case you can just escape the . this would work:
grep("^a\\.", c("a.b", "cac", "sss", "ca.f"))
If you only want the focus on the first two characters, then only present that much information to grep:
> grep("a.", substr(c("a.b", "cac", "sss", "ca.f"), 1,2) ,fixed=TRUE)
[1] 1
You could easily wrap it into a function:
> checktwo <- function (patt,vec) { grep(patt, substr(vec, 1,nchar(patt)) ,fixed=TRUE) }
> checktwo("a.", c("a.b", "cac", "sss", "ca.f") )
[1] 1
I think Dr. G had the key to the solution in his answer, but didn't explicitly call it out: "^" in the pattern specifies "at the beginning of the string". ("$" means at the end of the string)
So his "^a." pattern means "at the beginning of the string, look for an 'a' followed by one character of anything [the '.']".
Or you could just use "^a" as the pattern unless you don't want to match the one character string containing only "a".
Does that help?
Jeffrey

Resources