Misunderstanding sub function - r

I'm working on a Shiny app that loops through an html file replacing an instance of a phrase with a different phrase relative to its position.
That is, the first time "aa" comes, I put "bluh",
the second time "aa" comes, I put "gfgf".
I have a table of all the 2nd phrases in order.
I think I'm misunderstanding the sub function documentation:
The two *sub functions differ only in that sub replaces only the first
occurrence of a pattern whereas gsub replaces all occurrences.
But here a smallest reproducible example:
tt <- c("aa", "aa","bb","aa")
sub("aa","test",tt)
# [1] "test" "test" "bb" "test"
gsub("aa","test",tt)
# [1] "test" "test" "bb" "test"
tt
# [1] "aa" "aa" "bb" "aa"
I expected
sub("aa","test",tt)
# [1] "test" "aa" "bb" "aa"
so that I could loop through and go:
og.list <- c("aa","cat","aa","cat","aa")
repl.list <- c("the","is","happy")
for(i in 1:3){
og.list <- sub("aa",repl.list[i], og.list)
}
instead all "aa" become "the". I thought that's what gsub did, but instead it's both sub and gsub.
Thank you.

I think you might want just this:
og.list[og.list == "aa"] <- repl.list
#[1] "the" "cat" "is" "cat" "happy"

Thank you Wiktor^.
I now understand that I would need to separate each item into its own string and then sub.
og.list <- c("aa","cat","aa","cat","aa"
repl.list <- c("the","is","happy")
og.index <- grep("aa",og.list)
for(i in 1:3){
curr.index <- og.index[i]
og.list[curr.index] <- sub("aa",
repl.list[i],
og.list[curr.index])
}

Related

How to generate AA,AB,AC...,BA,BB,BC...,CA,CB,CC,...,ZZ series in R?

I have to generate an alphabet series (AA,AB,AC...,BA,BB,BC...,CA,CB,CC,...,ZZ) similar to the default column names of an excel file.
I tried it by using the combination function as follows,
combn(LETTERS,2)
However, this did not match with my requirement.
as.vector(sapply(LETTERS,function(x) sapply(LETTERS , function(y) paste0(x,y))))
An alternative would be outer
out <- c(outer(LETTERS, LETTERS, paste0))
head(sort(out))
# [1] "AA" "AB" "AC" "AD" "AE" "AF"

Multiple gsub() expressions in R

I'm trying to clean a column of data from a data frame with many gsub commands.
Some examples would be:
df$col1<-gsub("-00070", "-0070", df$col1)
df$col1<-gsub("-00063", "-0063",df$col1)
df$col1<-gsub("F4", "FA", df$col1)
...
Looking at the column after running these lines of code, it looks like some of the changes have taken, but some have not. Moreover, if I run the block of code with the gsub() commands more changes start taking effect the more I run the block.
I'm very confused by this behavior, any information is appreciated.
There's probably a better way, but you could always use Map
new <- 1:3
old <- letters[1:3]
to.change <- letters[1:10]
Map(function(x, y) to.change <<- gsub(x, y, to.change), old, new)
to.change
# [1] "1" "2" "3" "d" "e" "f" "g" "h" "i" "j"

How do I paste string columns in data.frame [duplicate]

This question already has answers here:
Concatenate rows of a data frame
(4 answers)
Closed 6 years ago.
suppose we have:
mydf <- data.frame(a= LETTERS, b = LETTERS, c =LETTERS)
Now we want to add a new column, containing a concatenation of all columns.
So that rows in the new column read "AAA", "BBB", ...
In my mind the following should work?
mydf[,"Concat"] <- apply(mydf, 1, paste0)
In addition to #akrun's answer, here is a short explanation on why your code didn't work.
What you are passing to paste0 in your code are vectors and here is the behavior of paste and paste0 with vectors:
> paste0(c("A","A","A"))
[1] "A" "A" "A"
Indeed, to concatenate a vector, you need to use argument collapse:
> paste0(c("A","A","A"), collapse="")
[1] "AAA"
Consequently, your code should have been:
> apply(mydf, 1, paste0, collapse="")
[1] "AAA" "BBB" "CCC" "DDD" "EEE" "FFF" "GGG" "HHH" "III" "JJJ" "KKK" "LLL" "MMM" "NNN" "OOO" "PPP" "QQQ" "RRR" "SSS" "TTT" "UUU" "VVV"
[23] "WWW" "XXX" "YYY" "ZZZ"
We can use do.call with paste0 for faster execution
mydf[, "Concat"] <- do.call(paste0, mydf)

Generate all possible permutations (or n-tuples)

I'd like to create a data.frame of all possible permutations of 10 variables that can be either 1 or 2
2*2*2*2*2*2*2*2*2*2 = 1024 # possible
1,1,1,1,1,1,1,1,1,1
1,2,1,1,1,1,1,1,1,1
1,2,2,1,1,1,1,1,1,1
1,2,2,2,1,1,1,1,1,1
...
Is there a "quick" way to do this in R?
how about this:
tmp = expand.grid(1:2,1:2,1:2,1:2,1:2,1:2,1:2,1:2,1:2,1:2)
or this (thanks Tyler):
x <- list(1:2)
tmp = expand.grid(rep(x, 10))
Some people have asked the question regarding letters, such as here. The expand.grid solution is usually given, but I find this to be much simpler:
sapply(LETTERS[1:3], function(x){paste0(x, LETTERS[1:3])}) %>% c()
# [1] "AA" "AB" "AC" "BA" "BB" "BC" "CA" "CB" "CC"

In R, how can a string be split without using a seperator

i am try split method and i want to have the second element of a string containing only 2 elemnts. The size of the string is 2.
examples :
string= "AC"
result shouldbe a split after the first letter ("A"), that I get :
res= [,1] [,2]
[1,] "A" "C"
I tryed it with split, but I have no idea how to split after the first element??
strsplit() will do what you want (if I understand your Question). You need to split on "" to split the string on it's elements. Here is an example showing how to do what you want on a vector of strings:
strs <- rep("AC", 3) ## your string repeated 3 times
next, split each of the three strings
sstrs <- strsplit(strs, "")
which produces
> sstrs
[[1]]
[1] "A" "C"
[[2]]
[1] "A" "C"
[[3]]
[1] "A" "C"
This is a list so we can process it with lapply() or sapply(). We need to subset each element of sstrs to select out the second element. Fo this we apply the [ function:
sapply(sstrs, `[`, 2)
which produces:
> sapply(sstrs, `[`, 2)
[1] "C" "C" "C"
If all you have is one string, then
strsplit("AC", "")[[1]][2]
which gives:
> strsplit("AC", "")[[1]][2]
[1] "C"
split isn't used for this kind of string manipulation. What you're looking for is strsplit, which in your case would be used something like this:
strsplit(string,"",fixed = TRUE)
You may not need fixed = TRUE, but it's a habit of mine as I tend to avoid regular expressions. You seem to indicate that you want the result to be something like a matrix. strsplit will return a list, so you'll want something like this:
strsplit(string,"",fixed = TRUE)[[1]]
and then pass the result to matrix.
If you sure that it's always two char string (check it by all(nchar(x)==2)) and you want only second then you could use sub or substr:
x <- c("ab", "12")
sub(".", "", x)
# [1] "b" "2"
substr(x, 2, 2)
# [1] "b" "2"

Resources