R: Apostrophes in recode() - r

I am using the recode() function in the car package to recode an integer class variable in a data frame. I am trying to recode one of the values of the variable to a string that contains a single apostrophe ('). However, this does not work. I imagine it is because the single apostrophe prematurely ends assignment. So, I tried to use \' to exit the function but it doesn't work either.
I would prefer to continue using recode() but if that is not an option, alternatives are welcome.
A working example:
# Load car() and dplyr()
library(car)
library(dplyr)
# Set up df
a <- seq(1:3)
b <- rep(9,3)
df <- cbind(a,b) %>% as.data.frame(.)
# Below works because none of the recoding includes an apostrophe:
recode(df$a, "1 = 'foo'; 2 = 'bar'; 3 = 'foobar'")
# Below doesn't work due to apostrophe in foofoo's:
recode(df$a, "1 = 'foo'; 2 = 'bar'; 3 = 'foofoo's'")
# Exiting doesn't fix it:
recode(df$a, "1 = 'foo'; 2 = 'bar'; 3 = 'foofoo\'s'")

We could escape the quotes to make it work
recode(df$a, "1 = \"foo\"; 2 = \"bar\"; 3 = \"foofoo's\"")
#[1] "foo" "bar" "foobar's"
A base R alternative would be to use the df$a values as numeric index to replace those values
df$a <- c("foo", "bar", "foobar's")[df$a]
df$a
#[1] "foo" "bar" "foobar's"
Suppose if the values are not numeric and not in the sequence.
set.seed(24)
v1 <- sample(LETTERS[1:3], 10, replace=TRUE)
v1
#[1] "A" "A" "C" "B" "B" "C" "A" "C" "C" "A"
as.vector(setNames(c("foo", "bar", "foobar's"), LETTERS[1:3])[v1])
#[1] "foo" "foo" "foobar's" "bar" "bar" "foobar's"
#[7] "foo" "foobar's" "foobar's" "foo"
Here, we replace "A" with "foo", "B" with "bar" and "C" with "foobar's". To do that, create a named key/value vector to replace values in 'v1'.

Related

Working with names and values of objects in a list in R using loops

how do you retrieve the names of the objects of a list in a loop. I want to do something like this:
lst = list(a = c(1,2), b = 1)
for(x in lst){
#print the name of the object x in the list
# print the multiplication of the values
}
Desired results:
"a"
2 4
"b"
2
In Python one can use dictionary and with the below code get the desired results:
lst = {"a":[1,2,3], "b":1}
for key , value in lst.items():
print(key)
print(value * 2)
but since in R we have no dictionary data structure, I am trying to achieve this using lists but I don't know how to return the objects names. Any help would be appreciated.
We can get the names directly
names(lst)
[1] "a" "b"
Or if we want to print in a loop, loop over the sequence or names of the list, print the name, as well as the value got by extracting the list element based on the name multiplied
for(nm in names(lst)) {
print(nm)
print(lst[[nm]] * 2)
}
[1] "a"
[1] 2 4
[1] "b"
[1] 2
Or another option is iwalk
library(purrr)
iwalk(lst, ~ {print(.y); print(.x * 2)})
[1] "a"
[1] 2 4
[1] "b"
[1] 2

how to merge a list with a datatable while keeping the list format as output

I have a list containing vector as its elements in R
Example:
## -- reference data
tr_data <- data.table(code = c("S00000170","K00000178","S00000164","S00000167"), name = c("A","B","C","D"))
## -- mylist to join a reference
data <- c("S00000170,K00000178,S00000164","K00000178,S00000167")
mylist <- str_split(data, ',')
mylist
# [[1]]
# [1] "S00000170" "K00000178" "S00000164"
#
# [[2]]
# [1] "K00000178" "S00000167"
I would like to merge mylist and tr_data and conserve a list format
## -- my output
mylist_name
# [[1]]
# [1] "A" "B" "C"
#
# [[2]]
# [1] "B" "D"
I know I could use a for loop but is there a better and faster way to do this?
You can use match, i.e.
lapply(mylist, function(i)tr_data$name[match(i, tr_data$code)])
#[[1]]
#[1] "A" "B" "C"
#[[2]]
#[1] "B" "D"
Another option is to convert data into a data.table, set key/index then perform a join
DT <- data.table(code=unlist(l <- stri_split_fixed(data, ',')),
g=rep(seq_along(l), lengths(l)))
setindex(DT, code)
setkey(tr_data, code)
tr_data[DT, on=.(code)][,
.(.(name)) , g]$V1
Or
setindex(DT, code)
setkey(tr_data, code)
DT[tr_data, on=.(code), name := name][,
.(.(name)), g]$V1

How to Apply String Vector to Logical Vector

I would like to replace any instances of TRUE in a logical vector with the corresponding elements of a same-lengthed string vector.
For example, I would like to combine:
my_logical <- c(TRUE, FALSE, TRUE)
my_string <- c("A", "B", "C")
to produce:
c("A", "", "C")
I know that:
my_string[my_logical]
gives:
"A" "C"
but can't seem to figure out how to return a same-lengthed vector. My first thought was to simply multiply the vectors together, but that raises the error "non-numeric argument to binary operator."
Another option with replace
replace(my_string, !my_logical, "")
#[1] "A" "" "C"
What about:
my_logical <- c(TRUE, FALSE, TRUE)
my_string <- c("A", "B", "C")
my_replace <- ifelse(my_logical==TRUE,my_string,'')
my_replace
[1] "A" "" "C"
Edit, thanks #www:
ifelse(my_logical, my_string, "")
Maybe:
my_string[ !my_logical ] <- ""
my_string
# [1] "A" "" "C"
Of course this overwrites existing object.
Use ifelse to add NA when my_logical equals FALSE (TRUE otherwise). Use this to subset.
new <- my_string[ifelse(!my_logical, NA, T)]
new
[1] "A" NA "C"
If you want "" over NA do this next.
new[is.na(new)] <- ""
[1] "A" "" "C"

R: trim consecutive trailing and leading special characters from set of strings

I have a list of character vectors, all equal lengths. Example data:
> a = list('**aaa', 'bb*bb', 'cccc*')
> a = sapply(a, strsplit, '')
> a
[[1]]
[1] "*" "*" "a" "a" "a"
[[2]]
[1] "b" "b" "*" "b" "b"
[[3]]
[1] "c" "c" "c" "c" "*"
I would like to identify the indices of all leading and trailing consecutive occurrences of the character *. Then I would like to remove these indices from all three vectors in the list. By trailing and leading consecutive characters I mean e.g. either only a single occurrence as in the third one (cccc*) or multiple consecutive ones as in the first one (**aaa).
After the removal, all three character vectors should still have the same length.
So the first two and the last character should be removed from all three vectors.
[[1]]
[1] "a" "a"
[[2]]
[1] "*" "b"
[[3]]
[1] "c" "c"
Note that the second vector of the desired result will still have a leading *, which, however became the first character after the operation, so it should be in.
I tried using which to identify the indices (sapply(a, function(x)which(x=='*'))) but this would still require some code to detect the trailing ones.
Any ideas for a simple solution?
I would replace the lead and lag stars with NA:
aa <- lapply(setNames(a,seq_along(a)), function(x) {
star = x=="*"
toNA = cumsum(!star) == 0 | rev(cumsum(rev(!star))) == 0
replace(x, toNA, NA)
})
Store in a data.frame:
DF <- do.call(data.frame, c(aa, list(stringsAsFactors=FALSE)) )
Omit all rows with NA:
res <- na.omit(DF)
# X1 X2 X3
# 3 a * c
# 4 a b c
If you hate data.frames and want your list back: lapply(res,I) or c(unclass(res)), which gives
$X1
[1] "a" "a"
$X2
[1] "*" "b"
$X3
[1] "c" "c"
First of, like Richard Scriven asked in his comment to your question, your output is not the same as the thing you asked for. You ask for removal of leading and trailing characters, but your given ideal output is just the 3rd and 4th element of the character lists.
This would be easily achievable by something like
a <- list('**aaa', 'bb*bb', 'cccc*')
alist = sapply(a, strsplit, '')
lapply(alist, function(x) x[3:4])
Now for an answer as you asked it:
IMHO, sapply() isn't necessary here.
You need a function of the grep family to operate directly on your characters, which all share a help page in R opened by ?grep.
I would propose gsub() and a bit of Regular Expressions for your problem:
a <- list('**aaa', 'bb*bb', 'cccc*')
b <- gsub(pattern = "^(\\*)*", x = a, replacement = "")
c <- gsub(pattern = "(\\*)*$", x = b, replacement = "")
> c
[1] "aaa" "bb*bb" "cccc"
This is doable in one regex, but then you need a backreference for the stuff in between i think, and i didn't get this to work.
If you are familiar with the magrittr package and its excellent pipe operator, you can do this more elegantly:
library(magrittr)
gsub(pattern = "^(\\*)*", x = a, replacement = "") %>%
gsub(pattern = "(\\*)*$", x = ., replacement = "")

In R, how can a string be split without using a seperator

i am try split method and i want to have the second element of a string containing only 2 elemnts. The size of the string is 2.
examples :
string= "AC"
result shouldbe a split after the first letter ("A"), that I get :
res= [,1] [,2]
[1,] "A" "C"
I tryed it with split, but I have no idea how to split after the first element??
strsplit() will do what you want (if I understand your Question). You need to split on "" to split the string on it's elements. Here is an example showing how to do what you want on a vector of strings:
strs <- rep("AC", 3) ## your string repeated 3 times
next, split each of the three strings
sstrs <- strsplit(strs, "")
which produces
> sstrs
[[1]]
[1] "A" "C"
[[2]]
[1] "A" "C"
[[3]]
[1] "A" "C"
This is a list so we can process it with lapply() or sapply(). We need to subset each element of sstrs to select out the second element. Fo this we apply the [ function:
sapply(sstrs, `[`, 2)
which produces:
> sapply(sstrs, `[`, 2)
[1] "C" "C" "C"
If all you have is one string, then
strsplit("AC", "")[[1]][2]
which gives:
> strsplit("AC", "")[[1]][2]
[1] "C"
split isn't used for this kind of string manipulation. What you're looking for is strsplit, which in your case would be used something like this:
strsplit(string,"",fixed = TRUE)
You may not need fixed = TRUE, but it's a habit of mine as I tend to avoid regular expressions. You seem to indicate that you want the result to be something like a matrix. strsplit will return a list, so you'll want something like this:
strsplit(string,"",fixed = TRUE)[[1]]
and then pass the result to matrix.
If you sure that it's always two char string (check it by all(nchar(x)==2)) and you want only second then you could use sub or substr:
x <- c("ab", "12")
sub(".", "", x)
# [1] "b" "2"
substr(x, 2, 2)
# [1] "b" "2"

Resources