Add letter at the beginning of string - r

I am trying to change the name of some columns in my data using str_c function ( after installing "stringr" package). Column names are as follow:
> x
a b c d
I need to change the the "c" and "d" with "Aa" and "Ab". So instead of writing the full column names in my command, I will use the following structure:
colnames(x[,3:4])<-str_c(colnames(x[,1:2], "A")
However, the result shows the "A" at the end not the beginning of the name. So how to put the "A" at the beginning to get the following:
> x
a b Aa Ab

If I understand your question and this is for displaying a "column name", then you should swap the order in your expression, this
colnames(x[,3:4])<-str_c(colnames(x[,1:2], "A")
should be
colnames(x[,3:4])<-str_c("A", colnames(x[,1:2])
Note column names with a leading digit won't work with the $ function unless the name is quoted. So "A" works fine with '$' function but "1" needs the name to be quoted.

Related

Remove/select a some objects with sequence number at the end of their name in R

How to quickly remove (rm) or select (ls) the numbered object names with a sequence in R?
In the example below, I want to delete only a1-a3.
a<-c(1:3)
a1<-c(4:6)
a2<-c(7:9)
a3<-c(10:12)
ab1c<-c(paste0("x", 1:3))
ab1c1<-c(paste0("x", 4:6))
ab1c2<-c(paste0("x", 7:9))
ab2c1<-c(1:5)
ab2c2<-c(2:6)
I know that to remove some objects quickly we use something like:
rm(list=ls(pattern="^a"))
rm(list=ls(pattern="1$"))
But, the pattern with ^ selects all variables with the initial character "a" and the pattern with $ selects all variables with "1" as the last character in their name, while I want to delete only some of them.
ls(pattern="^a")
[1] "a" "a1" "a2" "ab1c" "ab1c1" "ab1c2"
ls(pattern="1$")
[1] "a1" "ab1c1"
#what I want:
[1] "a1" "a2" "a3"
I am new to R and just knew—maybe two days ago—that my issue most possibly related to the regular expression (regex). I am starting to learn regex, but still had no luck finding how to select some objects whose name has consecutive sequence numbers. I tried to use something like pattern="1:3", but obviously, it does not work.
Please suggest how to select some object names with a sequence numeric pattern within them. If there is no such an expression for the numeric character in the object name, please suggest the quickest solution to remove them quickly from the global environment.
Thank you.
Edit (after the comment from Wiktor Stribiżew):
I also want to know if we can add up some criteria/argument in the pattern.
For example, I want to select only "ab1c1" and "ab1c2". Could we add up some criteria like "^a", "b1", [1-2]$? I am also asking this because sometimes the number is also in the middle of the name (and sometimes I want to select by the middle character, too).
I assume you want to remove those variables that start with a and then can have 1, 2 or 3, and then nothing else (i.e. end of string will follow).
Then, you may simply use ^a[1-3]$.
If you want to find the variable names that start with a and end with 1, 2 or 3, use ^a.*[1-3]$.
See an R test:
> a<-c(1:3)
> a1<-c(4:6)
> a2<-c(7:9)
> a3<-c(10:12)
> ab1c<-c(paste0("x", 1:3))
> ab1c1<-c(paste0("x", 4:6))
> ab2c2<-c(paste0("x", 7:9))
> ls(pattern="^a")
[1] "a" "a1" "a2" "a3" "ab1c" "ab1c1" "ab2c2"
> rm(list=ls(pattern="^a[1-3]$"))
> ls(pattern="^a")
[1] "a" "ab1c" "ab1c1" "ab2c2"

How do I make my string split case insensitive?

Below is a piece of code that splits a large piece of text called "lines" into multiples strings. It splits whenever it detects an ending punctuation (such as . or ?) but it excludes all periods that immediately follow an abbreviation such as Mr.
lines<-unlist(strsplit(lines, paste("(?<=(?<!", abbr,")[\\.\\?\\!])[\\s”’]"), perl = T))
All of the abbreviations are stored in a vector called "abbr" and they are all capitalized (Mr., Mrs. as opposed to mr., mrs.). The problem I have with my code is that I want it to be case insensitive and detect abbreviations in the text that aren't capitalized and I want to accomplish this without simply adding lower case versions of each abbreviation to the abbr vector.
strsplit itself does not offer case insensitivity, but you can make an equivalent (if not regex-inefficient) with
abbr <- "SomeText"
abbr1 <- strsplit(abbr, "")
abbr1
# [[1]]
# [1] "S" "o" "m" "e" "T" "e" "x" "t"
abbr2 <- paste(sprintf("[%s%s]", toupper(abbr1[[1]]), tolower(abbr1[[1]])), collapse = "")
abbr2
# [1] "[Ss][Oo][Mm][Ee][Tt][Ee][Xx][Tt]"
and use abbr2 in place of abbr in your code above.

Dollar operator as function argument for sapply not working as expected

I have the following list
test_list=list(list(a=1,b=2),list(a=3,b=4))
and I want to extract all elements with list element name a.
I can do this via
sapply(test_list,`[[`,"a")
which gives me the correct result
#[1] 1 3
When I try the same with Rs dollar operator $, I get NULL
sapply(test_list,`$`,"a")
#[[1]]
#NULL
#
#[[2]]
#NULL
However, if I use it on a single element of test_list it works as expected
`$`(test_list[[1]],"a")
#[1] 1
Am I missing something obvious here?
evaluation vs. none
[[ evaluates its argument whereas $ does not. L[[a]] gets the component of L whose name is held in the variable a. $ just passes the argument name itself as a character string so L$a finds the "a" component of L. a is not regarded as a variable holding the component name -- just a character string.
Below L[[b]] returns the component of L named "a" because the variable b has the value "a" whereas L$b returns the componet of L named "b" because with that syntax b is not regarded as a variable but is regarded as a character string which itself is passed.
L <- list(a = 1, b = 2)
b <- "a"
L[[b]] # same as L[["a"]] since b holds a
## [1] 1
L$b # same as L[["b"]] since b is regarded as a character string to be passed
## [1] 2
sapply
Now that we understand the key difference bewteen $ and [[ to see what is going on with sapply consider this example. We have made each element of test_list into a "foo" object and defined our own $.foo and [[.foo methods which simply show what R is passing to the method via the name argument:
foo_list <- test_list
class(foo_list[[1]]) <- class(foo_list[[2]]) <- "foo"
"$.foo" <- "[[.foo" <- function(x, name) print(name)
result <- sapply(foo_list, "$", "a")
## "..."
## "..."
result2 <- sapply(foo_list, "[[", "a")
## [1] "a"
## [1] "a"
What is happening in the first case is that sapply is calling whatever$... and ... is not evaluated so it would be looking for a list component which is literally named "..." and, of course, there is no such component so whatever$... is NULL hence the NULLs shown in the output in the question. In the second case whatever[[[...]] evaluates to whatever[["a"]] hence the observed result.
From what I've been able to determine it's a combination of two things.
First, the second element of $ is matched but not evaluated so it cannot be a variable.
Secondly, when arguments are passed to functions they are assigned to the corresponding variables in the function call. When passed to sapply "a" is assigned to a variable and therefore will no longer work with $. We can see this by occurring by running
sapply("a", print)
[1] "a"
a
"a"
This can lead to peculiar results like this
sapply(test_list, function(x, a) {`$`(x, a)})
[1] 1 3
Where despite a being a variable (which hasn't even been assigned) $ matches it to the names of the elements in the list.

R: return column using get() and paste()

Why does get() in combination with paste() work for dataframes but not for columns within a dataframe? how can I make it work?
ab<-12
get(paste("a","b",sep=""))
# gives: [1] 12
ab<-data.frame(a=1:3,b=3:5)
ab$a
#gives: [1] 1 2 3
get(paste("a","b",sep=""))
# gives the whole dataframe
get(paste("ab$","a",sep=""))
# gives: Error in get(paste("ab$", "a", sep = "")) : object 'ab$a' not found
Columns in dataframes are not first class objects. Their "names" are really indexing values for list-extraction. Despite the understandable confusion caused by the existence of the names function, they are not true R-names, i.e. unquoted tokens or symbols, in the list of R objects. See the ?is.symbol help page. The get function takes a character value, and then looks for it in the workspace and returns it for further processing.
> ab<-data.frame(a=1:3,b=3:5)
> ab$a
[1] 1 2 3
> get(paste("a","b",sep=""))
a b
1 1 3
2 2 4
3 3 5
>
> # and this would be the way to get the 'a' column of the ab object
get(paste("ab",sep=""))[['a']]
If there were a named object target with a value "a" tehn you could also do:
target <- "a"
get(paste("ab",sep=""))[[target]] # notice no quotes around target
# because `target` is a _real_ R name
It doesn't work because get() interprets the string it's passed as referring to an object named "ab$a" (not as referring to the element named "a" of the object named "ab") . Here's probably the best way to see what that means:
ab<-data.frame(a=1:3,b=3:5)
`ab$a` <- letters[1:3]
get("ab$a")
# [1] "a" "b" "c"

How to split a string in r by a delimiter and discard the last two items?

I have a string separated by _ and I want to get rid of the last two elements. For example, from A_B_C_D I want to return A_B, and from A_B_C_D_E I want A_B_C. I have tried str_split_fixed from stringr:
my_string <- "A_B_C_D"
x <- str_split_fixed(my_string,"_",3)
but it returns "A" "B" "C_D" instead of "A_B" "C" "D", otherwise I could have done head(x,-2) to get A_B
Is there a better way than
paste(head(unlist(strsplit(my_string,"_")),-2),collapse="_")
How about using a regex:
sub('(_[A-Z]){2}$', '', 'A_B_C_D')
Where the number 2 is the length you want to drop.

Resources