R capitalize roman numerals only in string - r

If I have a vector with the following:
people <- c("PERSON I", "PERSON II", "PERSON III", "PERSON IV")
To turn them into title case, I used the following:
people <- str_to_title(people)
Now I have the following
> people
[1] "Person I" "Person Ii" "Person Iii" "Person Iv"
What do I to capitalize the Roman numerals only like this?
"Person I" "Person II" "Person III" "Person IV"
Or is there a way to convert the all-caps into the last vector without using str_to_title?

Here is a base R option using substr, sub, and paste:
people <- c("PERSON I", "PERSON II", "PERSON III", "PERSON IV")
people <- paste0(substr(people, 1, 1), tolower(sub("^\\S(\\S+).*$", "\\1", people)),
" ", sub("^.*?(\\S+)$", "\\1", people))
people
[1] "Person I" "Person II" "Person III" "Person IV"

If it's always the same format, i.e. two words separated by a space, you can use the following:
sapply(strsplit(people,' '), function(X){paste(str_to_title(X[1]), X[2]) })

Related

How to order an array of character according to the numbers that it contains

I would like to order the following vector of chr:
x=c("class 1", "class 2", "class 4", "class 7", "class 5", "class 3", "class 6",
"class 10", "class 9", "class 11", "class 8", "class 12", "class 21")
according to the numbers that appear in the characters. E.g., in this case, the desired result is:
class 1, class 2, class 3, class 4, class 5, class 6, class 7, class 8, class 9, class 10, class 11
class 12, class 21
I tried with:
x[order(x)]
but obtaining a different result:
> x[order(x)]
[1] "class 1" "class 10" "class 11" "class 12" "class 2" "class 21" "class 3"
[8] "class 4" "class 5" "class 6" "class 7" "class 8" "class 9"
As mentioned, it is sorting alphabetically, and not considering the numeric value contained within the string.
There are a number of options to address this:
library(stringr)
str_sort(x, numeric = TRUE)
[1] "class 1" "class 2" "class 3" "class 4" "class 5" "class 6" "class 7" "class 8" "class 9" "class 10" "class 11" "class 12" "class 21"
Or
library(gtools)
mixedsort(x)
[1] "class 1" "class 2" "class 3" "class 4" "class 5" "class 6" "class 7" "class 8" "class 9" "class 10" "class 11" "class 12" "class 21"
Or without using another package, strip away "class" and use the numeric result to sort:
values <- as.numeric(gsub("class", "", x))
x[order(values)]
[1] "class 1" "class 2" "class 3" "class 4" "class 5" "class 6" "class 7" "class 8" "class 9" "class 10" "class 11" "class 12" "class 21"
That's because x is a vector of class "character" and elements (strings) are ordered alphabetically. Extract number from the strings an convert them to numeric type
y <- as.integer(substr(x, 7,8))
# y has the same order that x
# sort integers (numeric order) and match positions of unordered intergers
# match returns indexes of y ordered by sort(y)
x[match(y, sort(y))]
# Output is:
# [1] "class 1" "class 2" "class 7" "class 6" "class 5" "class 4" "class 3" "class 11" "class 9" "class 8" "class 10" "class 12"
# [13] "class 21"

Insert or replace multiple matches of the same string with a running counter

I have a RIS (text) file that looks roughly likes this:
mylist <- c("TI - a", "AU - b", "ER -", " ",
"TI - c", "AU - d", "ER -", " ",
"TI - e", "AU - f", "ER -")
I would like to insert a running ID tag as follows
mylist_with_ids <- c("TI - a", "AU - b", "ID - 1", "ER -", " ",
"TI - c", "AU - d", "ID - 2", "ER -", " ",
"TI - e", "AU - f", "ID - 3", "ER -")
My original approach was to write a stringr::str_replace loop, where I generate the ID list beforehand.
cc_id_replace <- paste0("ID - ", 1:3, "\nER -")
for (i in 1:3) {
mylist_with_ids <- str_replace(mylist, "^ER -", cc_id_replace[i])
}
Of course, this doesn't work for more than one reason. What might be a better way?
(There exist many regex and multiple array questions, but I couldn't figure out an answer so far.)
You can try:
list[list == "ER -"] <- paste("ID -", seq_along(which(list == "ER -")), "\nER -")
I think run-length encoding can be used here.
(BTW: I don't like using list as a variable name, since it is such a frequently used R function. While R knows well which you mean when referenced, it is feasible that this can be fooled, and troubleshooting that will be problematic. So I've named it mylist here.)
mylist <- c("TI - a", "AU - b", "ER -", " ",
"TI - c", "AU - d", "ER -", " ",
"TI - e", "AU - f", "ER -")
non_ER_runs <- rle(mylist == "ER -")
non_ER_runs
# Run Length Encoding
# lengths: int [1:6] 2 1 3 1 3 1
# values : logi [1:6] FALSE TRUE FALSE TRUE FALSE TRUE
RLE tells us how many are in each category. For us, the category is "matches and does not match". The $values vector here tells us that the first elements do not match (FALSE), and there are two of them. The second batch does match (TRUE) and is one long. Etc.
inds <- cumsum(non_ER_runs$lengths)
newlist <- mapply(function(a,b) mylist[a:b], c(1, 1+head(inds, n=-1)), inds)
newlist
# [[1]]
# [1] "TI - a" "AU - b"
# [[2]]
# [1] "ER -"
# [[3]]
# [1] " " "TI - c" "AU - d"
# [[4]]
# [1] "ER -"
# [[5]]
# [1] " " "TI - e" "AU - f"
# [[6]]
# [1] "ER -"
Okay, so we've broken each batch into its own vector. Using the return from rle again, we can choose just the elements where we want to append something:
newlist[ non_ER_runs$values ]
# [[1]]
# [1] "ER -"
# [[2]]
# [1] "ER -"
# [[3]]
# [1] "ER -"
Map(function(vec, vec2) c(vec, vec2),
newlist[ non_ER_runs$values ],
sprintf("ID - %i", seq_along(newlist[ non_ER_runs$values ])))
# [[1]]
# [1] "ER -" "ID - 1"
# [[2]]
# [1] "ER -" "ID - 2"
# [[3]]
# [1] "ER -" "ID - 3"
Now it's just a matter of replacing the list elements with the new elements, then unlisting it.
newlist[ non_ER_runs$values ] <-
Map(function(vec, vec2) c(vec, vec2),
newlist[ non_ER_runs$values ],
sprintf("ID - %i", seq_along(newlist[ non_ER_runs$values ])))
newlist <- unlist(newlist)
newlist
# [1] "TI - a" "AU - b" "ER -" "ID - 1" " "
# [6] "TI - c" "AU - d" "ER -" "ID - 2" " "
# [11] "TI - e" "AU - f" "ER -" "ID - 3"
ris <- c("TI - a", "AU - b", "ER -", " ",
"TI - c", "AU - d", "ER -", " ",
"TI - e", "AU - f", "ER -")
Another suggestion using dirty for loops ;)
1.Find position to insert ID element before (here using a bit of regex). Use pos vector to generate right number of IDs:
pos <- grep("^ER", ris)
ids <- paste0("ID = ", seq_along(pos))
2.Loop through all positions, insert, paste, repeat (and update pos):
for (i in seq_along(pos)) {
ris <- c(ris[1:(pos[i]-1)], ids[i], ris[pos[i]:length(ris)] )
pos <- pos + 1
}
ris
Returns:
[1] "TI - a" "AU - b" "ID = 1" "ER -"
[5] " " "TI - c" "AU - d" "ID = 2"
[9] "ER -" " " "TI - e" "AU - f"
[13] "ID = 3" "ER -"

Convert a dataframe to large character

I have a dataframe but need to convert it to a large character. Here is an example of the dataframe structure:
texts <- c("TEXT 1", "TEXT 2", "TEXT 3")
data <- data.frame(texts)
I need this structure:
[1] "TEXT 1" "TEXT 2" "TEXT 3"
I already tried using function as.character() , but it does not work as it converts all the lines to a single line.
You can transpose and concatenate, i.e.
c(t(data))
#[1] "TEXT 1" "TEXT 2" "TEXT 3"

concatenate two lists of string in r

Here is my sample:
a = c("a","b","c")
b = c("1","2","3")
I need to concatenate a and b automatically. The result should be "a 1","a 2","a 3","b 1","b 2","b 3","c 1","c 2","c 3".
For now, I am using the paste function:
paste(a[1],b[1])
I need an automatic way to do this. Besides writing a loop, is there any easier way to achieve this?
c(outer(a, b, paste))
# [1] "a 1" "b 1" "c 1" "a 2" "b 2" "c 2" "a 3" "b 3" "c 3"
Other options are :
paste(rep.int(a,length(b)),b)
or :
with(expand.grid(b,a),paste(Var2,Var1))
You can do:
c(sapply(a, function(x) {paste(x,b)}))
[1] "a 1" "a 2" "a 3" "b 1" "b 2" "b 3" "c 1" "c 2" "c 3"
edited paste0 into paste to match OP update

Paste together two character vectors of different lengths

I have two different character vectors in R, that I want to combine to use for column names:
groups <- c("Group A", "Group B")
label <- c("Time","Min","Mean","Max")
When I try using paste I get the result:
> paste(groups,label)
[1] "Group A Time" "Group B Min" "Group A Mean" "Group B Max"
Is there a simple function or setting that can paste these together to get the following output?
[1] "Group A Time" "Group A Min" "Group A Mean" "Group A Max" "Group B Time"
[6] "Group B Min" "Group B Mean" "Group B Max"
Probably outer helps your work. Try this:
> c(t(outer(groups, label, paste)))
[1] "Group A Time" "Group A Min" "Group A Mean" "Group A Max" "Group B Time" "Group B Min"
[7] "Group B Mean" "Group B Max"
outer
outer(groups, labels, FUN=paste)
Since it's two element array, I would do
c(paste(groups[1],label),paste(groups[2],label))

Resources