Compressing a string in R [duplicate] - r

This question already has an answer here:
Collapse vector to string of characters with respective numbers of consequtive occurences
(1 answer)
Closed 4 years ago.
I have a large sequence of strings containing only the following characters
"M", "D", "A"
such as:
"M" "M" "A" "A" "D" "D" "M" "D" "A"
and I would like to compress it to:
M2A2D2M1D1A1
in R. Googling has led me to this (a java solution) but before implementing it, it would be interesting to check if I can find something ready online. Thanks!

R function rle() is your friend.
testVector <- sample(c("M", "D", "A"), 20, replace=T)
res <- rle(testVector)
compressedString <- paste(res$values, res$lengths, collapse = "", sep = "")

Related

Two Column R Dataframe to Named LIst [duplicate]

This question already has answers here:
Named List To/From Data.Frame
(4 answers)
Closed 2 years ago.
I am trying to convert a two-column dataframe to a named list. There are several solutions on StackOverflow where every value in the first column becomes the 'name', but I am looking to collapse the values in column 2 into common values in column 1.
For example, the list should look like the following:
# Create a Named list of keywords associated with each file.
fileKeywords <- list(fooBar.R = c("A","B","C"),
driver.R = c("A","F","G"))
Where I can retrieve all keywords for "fooBar.R" using:
# Get the keywords for a named file
fileKeywords[["fooBar.R"]]
My data frame looks like:
df <- read.table(header = TRUE, text = "
file keyWord
'fooBar.R' 'A'
'fooBar.R' 'B'
'fooBar.R' 'C'
'driver.R' 'A'
'driver.R' 'F'
'driver.R' 'G'
")
I'm sure there is a simple solution that I am missing.
You could use unstack:
as.list(unstack(rev(df)))
$driver.R
[1] "A" "F" "G"
$fooBar.R
[1] "A" "B" "C"
This is equivalent to as.list(unstack(df, keyWord~file))
We can use stack in base R
stack(fileKeywords)[2:1]
if it is the opposite, then we can do
with(df, tapply(keyWord, file, FUN = I))
-output
#$driver.R
#[1] "A" "F" "G"
#$fooBar.R
#[1] "A" "B" "C"

Split a string into a vector of single character strings [duplicate]

This question already has answers here:
Split a character vector into individual characters? (opposite of paste or stringr::str_c)
(4 answers)
Closed 4 years ago.
I have a string like "abcde" and want it split into a vector like
> c("a", "b", "c", "d", "e")
[1] "a" "b" "c" "d" "e"
I found one way that I'll post as an answer but I am hoping someone else has a simpler way to do this, either in base R or using a package.
Using the stringr package:
library(stringr)
as.vector(str_split_fixed(x, pattern = "", n = nchar(x)))
[1] "a" "b" "c" "d" "e"
str_split_fixed produces a matrix that has to be coerced into a vector.

R - List manipulation element concatenation

Assume I have a list with 5 elements:
list <- list("A", "B", "C", "D", c("E", "F"))
I am trying to return this to a simple character vector using purrr with the need to combine list elements that have two strings into one, separated by a delimiter such as '-'. The output should look like this:
chr [1:5] "A" "B" "C" "D" "E-F"
I've tried a ton of approaches including paste, paste0, str_c and where I am getting hung up is it seems that map applies the function to each individual string of an element of a list and not the group of strings of an element (when there are more than one). The closes I've gotten is:
list2 <- unlist(map(list, str_flatten))
str(list2)
This returns:
chr [1:5] "A" "B" "C" "D" "EF"
where I need a hyphen between E and F:
chr [1:5] "A" "B" "C" "D" "E-F"
When I try to pass a function as a parenthetiinton to str_flatten(), such as str_flatten(list, collapse = "-"), it doesn't work. The big problem is I can't figure out what string to pass as an argument in str_flatten to group two strings of a given element of a list.
You almost had it. Try
library(purrr)
library(stringr)
unlist(map(lst, str_flatten, collapse = "-"))
#[1] "A" "B" "C" "D" "E-F"
You could also use map_chr
map_chr(lst, str_flatten, collapse = "-")
Without additional packages and with thanks to #G.Grothendieck you could do
sapply(lst, paste, collapse = "-")
data
lst <- list("A", "B", "C", "D", c("E", "F"))
We can also use map_chr and paste.
library(purrr)
lst <- list("A", "B", "C", "D", c("E", "F"))
map_chr(lst, ~paste(.x, collapse = "-"))
# [1] "A" "B" "C" "D" "E-F"

Replace text values in a vector [duplicate]

This question already has answers here:
Dictionary style replace multiple items
(11 answers)
Closed 5 years ago.
Here's my data :
dataset <- c("h", "H", "homme", "masculin", "f", "femme", "épouse")
How can I replace text values of the vector like :
"femme" -> "f"
"épouse" ->"f"
"Homme"-> "h"
"masculin" -> "h"
What I tried for "femme" -> "f"
test_out <- sapply(dataset, switch,
"f"="femme")
test_out
Expected result :
"h" "h" "h" "masculin" "f" "f" "f"
Try gsub with regular expressions:
dataset = gsub("^((?!h).*)$", "f", gsub("^((h|H|m).*)$", "h", dataset), perl=TRUE)

return number of specific element of vector based of its name [duplicate]

This question already has answers here:
Convert letters to numbers
(5 answers)
Closed 5 years ago.
I need to return number of element in vector based on vector element name. Lets say i have vector of letters:
myLetters=letters[1:26]
> myLetters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
and what I intent to do is to create/find function that returns me the number of element when called for example:
myFunction(myLetters["b"])
[1] 2
myFunction(myLetters["z"])
[1]26
In summary I need a way to refer to excel columns by writing letters of a column (A,B,C later maybe even AA or further) and to get the number.
If you want to refer to excel columnnames, you could create a reference vector with all possible excel column names:
eg1 <- expand.grid(LETTERS, LETTERS)
eg2 <- expand.grid(LETTERS, LETTERS, LETTERS)
excelcols <- c(LETTERS, paste0(eg1[[2]], eg1[[1]]), paste0(paste0(eg2[[3]], eg2[[2]], eg2[[1]])))
After which you can use which:
> which(excelcols == 'A')
[1] 1
> which(excelcols == 'AB')
[1] 28
> which(excelcols == 'ABC')
[1] 731
If you need to find the number of times specific letter occurs then the following should work:
myLetters = c("a","a", "b")
myFunction = function(myLetters, findLetter){
length(which(myLetters==findLetter))
}
Let find how many times "a" occurs in myLetters:
myFunction(myLetters, "a")
# [1] 2

Resources