How to split a string into its character vector? [duplicate] - r

This question already has answers here:
Split a character vector into individual characters? (opposite of paste or stringr::str_c)
(4 answers)
Closed 4 years ago.
In R I have a string like this:
'hello'
How do I convert it to a character vector like that:
[1] "h" "e" "l" "l" "o"

With stringr:
stringr::str_split("hello","")[[1]]
[1] "h" "e" "l" "l" "o"
Found another possible solution, although this is probably the worst approach:
substring("hello", seq(1,nchar("hello")), seq(1,nchar("hello")))
[1] "h" "e" "l" "l" "o"

While this might not be the most performant solution, this works as expected:
> unlist(strsplit('hello', ''))
[1] "h" "e" "l" "l" "o"
See the documentation of unlist and strsplit for further options.

Related

Unexpected behaviour when indexing vector with y[length(x) + 1:length(y)] [duplicate]

This question already has an answer here:
Order of operator precedence when using ":" (the colon)
(1 answer)
Closed 1 year ago.
I have a very specific problem, where I need to use the length of some shorter array, to subset a longer array. I have presented a toy example below. I do not understand why it does not work to just add 1 when indexing, and I don't understand why it returns the long array filled with NA.
x <- letters[1:5]
x
# [1] "a" "b" "c" "d" "e"
y <- letters[1:10]
y
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
y[length(x): length(y)]
# [1] "e" "f" "g" "h" "i" "j"
y[length(x) + 1: length(y)]
# [1] "f" "g" "h" "i" "j" NA NA NA NA NA
y[(length(x) + 1): length(y)]
# [1] "f" "g" "h" "i" "j"
Using y[length(x): length(y)] almost solves my probem, but the resulting array is too long, I dont want to return 'e', I have to start from one more index to the right. I thought I could solve this by using y[length(x) + 1: length(y)], but that gives me, for some reasons, a vector of the same length as y, and fills NA in the end. I found that using ( solved the problem, but again, I don't understand why, and what is happening when I don't use (, if someone could help me?
The colon operator comes before addition in the order of operations. Using the parentheses tells R that you want the value of length + 1 as the first number in the sequence.
So, as you mention, the following should work:
y[(length(x) + 1): length(y)]

Splitting character vector in my data frame by "|" not working

Working on Tidy Tuesday's data set horror_movies.csv and I cannot see how to split the genres column. I tried:
fieldList <- strsplit(df$genres, $"|")
Here is a sample of the output:
[1] "D" "r" "a" "m" "a" "|" " " "H" "o" "r" "r" "o" "r" "|" " " "S" "c" "i" "-" "F" "i"
[22] "|" " " "T" "h" "r" "i" "l" "l" "e" "r"
For some reason this splits my elements into individual characters. Here is a glimpse of this column so you can see how it is structured in the data frame:
$ genres <chr> "Drama| Horror| Thriller", "Horror", "Horror", "Comedy| Horror…
Is the | character special in R? What am I missing?
In R '|' is a logical operator meaning 'OR'.
You can do the following to solve the error, turn fixed=TRUE, this is set to FALSE by default.
fieldList <- strsplit(df$genres, $"|", fixed=TRUE)
Below is the documentation of the above function strsplit:
https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/strsplit

return number of specific element of vector based of its name [duplicate]

This question already has answers here:
Convert letters to numbers
(5 answers)
Closed 5 years ago.
I need to return number of element in vector based on vector element name. Lets say i have vector of letters:
myLetters=letters[1:26]
> myLetters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
and what I intent to do is to create/find function that returns me the number of element when called for example:
myFunction(myLetters["b"])
[1] 2
myFunction(myLetters["z"])
[1]26
In summary I need a way to refer to excel columns by writing letters of a column (A,B,C later maybe even AA or further) and to get the number.
If you want to refer to excel columnnames, you could create a reference vector with all possible excel column names:
eg1 <- expand.grid(LETTERS, LETTERS)
eg2 <- expand.grid(LETTERS, LETTERS, LETTERS)
excelcols <- c(LETTERS, paste0(eg1[[2]], eg1[[1]]), paste0(paste0(eg2[[3]], eg2[[2]], eg2[[1]])))
After which you can use which:
> which(excelcols == 'A')
[1] 1
> which(excelcols == 'AB')
[1] 28
> which(excelcols == 'ABC')
[1] 731
If you need to find the number of times specific letter occurs then the following should work:
myLetters = c("a","a", "b")
myFunction = function(myLetters, findLetter){
length(which(myLetters==findLetter))
}
Let find how many times "a" occurs in myLetters:
myFunction(myLetters, "a")
# [1] 2

Use sample() without replacement multiple times with increasing sample size in R

I want to take "random" samples from a vector called data but with increasing size and without replacement.
To illustrate my point data looks for example like:
data<-c("a","s","d","f","g","h","j","k","l","x","c","v","b","n","m")
What I need is to get different sampling vectors with increasing sampling size (starting with size=2) for example by 2 but without duplicates between the different vectors and store everything into a list so that the result would look something like this:
sample_1<-c("s","d")
sample_2<-c("s","d","a","f")
sample_3<-c("s","d","a","f","m","n")
sample_4<-c("s","d","a","f","m","n","l","c")
sample_5<-c("s","d","a","f","m","n","l","c","j","x")
sample_6<-c("s","d","a","f","m","n","l","c","j","x","v","k")
sample_7<-c("s","d","a","f","m","n","l","c","j","x","v","k","g","b")
sample_8<-c("s","d","a","f","m","n","l","c","j","x","v","k","g","b","h")
samples<-list(sample_1,sample_2,sample_3,sample_4,sample_5,sample_6,sample_7,sample_8)
What i have so far is:
samples<-sapply(seq(from=2, to=length(data), by=2), function(i) sample(data,size=i,replace=F),simplify=F,USE.NAMES=T )
What does not work is to have the increasing sample size but keeping the samples of the previous steps and to have a last list element with all observations.
Is something like this possible?
I'm not sure whether I understood you correctly, but perhaps you only need to scramble the data once:
data = letters
data_random = sample(data)
sapply(seq(from=2, to=length(data), by=2),
function (x) data_random[1:x],
simplify = FALSE)
After your comments on other answer I think I get what you want to achieve, so extending my previous code I end up with:
data<-c("a","s","d","f","g","h","j","k","l","x","c","v","b","n","m")
set.seed(123)
nbitems=length(data)/2+length(data)%%2
results=vector("list",nbitems)
results[[1]] <- sample(data,2) # get first sample
for (i in 2:nbitems) { # Loop for each result
samplesavail <- data[!data %in% results[[i-1]]] # Reduce the samples available
results[[i]] <- c(results[[i-1]], sample( samplesavail, min( length(samplesavail), 2) ) ) # concatenate a new sample, size depends on step and remaining samples available.
}
Hope this match your intended use:
> results
[[1]]
[1] "n" "f"
[[2]]
[1] "n" "f" "a" "g"
[[3]]
[1] "n" "f" "a" "g" "m" "v"
[[4]]
[1] "n" "f" "a" "g" "m" "v" "x" "l"
[[5]]
[1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j"
[[6]]
[1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j" "k" "h"
[[7]]
[1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j" "k" "h" "d" "s"
[[8]]
[1] "n" "f" "a" "g" "m" "v" "x" "l" "b" "j" "k" "h" "d" "s" "c"
Previous approach:
If I understood you well (but far unsure):
data<-c("a","s","d","f","g","h","j","k","l","x","c","v","b","n","m")
set.seed(123) # fix the seed for repro of answer, remove in real case
nbitems=length(data)/2+length(data)%%2 # Get how much entries we should have when stepping by 2
results=vector("list",nbitems) # preallocate the list (as we'll start by end)
results[[nbitems]] = sample(data,length(data)) # sample the datas
for (i in nbitems:2) {
results[[i-1]] <- results[[i]][1:(length(results[[i]]) - 2)] # for each iteration, take down the 2 last entries.
}
This give a single entry as first result.
Just noticed this is the same idea as #sbstn answer but with a more complicated backward approach, posting in case it can have some value.

Scan without spaces in R?

How do I scan for individual chars in a .txt for R? From my understanding, scan uses whitespace as separators, but if i want to use white space as something to scan for in R how do i do this?
ie (I want to scan the string "Hello World") how do i get H,e,l,l,o, ,W,o,r,l,d ?
strsplit would also be your friend here:
test <- readLines(textConnection("Hello world
Line two"))
strsplit(test,"")
> strsplit(test,"")
[[1]]
[1] "H" "e" "l" "l" "o" " " "w" "o" "r" "l" "d"
[[2]]
[1] "L" "i" "n" "e" " " "t" "w" "o"
And unlisted as suggested by #Thilo...
> unlist(strsplit(test,""))
[1] "H" "e" "l" "l" "o" " " "w" "o" "r" "l" "d" "L" "i" "n" "e" " " "t" "w" "o"
I would go a two-step approach: First read the file as plain text with readLines and then split the single lines to vectors of characters:
lines <- readLines("test.txt")
characterlist <- lapply(a, function(x) substring(x, 1:nchar(x), 1:nchar(x)))
Note that this approach does not return a well formed matrix or data.frame, but a list.
Depending on what you want to do, there might be a few different modifications:
unlist(characterlist)
gives you a vector of all characters in a row. If your textfile is so well behaved that you have exactly the same number of characters in each line, you may just add simplify=T to lapply and hopfully will get a matrix of your characters.

Resources