I have nested list (("H" "E" "L" "L" "O") ("T" "H" "I" "S") ("I" "S") ("A") ("T" "E" "S" "T")) I want to substitute each string in the list to another string using substitute but it does not work my code is:
(substitute "H" "W" paragraph)
paragraph is the name of the nested list.
Use SUBST:
(setq new-list (subst "W" "H" old-list :test #'string=))
If you need to modify the list in place, use NSUBST and assign the result back to the original variable:
(setq old-list (nsubst "W" "H" old-list :test #'string=))
Related
I am attempting to create a 5000 word vector composed of 500 blocks of 10 words. One block is drawn from sampling with replacement from a fixed list of animals, and this block is to alternate with a fixed list of foods. The following code yields one iteration of what I need:
anim<- data.frame(cbind(stim=list.sample(animals$WORD, 10, replace=T), cond="animal"))
food <- data.frame(cbind(stim=list.sample(foods$WORD, 10, replace=T), cond="food"))
both <- data.frame(rbind(anim, food))
This yields output as follows:
I just cannot figure out how to repeat this procedure 499 more times to create the total vector I need -- I will be running semantic distances between clusters to determine whether I can autosegment the boundaries between foods and animals. I attempted a repeat loop to no avail
Thanks for any ideas!
Since you did not provide any reproducible data, we will assume that LETTERS are food and letters are animals. This line of code generates the vector you specified. Here we are only using batches of 5 to illustrate the process:
result <- as.vector(replicate(5, c(sample(LETTERS, 5, replace=TRUE), sample(letters, 5, replace=TRUE))))
result
# [1] "H" "O" "T" "K" "J" "m" "c" "s" "u" "c" "P" "Y" "V" "U" "Y" "p" "u" "q" "k" "l" "B" "H" "U" "F" "K" "h" "v" "g"
# [29] "c" "d" "X" "F" "R" "N" "U" "v" "t" "u" "q" "x" "N" "E" "G" "Q" "L" "d" "a" "v" "e" "a"
I found this function in R that can generate the "power set" for a set of elements:
f <- function(set) {
n <- length(set)
masks <- 2^(1:n-1)
lapply( 1:2^n-1, function(u) set[ bitwAnd(u, masks) != 0 ] )
}
results = f((LETTERS[1:5])
results = sapply(results, paste, collapse = " ")
In a previous question (Subsetting Elements in a "Hypothetical" List), I learned how to interact with very large "power sets" that the computer can not load into memory. For example - suppose I wanted to make the "power set" for all 26 letters in the English alphabet (this set would contain 2^26 = 67108864 elements). I could find out the "13626980"th element in this list without actually generating the list (since it would be impossible to generate/store such a big list):
LETTERS[bitwAnd(13626980, 2^(1:26-1)) != 0]
[1] "C" "F" "G" "J" "K" "L" "N" "O" "P" "Q" "R" "S" "T" "W" "X"
I had the following question : Is it possible to do the "opposite" of this task?
For example, given the number "13626980" - can some function determine which sequence of letters ("C" "F" "G" "J" "K" "L" "N" "O" "P" "Q" "R" "S" "T" "W" "X") corresponds to? Is there some hypothetical function like:
#input
> hypothetical_function(c("C" "F" "G" "J" "K" "L" "N" "O" "P" "Q" "R" "S" "T" "W" "X"))
#output
13626980
Is this possible?
Thank you!
I can't seem to figure out a seemingly simple task:
I have phonemic transcriptions of speech. To count the phonemes I want to split the strings into single phonemic segments. Unfortunately, the characters used for the phonemes are not 100% different from each other. For example, a long /i/ sound is transcribed iː (NB: ː is not a colon but a special char!) whereas a short /i/ sound may occasionally be transcribed i. This double use of the i in two distinct phonemes causes a problem in the split operation:
Test data:
test1 <- "dɪd ɛnɪbɒdi liːv ðeə glɑːsɪz hɪə lɑːst wiːk sʌmbədi dɪd"
A vector of all phonemes:
phonemes <- c("ɪə","eɪ","ʊə","ɔɪ","aɪ","eə","aʊ","əʊ", # diphthongs
"iː","uː","ɜː","ɔː","ɑː", # long vowels
"ə","ɪ", "ɛ","ɒ","ʌ","æ","i","ʊ", # short vowels
"j","w", # semi-vowels
"r","l", # approximants
"n","m","ŋ", # nasals
"f","v","θ","ð","s","z","ʃ","ʒ","h", # fricatives
"ʧ","ʤ", # affricates
"p","b","t","d","k","g") # plosives
The alternation pattern:
phonemes_pattern <- paste0("(", paste0(phonemes, collapse = "|"), ")")
The splitting operation:
str_split(gsub(" ", "", test1), paste0("(?<=", phonemes_pattern, ")"))
[[1]]
[1] "d" "ɪ" "d" "ɛ" "n" "ɪ" "b" "ɒ" "d" "i" "l" "i" "ː" "v" "ð" "eə" "g" "l" "ɑː" "s" "ɪ" "z" "h" "ɪ" "ə" "l" "ɑː" "s" "t"
[30] "w" "i" "ː" "k" "s" "ʌ" "m" "b" "ə" "d" "i" "d" "ɪ" "d" ""
The result is correct except for the long /i/ sound where the two characters iand ː are also separated. Can anybody help with this?
Why not extract the phonemes instead ?
phonemes_pattern <- paste0(phonemes, collapse = "|")
stringr::str_extract_all(test1, phonemes_pattern)[[1]]
#[1] "d" "ɪ" "d" "ɛ" "n" "ɪ" "b" "ɒ" "d" "i" "l"
#[12] "iː" "v" "ð" "eə" "g" "l" "ɑː" "s" "ɪ" "z" "h"
#[23] "ɪə" "l" "ɑː" "s" "t" "w" "iː" "k" "s" "ʌ" "m"
#[34] "b" "ə" "d" "i" "d" "ɪ" "d"
Or in base R :
regmatches(test1, gregexpr(phonemes_pattern, test1))[[1]]
Just changing the lookbehind to a lookahead makes it work
# using the unchanged phonemes vector
phonemes_pattern <- paste0(phonemes, collapse = "|")
str_split(gsub(" ", "", test1), paste0("(?=", phonemes_pattern, ")"))
This question already has answers here:
Split a character vector into individual characters? (opposite of paste or stringr::str_c)
(4 answers)
Closed 4 years ago.
In R I have a string like this:
'hello'
How do I convert it to a character vector like that:
[1] "h" "e" "l" "l" "o"
With stringr:
stringr::str_split("hello","")[[1]]
[1] "h" "e" "l" "l" "o"
Found another possible solution, although this is probably the worst approach:
substring("hello", seq(1,nchar("hello")), seq(1,nchar("hello")))
[1] "h" "e" "l" "l" "o"
While this might not be the most performant solution, this works as expected:
> unlist(strsplit('hello', ''))
[1] "h" "e" "l" "l" "o"
See the documentation of unlist and strsplit for further options.
How do I scan for individual chars in a .txt for R? From my understanding, scan uses whitespace as separators, but if i want to use white space as something to scan for in R how do i do this?
ie (I want to scan the string "Hello World") how do i get H,e,l,l,o, ,W,o,r,l,d ?
strsplit would also be your friend here:
test <- readLines(textConnection("Hello world
Line two"))
strsplit(test,"")
> strsplit(test,"")
[[1]]
[1] "H" "e" "l" "l" "o" " " "w" "o" "r" "l" "d"
[[2]]
[1] "L" "i" "n" "e" " " "t" "w" "o"
And unlisted as suggested by #Thilo...
> unlist(strsplit(test,""))
[1] "H" "e" "l" "l" "o" " " "w" "o" "r" "l" "d" "L" "i" "n" "e" " " "t" "w" "o"
I would go a two-step approach: First read the file as plain text with readLines and then split the single lines to vectors of characters:
lines <- readLines("test.txt")
characterlist <- lapply(a, function(x) substring(x, 1:nchar(x), 1:nchar(x)))
Note that this approach does not return a well formed matrix or data.frame, but a list.
Depending on what you want to do, there might be a few different modifications:
unlist(characterlist)
gives you a vector of all characters in a row. If your textfile is so well behaved that you have exactly the same number of characters in each line, you may just add simplify=T to lapply and hopfully will get a matrix of your characters.