words = "Fine and Cloudy"
vowels = 'aeiouAEIOU'
my_words = [words.index(c) for c in words if c not in vowels and words.count(c) == 1]
print(my_words)
I'm just don't understand why the output is [0, 9, 10, 14] and how does this simple code works?
Can somebody explain in detail? Please!
Thanks in advance.
It prints the position of those letters in the string "Fine and Cloudy" that are not vowels and that occur exactly once. Those are the "F" (position 0), "C" (position 9), "l" (position 10), and the "y" (position 14).
Related
I have a vector that contains series of texts and numbers, like:
t <- c("A", 1:3, "A", 1:4, "A", 1:3)
t
#> [1] "A" "1" "2" "3" "A" "1" "2" "3" "4" "A" "1" "2" "3"
Created on 2022-08-06 by the reprex package (v2.0.1)
That is, the actual data is taken from a pdf, with the data frame collapsed into a single column vector, and the wrap length is uneven for some reason (probably because of the cell merging).
To process this data efficiently, I want to know the length from "A" to next "A" or end. In this example the answer would be 3, 4, 3 (Edit: sorry for a simple mistake, it would be 4, 5, 4).
I have tried many different methods but can't find one that works. Does anyone know of a better way?
An alternative using rle (run-length encoding)
with(rle(t == "A"), subset(lengths, !values))
#> [1] 3 4 3
You want the number of elements
(1) between adjacent "A"s;
(2) from the last "A" (excluding it) to the end.
We can use either of the following:
diff(c(which(t == "A"), length(t) + 1)) - 1
#[1] 3 4 3
diff(which(c(t, "A") == "A")) - 1
#[1] 3 4 3
Essentially we pad an "A" at the end to turn (2) into (1). If the last element of t happens to be an "A", the last value in the result will be 0.
Extension:
If you further want to know the number of elements from the beginning to the first "A" (excluding it), we can pad a leading "A":
diff(c(0, which(t == "A"), length(t) + 1)) - 1
#[1] 0 3 4 3
diff(which(c("A", t, "A") == "A")) - 1
#[1] 0 3 4 3
Here, the first value is 0, because the first element of t happens to be an "A".
I am looking to generate multiple, random lists of letters in which letters in a certain position repeat.
Here is an example of the "2-back" rule. It is a list of random letters, but 25% of the time the letter matches the letter that came 2 before it. The ones that match are 1 and the ones that do not are 0
R 0
Y 0
M 0
Y 1
L 0
C 0
F 0
G 0
S 0
G 1
I want to be able to specify the length of the list as well as the percentage of letters that do actually follow the pattern and repeat. The rest of the letters would be randomly generated.
I don't know if I have explained the problem well. Can someone potentially help me with this?
Since this process is state dependent I think this might be a rare case where it's OK to use a for/while loop, which is normally very slow compared to the alternatives. You still want to pre-allocate the vectors by creating them at their total length to start with. You can do something like:
total_length = 20
back_prob = 0.25
two_back_seq = character(total_length)
types = character(total_length)
# Generate the random results used to determine if we sample randomly
# or repeat the 2-back element
rolls = runif(total_length)
current_index = 1
while (current_index <= total_length) {
if ((rolls[current_index] > back_prob) || (current_index < 3)) {
two_back_seq[current_index] = sample(letters, 1)
types[current_index] = "random"
} else {
two_back_seq[current_index] = two_back_seq[current_index - 2]
types[current_index] = "two back"
}
current_index = current_index + 1
}
An example result:
> two_back_seq
[1] "u" "x" "u" "b" "x" "w" "t" "w" "o" "b" "o" "l" "m" "m" "m" "u" "d" "t" "k" "t"
> types
[1] "random" "random" "two back" "random" "random" "random"
"random" "two back" "random" "random" "two back" "random" "random"
[14] "random" "two back" "random" "random" "random" "random" "random"
Let's say you want to make the pattern for a single letter, a. Start by generating some letters, using prob to set the relative frequencies of the letters:
x <- sample(letters[1:5], 500, prob=c(.01,.09,.2,.3,.3), replace=TRUE)
Now insert extra letters to make the pattern at intervals:
x <- ifelse(x[-1:-3]=="a", "a", x[seq(1,length(x)-3)])
Does that help?
(To check which elements are equal to a I used which(x == "a"))
I am trying to convert a snippet of python code in R, but I don't know how to make it happen.
In python we can do:
## dictionary
a_list = {'red':23, 'black':12,'white':4,'orange':79}
## sort by key
dict(sorted(a_list.items()))
{'black': 12, 'orange': 79, 'red': 23, 'white': 4}
## sort by values
sorted(a_list.items(), key=lambda x: x[1])
[('white', 4), ('black', 12), ('red', 23), ('orange', 79)]
For this question, so, I have a:
a_list <- list(red=23, black=12, white = 4, orange=79)
I want to sort this list in two ways, such that the output is:
output 1 (sorted by keys): list(black=12, orange=79, red=23, white = 4)
output 2 (sorted by values): list(white = 4,black=12, red=23,orange=79)
How can I do this ?
One option is order on the names of 'a_list' for the first case
a_list[order(names(a_list))]
#$black
#[1] 12
#$orange
#[1] 79
#$red
#[1] 23
#$white
#[1] 4
For second, as the list elements are of length 1, unlist and order on that
a_list[order(unlist(a_list))]
#$white
#[1] 4
#$black
#[1] 12
#$red
#[1] 23
#$orange
#[1] 79
I have a numeric vector that I've imported from excel that is formatted in a "weird" way. For example: 12.000 stands for 12000. I want to convert all numeric variables that have decimals to entire values (in this example multiplying by 1000 - since R reads 12.000 as 12, and what I really want is 12000). I've tried to convert it to character and then manipulate it in order to add zeros. I don't think this is the best way, but what I'm trying looks like this:
vec <- c(12.000, 5.300, 5.000, 33.400, 340, 3200)
vec <- as.character(vec)
> vec
[1] "12" "5.3" "5" "33.4" "340" "3200"
x <- "([0 -9]{1})"
xx <- "([0 -9]{2})"
x.x <- "([0 -9]{1}\\.[0 -9]{1})"
xx.x <- "([0 -9]{2}\\.[0 -9]{1})"
I created this regular expressions so what I could do is create a condition that if grep(x, vec) is true, then I do : paste0("000", vec) for when vec is true in the condition set. My idea is to do this for all possible cases, which are: add "000" if x or if xx & add "00" if x.x or if xx.x
Does anyone has an idea of what I could do? If there is any simpler idea?
Thank you!!
You need to read the vector as a character in the first place. If you read as numeric R will interpret it as a number and remove the decimal followed by 0
df <- read.csv(text= "Index, Vec
1, 12.00
2, 5.3
3, 5
4, 33.4
5, 340
6, 3200",
colClasses = c("numeric", "character"))
isDot <- grepl("\\.", df$Vec)
df$Vec[isDot] <- as.numeric(df$Vec[isDot])*1000
df$Vec <- as.numeric(df$Vec)
I'm trying to devise an approach to recoding items in a vector based on whether or not they occur AFTER a certain value in that vector. I've got an intact dataset (a time series grouped by subject) that contains a column indicating the month of initial exposure by subject (this column has NA for lack of exposure and "G" for month exposure occured). Once the subject has been "exposed", I need the vector for that subject to indicate that he/she has been exposed until the end of the observation period for that subject. Here's a stripped down example and a solution that works some, but not in every case I need it to:
x2 <- c("G", NA, NA, NA, NA)
solution <- c(rep(1, length(x2)- length(rep("G", (length(x2)+1 )- which(x2=="G")))), rep("G", (length(x2)+1 )- which(x2=="G")))
In this case the solution looks like this:
> solution
[1] "G" "G" "G" "G" "G"
That said, the solution breaks when confronted with a vector that does not include any "G"s
x2 <- c(NA, NA, NA, NA, NA)
solution <- c(rep(1, length(x2)- length(rep("G", (length(x2)+1 )- which(x2=="G")))), rep("G", (length(x2)+1 )- which(x2=="G")))
Error in rep("G", (length(x2) + 1) - which(x2 == "G")) :
invalid 'times' argument
So, at the end of the day, the solution vector needs to:
1) be of the same length as the original vector (x2 in this case) AND
2) contain the value "G" in every position AFTER the initial "G" in the original vector
One more thing, I need the solution to be in some form that I can pass to plyr over a grouping factor (as I need to recode many vectors grouped by factor over a large dataset).
Thank you all very much in advance!
Chris
This works too:
x2 <- c(NA,"G", NA, NA, NA, NA)
ifelse(seq_along(x2)>=match('G',x2),'G',x2)
This question has been asked before... I think, I am trying to dig up the old question.
repG <- function(x, start) { patt <- paste0("^",start,"$")
if( length(grep(patt, x))>0 ){ x[ grep(patt, x)[1]:length(x)] <- start
return(x) } }
grep("^G$", tvec)
#[1] 6 7 8 9 10 11 12