am writing a code which like the lottery should produce a list of 6 numbers like this (20,45,11,16,09, + 12)
numbers have to be from 1 to 50
How do I do it so that it just gives me the last line all 5 randomly chosen values plus the additional number all within one list:
import random
a = list(range(1,50))
b = random.randint(1, 20)
temp = []
for i in range(5):
temp.append(random.choice(a))
print(temp, "+", b) code here
output:
[8] + 12
[8, 30] + 12
[8, 30, 42] + 12
[8, 30, 42, 21] + 12
**[8, 30, 42, 21, 14] + 12**
I am not familiar with how lottery works. I assume that all numbers, including the additional number, are drawn from the same set without replacement. If that is true, this code should do the trick for you:
import random
numbers = list(range(1, 51)) # include 50
random.shuffle(numbers)
print(numbers[:6]) # pick the first 6 numbers (5 + 1 additional number) in the randomize list
If your additional number comes from a separate set, do this:
import random
numbers = list(range(1, 51)) # include 50
random.shuffle(numbers)
# pick the first 5 from the randomized list:
picks = numbers[:5]
# select from remaining numbers those that are less than or equal to 20:
picks.append(random.choice([n for n in numbers[5:] if n <= 20]))
print(picks)
Expending on the previous answer based on your request to exclude previously picked numbers while generating the last number.
import random
numbers = list(range(1, 51)) # create a list containing 1 to 50
random.shuffle(numbers) # randomise the list
picks = numbers[:5] # pick the first 5 numbers from the randomised list
# select the last number from the remaining numbers in the list
picks.append(random.choice([n for n in numbers if n not in picks]))
print(picks)
Related
Let's say I have a dataframe.
x_coord y_coord u
1 12 16 100
2 17 16 105
3 22 12 95
4 27 12 98
I want to calculate the product of pairs of rows under u under multiple conditions based on the other columns which I've done with nested loops:
prod_pairs<- NULL
prod_pairs<- matrix(nrow=4, ncol=1)
for (i in 1:4) {
for (j in 1:4) {
if(i!=j & data$y_coord[i]==data$y_coord[j] & data$x_coord[i]-data$x_coord[j]==-5) {
prod_pairs[i]<- data$u[i]*data$u[j]
break
}
}
}
My actual dataset is much larger and I am repeating this multiple times with other columns in place of u and other value in the 3rd condition under the if statement (it's -5 here; so I will repeat with +5, -10, +10 etc).
The nested loops are quite slow and I've been trying to vectorize this but to no avail. Is there a way I can speed it up?
Also, I want to try to create a function so I can input other columns and values in the 3rd condition of the if statement. I was trying to combine vectorization with a function that can do this but could not make it work.
How would I go about doing this?
Thanks.
One approach might be to join the table on itself, requiring that the join be equal on y_coord, and be offset by (-5, 5, 10, etc) on the xcoord. This function does that, and also allows you to pass a different u column (default is "u", and a different offset or xdiff (default is -5)
get_paired_products<- function(df, ucol="u", xdiff = -5) {
result = df[df[,x2:= x_coord - xdiff], on=.(y_coord, x_coord=x2), nomatch=0]
result[, prod:=get(ucol)*get(paste0("i.",ucol))][, .(row_a = row, row_b=i.row, prod)]
}
If input is:
df = data.table(
x_coord = c(12,17,22,27),
y_coord = c(16,16,12,12),
u=c(100,105, 95, 98),
row = c(1,2,3,4)
)
Then output is:
> get_paired_products(df)
row_a row_b prod
1: 2 1 10500
2: 4 3 9310
I'd like to create a randomized data frame in R where the values of the 2nd column = 1st column + number and the next value of the 1st column = 2nd column + number
It would look something like this:
I've tried doing this:
Sal = rnorm(150,mean=18,sd=1.7)
H2O = Sal + 0.2358
d = data.frame(Sal = rep(Sal,1), H2O = rep(H2O,1))
df = d[order(d$Sal,d$H2O),]
df
But it doesn't really work since the next number doesn't "build" upon the previous number.
How could I do this? Should I use a loop instead? My R experience is fairly limited (as you can probably tell)
Thank you in advance!
I think cumsum is what you are looking for (and then filling the rows of your data frame):
set.seed(42)
n_rows <- 5
rnd_numbers <- rnorm(n_rows * 2, mean = 18, sd = 17)
entries <- cumsum(rnd_numbers)
df <- data.frame(matrix(entries, nrow = n_rows, byrow = T))
colnames(df) <- c('Sal', 'H20')
df
Sal H20
1 41.30629 49.70642
2 73.87961 102.63827
3 127.51083 143.70672
4 187.40259 203.79339
5 256.10659 273.04045
You can check that the result follows the structure you described above by having a look at the underlying random numbers:
rnd_numbers
[1] 41.306294 8.400131 24.173183 28.758664 24.872561 16.195883
[7] 43.695874 16.390796 52.313203 16.933860
For you question, looking at the desired output data.frame, if you focus on values from left-to-right and top-to-bottom, you will see that increment with respect to the first value as the reference can be written as
c(0,1,2,3,....,2*nr-1)*number
where nr is the number of rows of your desired output. In this since, the only thing you need to do is to create the sequence as the increments and add them up to the initial value.
You can try the code below
set.seed(1)
Sal <- rnorm(1, mean = 18, sd = 1.7)
number <- 0.2358
nr <- 10
df <- setNames(
data.frame(
matrix(Sal + (seq(2 * nr) - 1) * number,
ncol = 2,
byrow = TRUE
)
),
c("Sal", "H2O")
)
which gives
> df
Sal H2O
1 16.93503 17.17083
2 17.40663 17.64243
3 17.87823 18.11403
4 18.34983 18.58563
5 18.82143 19.05723
6 19.29303 19.52883
7 19.76463 20.00043
8 20.23623 20.47203
9 20.70783 20.94363
10 21.17943 21.41523
Here's an approach using Reduce:
set.seed(123)
Start <- rnorm(1,14,1)
Values <- Reduce(`+`,rnorm(19, 0.25, 0.125),init = Start, accumulate = TRUE)
Matrix <- matrix(Values, ncol = 2, byrow = TRUE)
Result <- setNames(as.data.frame(Matrix),c("Salt","H2O"))
Result
Salt H2O
1 13.43952 13.66075
2 14.10559 14.36440
3 14.63057 15.09495
4 15.40256 15.49443
5 15.65857 15.85287
6 16.25588 16.55085
7 16.85095 17.11478
8 17.29530 17.76867
9 18.08090 18.08507
10 18.42274 18.61364
Reduce's first argument is a function with two arguments, in our case +. We set init = to be the starting value, and then a random value generated by rnorm is added. That new value is used as the starting value for the next number. We use accumulate = TRUE to keep all the values.
From here, we can use matrix(), to change the vector of values into a 2 x n matrix. Then we can convert to data.frame and add the column names.
Im working with proteome data and would like to show the peptide expression according to the actual protein sequence. Currently they are ordered according to their usage in quantification (= random).
I suppose you can do this using regular expressions / stringr&rebus (preferably) but I couldnt figure out how.
Here is a data example, many thanks for your help!
peptides <- data.frame(peptide = c(1,2,3,4),
sequence = c("PRDPDPASRTH", "MTLGRRLACLF", "RRARPHAWP", "APNFVMSAAH"),
log2quant = c(21, 12, 17, 18))
protein_sequence <- c("MTLGRRLACLFLACVLPALLLGGTALASEIVGGRRARPHAWPFMVSLQLRGGHFCGATLIAPNFVMSAAHCVANVNVRAVRVVLGAHNLSRREPTRQVFAVQRIFENGYDPVNLLNDIVILQLNGSATINANVQVAQLPAQGRRLGNGVQCLAMGWGLLGRNRGIASVLQELNVTVVTSLCRRSNVCTLVRGRQAGVCFGDSGSPLVCNGLIHGIASFVRGGCASGLYPDAFAPVAQFVNWIDSIIQRSEDNPCPHPRDPDPASRTH")
expected_result <- data.frame(peptide = c(1,2,3,4),
sequence = c("PRDPDPASRTH", "MTLGRRLACLF", "RRARPHAWP", "APNFVMSAAH"),
log2quant = c(21, 12, 17, 18),
order = c(4, 1, 2, 3))
The sequence I copy/pasted from Uniprot (its the ELANE protein). Rest of the data comes from MassSpec results.
Would be great to find a solution for this, many thanks!
We can use str_locate from stringr to get the location of start (or end) of the pattern in the string protein_sequence and use rank to get it's order.
peptides$order <- rank(stringr::str_locate(protein_sequence,peptides$sequence)[, 1])
peptides
# peptide sequence log2quant order
#1 1 PRDPDPASRTH 21 4
#2 2 MTLGRRLACLF 12 1
#3 3 RRARPHAWP 17 2
#4 4 APNFVMSAAH 18 3
Make sure that peptides$sequence is character and not factor before using it in str_locate.
I have derived all the start and stop positions within a DNA string and now I would like to map each start position with each stop position, both of which are vectors and then use these positions to extract corresponding sub strings from the DNA string sequence. But I am unable to efficiently loop through both vectors to achieve this, especially as they are not of the same length.
I have tried different versions of loops (for, ifelse) but I am not quite able to wrap my head around a solution yet.
Here is an example of one of my several attempts at solve this problem.
new = data.frame()
for (i in start_pos){
for (j in stop_pos){
while (j>i){
new[j,1]=i
new[j,2]=j
}
}
}
Here is an example of my desired result:
start = c(1,5,7, 9, 15) stop = c(4, 13, 20, 30, 40, 50). My desired result would ideally be a dataframe of two columns mapping each start to its stop position. I only want to add rows on to df where by start values are greater than its corresponding stop values (multiple start values can have same stop values as long as it fulfills this criteria)as shown in my example below.
i.e first row df= (1,4)
second row df= (5,13)
third row df = (7, 13 )
fourth row df = (9,13)
fifth row df = (15, 20)
Here's a fairly simple solution - it's probably good not to over-complicate things unless you're sure you need the extra complexity. The starts and stops already seem to be matched up, you just might have more of one than the other, so you can find the length of the shortest vector and only use that many items from start and stop:
start = c(1, 5, 15)
stop = c(4, 13, 20, 30, 40, 50)
min_length = min(length(start), length(stop))
df = data.frame(
start = start[1:min_length],
stop = stop[1:min_length]
)
EDIT: after reading some of your comments here, it looks like your problem actually is more complicated than it first seemed (coming up with examples that demonstrate the level of complexity you need, without being overly complex, is always tricky). If you want to match each start with the next stop that's greater than the start, you can do:
# Slightly modified example: multiple starts
# that can be matched with one stop
start = c(1, 5, 8)
stop = c(4, 13, 20, 30, 40, 50)
df2 = data.frame(
start = start,
stop = sapply(start, function(s) { min(stop[stop > s]) })
)
Here is a possible tidyverse solution:
library(purrr)
library(plyr)
library(dplyr)
The map2 is used to map values of the two vectors(start and stop). We then make one vector out of these followed by unlisting and combining our results into a data.frame object.
EDIT:
With the updated condition, we can do something like:
start1= c(118,220, 255)
stop1 =c(115,210,260)
res<-purrr::map2(start1[1:length(stop1)],stop1,function(x,y) c(x,y[y>x]))
res[unlist(lapply(res,function(x) length(x)>1))]
# [[1]]
# [1] 255 260
ORIGINAL:
plyr::ldply(purrr::map2(start[1:length(stop)],stop,function(x,y) c(x,y)),unlist) %>%
setNames(nm=c("start","stop")) %>%
mutate(newCol=paste0("(",start,",",stop,")"))
# start stop newCol
#1 1 4 (1,4)
#2 5 13 (5,13)
#3 15 20 (15,20)
#4 NA 30 (NA,30)
#5 NA 40 (NA,40)
#6 NA 50 (NA,50)
Alternative: A clever way is shown by #Marius .The key is to have corresponding lengths.
plyr::ldply(purrr::map2(start,stop[1:length(start)],function(x,y) c(x,y)),unlist) %>%
setNames(nm=c("start","stop")) %>%
mutate(newCol=paste0("(",start,",",stop,")"))
start stop newCol
1 1 4 (1,4)
2 5 13 (5,13)
3 15 20 (15,20)
Sometimes we want not only the rows satisfy our condition, but its adjacent rows to do a comparison, I want to get the wanted row, and the n rows above it and/or n rows below it.
To be more specific, suppose the condition give me row 3,6,7
I want row 1,2,3,4,5 to compare 3
need row 4,5,6,7,8 to compare 6
and row 5,6,7,8,9 to compare 7
after eliminate duplication, I get row 1,2,3,4,5,6,7,8,9
I knew this could be handled by writing a function my self and it is not hard
I just wondering if there is any package focus on the neighbor of the rows, since many estimation need not only the rows but also its neighbor
I have find the way(write the function myself), but if anyone know any package aimed for this please tell me
extend the number to a vector centered with this value by 2, consider the boundary case, by default the first row is 1 and we didn't reach the end
getAdjacentNum = function(x,lowerbound = 1,upperbound = x+2){
start = x - 2
if (x - lowerbound == 1)
start = x - 1
if (x == lowerbound)
start = x
end = x + 2
if (upperbound - x == 1)
end = x + 1
if (upperbound == x)
end = x
result = seq(from = start,to = end,by = 1)
return(result)
}
vectorize this function
getAdjacentNumV = Vectorize(getAdjacentNum,SIMPLIFY = FALSE)
combine the result and deal with the duplication
getAdjacentIndex = function(index){
unique(unlist(getAdjacentNumV(index)))
}
example:
getAdjacentIndex(c(1,5,29))
[1] 1 2 3 4 5 6 7 27 28 29 30 31