Choose a subsample of random numbers - r

I will play in the Brazilian Lottery with my friends. I requested every one of them to choose seven numbers. I create a variable for all of them.
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
Then, I created a list with all of the numbers, and a list with unique numbers (in order to avoid repeated numbers)
list = c(carol, davi, pestana, valerio, victor, diuli, cynara)
list2 = unique(list)
Finally, I made a sample() for the list2
sample(list2, 7)
After that, I was wondering. Is it possible for me not to use the unique and not have repeated numbers? Because for instance, that way, repeated numbers have the same probability of appearing, when in fact, they have more (for instance, seven appeared three times).

How about this:
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
list = c(carol, davi, pestana, valerio, victor)
l <- c(unlist(list))
nums <- table(l)
probs <- nums/sum(nums)
sample(names(probs), 7, prob = probs, replace=FALSE)
#> [1] "4" "33" "44" "11" "29" "52" "8"
Created on 2022-12-14 by the reprex package (v2.0.1)
Using the prob argument, you can make some values more likely to show up than others.

Related

Operation with a column vector without using a loop in R

I want to perform an operation over a vector without using a loop. The operation is the following:
This is how I am coding in R
meanx <- mean(rankx)
Numerador <- (rankx[] - meanx)*(rankx[+1] - meanx)
This is the input:
> dput(rankx)
c(15, 11, 12, 30, 58, 14, 41, 10, 57, 32, 28, 52, 61, 18, 54,
37, 19, 7, 29, 66, 5, 47, 25, 6, 50, 65, 62, 23, 40, 63, 42,
64, 38, 56, 45, 17, 8, 59, 55, 67, 24, 60, 2, 35, 44, 20, 3,
39, 4, 31, 26, 51, 21, 22, 53, 33, 46, 9, 16, 36, 13, 27, 34,
48, 1, 49, 43)
For example for the first case it will be: (15 - mean(rankx))(11 - mean(rankx))
For the next: (11 - mean(rankx))(12 - mean(rankx))
I am not sure how to refer to the second element and my error is in rankx[+1]
Any idea in how to solve this operation without using a loop?
You can use dplyr::lead
rankx[+1] is equivalent to rankx[1], which is 15.
If you want a copy of rankx that's displaced by one unit, use dplyr::lead(rankx) - like this:
rankx <- c(15, 11, 12, 30, 58, 14, 41)
dplyr::lead(rankx)
#> [1] 11 12 30 58 14 41 NA
meanx <- mean(rankx)
Numerador <- (rankx - meanx)*(dplyr::lead(rankx) - meanx)
Numerador
#> [1] 161.30612 205.87755 -57.40816 133.16327 -381.12245 -179.55102 NA
Created on 2021-04-20 by the reprex package (v1.0.0)

Remove community boxes in igraph

I have created a simple minimum spanning tree and now have a data frame with columns 'from', 'to' and 'distance'.
Based on this, I found communities using the Louvain method, which I plotted. As far as I understand it, for clustering and plotting I need only the columns from and to, and the distance is not used.
How can I keep the communities I found, ideally each in a different color, but remove the box around the communities?
library(igraph)
from <- c(14, 25, 18, 19, 29, 23, 24, 36, 5, 22, 21, 29, 18, 26, 2, 45, 8, 7, 36, 42, 3, 23, 13, 13, 20, 15, 13, 7, 28, 9, 6, 37, 8, 4, 15, 27, 10, 2, 39, 1, 43, 21, 14, 4, 14, 8, 9, 40, 31, 1)
to <- c(16, 26, 27, 20, 32, 34, 35, 39, 6, 32, 35, 30, 22, 28, 45, 46, 48, 12, 38, 43, 42, 24, 27, 25, 30, 20, 50, 29, 34, 49, 40, 39, 11, 41, 46, 47, 50, 16, 46, 40, 44, 31, 17, 40, 44, 23, 33, 42, 33, 1)
distance <- c(0.3177487, 0.3908324, 0.4804059, 0.4914682, 0.5610357, 0.6061082, 0.6357532, 0.6638961, 0.7269725, 0.8136463, 0.8605391, 0.8665838, 0.8755252, 0.8908454, 0.9411793, 0.9850834, 1.0641603, 1.0721154, 1.0790506, 1.1410964, 1.1925349, 1.2115428, 1.2165045, 1.2359032, 1.2580204, 1.2725243, 1.2843610, 1.2906908, 1.3070725, 1.3397053, 1.3598817, 1.3690732, 1.3744088, 1.3972220, 1.4472312, 1.4574936, 1.4654772, 1.4689660, 1.5999424, 1.6014316, 1.6305410, 1.6450413, 1.6929959, 1.7597620, 1.8113320, 2.0380866, 3.0789517, 4.0105981, 5.1212614, 0.0000000)
mst <- cbind.data.frame(from, to, distance)
g <- graph.data.frame(mst[, 1:2], directed = FALSE)
lou <- cluster_louvain(g)
set.seed(1)
plot(lou, g, vertex.label = NA, vertex.size=5)
The blobs around the groups can be turned off like this:
plot(lou, g, vertex.label = NA, vertex.size=5, mark.groups = NULL)
Do you want this?
plot(lou, g, vertex.label = NA, vertex.size = 5, mark.border = NA)

R rewriting a for loop

I've got a loop in my code that I would like to rewrite so running the code takes a little less time to compete. I know you allways have to avoid loops in the code but I can't think of an another way to accomplice my goal.
So I've got a dataset "df_1531" containing a lot of data that I need to cut into pieces by using subset() (if anyone knows a better way, let me know ;) ). I've got a vector with 21 variable names on which I like assign a subset of df_1531. Furthermore the script contains 22 variables with constrains (shift_XY_time).
So, this is my code now...
# list containing different slots
shift_time_list<- c(startdate, shift_1m_time, shift_1a_time, shift_1n_time,
shift_2m_time, shift_2a_time, shift_2n_time,
shift_3m_time, shift_3a_time, shift_3n_time,
shift_4m_time, shift_4a_time, shift_4n_time,
shift_5m_time, shift_5a_time, shift_5n_time,
shift_6m_time, shift_6a_time, shift_6n_time,
shift_7m_time, shift_7a_time, shift_7n_time)
# List with subset names
shift_sub_list <- c("shift_1m_sub", "shift_1a_sub", "shift_1n_sub",
"shift_2m_sub", "shift_2a_sub", "shift_2n_sub",
"shift_3m_sub", "shift_3a_sub", "shift_3n_sub",
"shift_4m_sub", "shift_4a_sub", "shift_4n_sub",
"shift_5m_sub", "shift_5a_sub", "shift_5n_sub",
"shift_6m_sub", "shift_6a_sub", "shift_6n_sub",
"shift_7m_sub", "shift_7a_sub", "shift_7n_sub")
# The actual loop that I'd like to rewrite
for (i in 1:21) {
assign(shift_sub_list[i], subset(df_1531, df_1531$'PLS FFM' >= shift_time_list[i] & df_1531$'PLS FFM' < shift_time_list[i+1]))
}
Running the loop takes approximately 6 or 7 seconds. So, if anyone knows a better/cleaner or quicker way to write my code, I desperately like to hear your suggestion/opinion.
**Reproducible example **
mydata <- cars
dput(cars)
structure(list(speed = c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11,
12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16,
16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20,
22, 23, 24, 24, 24, 24, 25), dist = c(2, 10, 4, 22, 16, 10, 18,
26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80,
20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32,
48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)), class = "data.frame", row.names = c(NA,
-50L))
dist_interval_list <- c( 0, 5, 10, 15,
20, 25, 30, 35,
40, 45, 50, 55,
60, 65, 70, 75,
80, 85, 90, 95,
100, 105, 110, 115, 120)
var_name_list <- c("var_name_1a", "var_name_1b", "var_name_1c", "var_name_1d",
"var_name_2a", "var_name_2b", "var_name_2c", "var_name_2d",
"var_name_3a", "var_name_3b", "var_name_3c", "var_name_3d",
"var_name_4a", "var_name_4b", "var_name_4c", "var_name_4d",
"var_name_5a", "var_name_5b", "var_name_5c", "var_name_5d",
"var_name_6a", "var_name_6b", "var_name_6c", "var_name_6d")
for (i in 1:24){
assign(var_name_list[i], subset(mydata,
mydata$dist >= dist_interval_list[i] &
mydata$dist < dist_interval_list[i+1]))
}
Starting with the 'reproducible' part and the information that the final aim is to summarize another column, it is possible to exploit the fact that the intervals are non-overlapping and simply use the cut function.
library(tidyverse)
mydata %>%
mutate(interval = cut(dist, breaks = dist_interval_list)) %>%
group_by(interval) %>%
summarise(sum = sum(speed))
This should be much faster and will also help you not to get lost in a messy environment full of variables (which are actually part of your data). You want to keep all your data in a single data frame as long as possible;) You probably want to follow with something like purrrlyr::invoke_rows at the final modeling step, if your function does not work with data frames.

How can I extract a vector from a list of text files

I have many different text files with the same structure (900*600 pixels). Now I would like to extract 900*600 vectors each containing one data point from each text file.
For example I would like to have a vector from the position (x1,y1) with all the data points from all the text files.
Here you can see my code I have in order to generate a list with all the text files.
file.list = list.files(pattern="*.txt", full.names=T)
df = data.frame( files= sapply(file.list, FUN = function(x)readChar(x, file.info(x)$size)), stringsAsFactors=FALSE)
Now "df" is a list containing all the text files.
How can I extract now the different vectors with values from all the files?
This is my code so far. I need to define somehow a function (FUN).
files = lapply(df, FUN, header = F, sep="\t", skip = 2, stringsAsFactors = F)
I prepared a dummy data set.
a = matrix(c(15, 12, 37, 21, 37, 26, 33, 33, 27, 38, 32, 21, 24, 18,
20, 14, 32, 56, 16, 7, 23, 14, 34, 42), nrow = 3, ncol = 4)
b = matrix(c(14, 18, 34, 26, 37, 26, 32, 36, 21, 39, 32, 21, 22, 18,
20, 16, 42, 50, 16, 7, 23, 12, 36, 40), nrow = 3, ncol = 4)
c = matrix(c(10, 12, 34, 29, 31, 26, 30, 30, 20, 38, 36, 21, 29, 18,
20, 10, 32, 59, 16, 1, 23, 10, 39, 49), nrow = 3, ncol = 4)
file.list = list(a,b,c)
Here every variable corresponds to one textfile (listed in file.list). And instead of a 900*600 matrix there are 3*4 matrices.
Accordingly to your suggestions I implemented the the functions the following way.
cmbn = expand.grid(1:3, 1:4)
flen = length(file.list)
lapply(1:(nrow(cmbn)),function(t,lst,cmbn){
return(sapply(1:flen,function(i,t1,lst1,cmbn1){
return(lst1[[i]][cmbn1$Var1[t1],cmbn1$Var2[t1]])},t,lst,cmbn))}
,file.list,cmbn)
This should work for you:
It will take two loops. Not sure if this is the most optimized solution.
cmbn is the data.frame of co-ordinates.
cmbn = expand.grid(1:3,1:4)
#or `expand.grid(1:900,1:600)` in your case
flen = length(file.list)
lst will take file.list
lapply(1:(nrow(cmbn)),function(t,lst,cmbn)
{return(sapply(1:flen,function(i,t1,lst1,cmbn1){
return(lst1[[i]][cmbn1$Var1[t1],cmbn1$Var2[t1]])},t,lst,cmbn))
},file.list,cmbn)

Shuffle deck of cards without built-in random functon

My friend suggested me to try to solve this problem before interview, but I have no idea on how to approach it.
I need to write a code to shuffle a deck of 52 cards without using a built-in standard random function.
Update
Thanks to Yifei Wu, his answer was very helpful.
Here is a link for my github project where I executed the given algorithm
https://github.com/Dantsj16/Shuffle-Without-Random.git
Your question does not say it must be a random shuffle of 52 cards. There is such a thing as a perfect shuffle, where a riffle shuffle is done with the top card remaining on the top and every other card comes from the other half of the deck. Many magicians and card sharks can do this shuffle as desired. It is well known that eight perfect shuffles in a row of a standard 52-card deck returns the cards to their original order, if the top card remains on top for each shuffle.
Here are 8 perfect shuffles in python Note that this shuffle is done differently than an actual manual shuffle would be done, to simplify the code.
In [1]: d0=[x for x in range(1,53)] # the card deck
In [2]: print(d0)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [3]: d1=d0[::2]+d0[1::2] # a perfect shuffle
In [4]: print(d1)
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52]
In [5]: d2=d1[::2]+d1[1::2]
In [6]: d3=d2[::2]+d2[1::2]
In [7]: d4=d3[::2]+d3[1::2]
In [8]: d5=d4[::2]+d4[1::2]
In [9]: d6=d5[::2]+d5[1::2]
In [10]: d7=d6[::2]+d6[1::2]
In [11]: d8=d7[::2]+d7[1::2]
In [12]: print(d8)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [13]: print(d0 == d8)
True
If you want the perfect shuffle as done by hand, use
d1=[None]*52
d1[::2]=d0[:26]
d1[1::2]=d0[26:]
This gives, for d1,
[1, 27, 2, 28, 3, 29, 4, 30, 5, 31, 6, 32, 7, 33, 8, 34, 9, 35, 10, 36, 11, 37, 12, 38, 13, 39, 14, 40, 15, 41, 16, 42, 17, 43, 18, 44, 19, 45, 20, 46, 21, 47, 22, 48, 23, 49, 24, 50, 25, 51, 26, 52]
Let me know if you really need a random shuffle. I can adapt my Borland Delphi code into python if you need it.

Resources