Random Assignment of Groups - r

How do I randomly assign a group of people into four treatment groups and a control group, given that I have a list of their names on an excel document?

Get the randomizr package
install.packages("randomizr")
library(randomizr)
use complete random assignment (holds the number of units assigned to each condition fixed across randomizations, unlike sample with replace = TRUE
Z <- complete_ra(N = 100, num_arms = 5)
table(Z)

If you have 100 names (number them as such) then you can assign them to one of 5 groups with
split(1:100, sample(1:5, 100, replace = TRUE))
split(x, f) splits x into groups according to f, for which I've used sample to sample 100 occurrences of the numbers 1 to 5 (with replacement).
Take these numbered names from your list.
(Note: you didn't specify equal groups).
Alternatively, the caret package can handle this quite nicely for you: https://topepo.github.io/caret/data-splitting.html

Related

Dividing one dataframe into many with names in R

I have some large data frames that are big enough to push the limits of R on my machine; e.g., the one on which I'm currently working is 2 columns by 70 million rows. The contents aren't important, but just in case, column 1 is a string and column 2 is an integer.
What I would like to do is split that data frame into n parts (say, 20, but preferably something that could change on a case-by-case basis) so that I can work on each of the smaller data frames one at a time. That means that (a) the result has to produce things that are named (e.g., "newdf_1", "newdf_2", ... "newdf_20" or something), and (b) each line in the original data frame needs to be in one (and only one) of the new "sub" data frames. The order does not matter, but doing it sequentially by rows makes sense to me.
Once I do the work, I will start to recombine them (using rbind()) one pair at a time.
I've looked at split(), but from what I can tell, it is designed to work with factors (which I don't have).
Any ideas?
You can create a new column and split the data frame based on that column. The column does not need to be a factor, but need to be a data type that can be converted to a factor by the split function.
# Number of groups
N <- 20
dat$group <- 1:nrow(dat) %% N
# Add 1 to group
dat$group <- dat$group + 1
# Split the dat by group
dat_list <- split(dat, f = ~group)
# Set the name of the list
names(dat_list) <- paste0("newdf_", 1:N)
Data
set.seed(123)
# Create example data frame
dat <- data.frame(
A = sample(letters, size = 70000000, replace = TRUE),
B = rpois(70000000, lambda = 1)
)
Here's a tidyverse based solution. Try using read_csv_chunked().
# practice data
tibble(string = sample(letters, 1e6, replace = TRUE),
value = rnorm(1e6) %>%
write_csv("test.csv")
# here's the solution
partial_data <- read_csv_chunked("test.csv",
DataFrameCallback$new(function(x, pos) filter(x, string == "a")),
chunk_size = 1000)
You can wrap the call to read_csv_chunked in a function where you change the string that you subset on.
This is more or less a repeat of this question:
How to read only lines that fulfil a condition from a csv into R?

R: How To Generate 15 Random Numbers random numbers from 1:10 With Strange Conditions

How To Generate 15 Random Numbers random numbers from 1:10, making sure A) Each number appear AT LEAST once and B) No number appears THREE times
I know how to use the normal sample function with replace = True and False. But this question is totally different, argh....
sample(1:10, 15, replace = TRUE)
You could do this with...
sample(c(1:10, sample(10, 5)))
The sequence 1:10 is combined with five random numbers from 1:10. The outer sample shuffles the whole lot.

Repat Column in R with different arrangement of rows each time

I have a vector in R, called v, such that v <- c(1,3,4,5,2).
I am trying to create 50 repetitions of this vector, but would like to re-arrange the indices randomly and non-repeating for each repetition, i.e.,
reps <- c(1,3,4,5,2 1,2,3,4,5) etc...
I tried:
reps <- rep(sample(1:5, 5, replace = FALSE), 50)
but this just repeats the same sample 50 times.
May be we can use
c(replicate(50, sample(v)))

Creating a sparse matrix in r with a set number of integer values per row

I'm trying to create a sparse matrix, where for each row has a maximum of n entries that are each integers within a certain range, which I could then use as an adjacency matrix for social network analysis. For example, an 80X80 matrix where each row has 10 or fewer entries that are integers from 1-4. The goal is to represent the sort of data you would get from a social networking survey in which respondents were selecting values between 1 and 4 to indicate their relationship with up to 10 of the possibilities/columns in the survey.
I can create a sparse matrix using the "rsparsematrix" function, and using the density command can approximate the required number of responses, but I can't control the number of responses per row and would have to do additional processing to convert the random values to integers within my desired range.
eg: I could start with something like
M1<-rsparsematrix(80, 80, density = .1, symmetric = FALSE)
A more promising approach (from https://www.r-bloggers.com/casting-a-wide-and-sparse-matrix-in-r/) would be to generate the values and then use "transform" to convert them into a matrix. This allows me to control the integer values, but still doesn't get the limited number of responses per row.
Example code from the blog follows:
set.seed(11)
N = 10
data = data.frame(
row = sample(1:3, N, replace = TRUE),
col = sample(LETTERS, N, replace = TRUE),
value = sample(1:3, N, replace = TRUE))
data = transform(data,
row = factor(row),
col = factor(col)) "
This could be tweaked to give the required 80x80 matrix, but doesn't solve the problem of limiting the responses per row and, in the case of duplicate entries in the same row/column combination will result in out of range values since it resolves duplicate entries by taking the sum.
Any suggestions would be most appreciated.
As a bonus question, how would you then create random rows of null responses? For example within the 80*80 matrix, how might you introduce 40 random rows with no values? As in the description above, this would correspond to missing survey data.
You can try to build the spare matrix up using the row (i), column (j) amd value (x) components. This involves sampling subject to your row and value constraints.
# constraints
values <- 1:4
maxValuesPerRow <- 10
nrow <- 80
ncol <- 80
# sample values : how many values should each row get but <= 10 values
set.seed(1)
nValuesForEachRow <- sample(maxValuesPerRow, nrow, replace=TRUE)
# create matrix
library(Matrix)
i <- rep(seq_len(nrow), nValuesForEachRow) # row
j <- unlist(lapply(nValuesForEachRow, sample, x=seq_len(ncol))) # which columns
x <- sample(values, sum(nValuesForEachRow), replace=TRUE) # values
sm <- sparseMatrix(i=i, j=j, x=x)
check
dim(sm)
table(rowSums(sm>0))
table(as.vector(sm))
note, cant just sample columns like below as this can give duplicate values, hence loop used.
j <- sample(seq_len(ncol), sum(nValuesForEachRow), replace=TRUE)
The code below will do what you want. It generates your random sparse matrix, rounds it to whole numbers, then for every row that has more than 10 entries, randomly makes some entries NA until only 10 remain. It then makes all the non NA entries a random number between 1 and 4.
library(Matrix)
M1<-as.data.frame(as.matrix((rsparsematrix(80, 80, density = .1, symmetric = FALSE))))
M1 <- as.data.frame(apply(M1,1,round))
M1<-as.data.frame(sapply(M1,function(x) ifelse(x==0,NA,x)))
rows<-which(apply(M1,1,function(x) sum(!(is.na(x)))) >10)
for(i in rows)
{
toNA<-setdiff(which(!(is.na(M1[i,]))),sample(which(!(is.na(M1[i,]))),10,replace=F))
M1[i,toNA] <- NA
)
for(i in 1:nrow(M1))
{
M1[i,which(!(is.na(M1[i,])))] <- sample(1:4,length(M1[i,which(!
(is.na(M1[i,])))]),replace=T)
}

Random generation of numbers using R

I have some data that involves zebu (beef animals) that are labeled 1-40. I need to divide them into 4 groups of 10 each. I need to choose them randomly to remove any bias and I need to use R and Excel. Thank you please help.
There are ways of doing this that only require less code, but here's a verbose example that let's me explain what's happening.
Here's the dataset I'll be using since I don't know exactly how your data look.
beef <-
data.frame(number = 1:40, weight = round(rnorm(40, mean = 2000, sd = 500)))
Because your animals are numbered from 1 to 40, you can create a new dataframe that contains those numbers with a random group number (1 to 4) as the second column.
num_group <- (data.frame(
number = 1:40,
group =
sample(
x = 1:4,
size = 40,
replace = TRUE
)
))
Join the two dataframes together and you have your answer.
merge(beef, num_group)
To shuffle the data in excel follow this tip
Create new column in your data then apply RAND()
It will generate random number over that column and sort random numbers column you will get your data shuffled.
Later load data in to R and select 10 rows each time and assign class to them.

Resources