All unique samples from a data set in R - r

Here's my data
z<- c("COP","CHK","BP","BHI","CVX")
if i do
sample(z,3,replace=FALSE)
This will give me 1 unique random sample of 3 from my data set.
I want to find all possible unique samples of 3 from my data set. In this case there will be 10 outcomes.
But how do I write a R code for it?
Please help

We can use combn to get the unique combinations
t(combn(z, 3))
If we need to sample it
t(combn(sample(z), 3))

Related

Sorry for a dumb question. How to create a random sample of size e.g. 10 individuals in R

How do I create a random sample df of 10 PEOPLE in R?
You can simply use sample. the first criteria is the range or list of numbers that it will randomly pick. the second number is the amount of numbers to pick
sample(1:100, 10)

R code to generate random pairs of rows and do simulation

I have a data matrix in R having 45 rows. Each row represents a value of a individual sample. I need to do to a trial simulation; I want to pair up samples randomly and calculate their differences. I want a large sampling (maybe 10000) from all the possible permutations and combinations.
This is how I managed to do it till now:-
My data matrix ("data") has 45 rows and 2 columns. I selected 45 rows randomly and subtracted from another randomly generated 45 rows.
n1<-(data[sample(nrow(data),size=45,replace=F),])-(data[sample(nrow(data),size=45,replace=F),])
This gave me a random set of differences of size 45.
I made 50 such vectors (n1 to n50) and did rbind, which gave me a big data matrix containing random differences.
Of course, many rows between first random set and second random set were same and cancelled out. I removed it with a code as follows:
row_sub = apply(new, 1, function(row) all(row !=0 ))
new.remove.zero<-new[row_sub,]
BUT, is there a cleaner way to do this? A simpler way to generate all possible random pairs of rows, calculate their difference as bind it together as a new matrix?
Thanks in advance.

Clustering in R

I used hclust to cluster my data and cutree to specify the numbers of cluster to be 3. Is there any way that I can examine each of the cluster? By examine I mean to list out the cases/observations that are in e.g. the first cluster. I tried all the basic function that I know such as summary(), list()...but seem not relevant. Any function can do this?
If not, the cutree function returns a list of groups/clusters that each of my observation belongs to, something like this:
1,3,1,2,3,3,1
which indicates my first observation belongs to group 1, second belong to group three...
I am thinking about how to extract the position from that list where e.g. group = 1, so it will return 1,3 and 7 since observations 1,3,7 are belong to group 1
Or I need to use a loop to count all the observations that belong to e.g. group 1 from that list?
Is my question clear?
Does this help to get started?
nclust <- 10
cutreeout <- cutree(hclustOutput, nclust)
Add them as a new column to your dataframe
mydata$cluster <- cutreeout
How many observations are in each cluster?
table(mydata$cluster)
Then you can do more stuff to interpret your clusters, and/or study subsets of your data.
This is a hint, not the answer. Here's the example of Hierarchical Clustering in R. You can try to use the functions table(), ggplot() in order to see observations per clusters.

missing values for each participant in the study

I am working in r, what I want to di is make a table or a graph that represents for each participant their missing values. i.e. I have 4700+ participants and for each questions there are between 20 -40 missings. I would like to represent the missing in such a way that I can see who are the people that did not answer the questions and possible look if there is a pattern in the missing values. I have done the following:
Count of complete cases in a data frame named 'data'
sum(complete.cases(mydata))
Count of incomplete cases
sum(!complete.cases(mydata$Variable1))
Which cases (row numbers) are incomplete?
which(!complete.cases(mydata$Variable1))
I then got a list of numbers (That I am not quite sure how to interpret,at first I thought these were the patient numbers but then I noticed that this is not the case.)
I also tried making subsets with only the missings, but then I litterly only see how many missings there are but not who the missings are from.
Could somebody help me? Thanks!
Zas
If there is a column that can distinguish a row in the data.frame mydata say patient numbers patient_no, then you can easily find out the patient numbers of missing people by:
> mydata <- data.frame(patient_no = 1:5, variable1 = c(NA,NA,1,2,3))
> mydata[!complete.cases(mydata$variable1),'patient_no']
[1] 1 2
If you want to consider the pattern in which the users have missed a particular question, then this might be useful for you:
Assumption: Except Column 1, all other columns represent the columns related to questions.
> lapply(mydata[,-1],function(x){mydata[!complete.cases(x),'patient_no']})
Remember that R automatically attach numbers to the observations in your data set. For example if your data has 20 observations (20 rows), R attaches numbers from 1 to 20, which is actually not part of your original data. They are the row numbers. The results produced by the R code: which(!complete.cases(mydata$Variable1)) correspond to those numbers. The numbers are the rows of your data set that has at least one missing data (column).

How to save data in a dataframe?

Sorry, really beginner question: I want to generate a data frame with random data. I want my data frame to be 10 rows by 20 columns, where each row contains data from a random sample generated by rnorm. How do I do this?
Producing a matrix may be easier, but this can be converted to a dataframe:
rownum <- 10
colnum <- 20
yourdf <- as.data.frame(matrix(rnorm(rownum * colnum), nrow=rownum))

Resources