Is there a way to print random values - r

In R programming, can we print random values at any given point. For example we have unique(iris$Species) showing 3 categories. But Can we print any one category at any given point of time.

Use sample() from base R
sample(unique(iris$Species),1)

Related

Translating a for-loop to perhaps an apply through a list

I have a r code question that has kept me from completing several tasks for the last year, but I am relatively new to r. I am trying to loop over a list to create two variables with a specified correlation structure. I have been able to "cobble" this together with a "for" loop. To further complicate matters, I need to be able to put the correlation number into a data frame two times.
For my ultimate usage, I am concerned about speed, efficiency, and long-term effectiveness of my code.
library(mvtnorm)
n=100
d = NULL
col = c(0, .3, .5)
for (j in 1:length(col)){
X.corr = matrix(c(1, col[j], col[j], 1), nrow=2, ncol=2)
x=rmvnorm(n, mean=c(0,0), sigma=X.corr)
x1=x[,1]
x2=x[,2]
}
d = rbind(d, c(j))
Let me describe my code, so my logic is clear. This is part of a larger simulation. I am trying to draw 2 correlated variables from the mvtnorm function with 3 different correlation levels per pass using 100 observations [toy data to get the coding correct]. d is a empty data frame. The 3 correlation levels will occur in the following way pass 1 uses correlation 0 then create the variables, and yes other code will occur; pass 2 uses correlation .3 to create 2 new variables, and then other code will occur; pass 3 uses correlation .5 to create 2 new variables, and then other code will occur. Within my larger code, the for-loop gets the job done. The last line puts the number of the correlation into the data frame. I realize as presented here it will only put 1 number into this data frame, but when it is incorporated into my larger code it works as desired by putting 3 different numbers in a single column (1=0, 2=.3, and 3=.5). To reiterate, the for-loop gets the job done, but I believe there is a better way--perhaps something in the apply family. I do not know how to construct this and still access which correlation is being used. Would someone help me develop this little piece of code? Thank you.

Looping through a dataset in R and count occurences of variables

I found an interesting data-set from a psychology study (data-set is called WearingTShirt), and I would like to replicate the results. I would need to summarize two variables into a single variable. This is what I have written:
Create empty variable
PinkAndRed = 0
Count instances of people wearing both pink and red and add 1
for i in WearingTShirt:
PinkAndRed+1 if:
WearingTShirt$PINKSHIRT==1 OR WearingTShirt$REDSHIRT==1
Add variable to dataset
WearingTShirt$PinkAndRed
I have not much R experience (I wrote mostly in Python).
Your code is more in python than in R. The equivalent code in R for what you want to do is:
PinkAndRed = rep(0,dim(WearingTShirt)[1])
for(i in 1:dim(WearingTShirt)[1]){
if((WearingTShirt$PINKSHIRT[i]==1) || (WearingTShirt$REDSHIRT[i]==1))
{
PinkAndRed[i] = 1
}
}
WearingTShirt=cbind(WearingTShirt,PinkAndRed)
You need to review basics on R. There are countless small difference between R and python, such as parenthesis in loops or conditions, set the length of a loop (in the above code with dim you calculate the dimension of the dataset and by doing [1] you indicate that you want the number of rows)...
Update:
thanks to the comments i've realized that is not clear if you want a cumulative sum of the individuals with pink and red shirts or a variable which is 1 with the shirt is pink or red, and 0 in other case.
The code above is for a varaible that includes pink and red shirts in one variable.
If you want the sum you must use cumsum function as it's said in the comments
I would not choose to loop, but:
WearingTShirt$PinkAndRed <- ifelse(WearingTShirt$PINKSHIRT==1 |
WearingTShirt$REDSHIRT==1,1,0)
PinkAndRed sounds more like PinkOrRed based on example given.

R: how to divide a vector of values into fixed number of groups, based on smallest distance?

I think I have a rather simple problem but I can't figure out the best approach. I have a vector with 30 different values. Now I need to divide the vector into 10 groups in such a way that the mean within group variance is as small as possible. the size of the groups is not important, it can anything between one and 21.
Example. Let's say I have vector of six values, that I have to split into three groups:
Myvector <- c(0.88,0.79,0.78,0.62,0.60,0.58)
Obviously the solution would be:
Group1 <-c(0.88)
Group2 <-c(0.79,0.78)
Group3 <-c(0.62,0.60,0.58)
Is there a function that gives the same outcome as the example and that I can use for my vector withe 30 values?
Many thanks in advance.
It sounds like you want to do k-means clustering. Something like this would work
kmeans(Myvector,3, algo="Lloyd")
Note that I changed the default algorithm to match your desired output. If you read the ?kmeans help page you will see that there are different algorithms to calculate the different clusters because it's not a trivial computational problem. They might necessarily guarantee optimality.

Sequence manipulation

I have a matrix that is equivalent to a 96-well plate commonly used in microbiology. In that matrix I randomized 12 treatments each 8 times. I printed a kind of guide in order to follow the patten in the lab easily, and then after measurements I merged the randomized plate to the data.
cipPlate <- c(rep(c(seq(0,50,5),"E"),8)); cipPlate
rcipPlate <- array(sample(cipPlate),dim=c(8,12),dimnames=dimna); rcipPlate
platecCIP <- melt(rcipPlate); platecCIPbbCIP
WellCIP <- paste(platecCIP$Var1,platecSTR$Var2,sep=''); WellCIP
bbCIP <- data.frame(Well=WellCIP,ID=platecCIP$value); bbCIP
That works fine, except that the numbers in the sequence created for this are characters instead of integers. Then when i try to use ggplo2 to plot this with the measurements instead of plotting in the x -axis (o,5,10,15,...,50) it goes (0,10,15,...,45,5,50)
Is there a way to avoid this, or to make the numbers inside these sequence to represent the actual number as an integer instead of a character
BTW: sorry for the clumpsy code, I'm not an expert and it works good enough so that i can use it further.

dealing with data table with redundant rows

The title is not precisely stated but I could not come up with other words which summarizes what I exactly going to ask.
I have a table of the following form:
value (0<v<1) # of events
0.5677 100000
0.5688 5000
0.1111 6000
... ...
0.5688 200000
0.1111 35000
Here are some of the things I like to do with this table: drawing the histogram, computing mean value, fitting the distribution, etc. So far, I could only figure out how to do this with vectors like
v=(0.5677,...,0.5688,...,0.1111,...)
but not with tables.
Since the number of possible values are huge by being almost continuous, I guess making a new table would not be that effective, so doing this without modifying the original table and making another table would be desirable very much. But if it has to be done so, it's okay. Thanks in advance.
Appendix: What I want to figure out is how to treat this table as a usual data vector:
If I had the following vector representing the exact same data as above:
v= (0.5677, ...,0.5677 , 0.5688, ... 0.5688, 0.1111,....,0.1111,....)
------------------ ------------------ ------------------
(100000 times) (5000+200000 times) (6000+35000) times
then we just need to apply the basic functions like plot, mean, or etc to get what I wanted. I hope this makes my question more clear.
Your data consist of a value and a count for that value so you are looking for functions that will use the count to weight the value. Type ?weighted.mean to get information on a function that will compute the mean for weighted (grouped) data. For density plots, you want to use the weights= argument in the density() function. For the histogram, you just need to use cut() to combine values into a small number of groups and then use aggregate() to sum the counts for all the values in the group. You will find a variety of weighted statistical measures in package Hmisc (wtd.mean, wtd.var, wtd.quantile, etc).

Resources