I'm having some trouble randomly sampling 1 column out of a group. I have over 300 columns, and over 500 rows. I am attempting to sample 1 column out of the first 15, and then move on to sample 1 column from the next 15, etc... until there are no more.
For the basic first sample, I used:
sample(DATA[,1:15],1)
But it only outputs a single number. If I change my size to 535 (amount of rows), it grabs 535 random numbers in total from columns 1:15.
I referenced the below link, which had a somewhat similar basis, but the accepted answer is what I tried and can't seem to work:
R: random sample of columns excluding one column
Any suggestions?
The output of a sample function is an integer. It should be used to randomize the column of the dataframe, not the entire dataframe, like you did earlier.
DATA[,sample(1:15,1)]
This will randomly select columns from 1 to 15 and will return the output as you desired.
Found my answer pretty quickly:
DATA[,sample(1:15,1)]
Related
I am working with R and am faced with the following combinatorial problem. The initial situation is a data frame with 512 rows containing all possible triple combinations of the digits 1 to 8:
expand.grid(rep(list(1:8), 3))
Now I would like to sample 420 rows from this data frame so that the frequency of each digit in each column is as similar as possible.
The randomly produced table would look like this and contains - depending on chance - very fluctuating frequencies.
expand.grid(rep(list(1:8), 3)) %>%
filter(row_number() %in% sample(1:nrow(.), 420))
Does some sort of constraint exist in order to obtain frequencies that are as equal as possible?
Edit:
However, the result doesn't have to be random. Is there a way to filter 420 rows with maximally equal frequencies?
I would like to create a column of numbers using R code. I want the numbers one through five to repeat 200 times in a row each, and then the number 6 to repeat 125 times in a row. How can I code this? I tried just coding
New <- c(1:6,each=200)
hoping it would just stop when it had filled all 1125 available columns. But I just get an error message instead.
Thanks for your help!
I have a number of dataframes, each with different numbers of rows. I want to break them all into smaller dataframes that have no more than 50 rows each, for example.
So, if I had a dataframe with 107 rows, I want to output the following:
A dataframe containing rows 1-50
A dataframe containing rows 51-100
A dataframe containing rows 101-107
I have been reading many examples using the split() function but I have not been able to find any usage of split() or any other solution to this that does not pre-define the number of dataframes to split into, or scrambles the order of the data, or introduce other problems.
This seems like such a simple task that I am surprised that I have not been able to find a solution.
Try:
split(df,(seq_len(nrow(df))-1) %/% 50)
What have in common the first 50 rows? If you make an integer division (%/%) of the index of row (less one) by 50, they all give 0 as result. As you can guess, rows 51-100 give 1 and so on. The (seq_len(nrow(df))-1) %/% 50 basically indicate the group you want to split into.
I looked everywhere but did not find answer to my question. I am having trouble with makig contingency table. I have data with many columns, let say 1, 2 and 3. In the first column there are let say 100 different values, in the second 20 and the third column has 2 possible values: 0 and 1. First I take just data with value 1 in column 3 (data<-data[Column3==1,]). Now I have only around 20 different values in 1. column and 5 in 2. column. However when I do a contingency table its size is 100x20, not 20x5, and contains a lot of zeros (they correspond to combination of column1 and column2 which has value 0 in column3). I would be greatful for every kind of help, thanks.
I guess all your three variables are factors.So convert them into character using
as.character()
to all three variables then apply
table()
for that.
I think this will be relatively elementary, but I cannot for the life of me figure it out.
Imagine a dataset in which there are 108 rows, made up of two readings for 54 clones. Pretty much, I need to condense a dataset based on clone (column 2), by averaging the cells from [6:653], whilst keeping the information for column 1, 2, 3, 654 (which is identical for these columns between the two readings).
I have a pretty small dataset, in which I have 108 rows, and 654 columns, which i would like to whittle down to a smaller dataset. Now, the rows consist of 54 different tree clones (column 2), each with two readings (column 4) (54 * 2 = 108). I would like to average the two readings for each clone, reducing my dataset to 54 rows. Just FYI, the first 5 columns are characters, the next 648 are numeric. I would like to remove columns 4 and 5 from the new dataset, leaving a dataset of 54x652, but this is optional.
I believe that a (plyr) function or something will do the trick, but i can't make it work. I've tried a bunch of things, but it just won't play ball.
Thanks in advance.
For average you can use mean for leaving out a row or column just subtract the row.
Example:
table[-x, ] - deletes the x row
table[ ,-x] - deletes the x column
(x can be one number or x<-c(1:3) # the first three rows/columns)
If you provide more information I think others will also help.