Looping across 10 columns at a time in R - r

I have a dataframe with 1000 columns. I am trying to loop over 10 columns at a time and use the seqdef() function from the TraMineR package to do sequence alignment across the data in those columns. Hence, I want to apply this function to columns 1-10 in the first go-around, and columns 11-20 in the second go-around, all the way up to 1000.
This is the code I am using.
library(TraMineR)
by(df[, 1:10], seqdef(df))
However, this only loops over the first 10 and then stops. How do I loop it across chunks of 10 columns?

You could do something like this using mapply and setting the sequence of columns to loop across.
colpicks <- seq(10,1000,by=10)
mapply(function(start,stop) seqdef(df[,start:stop]), colpicks-9, colpicks)

Related

splitting a dataframe into several smaller datframes with an equal number of columns

If I have a dataframe in R:
dim(df)
[1] 9 705936
and I want to divide it into 28 parts by splitting it on the columns, and still have all nine rows when I am finished in each smaller dataframe. How do I do that? Because I have managed to screw this up every which way that I've tried and I'm all out of patience. It seems like this should be a one line command but I can't get it.
Update II after testing with fake dataframe with 705936 columns (removed previous answer):
The correct answer is using split.default
split.default(df, rep(1:28, each = 25212))

Using FOR LOOP over Multiple Columns of MATRIC and keeping FIRST column constant in RStudio

I am running the Automatic Variance Ratio (AVR) test on my dataset in R. My Dataset Contains 6 Indices i.e. columns exculing the date column. In this test, I need to use FOR LOOP which would constantly roll over the first column i.e. Date column, and keep changing/moving from the 2nd till the 6th column. I am new to R, therefore, I don't know exactly what to do and how to do it. Currently, I have a code that can run this for only the 2nd column but from the 2nd column onwards it can loop over. All of you are requested to please help me in this regard.
A standard way to loop through the columns of a dataframe is with lapply. If your dataframe is df with 7 columns and you want to loop through columns 2 through 7 and your function is Av.VR() then
output_list <- lapply(df[,2:7], function(x) Av.VR(x))
should yield a list of outputs for each column.
Note I have no experience using the function Av.VR().

for-loop with multiple columns

I have a matrix with 10 columns and 1000 rows of data. I have a vector with 1000 rows of data. I want to subtract that vector from every column separately, using a loop. What is the best way to do this in R?

Get all possible combinations on a large dataset in R

I am having a large dataset having more than 10 million records and 20 variables. I need to get every possible combination for 11 variables out of these 20 variables and for each combination the frequency also should be displayed.
I have tried count() in plyr package and table() function. But both of them are unable to get all possible combinations, since the number of combinations are very high (greater than 2^32 combinations) and also the size is huge.
Assume following dataset having 5 variables and 6 observations -
And I want all possible combinations of first three variables where frequencies are greater than 0.
Is there any other function to achieve this? I am just interested in combinations whose frequency is non-zero.
Thanks!
OK. I think I have an idea of what you require. If you are saying you want the count by N categories of rows in your table, you can do so with the data.table package. It will give you the count of all combinations that exist in the table. Simply list the required categories in the by arguement
DT<-data.table(val=rnorm(1e7),cat1=sample.int(10,1e7,replace = T),cat2=sample.int(10,1e7,replace = T),cat3=sample.int(10,1e7,replace = T))
DT_count<-DT[, .N, by=.(cat1,cat2,cat3)]

Split a dataframe into any number of smaller dataframes with no more than N number of rows

I have a number of dataframes, each with different numbers of rows. I want to break them all into smaller dataframes that have no more than 50 rows each, for example.
So, if I had a dataframe with 107 rows, I want to output the following:
A dataframe containing rows 1-50
A dataframe containing rows 51-100
A dataframe containing rows 101-107
I have been reading many examples using the split() function but I have not been able to find any usage of split() or any other solution to this that does not pre-define the number of dataframes to split into, or scrambles the order of the data, or introduce other problems.
This seems like such a simple task that I am surprised that I have not been able to find a solution.
Try:
split(df,(seq_len(nrow(df))-1) %/% 50)
What have in common the first 50 rows? If you make an integer division (%/%) of the index of row (less one) by 50, they all give 0 as result. As you can guess, rows 51-100 give 1 and so on. The (seq_len(nrow(df))-1) %/% 50 basically indicate the group you want to split into.

Resources