Create different subsets of variables in a matrix with no repetition - r

I have a matrix of about 20 different variables. My scope is to create different combinations of these variables (no repetition, each of min 3 variables) and store them each in a new data frame. I tried with combn and expand.grid but it takes as inputs all the values connected to each variable; while I want it to take only the "name" of the variable and not its values.
Thanks in advance

Related

Transform factor variable to numeric R

I have tried multiple things so I'll ask my question here.
I have a dataset, containing of 5 columns. The first one lists countries (text), the second Year (integer) and 3-5 are my variables which now are factors.
I want to run a regression with my 3 variables, which is not possible rn as (I guess) my variables are not numeric/integers. I tried to transform them to numeric directly, but it only gave out ranks. I also tried to firstly transform them to characters and secondly to integers/numeric (tried both), but also only transformed my 3 variables into ranks. I used the transform and as.integer code, thus creating a new dataset.
x<-transform(GDPall, HardWork = as.integer(HardWork), FamilyImportance = as.integer(FamilyImportance), GDPWorker = as.integer(GDPWorker))
How can I transform my 3 variables into a class which allows me to run my regression?
Thank you in advance!

Impute different types of variables with MICE

I am trying to perform imputation on a dataset which has 69 columns and over 50000 rows. My dataset has different types of variables:
columns that only present binary variables (0,1)
categorical columns
columns that take continuous numerical data
Now, I want to perform imputation and I know that my columns have a high level of multicollinearity.
Do I have to split my dataset into 3 different subsets (one for each of 1), 2), 3) type of column that I can have) or should I perform imputation on the whole dataset?
The problem is that the package mice have different methods for each of these types. And if I run three different times, do I have to take into consideration the whole dataset or only that specific part?
You can input your whole dataset at once to mice.
(you can actually specify which method to use for each variable separately)
I am citing from the mice reference:
Parameter 'method'
Can be either a single string, or a vector of strings with length length(blocks), specifying the imputation method to be used for each column in data. If specified as a single string, the same method will be used for all blocks. The default imputation method (when no argument is specified) depends on the measurement level of the target column, as regulated by the defaultMethod argument. Columns that need not be imputed have the empty method "". See details.

Transpose/Reshape Data in R

I have a data set in a wide format, consisting of two rows, one with the variable names and one with the corresponding values. The variables represent characteristics of individuals from a sample of size 1000. For instance I have 1000 variables regarding the size of each individual, then 1000 variables with the height, then 1000 variables with the weight etc. Now I would like to run simple regressions (say weight on calorie consumption), the only way I can think of doing this is to declare a vector that contains the 1000 observations of each variable, say for instance:
regressor1=c(mydata$height0, mydata$height1, mydata$height2, mydata$height3, ... mydata$height1000)
But given that I have a few dozen variables and each containing 1000 observations this will become cumbersome. Is there a way to do this with a loop?
I have also thought a about the reshape options of R, but this again will put me in a position where I have to type 1000 variables a few dozen times.
Thank you for your help.
Here is how I would go about your issue. t() will transpose the data for you from many columns to many rows.
Note: t() can be used with a matrix rather than a data frame, I simply coerced to data frame to show my example will work with your data.
# Many columns, 2 rows
x <- as.data.frame(matrix(nrow=2,ncol=1000,seq(1:2000)))
#2 Columns, many rows
t(x)
Based on your comments you are looking to generate vectors.
If you have transposed:
regressor1 <- x[,1]
regressor2 <- x[,2]
If you have not transposed:
regressor1 <- x[1,]
regressor2 <- x[2,]

Selecting different elements of an R dataframe (one for each row, but possibly different columns) without using loops

Say I have a data.frame of arbitrary dimensions (n by p). I want to extract a vector of length n from that data.frame, one element in the vector per row in the data.frame. However, the column in which each element lies may vary by row. Is there a way to do this without loops?
For example, if I have the following (3x3) data frame, called say DATA
X Y Z
1 17 43
3 4 2
6 9 0
I want to extract one scalar value from DATA per row. I have a vector, call it column.list, c(1,3,1) (arbitrarily selected in this case) which gives the column index for the elements I want, where the kth element of column.list is the column index for row k in DATA. How do I do this without loops? I want to avoid loops because I am using this repeatedly in a simulation study that will take a lot of running time even without loops, and the row number might be 100,000 or so. Much appreciated!
You can do this by indexing your data.frame with a matrix. The first column indicates row, the second indicates column. So if you do
column.list <- c(1,3,1)
DATA[cbind(1:nrow(DATA), column.list)]
You will get
[1] 1 2 6
as desired. If you mix across columns of different classes, all the variable will be coerced to the most accommodating data type.

R: Function: generate and save multiple matrices based on multiple conditions

I am a new R user and an unexperienced coder and I have a data handling problem. Hopefully someone can help:
I have a data.frame with 3 columns (firm, year, class) and about 50.000 rows. I want to generate and store for every firm a (class x year) matrix with class counts as the elements in the matrix. Every matrix would be automatically named something like firm.name and stored so that I can use them afterwards for computations. Ideally, I'd be able to change the simple class counts into a function of values in columns 4 and 5 (backward and forward citations)
I am looking at 40 firms, 30 years, and about 1500 classes (so many firm-year-class counts are zero).
I realise I can get most of what I need (for counts) by simply using table(class,year,firm) as these columns have the same length. However, I don't know how to either store or access the matrices this function generates...
Any help would be greatly appreciated!
Simon
So, your question is how to deal with a table object?
Example:
#note the assigment operator
mytable <- with(ChickWeight, table(cut(weight, c(0,100,200,Inf)), Diet, Chick))
#access the data for the first chick
mytable[,,1]
#turn the table object into a data.frame
as.data.frame(mytable)

Resources