get all the combinations of 10 factor variables - r

I have 10 factor variables, i want to get all the possible unique combinations of the factor variables by level wise.
My dataframe has the following data variables:
And i want to get the output as formatted below:

unique(dataframe_name)
This command will display unique values in dataframe.
unique_data <- subset(unique(dataframe_name))

Related

Create full data frame from possible combinations of grouping variables

I apologize if this has been asked before, but I could not find the answer I needed when there are three grouping variables.
I need to fill a dataframe with possible combinations of variables, but insert NAs for a non-grouping observation values when a combination does not appear. Say there is a dataframe with three grouping variables: Year, Geography, and Grouping:
Year <- rep(2008:2019,each=50)
Geography <- rep(1:60,each=10)
Grouping <- rep(1:4,each=150)
value <- seq(rnorm(600,mean=0,sd=1))
df=cbind(Year,Geography)
df=as.data.frame(cbind(df,value))
But the dataframe is missing some random observations like so:
df2=df[-c(15,60,150,510),]
How would one go about changing the dataframe back into a length of 600 (which is the length it would be if all possible combinations of three grouping variables were present), but inserting NAs where the value would be if the combinations were in the dataframe? Note that all unique observations for each grouping variable are present in the dataset at some point.

Extract certain amount of factor levels in R

I have a data frame with a column with more than 100 factor levels.
I want to extract rows to make the column just have 50 factor levels, to decrease the calculation time.
How to randomly extract certain amount of factor levels?
To avoid no answer ...
You can use sample to get a random sample of the factor and then use %in% to select the relevant rows of your data.frame.
ReducedFactors = sample(levels(df$MyFactor), 50)
df[which(df$MyFactor %in% ReducedFactors ), ]

how to ignore factors or levels while using summary function in R

I am working on a dataset which has a column with only 2 possible values i.e. 0 and 1. I applied as.factor() to this column and it created two levels for me.
dr$col <- as.factor(dr$col)
Now when I do summary(dataset) it gives me occurrences of those values instead of mean/max/min etc. values.
summary(dr)
col
0:12
1:34
How can I advice summary function to ignore the factors for that column and calculate aggregate values like it does for other numeric columns.
Let's assume you have the following
>> vec=c(1,1,1,1,0,0,0)
>> vecf=as.factor(vec)
Then the following will give you the desired results
>> summary(as.numeric(as.character(vecf)))

keep most common factor levels in R

I used the "dummies" package to create 42 dummy variables for the 42 levels of a factor variable in my data-frame. Now I only want to keep the 5 dummies that represent the five most common factor levels. I used:
counts <- colSums(dummy_variables)
rank <- sort(counts)
to figure out what those levels are, but now I want to be able to reference the most common ones and keep them in my data frame. I am somewhat new to R - I just can't figure out the syntax to do this.
Filter out the top 5 variables, and then subset only those columns.
rank <- sort(counts)[(length(counts)-4):length(counts)]
dummy_variables <- dummy_variables[names(dummy_variables) %in% names(rank)]
Or in one line as the commenter suggested,
dummy_variables[names(dummy_variables) %in% names(tail(sort(colSums(dummy_variables)),5))]

Are there any packages that actually sample randomly without including the same data?

I have tried sample(), and srswor()/ srswr() from the sampling package, none of these will select from my vector of factors, x number of unique factor levels. Just as often as not, they return two factor levels that are the same, in amongst however many random samples I ask for. Is there a package or script that can randomly select factor levels, but where no two are the same?
To sample from the factor levels you can simply do:
sample(levels(factor_variable), 10)
This randomly samples 10 levels from the total amount of unique levels in factor_variable.

Resources