Eliminate rows with a particular factor level - r

I have a factor with a great many levels and would like to eliminate 2 of them. I know that I could use subset to extract particular levels of a factor to form a new dataset. Since I have many levels, I would like to eliminate particular levels instead, however.
In the following example, how do I eliminate the fish level?
catsDogsFish=c(rep("cat", 5), rep("dog", 5), rep("fish", 5))
happyInside=c(7,4,9,7,8, 2,4,7,3,3,9,8,9,10,10)
happyPets=data.frame(catsDogsFish, happyInside)

Related

R Subset by Factor level

I am attempting to formulate code that will subset by factor level not factor name so minimal editing is required to run the code with other data sets. The number of factor levels is always 4, yet the name of those levels may change depending on the data set.1
Specifically, the name of the SET1$Bearing factors may change. For examples "001°" may be "002°" in the next data set. Presently I use this code...
SET1Bearing1<-SET1[SET1$Bearing=="001°",]
Yet this only subsets by the name of the factor. Is there a way to subset by factor level regardless of the factor name?
SET1Bearing1<-SET1[SET1$Bearing == levels(SET1$Bering)[4],]
Should work

Add vector of numbers of same combination of factor levels to a dataframe

I need to know how many levels of a certain factor have each of the unique list of levels of other factors in the data frame. In other words, for this data, how many sites have crop1 vs. how many sites have crop2, and then how many have crop1 on soil a ad. infinitum. I want numbers for each level of interaction (no interaction/only crops, crops*soil) This is pretty easy with aggregate if I just want to answer one question at a time. But, with nine factors this gets pretty tedious to find the numbers then merge those back to the dataframe as I've done below.
df <- data.frame(crop = c(1,1,2,2,2,2,2,2,2,1),
site=c(LETTERS[1:7],"A","A","A"),
soil=c('a','a','b','b','b','c','c','c','c','c'))
Add numbers of sites with the same crop, same soil, same crop x soil
#... 1st for crop
f<-(unique(df[c("site","crop")]))
f<-(aggregate(numeric(nrow(f)), f[c("crop")], length))
names(f)[names(f)=="x"]<-"sites_w_same_cr"
df<-merge(df,f,by="crop")
#..2nd for soil
f<-(unique(df[c("site","soil")]))
f<-(aggregate(numeric(nrow(f)), f[c("soil")], length))
names(f)[names(f)=="x"]<-"sites_w_same_sl"
df<-merge(df,f,by="soil")
#..3rd for soil*crop
f<-(unique(df[c("site","crop","soil")]))
f<-(aggregate(numeric(nrow(f)), f[c("crop","soil")], length))
names(f)[names(f)=="x"]<-"sites_w_same_cr.sl"
df<-merge(df,f,by=c("crop","soil"))
How do I keep doing this for more factors, in other words put the answer to "How many sites grow crop 1, on soil a, are irrigated, receive fertilizer, ...?" in new columns on the same dateframe? The column names I gave for the merged columns are for clarity, they could simply be some combination of the factors like "crop", "crop.soil" etc that could be generated from the factors themselves. This answer shows how to get the levels of all factors at once, but not how to get the length of each one or each interaction. Thanks!

R - How to convert factor into ordered factor without loss of information

I have a factor like this one : factor
This factor has 4 levels (the four possible responses for the survey) but people only reply with 2 of the 4 choices.
I want to convert in an ordered factor without loss of information, so without the loss of the unused choices. I used the "ordered" function but the new ordered factor has lost the 2 unused levels : ordered factor
I would like to emphasize the fact that I had a lot of factors to convert, therefore I don't want to do this with a manual solution.
Thank you for your help.

keep most common factor levels in R

I used the "dummies" package to create 42 dummy variables for the 42 levels of a factor variable in my data-frame. Now I only want to keep the 5 dummies that represent the five most common factor levels. I used:
counts <- colSums(dummy_variables)
rank <- sort(counts)
to figure out what those levels are, but now I want to be able to reference the most common ones and keep them in my data frame. I am somewhat new to R - I just can't figure out the syntax to do this.
Filter out the top 5 variables, and then subset only those columns.
rank <- sort(counts)[(length(counts)-4):length(counts)]
dummy_variables <- dummy_variables[names(dummy_variables) %in% names(rank)]
Or in one line as the commenter suggested,
dummy_variables[names(dummy_variables) %in% names(tail(sort(colSums(dummy_variables)),5))]

Are there any packages that actually sample randomly without including the same data?

I have tried sample(), and srswor()/ srswr() from the sampling package, none of these will select from my vector of factors, x number of unique factor levels. Just as often as not, they return two factor levels that are the same, in amongst however many random samples I ask for. Is there a package or script that can randomly select factor levels, but where no two are the same?
To sample from the factor levels you can simply do:
sample(levels(factor_variable), 10)
This randomly samples 10 levels from the total amount of unique levels in factor_variable.

Resources