How to clean/reconstruct factor in R [duplicate] - r

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
dropping factor levels in a subsetted data frame in R
I have a data frame which has a factor column, then I would like to use subset to extract only part of its data. But the extracted data frame's factor column still has the same levels even some levels has no value. This would impact my following actions (like visualization using ggplot).
The following is a sample code.
d<-data.frame(c1=factor(c(1,1,2,3)),c2=c("a","b","c","d"))
d<-subset(d,c1 %in% c(1,2))
d$c1
The column c1 still have 3 levels (1,2,3), but actually I'd like to it to be (1,2), because these's no value for level 3. Then in visualization, I won't draw any graph for level 3.
How can I achieve that ? Thanks

Use droplevels:
d <- droplevels(d)

Related

plot 3 continuous variables against each other in one plot [duplicate]

This question already has answers here:
Plot each column against each column
(2 answers)
Plot all pairs of variables in R data frame based on column type [duplicate]
(1 answer)
Closed 17 days ago.
I have the output of 3 different algorithms as a continuous vector. Instead of comparing their correlation 1 by 1, I would like to plot them all simuntaionusly in the same plot, but in different panels. The dataframe looks like this (but contains >10k ids):
df <- data.frame(id=1:5,
feature1=runif(5),
feature2=runif(5,min = 3,max=5),
feature3=runif(5, min = 5,max=8))
Ideally, the resulting plot should looks something like this:
I am fairly sure that there is some simple tidyr function, which expands my dataframe in such a way that I can simply use ggplot2 in combination with facet_grid, but I searched and coudn't find anything..
Any help is much appreciated!

How can i separate categorical data from continuous data in a dataset in R? [duplicate]

This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 1 year ago.
I have a dataset and I need to plot histograms for all the continuous data which i know how to do, however I cant use a loop as there are categorical columns too, meaning histograms wont be created for them which will create an error. Is there a way to separate the continuous data from the categorical data? If worse comes to worst, I can just manually remove the categorical features however I would like to know if theres a way to do this automatically for future reference.
You can use package "dplyr", and in the example below, you chose all columns with factor variables
data <- data %>%
select_if(is.factor)

calculate frequency, separate and transpose column that have two factor variable in R

This is my data https://www.dropbox.com/s/msf0ro8saav7wbl/data1.txt?dl=0 (dataA), i want to extract "Habitat" to have frequency table so that i can calculate any statistical analysis such as mean and variance, and also to plot such as boxplot using ggplot2
I tried to use solution in duplicate question here R: How to get common counts (frequency) of levels of two factor variables by ID Variable (as new data frame) but i think it does not help my problem
Here's the easiest way to get a data.frame with frequencies using table. I'm using t to transpose and as.data.frame.matrix to transform it into a data.frame.
as.data.frame.matrix(t(table(data1)))
A B C
Adult 1 2 1
Juvenile 2 0 0

Categorising data within a range in R [duplicate]

This question already has answers here:
Convert continuous numeric values to discrete categories defined by intervals
(2 answers)
Closed 6 years ago.
I have a data frame in R that has a personal ID, an income and some other variables. I would like to add a new column to this data that categorises people in to which income group they fit in to (0-24,999, 25,000-49,999, 50,000-74,999, 75000-99,000, etc).
I then want to be able to make frequency tables of this data compared with some of the other variables (eg: weekly hours worked, age).
I should be fine to figure out the latter of these problems, but I am having trouble figuring out how to categorise my data. Any help would be greatly appreciated.
Thank you.
We can use cut or findInterval to group the "Variable"
gr <- cut(df1$Variable, breaks = c(0, 24999, 49999,74999,99999, Inf))
Then, use table to get the frequency count
table(gr, df1$age)

Ghost factor levels in R [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
dropping factor levels in a subsetted data frame in R
I have subsetted away observations with a certain factor level. When checking whether this has been done with summary() the levels were still listed, but with zero observations. Shouldn't they disappear during the subsetting?
Subsetting doesn't drop empty levels. Why this is the case is that it is a feature. Think of it as your factor levels determine the possible/potential categories of a thing. If you only take a subset of these things, the possible categories of thing don't change, your subset just doesn't contain any of them.
If you want to drop these empty levels, see ?droplevels.
To make the extra levels disappear, use drop=TRUE when subsetting:
newfactor <- oldfactor[indices, drop=TRUE]
Incidentally, one reason this is not the default is that factors with different levels cannot be compared. So if you want to compare your factors with the original vector, or perhaps a different subset of the vector, you'd need to keep the extra levels.

Resources