Combine factors into single list - r

I have some data about the title of faculties in our college
I want to find all the titles and only list the ones that are unique, e.g. Professor, Assistant Teaching Professor, Instructor.
For instance. make "Distinguished Professor' and 'professor' have the same factor "professor"
Simply, it just like combine two factors to one factor

If I have a data set with 5 factor levels, I can change the levels by just renaming those columns:
set.seed(23)
x <- factor(sample(letters[1:5], 20, replace = TRUE))
x
# [1] c b b d e c e e e e e d b b e a c c e d
# Levels: a b c d e
levels(x)[3] <- "new_level"
x
# [1] a b b d e a e e e e e d b b e a a a e d
# Levels: a b d e
The number of levels will automatically be reduced, as shown.

Related

Remove NA from Table Display

I have the following table that I generated using the table(data$a, data$b) function
a b c NA
d 0 45 42 63 0
e 0 12 45 63 0
f 0 95 65 21 0
NA 0 0 0 0 0
How can I remove the columns with " " and NA?
Here is a reproducible example
a b
a d
a d
a d
a d
a d
a d
a d
a d
a d
a d
a d
a d
b d
b d
b e
b e
b e
b e
c e
c e
c e
c e
c e
c e
c e
c e
c e
c e
c f
c f
c f
c f
c f
c f
c f
c f
c f
c f
c f
Note that there are no "" or NAs in the set, but they still appear in the table
In this table, both of the variables are factors.
Thank you!
It is possible that the NAs are character strings "NA" instead of NA, otherwise, the table would pick up with default useNA= "no" and remove it. One option is to change the values '' and "NA" to NA
df1[df1 == "NA"|df1 == ""] <- NA
Assuming that we have two column dataframe and all of the columns are character class
Update
If the dataset have "NA" or "", it would be a factor class column with unused levels already existing. One option is droplevels and then apply the table
table(droplevels(df1))
If we create a table called "mytable", you could try the following:
bad_cols <- which(colnames(mytable) == "NA" || colnames(mytable) == "")
mytable <- mytable[, -bad_cols]
This will first find the positions in which we either have NA or "" in the column, then we exclude it via subsetting and save it in the variable „mytable“ again.

Referencing a the index object's name in a loop in R

How can R report the actual name i, when using it to name columns and lists in a for loop.
For example, using the following data:
z <- data.frame(x= c(1,2,3,4,5), y = c("a", "b", "v", "d", "e"))
When I reference i from the loop when creating the columns it names it i as the column names.
a_final <- NULL
for(i in z$x){
print(data.frame(i = z$y))
}
Instead, I'd like the columns to be named by the value of each i in the loop, instead.
I'd like the results to look something like:
1 2 3 4 5 6
a a a a a a
b b b b b b
c c c c c c
d d d d d d
e e e e e e
You could create a matrix with data from z$y and dimensions same as nrow(z) and convert it into dataframe.
as.data.frame(matrix(z$y, ncol = nrow(z), nrow = nrow(z)))
# V1 V2 V3 V4 V5
#1 a a a a a
#2 b b b b b
#3 c c c c c
#4 d d d d d
#5 e e e e e
We can also use replicate
as.data.frame(replicate(nrow(z), z$y))

Sorting cells in rows to match respective column headers in R

I'm trying to sort the cell value in multiple rows to match the value of column headers.
What it looks like:
A B C D E
---------
D
A C
C E
B D E
E
What I want it to look like:
A B C D E
---------
D
A C
C E
B D E
E
Any suggestions are much appreciated. Cheers.

Join dataframes including mutual pairs

I want to join two dataframes by two columns they have in common but I do not want mutual pairs to be considered as duplicates.
Sample dataframes look like:
>df
letter1 letter2 value
d e 1
c d 2
c e 4
>dc
letter1 letter2
a e
c a
c d
c e
d a
d c
d e
e a
I want to join them by the first two columns, leaving in the third column the value in df$value and NA if the row does not exist in df. I have tried:
s <- join(dc,df, by = c("letter1","letter2"))
>s
letter1 letter2 value
a e NA
c a NA
c d 2
c e 4
d a NA
d c 2
d e 1
e a NA
Here, the pair d c is considered the same as c d and the value in the third column is the same. What I want is d c being considered as non-present in df, so their row value is NA. My desired output is:
>s
letter1 letter2 value
a e NA
c a NA
c d 2
c e 4
d a NA
d c NA
d e 1
e a NA
How can I join the dataframes so mutual pairs are considered different combinations?
UPDATE: I am sorry but I have just realized there was a problem with my input dataframes and that the join line I was trying actually works. I will accept the first answer that also works to give credit to the author.
We can use apply to change the order
df[1:2] <- t(apply(df[1:2], 1, sort))
dc <- t(apply(dc, 1, sort)
and then do the join
You could use merge instead of join:
merge(dc,df, by = c("letter1","letter2"),all=TRUE)
#Creating the data frames
df <- data.frame(letter1=c("d","c","c"),
letter2=c("e","d","e"),
value=c(1,2,4))
dc <- data.frame(letter1=c("a","c","c","c","d","d","d","e"),
letter2=c("e","a","d","e","a","c","e","a"))
# Merging the data frames
dout <- merge(df,dc,by=c("letter1","letter2"),all=T)
# Outcome
letter1 letter2 value
1 c d 2
2 c e 4
3 c a NA
4 d e 1
5 d a NA
6 d c NA
7 a e NA
8 e a NA

R making a list of factors in a dataframe column

Suppose I have a column in a dataframe:
A A A A B B B B B B B B B C C D D E E E E E E E E E F F F F F F F F F
How can I make a list of the factors within that column?
ie:
A B C D E F
Thank you for your help.
levels(factor(df$col))
OR:
unique(df$col)
Or even:
names(table(df$col))

Resources