create levels for data frame column explicitly - r

Quite often, data frames are created based on some raw data and these "down stream" data frames' factor levels may not be aligned. Is the following, the correct way to created a level x, which may exist in another data frame with the same signature?
df <- data.frame(
c1 = c("a", "a", "b", "c")
)
df
str(df)
df$c1 <- factor(as.character(df$c1), ordered = FALSE, levels = c("c", "a", "b", "x"))
df
str(df)

Related

How to sort a data frame on multiple variables of which the names are given in vectors using a base R function?

I have a data frame like the one below:
df <- data.frame(v1 = c("A", "B", "A", "A", "B", "B", "B", "B", "A", "A", "A", "A"),
v2 = c("X", "Y", "X", "Y", "Z", "X", "X", "Y", "X", "Y", "Z", "Z"),
v3 = c(2, 1, 3, 1, 1, 2, 1, 2, 1, 2, 2, 1))
In this data frame v1 and v2 are so called grouping variables (charachter vectors is this case) within I'd like to order my counter variable v3 ascending using (a) base R function(s). There's no requirement for the order in which the grouping variables are sorted (both ascending and descending would be ok). Now in this special case that would be easy:
df <- df[order(df$v1, df$v2, df$v3),]
Or alternatively:
df <- df[do.call(what = order, args = df),]
What I'd like is a more general solution for any data frame with n grouping variables of which the names are contained in a vector and the name of the counter variable is contained in another vector. Reason I want this is that this data is given in a function call in a user defined function and can therefore vary.
grouping_vars <- c("v1", "v2", ..., "vn") #not actual code. Data frame contains *n* variables.
counter <- "vi" #not actual code. One of them, the i-th, is the counter variable.
Again, I'd like to make use of a base R function here (most likely order) and not a solution from data.frame or tidyverse from example.
Your code is almost there. Just use [] behind df to extract grouping and numerical columns for ordering.
df[do.call(what = order, args = df[,c(grouping_vars, counter)]), ]
PeterD: I added a comma in front of the vector that contains the selected columns to be explicit about the selection of columns of data frame df.

How to use fct_relevel with mutate_at syntax

I want to relevel the factors in a dataset, however I'm really struggling with the fct_relevel syntax and using it with mutate_at. I get a series of errors about my data not being a factor.
The solution must allow me to relevel multiple factors (the actual dataset has 20-odd factors to relevel in different ways)
This answer seems like it should work, but I'm clearly not picking up the syntax properly. Where am I going wrong?
Here's an example:
library(tidyverse)
dat <- tibble (x1 = c("b", "b", "a", "c", "b"),
x2 = c("c", "b", "c", "a", "a"),
y = c(10, 5, 12, 3, 4)) %>%
mutate_at(.vars = vars(x1:x2), factor)
I'm definitely dealing with factors
sapply(dat, class)
But I can't relevel x1, I receive the following error: f must be a factor (or character vector))
dat %>% fct_relevel(x1, "c", "b", "a")
And this is what I ideally want to be able to do
dat2 <- dat %>%
mutate_at(.vars = vars (x1:x2),
.funs = fct_relevel("c", "b", "a"))
At the moment that final set is giving me the following errors:
Error: Can't create call to non-callable object
Call rlang::last_error() to see a backtrace
In addition: Warning message:
Unknown levels in f: b, a
I'd be really grateful for anyone pointing out what I'm sure is an obvious mistake.
This should work
library(dplyr)
library(forcats)
dat <- dat %>% mutate_at(vars(x1:x2), ~fct_relevel(., c("c", "b", "a")))
dat$x1
#[1] b b a c b
#Levels: c b a
dat$x2
#[1] c b c a a
#Levels: c b a
We can specify it with
library(forcats)
dat <- dat %>%
mutate_at(.vars = vars (x1:x2),
fct_relevel, c("c", "b", "a"))

Subsetting data from a dataframe and taking specific values from the subsetted values

I want to check if values (in example below "letters") in 1 dataframe appear in another dataframe. And if that is the case, I want a value (in example below "ranking") which is specific for that value from the first dataframe to be added to the second dataframe... What I have now Is the following:
Df1 <- data.frame(c("A", "C", "E"), c(1:3))
colnames(Df1) <- c("letters", "ranking")
Df2 <- data.frame(c("A", "B", "C", "D", "E"))
colnames(Df2) <- c("letters")
Df2$rank <- ifelse(Df2$letters %in% Df1$letters, 1, 0)
However... Instead of getting a '1' when the letters overlap, I want to get the specific 'ranking' number from Df1.
Thanks!
What you're looking for is called a merge:
merge(Df2, Df1, by="letters", all.x=TRUE)
Also, fun fact, you can create a dataframe and name the columns at the same time (and you'll usually want to "turn off" strings as factors):
df1 <- data.frame(
letters = c("a", "b", "c"),
ranking = 1:3,
stringsAsFactors = FALSE)
dplyr package is best for this.
Df2 <- Df2 %>%
left_join(Df1,by = "letters")
this will show a NA for "D" if you want to keep it.
Otherwise you can do semi_join
DF2 <- Df2 %>%
semi_join(Df1, by = "letters")
And this will only keep the ones they have in common (intersection)

Subset a Data Frame Based on All Combinations and Sub-combinations of Factor Variables

I need to subset a data.frame based on all combinations an sub-combinations of multiple columns of factor variables. Additionally the number of columns factor variables may change so the method needs to be flexible in accepting different numbers of attributes. I can figure out how to create the combinations of variables in a simple example but don't have a good way to subset the data.frame efficiently. Any thoughts?
#setup an example data.frame
a <- c("a", "b", "b", "b", "e")
b <- c("b", "c", "b", "b", "f")
c <- c("c", "d", "b", "b", "g")
df <- data.table(a = a, b = b, c = c)
#build a data.frame of unique combos to subset on
df_unique <- df[!duplicated(df), ]
df_combos <- data.table()
for(i in 1:ncol(df_unique)){
for(x in 1:ncol(df_unique)){
df_sub <- df_unique[,i:x, with = F]
df_combos <- rbind(df_combos, df_sub, fill = T)
}
}
df_combos <- df_combos[!duplicated(df_combos), ]
rm(df_unique)
#create a loop to build the subsets
combos_out <- data.table()
for(i in 1:nrow(df_combos)){
df_combos_sub <- df_combos[i, ]
df_combos_sub <- df_combos_sub[,which(unlist(lapply(df_combos_sub, function(x)!all(is.na(x))))),with=F]
df_sub <- merge(df, df_combos_sub, by = colnames(df_combos_sub))
#interesting code here that performs analysis on the subsets
}

Reorder / arrange bars in a plot(table) while keeping value names

I would like to plot the result of table in a decreasing order, but if I sort the table before plotting it the plot does not show the value names anymore.
a <- data.frame(var = c("A", "A", "B", "B", "B", "B", "B", "C", "D", "D", "D"))
plot(table(a))
plot(sort(table(a)))
We get the count with table ('tbl'), order the elements and assign it to 'tbl' to keep the same structure as in 'tbl' and then plot. In the OP's code, the sort or order converts the table class to matrix.
tbl <- table(a)
tbl[] <- tbl[order(tbl)]
plot(tbl)

Resources