How to use fct_relevel with mutate_at syntax - r

I want to relevel the factors in a dataset, however I'm really struggling with the fct_relevel syntax and using it with mutate_at. I get a series of errors about my data not being a factor.
The solution must allow me to relevel multiple factors (the actual dataset has 20-odd factors to relevel in different ways)
This answer seems like it should work, but I'm clearly not picking up the syntax properly. Where am I going wrong?
Here's an example:
library(tidyverse)
dat <- tibble (x1 = c("b", "b", "a", "c", "b"),
x2 = c("c", "b", "c", "a", "a"),
y = c(10, 5, 12, 3, 4)) %>%
mutate_at(.vars = vars(x1:x2), factor)
I'm definitely dealing with factors
sapply(dat, class)
But I can't relevel x1, I receive the following error: f must be a factor (or character vector))
dat %>% fct_relevel(x1, "c", "b", "a")
And this is what I ideally want to be able to do
dat2 <- dat %>%
mutate_at(.vars = vars (x1:x2),
.funs = fct_relevel("c", "b", "a"))
At the moment that final set is giving me the following errors:
Error: Can't create call to non-callable object
Call rlang::last_error() to see a backtrace
In addition: Warning message:
Unknown levels in f: b, a
I'd be really grateful for anyone pointing out what I'm sure is an obvious mistake.

This should work
library(dplyr)
library(forcats)
dat <- dat %>% mutate_at(vars(x1:x2), ~fct_relevel(., c("c", "b", "a")))
dat$x1
#[1] b b a c b
#Levels: c b a
dat$x2
#[1] c b c a a
#Levels: c b a

We can specify it with
library(forcats)
dat <- dat %>%
mutate_at(.vars = vars (x1:x2),
fct_relevel, c("c", "b", "a"))

Related

R How to remap letters in a string

I’d be grateful for suggestions as to how to remap letters in strings in a map-specified way.
Suppose, for instance, I want to change all As to Bs, all Bs to Ds, and all Ds to Fs. If I do it like this, it doesn’t do what I want since it applies the transformations successively:
"abc" %>% str_replace_all(c(a = "b", b = "d", d = "f"))
Here’s a way I can do what I want, but it feels a bit clunky.
f <- function (str) str_c( c(a = "b", b = "d", c = "c", d = "f") %>% .[ strsplit(str, "")[[1]] ], collapse = "" )
"abc" %>% map_chr(f)
Better ideas would be much appreciated.
James.
P.S. Forgot to specify. Sometimes I want to replace a letter with multiple letters, e.g., replace all As with the string ZZZ.
P.P.S. Ideally, this would be able to handle vectors of strings too, e.g., c("abc", "gersgaesg", etc.)
We could use chartr in base R
chartr("abc", "bdf", "abbbce")
#[1] "bdddfe"
Or a package solution would be mgsub which would also match and replace strings with number of characters greater than 1
library(mgsub)
mgsub("abbbce", c("a", "b", "c"), c("b", "d", "f"))
#[1] "bdddfe"
mgsub("abbbce", c("a", "b", "c"), c("ba", "ZZZ", "f"))
#[1] "baZZZZZZZZZfe"
Maybe this is more elegant? It will also return warnings when values aren't found.
library(plyr)
library(tidyverse)
mappings <- c(a = "b", b = "d", d = "f")
str_split("abc", pattern = "") %>%
unlist() %>%
mapvalues(from = names(mappings), to = mappings) %>%
str_c(collapse = "")
# The following `from` values were not present in `x`: d
# [1] "bdc"

Counting number of elements in a character column by levels of a factor column in a dataframe

I am a beginner in R. I have a dataframe in which there are two factor columns. One column is a company column, second is a product column. There are several missing values in product column and so I want to count the number of values in product column for each company (or each level of the company variable). I tried table, and count function in plyr package but they only seem to work with numeric variables. Please help!
Lets say the data frame looks like this:
df <- data.frame(company= c("A", "B", "C", "D", "A", "B", "C", "C", "D", "D"), product = c(1, 1, 2, 3, 4, 3, 3, NA, NA, NA))
So the output I am looking for is -
A 2
B 2
C 3
D 2
Thanks in advance!!
A dplyr solution.
df %>%
filter(!is.na(product)) %>%
group_by(company) %>%
count()
# A tibble: 4 × 2
comp n
<fctr> <int>
1 A 2
2 B 2
3 C 3
4 D 1
We can use rowsum from base R
with(df, rowsum(+!is.na(prod), comp))
Assuming your df is :
CASE 1) As give in question
Data for df:
options(stringsAsFactors = F)
comp <- c("A", "B", "C", "D", "A", "B", "C", "C", "D","D" )
prod <- c(1,1,2,3,4,3,3,1,NA,NA)
df <- data.frame(comp=comp,prod=prod)
Program:
df$prodflag <- !is.na(df$prod)
tapply(df$prodflag , df$comp,sum)
Output:
> tapply(df$prodflag , df$comp,sum)
A B C D
2 2 3 1
#########################################################################
CASE 2) In case stringsAsFactors is on and prod is in characters, even NAs are quoted as characters and marked as factors then you can do:
Data:
comp <- c("A", "B", "C", "D", "A", "B", "C", "C", "D","D" )
prod <- c("a","a","b","c","d","c","c","a","NA","NA")
df <- data.frame(comp=comp,prod=prod,stringsAsFactors = T)
Solution:
df$prodflag <- as.numeric(!as.character(df$prod)=="NA")
tapply(df$prodflag , df$comp,sum)
#########################################################################
CASE 3) In case the prod is a character and stringsAsFactors is on but NAs are not quoted then you can do:
Data:
comp <- c("A", "B", "C", "D", "A", "B", "C", "C", "D","D" )
prod <- c("a","a","b","c","d","c","c","a",NA,NA)
df <- data.frame(comp=comp,prod=prod,stringsAsFactors = T)
Solution:
df$prodflag <- as.numeric(!is.na(df$prod))
tapply(df$prodflag , df$comp,sum)
Moral of the story, we should understand our data and then we can the logic which best suits our need.

Counting on dataframe in R

I have a data frame like
A B
A E
B E
B C
..
I want to convert it to two dataframes
One is counting how many times A, B, C.. appear in the first column and other one is counting how many times A, B, B .. appear in the second column.
A 5
B 4
...
Could you give me some suggestions?
Thanks
Try plyr library:
library(plyr)
myDataFrame <- as.data.frame(cbind( c("A", "A", "B", "B", "B", "C"), c("B", "E", "E", "C", "C", "E") ))
count(myDataFrame[,1]) ##prints counts of first column
count(myDataFrame[,2]) ##prints counts of second column
We can use lapply to loop over the columns, get the frequency with table, convert to data.frame and if needed as separate datasets, use list2env (not recommended)
list2env(setNames(lapply(df1, function(x)
as.data.frame(table(x))), paste0("df", 1:2)), envir=.GlobalEnv)
Alternatively, You could also use the dplyr library-
library("dplyr")
df<- as.data.frame(cbind( c("A", "A", "B", "B", "B", "C"), c("B", "E", "E", "C", "C", "E") ))
names(df)<-c("V1","V2")
df <- tbl_df(df)
df %>% group_by(V1) %>% summarise(c1 = n()) ## for column 1
df %>% group_by(V2) %>% summarise(c1 = n()) ## for column 2

Preserving zero length groups with aggregate

I just noticed that aggregate disappears empty groups from the result, how can I solve this? e.g.
`xx <- c("a", "b", "d", "a", "d", "a")
xx <- factor(xx, levels = c("a", "b", "c", "d"))
y <- rnorm(60, 5, 1)
z <- matrix(y, 6, 10)
aggregate(z, by = list(groups = xx), sum)`
xx is a factor variable with 4 levels, but the result gives just 3 rows, and would like a row for the "c" level with zeros. I would like the same behavior of table(xx) tha gives frecuencies even for levels with no observations.
We can create another data.frame with just the levels of 'xx' and then merge with the aggregate. The output will have all the 'groups' while the row corresponding to the missing level for the other columns will be NA.
merge(data.frame(groups=levels(xx)),
aggregate(z, by = list(groups = xx), sum), all.x=TRUE)
Another option might be to convert to 'long' format with melt and then use dcast with fun.aggregate as 'sum' and drop=FALSE
library(data.table)
dcast(melt(data.table(groups=xx, z), id.var='groups'),
groups~variable, value.var='value', sum, drop=FALSE)

Select and count the number of duplicate items with two different outcome values?

Long-time follower, thanks so much for all your help over the years! I have a question that might have an easy answer, but I failed in googling it, and trying various subsetting and bracket notation also feel short. I'm betting someone here has encountered a similar problem.
I have a long-form data set with a set of duplicate ids. I also have a third variable that might be different for the duplicate. By example, if you recreate my data set:
x <- c("a", "a", "b", "c", "c", "d", "d", "d")
y <- c("z", "z", "z", "y", "y", "y", "x", "x")
z <- c(10, 20, 10, 10, 10, 10, 10, 20)
df <- cbind(x, y, z)
df <- as.data.frame(df)
names(df) <- c("id1", "id2", "var1")
df
I want to select the rows in which id2 has BOTH a 10 and 20 when they are connected to the same id1, For example, 'x' has two observations connected to id1 ('a') with two different var1 values (a '10' and a '20).
I want to select these cases, as well as count how many cases like this are in the overall data set. Thanks in advance!
One way is with ddply from the plyr package. Something like this:
> library(plyr)
> ddply(df, c('id2', 'id1'), function(x) if(length(unique(x$var1))==2) x)
id1 id2 var1
1 d x 10
2 d x 20
3 a z 10
4 a z 20

Resources