Using fct_relevel over a list of variables using map_at - r

I have a bunch of factor variables that have the same levels, and I want them all reordered similarly using fct_relevel from the forcats package. Many of the variable names start with the same characters ("Q11A" to "Q11X", "Q12A" to "Q12X", "Q13A" to "Q13X", etc.). I wanted to use the starts_with function from dplyr to shorten the task. The following error didn't give me an error, but it didn't do anything either. Is there anything I'm doing wrong?
library(dplyr)
library(purrr)
library(forcats)
library(tibble)
#Setting up dataframe
f1 <- factor(c("a", "b", "c", "d"))
f2 <- factor(c("a", "b", "c", "d"))
f3 <- factor(c("a", "b", "c", "d"))
f4 <- factor(c("a", "b", "c", "d"))
f5 <- factor(c("a", "b", "c", "d"))
df <- tibble(f1, f2, f3, f4, f5)
levels(df$f1)
[1] "a" "b" "c" "d"
#Attempting to move level "c" up before "a" and "b".
df <- map_at(df, starts_with("f"), fct_relevel, "c")
levels(df$f1)
[1] "a" "b" "c" "d" #Didn't work
#If I just re-level for one variable:
fct_relevel(df$f1, "c")
[1] a b c d
Levels: c a b d
#That worked.

I think you're looking for mutate_at:
df <- mutate_at(df, starts_with("f"), fct_relevel, ... = "c")
df$f1
[1] a b c d
Levels: c a b d

Related

Can the "c" statement be used along with the "which" statement?

I am using the R programming language. I am interested in seeing whether the "c" statement can be used along with the "which" statement in R. For example, consider the following code (var1 and var2 are both "Factor" variables):
my_file
var1 var2
1 A AA
2 B CC
3 D CC
4 C AA
5 A BB
ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]
But this does not seem to be working.
I can run each of these conditions individually, e.g.
output <- my_file[which(my_file$var1 == "A" | my_file$var1 == "B" | my_file$var1 == "C"), ]
output1 <- output[which(output$var2 == "AA" | output$var2 == "CC" ), ]
But I would like to run them in a more "compact" form, e.g.:
ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]
Can someone please tell me what I am doing wrong?
Thanks
When you compare my_file$var1 == c("A", "B", "C"), the comparison will take place element-by-element, but because they are different lengths, the shorter will be repeated (with a warning because the repeating is incomplete.
c("A", "B", "D", "C", "A") == c("A", "B", "C", "A", "B") giving:
c(TRUE, TRUE, FALSE, FALSE, FALSE), then which will convert to c(1, 2).
The reason it works when you use one letter at a time is that the single element is repeated 5 times my_file$var1 == "A" leads to c("A", "B", "D", "C", "A") == c("A", "A", "A", "A", "A") and gives the result you expect.
#deschen is right, you should use %in%
output <- my_file[which(my_file$var1 %in% c("A", "B", "C") & !my_file$var2 %in% c("AA", "CC")), ]
As #deschen says in a comment, you should use %in% rather than ==. You can also (1) get rid of the which() (logical indexing works just as well here as indexing by position) and (2) use subset to avoid re-typing my_file.
output <- subset(my_file, var1 %in% c("A", "B", "C") &
!(var2 %in% c("AA", "CC")))
Alternatively, if you like the tidyverse, this would be:
library(dplyr)
output <- my_file %>% dplyr::filter(var1 %in% c("A", "B", "C"),
!(var2 %in% c("AA", "CC")))
(comma-separated conditions in filter() work the same as &).

How to combine multiple vectors such that elements of each vector are distributed as equally as possible?

Let's say I have two or more vectors with to or more elements (single factor) each, e.g.
v1 = c("a", "a", "a")
v2 = c("b", "b")
What I want to do is to merge all vectors and distribute the elements for each group as equally as possible.
For the simple example above there would be a single solution:
c("a", "b", "a", "b", "a")
If v1 = c("a", "a", "a", "a") any of these
c("a", "b", "a", "b", "a", "a")
c("a", "b", "a", "a", "b", "a")
c("a", "a", "b", "a", "b", "a")
would be the best solution. Is there a built-in function that can do this? Any ideas how to implement it?
This would work for two vectors.
v1 = c("a", "a", "a")
v2 = c("b", "b")
distribute_equally <- function(v1, v2) {
v3 <- c(v1, v2)
tab <- sort(table(v3))
c(rep(names(tab), min(tab)), rep(names(tab)[2], diff(range(tab))))
}
distribute_equally(v1, v2)
#[1] "b" "a" "b" "a" "a"
distribute_equally(c('a', 'a'), c('b', 'b'))
#[1] "a" "b" "a" "b"
Thinking of the problem in terms of experimental design optimization, we can get a general solution using the MaxProQQ function in the MaxPro package.
Each position in the merged vector can be thought of as coming from a discrete quantitative factor, and the factors from your v1, v2, etc. can be thought of as qualitative factors. Here's some example code (MaxProQQ takes integer factors instead of characters, but you can convert it afterward):
library(MaxPro)
set.seed(1)
v1 <- rep(1, sample.int(10, 1))
v2 <- rep(2, sample.int(10, 1))
v3 <- rep(3, sample.int(10, 1))
v4 <- rep(4, sample.int(10, 1))
vComb <- c(v1, v2, v3, v4)
vMerge1234 <- MaxProQQ(cbind(1:length(vComb), sample(vComb, length(vComb))), p_nom = 1)$Design
vMerge1234 <- vMerge1234[order(vMerge1234[,1]),][,2]
> vMerge1234
[1] 4 3 4 2 4 3 4 1 2 4 3 4 2 4 3 1 4 3 2 4 1 3 4
Generate 100 samples, say, without replacement from c(v1, v2) giving m which is 5x100 with one column per sample. Then find the column for which the sum of the variances of the frequencies over each group is minimized. If there are more than two vectors just concatenate them in the line marked ## and the rest of the code stays the same.
set.seed(123)
v1 = c("a", "a", "a")
v2 = c("b", "b")
v <- c(v1, v2) ##
m <- replicate(100, sample(v))
varsum <- apply(m, 2, function(x) {
f <- factor(x, levels = unique(v))
sum(tapply(f, v, function(x) var(table(x))))
})
m[, which.min(varsum)]
## [1] "a" "a" "b" "b" "a"

Create new vector from row index of two matching columns

I have a data frame:
a <- c(1,2,3,4,5,6)
b <- c(1,2,1,2,1,4)
c <- c("A", "B", "C", "D", "E", "F")
df <- data.frame(a,b,c)
What I want to do, is create another vector d, which contains the value of c in the row of a which matches each value of b
So my new vector would look like this:
d <- c("A", "B", "A", "B", "A", "D")
As an example, the final value of b is 4, which matches with the 4th row of a, so the value of d is the 4th row of c, which is "D".
If a and b are both lists with integer values you can use them directly.
d <- c[b[a]]
d
[1] "A" "B" "A" "B" "A" "D"
if a is a regular integer sequence along c you can simply call c from b.
c[b]
[1] "A" "B" "A" "B" "A" "D"
Another option is to convert to factor and use it as:
factor(a, labels = c)[b]
#[1] A B A B A D
OR
as.character(factor(a, labels = c)[b])
#[1] "A" "B" "A" "B" "A" "D"
data
a <- c(1,2,3,4,5,6)
b <- c(1,2,1,2,1,4)
c <- c("A", "B", "C", "D", "E", "F")

Counting on dataframe in R

I have a data frame like
A B
A E
B E
B C
..
I want to convert it to two dataframes
One is counting how many times A, B, C.. appear in the first column and other one is counting how many times A, B, B .. appear in the second column.
A 5
B 4
...
Could you give me some suggestions?
Thanks
Try plyr library:
library(plyr)
myDataFrame <- as.data.frame(cbind( c("A", "A", "B", "B", "B", "C"), c("B", "E", "E", "C", "C", "E") ))
count(myDataFrame[,1]) ##prints counts of first column
count(myDataFrame[,2]) ##prints counts of second column
We can use lapply to loop over the columns, get the frequency with table, convert to data.frame and if needed as separate datasets, use list2env (not recommended)
list2env(setNames(lapply(df1, function(x)
as.data.frame(table(x))), paste0("df", 1:2)), envir=.GlobalEnv)
Alternatively, You could also use the dplyr library-
library("dplyr")
df<- as.data.frame(cbind( c("A", "A", "B", "B", "B", "C"), c("B", "E", "E", "C", "C", "E") ))
names(df)<-c("V1","V2")
df <- tbl_df(df)
df %>% group_by(V1) %>% summarise(c1 = n()) ## for column 1
df %>% group_by(V2) %>% summarise(c1 = n()) ## for column 2

Order a numeric vector by length in R

I've got two numeric vectors that I want to order by the length of the their observations, i.e., the number of times each observation appears.
For example:
x <- c("a", "a", "a", "b", "b", "b", "b", "c", "e", "e")
Here, b occurs four times, a three times, e two and c one time. I'd like my result in this order.
ans <- c("b", "b", "b", "b", "a", "a", "a", "e", "e", "c")
I´ve tried this:
x <- x[order(-length(x))] # and some similar lines.
Thanks
Using rle you can get values lenghts. You order lengths, and use values to recreate the vector again using the new order:
xx <- c('a', 'a', 'a', 'b', 'b', 'b','b', 'c', 'e', 'e')
rr <- rle(xx)
ord <- order(rr$lengths,decreasing=TRUE)
rep(rr$values[ord],rr$length[ord])
## [1] "b" "b" "b" "b" "a" "a" "a" "e" "e" "c"
You may also use ave when calculating the lengths
x[order(ave(x, x, FUN = length), decreasing = TRUE)]
# [1] "b" "b" "b" "b" "a" "a" "a" "e" "e" "c"

Resources