Order by column using infix operator - r

It's possibly very simple question, but I couldn't find an answer. I'm trying to apply abs on my matrix and then apply order by the first column (descending).
In separate rows it looks like:
pcaRotaMat <- abs(pcaImportance$rotation)
temp <- pcaRotaMat[order(-pcaRotaMat[,1]),]
However, when I'm trying to use the infix operator (%>%), I'm getting the following error:
t <- pcaImprtance$rotation %>% abs() %>% order(-[,1],)
Error: unexpected '[' in "t <- pcaImprtance$rotation %>% abs() %>% order(["
Your help will be appreciated.

If you are comfortable with something more verbose:
sort_fn = function(x) {
x[order(-x[ ,1]), ]
}
t <- pcaImprtance$rotation %>% abs() %>% sort_fn
Option 2:
If you don't want to create a function to sort:
t <- pcaImprtance$rotation %>% abs %>% .[order(-.[, 1]), ]
"." is the placeholder here for the matrix. I would also not recommend assigning variables to "t", as this is the function that transposes matrices.

Related

Get all combinations of a character vector

I am trying to write a function to dynamically group_by every combination of a character vector.
This is how I set it up my list:
stuff <- c("type", "country", "color")
stuff_ListStr <- do.call("c", lapply(seq_along(stuff), function(i) combn(stuff, i, FUN = list)))
stuff_ListChar <- sapply(stuff_ListStr, paste, collapse = ", ")
stuff_ListSym <- lapply(stuff_ListChar, as.symbol)
Then I threw it into a loop.
b <- list()
for (each in stuff_ListSym) {
a <- answers_wfh %>%
group_by(!!each) %>%
summarize(n=n())
b <- append(b, a)
}
So essentially I want to replicate this
... group_by(type),
... group_by(country),
... group_by(type, country),
... and the rest of the combinations. Then I want put all the summaries into one list (a list of tibbles/lists)
It's totally failing. This is my error message:
Error: Column `type, country` is unknown.
Not only that, b is not giving me what I want. It's a list with length 12 already when I only expected 2 before it failed. One tibble grouped by 'type' and the second by 'country'.
I'm new to R in general but thought tidy eval was really cool and wanted to try. Any tips here?
I think you have a problem of standard evaluation. !! is sometimes not enough to unquote variables and get dplyr to work. Use !!! and rlang::syms for multiple unquotes
b <- list()
for (each in stuff_ListSym) {
a <- answers_wfh %>%
group_by(!!!rlang::syms(each)) %>%
summarize(n=n())
b <- append(b, a)
}
I think lapply would be better in your situation than for since you want to end-up with a list
Since you use variable names as arguments of functions, you might be more comfortable with data.table than dplyr. If you want the equivalent data.table implementation:
library(data.table)
setDT(answers_wfh)
lapply(stuff_ListSym, function(g) answers_wfh[,.(n = .N), by = g])
You can have a look at this blog post I wrote on the subject of SE vs NSE in dplyr and data.table
I think stuff_ListStr is enough to get what you want. You cold use group_by_at which accepts character vector.
library(dplyr)
library(rlang)
purrr::map(stuff_ListStr, ~answers_wfh %>% group_by_at(.x) %>% summarize(n=n()))
A better option is to use count but count does not accept character vectors so using some non-standard evaluation.
purrr::map(stuff_ListStr, ~answers_wfh %>% count(!!!syms(.x)))

Problem with mutate keyword and functions in R

I got a problem with the use of MUTATE, please check the next code block.
output1 <- mytibble %>%
mutate(newfield = FND(mytibble$ndoc))
output1
Where FND function is a FILTER applied to a large file (5GB):
FND <- function(n){
result <- LARGETIBBLE %>% filter(LARGETIBBLE$id == n)
return(paste(unique(result$somefield),collapse=" "))
}
I want to execute FND function for each row of output1 tibble, but it just executes one time.
Never use $ in dplyr pipes, very rarely they are used. You can change your FND function to :
library(dplyr)
FND <- function(n){
LARGETIBBLE %>% filter(id == n) %>% pull(somefield) %>%
unique %>% paste(collapse = " ")
}
Now apply this function to every ndoc value in mytibble.
mytibble %>% mutate(newfield = purrr::map_chr(ndoc, FND))
You can also use sapply :
mytibble$newfield <- sapply(mytibble$ndoc, FND)
FND(mytibble$ndoc) is more suitable for data frames. When you use functions such as mutate on a tibble, there is no need to specify the name of the tibble, only that of the column. The symbols %>% are already making sure that only data from the tibble is used. Thus your example would be:
output1 <- mytibble %>%
mutate(newfield = FND(ndoc))
FND <- function(n){
result <- LARGETIBBLE %>% filter(id == n)
return(paste(unique(result$somefield),collapse=" "))
}
This would be theoretically, however I do not know if your function FND will work, maybe try it and if not, give some practical example with data and what you are trying to achieve.

dplyr mutate inside for loop - Issue

I am performing Data Analysis and cleaning in R using tidyverse.
I have a Data Frame with 23 columns containing values 'NO','STEADY','UP' and 'down'.
I want to change all the values in these 23 columns to 0 in case of 'NO','STEADY' and 1 in other case.
What i did is, i created a list by name keys in which i have kept all my columns, After that i am using for loop, ifelse statements and mutate.
Please have a look at the code below
# Column names are kept in the list by name keys
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
After that, i used following code to get the desired result :
for (col in keys){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1)) }
I was expecting that, it will do the changes that i require, but nothing happens after this. (NO ERROR MESSAGE AND NO DESIRED RESULT)
After that, i researched further and executed following code
for (col in keys){
print(col)}
It gives me elements of list as characters like - "metformin"
So, i thought - may be this is the issue. Hence, i used the below code to caste the keys as symbols :
keys_new = sym(keys)
After that i again ran the same code:
for (col in keys_new){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1))}
It gives me following Error -
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
After all this. I also tried to create a function to get the desired results, but that too didn't worked:
change = function(name){
Dataset = Dataset %>%
mutate(name = ifelse(name %in% c('No','Steady'),0,1),
name = as.factor(name))
return(Dataset)}
for (col in keys){
change(col)}
This didn't perform any action. (NO ERROR MESSAGE AND NO DESIRED RESULT)
When keys_new is placed in this code:
for (col in keys_new){
change(col)}
I got the same Error :
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
PLEASE GUIDE
There's no need to loop or keep track of column names. You can use mutate_all -
Dataset %>%
mutate_all(~ifelse(. %in% c('No','Steady'), 0, 1))
Another way, thanks to Rui Barradas -
Dataset %>%
mutate_all(~as.integer(!. %in% c('No','Steady')))
There's a simpler way using mutate_at and case_when.
Dataset %>% mutate_at(keys, ~case_when(. %in% c("NO", "STEADY") ~ 0, TRUE ~ 1))
mutate_at will only mutate the columns specified in the keys variable. case_when then lets you replace one value by another by some condition.
This answer for using mutate through forloop.
I don't have your data, so i tried to make my own data, i changed the keys into a tibble using enframe then spread it into columns and used the row number as a value for each column, then check if the value is higher than 10 or not.
To use the column name in mutate you have to use !! and := in the mutate function
df <- enframe(c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
) %>% spread(key = value,value = name)
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
for (col in keys){
df = df %>%
mutate(!!as.character(col) := ifelse( df[col] > 10,0,100) )
}

Use input of purrr's map function to create a named list as output in R

I am using the map function of the purrr package in R which gives as output a list. Now I would like the output to be a named list based on the input. An example is given below.
input <- c("a", "b", "c")
output <- purrr::map(input, function(x) {paste0("test-", x)})
From this I would like to access elements of the list using:
output$a
Or
output$b
We just need to name the list
names(output) <- input
and then extract the elements based on the name
output$a
#[1] "test-a"
If this needs to be done using tidyverse
library(tidyverse)
output <- map(input, ~paste0('test-', .)) %>%
setNames(input)
Update
Now in 2020 the answer form #mihagazvoda describes the correct approach: simply set_names before applying map
c("a", "b", "c") %>%
purrr::set_names() %>%
purrr::map(~paste0('test-', .))
Outdated answer
The accepted solution works, but suffers from a repeated argument (input) which may cause errors and interrupts the flow when using piping with %>%.
An alternative solution would be to use a bit more power of the %>% operator
1:5 %>% { set_names(map(., ~ .x + 3), .) } %>% print # ... or something else
This takes the argument from the pipe but still lacks some beauty. An alternative could be a small helper method such as
map_named = function(x, ...) map(x, ...) %>% set_names(x)
1:5 %>% map_named(~ .x + 1)
This already looks more pretty and elegant. And would be my preferred solution.
Finally, we could even overwrite purrr::map in case the argument is a character or integer vector and produce a named list in such a case.
map = function(x, ...){
if (is.integer(x) | is.character(x)) {
purrr::map(x, ...) %>% set_names(x)
}else {
purrr::map(x, ...)
}
}
1 : 5 %>% map(~ .x + 1)
However, the optimal solution would be if purrr would implement such behaviour out of the box.
The recommended solution:
c("a", "b", "c") %>%
purrr::set_names() %>%
purrr::map(~paste0('test-', .))

dplyr::filter used with a function on string representation of factor

I have a dataframe with some 20 columns and some 10^7 rows. One of the columns is an id column that is a factor. I want to filter the rows by properties of the string representation of the levels of the factor. The code below achieves this, but seems to me to be really rather inelegant. In particular that I have to create a vector of the relevant ids seems to me should not be needed.
Any suggestions for streamlining this?
library(dplyr)
library(tidyr)
library(gdata)
dat <- data.frame(id=factor(c("xxx-nld", "xxx-jap", "yyy-aus", "zzz-ita")))
europ.id <- function(id) {
ctry.code <- substring(id, nchar(id)-2)
ctry.code %in% c("nld", "ita")
}
ids <- levels(dat$id)
europ.ids <- subset(ids, europ.campaign(ids))
datx <- dat %>% filter(id %in% europ.ids) %>% drop.levels
Docendo Discimus gave the right answer in comments. To explain it first see the error I kept getting in my different attempts
> dat %>% filter(europ.id(id))
Error in nchar(id) : 'nchar()' requires a character vector
Calls: %>% ... filter_impl -> .Call -> europ.id -> substring -> nchar
Then note that his solution works because grepl applies as.character to its argument if needed (from the man: a character vector where matches are sought, or an object which can be coerced by as.character to a character vector). This implicit application of as.character also happens if you use %in%. Since this solution is also perfectly performant, we can do the following
dat %>% filter(europ.id(as.character(id)) %>% droplevels
Or to make it read a bit nicer update the function to
europ.id <- function(id) {
ids <- as.character(id)
ctry.code <- substring(ids, nchar(ids)-2)
ctry.code %in% c("nld", "ita")
}
and use
dat %>% filter(europ.id(id)) %>% droplevels
which reads exactly like what I was looking for.

Resources