looping through column name as object - r

I'm trying to use loop through column names in dataframe.
I have a dataset, testset, which has about 100 columns. To simplify, I will call these columns in alphabet, a, b, c, ... and so on.
I made function to mutate values in testset into other value.
mutatefun<-function(i){
i1<-rlang::quo_name(enquo(i))
output<-testset%>%mutate(!! i1:=case_when({{i}} %in% c(1)~0,
{{i}} %in% c(2)~50,
{{i}} %in% c(3)~100,
TRUE~NA_real_))
return(output)
}
When i ran mutatefun(a), it worked. Now I want to use this function for multiple columns in testset
With help of this post R: looping through column names, I created varlist and ran this code.
varlist<-c("a","c","e","k")
for(i in varlist){
output<-mutatefun({{i}})}
However, it didn't return expected results. Instead, it return column a as NA. I think it might read i as only character, not object in testset.
Would you let me know how to solve this problem?

Try using map_dfc with transmute
library(dplyr)
purrr::map_dfc(varlist, ~{
var <- sym(.x)
transmute(testset, !!.x := case_when(!!var == 1~0,
!!var == 2~50,
!!var == 3~100,
TRUE~ NA_real_))
})
If you want other columns which are not in varlist, you may add
%>% bind_cols(testset %>% select(-varlist))
at the end of the chain.

Related

Four arguments passed to 'for' which requires three arguments for a 'for' loop

I am trying to add a new column, currency, to df "myfile".
The contents of that column are conditional like if the year column fulfills this condition then the new column has this value, else another value.
When I tried if else without the loop, it says >1, so I guessed if else couldn’t work for a vector with multiple elements, I could use a for loop, but then this error showed:
myfile$currency <- myfile %>% for (i in year) {if(year>2000){print("Latest")}else{"Oldest"}}
Error in for (. in i) year : 4 arguments passed to 'for' which requires 3
You can use ifelse in mutate. See the documentation for dpylr.
library(dplyr)
myfile <- myfile %>%
mutate(
currency = ifelse(year > 2000, "latest", "oldest")
)
If you have more conditions, see case_when.
Or you can do something like this:
myfile$currency[myfile$year > 2000] <- "latest"
myfile$currency[myfile$year <= 2000] <- "oldest"

Get all combinations of a character vector

I am trying to write a function to dynamically group_by every combination of a character vector.
This is how I set it up my list:
stuff <- c("type", "country", "color")
stuff_ListStr <- do.call("c", lapply(seq_along(stuff), function(i) combn(stuff, i, FUN = list)))
stuff_ListChar <- sapply(stuff_ListStr, paste, collapse = ", ")
stuff_ListSym <- lapply(stuff_ListChar, as.symbol)
Then I threw it into a loop.
b <- list()
for (each in stuff_ListSym) {
a <- answers_wfh %>%
group_by(!!each) %>%
summarize(n=n())
b <- append(b, a)
}
So essentially I want to replicate this
... group_by(type),
... group_by(country),
... group_by(type, country),
... and the rest of the combinations. Then I want put all the summaries into one list (a list of tibbles/lists)
It's totally failing. This is my error message:
Error: Column `type, country` is unknown.
Not only that, b is not giving me what I want. It's a list with length 12 already when I only expected 2 before it failed. One tibble grouped by 'type' and the second by 'country'.
I'm new to R in general but thought tidy eval was really cool and wanted to try. Any tips here?
I think you have a problem of standard evaluation. !! is sometimes not enough to unquote variables and get dplyr to work. Use !!! and rlang::syms for multiple unquotes
b <- list()
for (each in stuff_ListSym) {
a <- answers_wfh %>%
group_by(!!!rlang::syms(each)) %>%
summarize(n=n())
b <- append(b, a)
}
I think lapply would be better in your situation than for since you want to end-up with a list
Since you use variable names as arguments of functions, you might be more comfortable with data.table than dplyr. If you want the equivalent data.table implementation:
library(data.table)
setDT(answers_wfh)
lapply(stuff_ListSym, function(g) answers_wfh[,.(n = .N), by = g])
You can have a look at this blog post I wrote on the subject of SE vs NSE in dplyr and data.table
I think stuff_ListStr is enough to get what you want. You cold use group_by_at which accepts character vector.
library(dplyr)
library(rlang)
purrr::map(stuff_ListStr, ~answers_wfh %>% group_by_at(.x) %>% summarize(n=n()))
A better option is to use count but count does not accept character vectors so using some non-standard evaluation.
purrr::map(stuff_ListStr, ~answers_wfh %>% count(!!!syms(.x)))

R - lapply - getting data frames back out of lists?

I have the same problem as this guy: returning from list to data.frame after lapply
Whilst they solved his specific problem, no one actually answered his original question about how to get dataframes out of a list.
I have a list of data frames:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
And I want to filter/replace etc on them all.
So my function is:
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
And I use lapply to run the function on them all like this:
a = lapply(dfPreList, DoThis)
As the other post stated, these data frames are now stuck in this list (a), and I need a for loop to get them out, which just cannot be the correct way of doing it.
This is my current working way of applying the function to the dataframes and then getting them out:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
dfPreListstr= list('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
a = lapply(dfPreList, DoThis)
for( i in seq_along(dfPreList)){
assign(dfPreListstr[[i]], as.data.frame(a[i]))
}
Is there a way of doing this without having to rely on for loops and string names of the dataframes? I.e. a one-liner with the lapply?
Many thanks for your help
You can assign names to the list and then use list2env.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
list2env(a, .GlobalEnv)
Another way would be to unlist the list, then convert the content into data frame.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
yearlyFunding <- data.frame(matrix(unlist(a$yearlyFunding), nrow= nrow(yearlyFunding), ncol= ncol(yearlyFunding)))
yearlyPubs <- data.frame(matrix(unlist(a$yearlyPubs), nrow= nrow(yearlyPubs), ncol= ncol(yearlyPubs)))
yearlyAuthors <- data.frame(matrix(unlist(a$yearlyAuthors), nrow= nrow(yearlyAuthors), ncol= ncol(yearlyAuthors)))
Since unlist function returns a vector, we first generate a matrix, then convert it to data frame.

dplyr mutate inside for loop - Issue

I am performing Data Analysis and cleaning in R using tidyverse.
I have a Data Frame with 23 columns containing values 'NO','STEADY','UP' and 'down'.
I want to change all the values in these 23 columns to 0 in case of 'NO','STEADY' and 1 in other case.
What i did is, i created a list by name keys in which i have kept all my columns, After that i am using for loop, ifelse statements and mutate.
Please have a look at the code below
# Column names are kept in the list by name keys
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
After that, i used following code to get the desired result :
for (col in keys){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1)) }
I was expecting that, it will do the changes that i require, but nothing happens after this. (NO ERROR MESSAGE AND NO DESIRED RESULT)
After that, i researched further and executed following code
for (col in keys){
print(col)}
It gives me elements of list as characters like - "metformin"
So, i thought - may be this is the issue. Hence, i used the below code to caste the keys as symbols :
keys_new = sym(keys)
After that i again ran the same code:
for (col in keys_new){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1))}
It gives me following Error -
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
After all this. I also tried to create a function to get the desired results, but that too didn't worked:
change = function(name){
Dataset = Dataset %>%
mutate(name = ifelse(name %in% c('No','Steady'),0,1),
name = as.factor(name))
return(Dataset)}
for (col in keys){
change(col)}
This didn't perform any action. (NO ERROR MESSAGE AND NO DESIRED RESULT)
When keys_new is placed in this code:
for (col in keys_new){
change(col)}
I got the same Error :
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
PLEASE GUIDE
There's no need to loop or keep track of column names. You can use mutate_all -
Dataset %>%
mutate_all(~ifelse(. %in% c('No','Steady'), 0, 1))
Another way, thanks to Rui Barradas -
Dataset %>%
mutate_all(~as.integer(!. %in% c('No','Steady')))
There's a simpler way using mutate_at and case_when.
Dataset %>% mutate_at(keys, ~case_when(. %in% c("NO", "STEADY") ~ 0, TRUE ~ 1))
mutate_at will only mutate the columns specified in the keys variable. case_when then lets you replace one value by another by some condition.
This answer for using mutate through forloop.
I don't have your data, so i tried to make my own data, i changed the keys into a tibble using enframe then spread it into columns and used the row number as a value for each column, then check if the value is higher than 10 or not.
To use the column name in mutate you have to use !! and := in the mutate function
df <- enframe(c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
) %>% spread(key = value,value = name)
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
for (col in keys){
df = df %>%
mutate(!!as.character(col) := ifelse( df[col] > 10,0,100) )
}

How to build subset query using a loop in R?

I'm trying to subset a big table across a number of columns, so all the rows where State_2009, State_2010, State_2011 etc. do not equal the value "Unknown."
My instinct was to do something like this (coming from a JS background), where I either build the query in a loop or continually subset the data in a loop, referencing the year as a variable.
mysubset <- data
for(i in 2009:2016){
mysubset <- subset(mysubset, paste("State_",i," != Unknown",sep=""))
}
But this doesn't work, at least because paste returns a string, giving me the error 'subset' must be logical.
Is there a better way to do this?
Using dplyr with the filter_ function should get you the correct output
library(dplyr)
mysubset <- data
for(i in 2009:2016)
{
mysubset <- mysubset %>%
filter_(paste("State_",i," != \"Unknown\"", sep = ""))
}
To add to Matt's answer, you could also do it like this:
cols <- paste0( "State_", 2009:2016)
inds <- which( mysubset[ ,cols] == "Unknown", arr.ind = T)[,1]
mysubset <- mysubset[-(unique(inds), ]

Resources