Context: I have a large dataset (CoreData) with an accompanying datafile (CoreValues) that contains the code and values for each variable within the dataset.
Problem: I want to use a loop to assign each variable within the dataset (CoreData) the correct value labels (from the CoreValues data).
What I've tried so far:
I have created a character vector that identifies which variables within my main data (CoreData) have values that need to be added:
Core_VarwithValueLabels<- unique(CoreValues$Abbreviation)
I have tried a for loop using the vector created , to create vectors for both the label and level arguments that feed into the factor() function.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=paste0(i, 'Levels'), labels = paste0(i, 'Labels'))
}
This creates the correct label and level vectors, however, they are not being picked up properly within the factor function.
Question: Can you help me identify how to get my factor function to work within this loop or if there is a more appropriate method?
Sample data:
CoreValues:
example data from CoreValues
CoreData:
example data from CoreData
UPDATE: RESOLVED
I have now resolved this by using the get() function within my factor() function as it uses the strings I've created with paste0() and find the vector of that name.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=get(paste0(i, 'Levels')), labels = get(paste0(i, 'Labels')))
}
(I am new in R)
Trying to change variables data type of df members to factors based on condition if their names available in a list to_factors_list.
I have tried some code using mutate(across()) but it's giving errors.
Data prep.:
library(tidyverse)
# tidytuesday himalayan data
members <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
# creating list of names
to_factors_list <- members %>%
map_df(~(data.frame(n_distinct = n_distinct(.x))),
.id = "var_name") %>%
filter(n_distinct < 15) %>%
select(var_name) %>% pull()
to_factors_list
############### output ###############
'season''sex''hired''success''solo''oxygen_used''died''death_cause''injured''injury_type'
Getting error in below code attempts:
members %>%
mutate(across(~.x %in% to_factors_list, factor))
members %>%
mutate_if( ~.x %in% to_factors_list, factor)
I am not sure what's wrong and how can I make this work ?
In base R, this can be done with lapply
members[to_factors_list] <- lapply(members[to_factors_list], factor)
The correct syntax is:
members %>% mutate(across(to_factors_list, factor))
Or if you prefer an older-version dplyr syntax:
members %>% mutate_at(vars(to_factors_list), factor)
I got a problem with the use of MUTATE, please check the next code block.
output1 <- mytibble %>%
mutate(newfield = FND(mytibble$ndoc))
output1
Where FND function is a FILTER applied to a large file (5GB):
FND <- function(n){
result <- LARGETIBBLE %>% filter(LARGETIBBLE$id == n)
return(paste(unique(result$somefield),collapse=" "))
}
I want to execute FND function for each row of output1 tibble, but it just executes one time.
Never use $ in dplyr pipes, very rarely they are used. You can change your FND function to :
library(dplyr)
FND <- function(n){
LARGETIBBLE %>% filter(id == n) %>% pull(somefield) %>%
unique %>% paste(collapse = " ")
}
Now apply this function to every ndoc value in mytibble.
mytibble %>% mutate(newfield = purrr::map_chr(ndoc, FND))
You can also use sapply :
mytibble$newfield <- sapply(mytibble$ndoc, FND)
FND(mytibble$ndoc) is more suitable for data frames. When you use functions such as mutate on a tibble, there is no need to specify the name of the tibble, only that of the column. The symbols %>% are already making sure that only data from the tibble is used. Thus your example would be:
output1 <- mytibble %>%
mutate(newfield = FND(ndoc))
FND <- function(n){
result <- LARGETIBBLE %>% filter(id == n)
return(paste(unique(result$somefield),collapse=" "))
}
This would be theoretically, however I do not know if your function FND will work, maybe try it and if not, give some practical example with data and what you are trying to achieve.
I am performing Data Analysis and cleaning in R using tidyverse.
I have a Data Frame with 23 columns containing values 'NO','STEADY','UP' and 'down'.
I want to change all the values in these 23 columns to 0 in case of 'NO','STEADY' and 1 in other case.
What i did is, i created a list by name keys in which i have kept all my columns, After that i am using for loop, ifelse statements and mutate.
Please have a look at the code below
# Column names are kept in the list by name keys
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
After that, i used following code to get the desired result :
for (col in keys){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1)) }
I was expecting that, it will do the changes that i require, but nothing happens after this. (NO ERROR MESSAGE AND NO DESIRED RESULT)
After that, i researched further and executed following code
for (col in keys){
print(col)}
It gives me elements of list as characters like - "metformin"
So, i thought - may be this is the issue. Hence, i used the below code to caste the keys as symbols :
keys_new = sym(keys)
After that i again ran the same code:
for (col in keys_new){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1))}
It gives me following Error -
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
After all this. I also tried to create a function to get the desired results, but that too didn't worked:
change = function(name){
Dataset = Dataset %>%
mutate(name = ifelse(name %in% c('No','Steady'),0,1),
name = as.factor(name))
return(Dataset)}
for (col in keys){
change(col)}
This didn't perform any action. (NO ERROR MESSAGE AND NO DESIRED RESULT)
When keys_new is placed in this code:
for (col in keys_new){
change(col)}
I got the same Error :
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
PLEASE GUIDE
There's no need to loop or keep track of column names. You can use mutate_all -
Dataset %>%
mutate_all(~ifelse(. %in% c('No','Steady'), 0, 1))
Another way, thanks to Rui Barradas -
Dataset %>%
mutate_all(~as.integer(!. %in% c('No','Steady')))
There's a simpler way using mutate_at and case_when.
Dataset %>% mutate_at(keys, ~case_when(. %in% c("NO", "STEADY") ~ 0, TRUE ~ 1))
mutate_at will only mutate the columns specified in the keys variable. case_when then lets you replace one value by another by some condition.
This answer for using mutate through forloop.
I don't have your data, so i tried to make my own data, i changed the keys into a tibble using enframe then spread it into columns and used the row number as a value for each column, then check if the value is higher than 10 or not.
To use the column name in mutate you have to use !! and := in the mutate function
df <- enframe(c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
) %>% spread(key = value,value = name)
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
for (col in keys){
df = df %>%
mutate(!!as.character(col) := ifelse( df[col] > 10,0,100) )
}
I'm new to R and have a problem
I am trying to reformat some data, and in the process I would like to rename the columns of the new data set.
here is how I have tried to do this:
first the .csv file is read in, lets say case1_case2.csv
then the name of the .csv file is broken up into two parts
each part is assigned to a vector
so it ends up being like this:
xName=case1
yName=case2
After I have put my data into new columns I would like to rename each column to be case1 and case2
to do this I tried using the rename function in R but instead of renaming to case1 and case2 the columns get renamed to xName and yName.
here is my code:
for ( n in 1:length(dirNames) ){
inFile <- read.csv(dirNames[n], header=TRUE, fileEncoding="UTF-8-BOM")
xName <- sub("_.*","",dirNames[n])
yName <- sub(".*[_]([^.]+)[.].*", "\\1", dirNames[n])
xValues <- inFile %>% select(which(str_detect(names(inFile), xName))) %>% stack() %>% rename( xName = values ) %>% subset( select = xName)
yValues <- inFile %>% select(which(!str_detect(names(inFile), xName))) %>% stack() %>% rename(yName = values, Organisms=ind)
finalForm <- cbind(xValues, yValues) %>% filter(complete.cases(.))
}
how can I make sure that the variables xName and yName are expanded inside of the rename() function
thanks.
You didn't provide a reproducible example, so I'll just demonstrate the idea in general. The rename function is part of the dplyr package.
You need to "unquote" the variable that contains the string you want to use as the new column name. The unquote operator is !! and you'll need to use the special := assignment operator to make unquoting on the left hand side allowed.
library(tidyverse)
df <- data_frame(x = 1:3)
y <- "Foo"
df %>% rename(y=x) # Not what you want - need to unquote y
df %>% rename(!!y = x) # Gives error - need to use :=
df %>% rename(!!y := x) # Correct