How to use a List Item in fct_recode - r

I want to rename factor levels with fct_recode by using items I created beforehand.
I first create some labels and save them into a list:
#Creating the Labels:
LabelsWithN <- c(
sprintf("Man(%s)", FreqGender["Man","Freq"]),
sprintf("Woman(%s)", FreqGender["Woman","Freq"]),
sprintf("Non-Binary(%s)", FreqGender["Non-Binary","Freq"]),
sprintf("Other(%s)", FreqGender["Other","Freq"]),
sprintf("Prefer Not To Disclose(%s)", FreqGender["Prefer not to disclose","Freq"])
)
This creates a chr list with items like "Man(105)", "Woman(51)" etc.
Now I want to relabel the factors in the original DataSet (i.e. "Man" --> "Man(105)") in order to label a graph. I want to use either the list item (i.e., LabelsWithN[1]) or directly the function creating the string (i.e., sprintf("Man(%s)", FreqGender["Man","Freq"]).
I then try to enter either the list item or the function into fct_recode:
#Using the Labels:
DataSet %>%
mutate(`Gender. What_is_your_ge.._` = fct_recode(`Gender. What_is_your_ge.._`, LabelsWithN[1] = "Man", sprintf("Woman(%s)", FreqGender["Woman","Freq"]) = "Woman")) %>%
#THis is just the code for the graph:
ggplot(aes(x = `Gender. What_is_your_ge.._` , y = `Age. How_old_are_you?`, main = "Age Distribution By Gender")) +
geom_boxplot() +
xlab("Gender (n)") +
ylab("Age")
However, this yields:
"Unexpected '=' in:
"DataSet %>%
mutate(`Gender. What_is_your_ge.._` = fct_recode(`Gender. What_is_your_ge.._`, LabelsWithN[1] ="
It doesn't matter if I use the function or the list item.
The vector is a factor and the list is filled with characters. If I manipulate the code to rename the factor "man" to "cat" ("cat" = "Man") the code works fine.
How can I address the list item/enter the function into fct_recode so that it works?
Also, can somebody explain to me what the problem here is? If I print out LabelsWithN[1] I get the correct string printed out.
Thank you and Bw,
Jan

perhaps this might help ?
x <- factor(c("apple", "bear", "banana", "dear")) # what you want to recode
levels <- c(fruit = "apple", fruit = "banana") # how you want to recode it
x <- fct_recode(x, !!!levels) # recoding it

Related

Change column values depending on other column in R

I have problem with my data frame.
I have a dataframe with 2 columns, 'word' and 'word_categories'. I created different variables which include the different words, e.g. 'noun' which includes all the nouns of the word column. I now want to change the labels in the word_categories column to the corresponding variable. So if the word in the word column is included in the object 'noun', I want the word_categories column to display 'noun'.
df <- read.csv("palm.csv")
noun <- c("house", ...)
adj <- c("hard", ...)
...
The data frame looks like the following. It includes other columns but they are fine.
word word_categories
house
car
hard
...
I now want to look, if the words are in any of the created objects and if so, I want the corresponding label printed in the word_categories column. So for 'house' the column should show noun, for 'hard' it should show adjective. If the word is in none of the objects, it should show nothing or 'NA'.
I tried it with the following:
palm$word_categories <- ifelse(palm$word == noun, "noun",
ifelse(palm$word == adj, "adjective", "")))
This, however, doesn't work at all and I have 7 Objects in total so the statement becomes ridiculously long. How do I do it properly?
If the dataframe is called palm (you first call it df but later you use palm) and noun and adj are vectors as you define above, I would do:
library(dplyr)
palm <- palm %>%
mutate(word_categories = case_when(word %in% noun ~ "noun",
word %in% adj ~ "adjective",
TRUE ~ NA_character_))
One way would be to create a named vector of your noun/adjective dictionaries to select each element. The name would be the word and the corresponding data would be noun, adjective etc. You didn't really supply any data so I made some up.
df <- data.frame(
stringsAsFactors = FALSE,
word = c("dog", "short", "bird", "cat", "short", "man")
)
nounName <- c('dog', 'cat', 'bird')
adjName <- c('quick', 'brown', 'short')
noun <- rep('noun', length(nounName))
adj <- rep('adjective', length(adjName))
names(noun) <- nounName
names(adj) <- adjName
partsofspeech <- c(noun, adj)
df$word_categories <- partsofspeech[df$word]

Ho to run a function (many times) that changes variable (tibble) in global env

I'm a newbie in R, so please have some patience and... tips are most welcome.
My goal is to create tibble that holds a "Full Name" (of a person, that may have 2 to 4 names) and his/her gender. I must start from a tibble that contains typical Male and Female names.
Below I present a minimum working example.
My problem: I can call get_name() multiple time (in 10.000 for loop!!) and get the right answer. But, I was looking for a more 'elegant' way of doing it. replicate() unfortunately returns a vector... which make it unusable.
My doubts: I know I have some (very few... right!!) issues, like the if statement, that is evaluated every time (which is redundant), but I don't find another way to do it. Any suggestion?
Any other suggestions about code struct are also welcome.
Thank you very much in advance for your help.
# Dummy name list
unit_names <- tribble(
~Women, ~Man,
"fem1", "male1",
"fem2", "male2",
"fem3", "male3",
"fem4", "male4",
"fem5", "male5",
"fem6", NA,
"fem7", NA
)
set.seed(12345) # seed for test
# Create a tibble with the full names
full_name <- tibble("Full Name" = character(), "Gender" = character() )
get_name <- function() {
# Get the Number of 'Unit-names' to compose a 'Full-name'
nbr_names <- sample(2:4, 1, replace = TRUE)
# Randomize the Gender
gender <- sample(c("Women", "Man"), 1, replace = TRUE)
if (gender == "Women") {
lim_names <- sum( !is.na(unit_names$"Women"))
} else {
lim_names <- sum( !is.na(unit_names$"Man"))
}
# Sample the Fem/Man List names (may have duplicate)
sample(unlist(unit_names[1:lim_names, gender]), nbr_names, replace = TRUE) %>%
# Form a Full-name
paste ( . , collapse = " ") %>%
# Add it to the tibble (INCLUDE the Gender)
add_row(full_name, "Full Name" = . , "Gender" = gender)
}
# How can I make 10k of this?
full_name <- get_name()
If you pass a larger number than 1 to sample this problem becomes easier to vectorise.
One thing that currently makes your problem much harder is the layout of your unit_names table: you are effectively treating male and female names as individually paired, but they clearly aren’t: hence they shouldn’t be in columns of the same table. Use a list of two vectors, for instance:
unit_names = list(
Women = c("fem1", "fem2", "fem3", "fem4", "fem5", "fem6", "fem7"),
Men = c("male1", "male2", "male3", "male4", "male5")
)
Then you can generate random names to your heart’s delight:
generate_names = function (n, unit_names) {
name_length = sample(2 : 4, n, replace = TRUE)
genders = sample(c('Women', 'Men'), n, replace = TRUE)
names = Map(sample, unit_names[genders], name_length, replace = TRUE) %>%
lapply(paste, collapse = ' ') %>%
unlist()
tibble(`Full name` = names, Gender = genders)
}
A note on style, unlike your function the above doesn’t use any global variables. Furthermore, don’t "quote" variable names (you do this in unit_names$"Women" and for the arguments of add_row). R allows this, but this is arguably a mistake in the language specification: these are not strings, they’re variable names, making them look like strings is misleading. You don’t quote your other variable names, after all. You do need to backtick-quote the `Full name` column name, since it contains a space. However, the use of backticks, rather than quotes, signifies that this is a variable name.
I am not 100% of what you are trying to get, but if I got it right...did you try with mutate at dplyr? For example:
result= mutate(data.frame,
concated_column = paste(column1, column2, column3, column4, sep = '_'))
With a LITTLE help from Konrad Rudolph, the following elegant (and vectorized ... and fast) solution that I was looking. map2 does the necessary trick.
Here is the full working example if someone needs it:
(Just a side note: I kept the initial conversion from tibble to list because the data arrives to me as a tibble...)
Once again thanks to Konrad.
# Dummy name list
unit_names <- tribble(
~Women, ~Men,
"fem1", "male1",
"fem2", "male2",
"fem3", "male3",
"fem4", "male4",
"fem5", "male5",
"fem6", NA,
"fem7", NA
)
name_list <- list(
Women = unit_names$Women[!is.na(unit_names$Women)],
Men = unit_names$Men[!is.na(unit_names$Men)]
)
generate_names = function (n, name_list) {
name_length = sample(2 : 4, n, replace = TRUE)
genders = sample(c('Women', 'Men'), n, replace = TRUE)
#names = lapply(name_list[genders], sample, name_length) %>%
names = map2(name_list[genders], name_length, sample) %>%
lapply(paste, collapse = ' ') %>%
unlist()
tibble(`Full name` = names, Gender = genders)
}
full_name <- generate_names(10000, name_list)

Reorder panels in facet_wrap/facet_grid based on another factor, with multiple occurrences

Consider this example. I want to create a custom label for my panels by joining two columns into a string.
The panels created through faceting are ordered alphabetically, but actually, I want them to be ordered by src, so SRC01 should come first, then SRC02, etc.
library(tidyverse)
tibble::tibble(
src = rep(c("SRC03", "SRC04", "SRC01", "SRC02"), 2),
data = runif(8)
) %>%
mutate(
foo = case_when(src %in% c("SRC01", "SRC02") ~ "foo", TRUE ~ "bar"),
label = paste(foo, src)
) %>%
ggplot(aes(x = data)) +
geom_density() +
facet_wrap(~label)
Created on 2019-05-22 by the reprex package (v0.3.0)
I know that this order depends on the order of underlying factor levels, but this question shows how to manually specify the levels, which I do not want (there are many more SRC values and I don't want to type all of them…).
I found a solution using fct_reorder, in which I could specify:
mutate(label = fct_reorder(label, src, .fun = identity))
But this only works when there is one line per src/label combination. If I add data (i.e., more than one data point per combination), it fails with:
Error: `fun` must return a single value per group
What would be the most succinct way to achieve what I need?
You can use the numeric part of src, and then use reorder():
tibble::tibble(
src = rep(c("SRC03", "SRC04", "SRC01", "SRC02"), 2),
data = runif(8)
) %>%
mutate(
foo = case_when(src %in% c("SRC01", "SRC02") ~ "foo", TRUE ~ "bar"),
label = paste(foo, src)
) %>%
mutate(label_order = as.numeric(str_extract(src, "\\d+"))) %>%
# use str_extract() to find the "01" inside "SRC01", turn it to numeric.
ggplot(aes(x = data)) +
geom_density() +
facet_wrap(~reorder(label, label_order))
# user reorder to change the ordering based on the numbers
A note about str_extract(), it works on your example because:
str_extract("SRC01", "\\d+") gives "01", then transformed to 1. But:
str_extract("2SRC01", "\\d+") would return 2, which wouldn't be ideal possibly.
Luckily there are tons of way to use regex to extract what you may need.

how to use dataframe name inside for loop to save different ggplot2 plots in R

I have a data frame (all.table) that i have subsetted into 3 different data plots name (A1.table, B25.table, and C48.table)
all.table = read.table(file.path(input_file_name), header=T, sep = "\t")
A1.table = subset(all.table, ID == "A1")
B25.table = subset(all.table, ID == "B25")
C48.table = subset(all.table, ID == "C48")
For each graph type I want, I want to generate it based on all 4 tables
for (i in list(all.table, A1.table, B25.table, C48.table)){
ggplot(i, aes(x=Position, fill=Frequency)) + #other plot options
ggsave(file.path(full_output_path, "uniqueFileName.pfd")
#additional plots
#additional saves
}
my problem comes in the ggsave command with how to generate the 'uniqueFileName.pdf'. I would like to name it as some form of all.table.graph1.pdf, all.table.graph2.pdf and A1.table.graph1.pdf, A1.table.graph2.pdf etc
My question is how do I turn the name of the iterator i into a string, and add that string to a '.graph1.pdf' string?
Coming from a python background this seems like it should be rather simple. I am not very versed in R (as is likely obvious from this question) and anything resembling an answer I have found seems incredibly over complicated.
This is a workflow that uses the tidyverse suite of functions. iwalk is similar to lapply in base, but it requires a function that takes 2 arguments, and it automatically inputs the names of the list as the 2nd argument.
The short answer for what you want is paste0, which lets you combine strings.
library(tidyverse)
all.table %>%
filter(ID %in% c("A1", "B25", "C48")) %>% # only needed if there are more IDs than the 3 explictly listed
split(., .$ID) %>% # creates the list of data frames
c(list(all.table = all.table), .) %>% # adds "all.table" as a list element
iwalk(function(df, label) {
ggplot(df, aes(x = Position, fill = Frequency)) +
...
ggsave(file.path(full_output_path, paste0(label, ".graph1.pdf")))
})
Figured out a solution by looking for a python dictionary equivalent:
all.table = read.table(file.path(input_file_name), header=T, sep = "\t")
A1.table = subset(all.table, ID == "A1")
B25.table = subset(all.table, ID == "B25")
C48.table = subset(all.table, ID == "C48")
#Generate a named list of tables
list_of_tables = list(all = all.table, A1 = A1.table, B25 = B25.table, C48 = C48.table)
for (i in 1:length(list_of_tables)){
ggplot(list_of_tables[[i]], aes(x=Frequency, fill=Category)) + #more options
ggsave(file.path(full_output_path, paste0(names(list_of_tables[i]), ".graph1.pdf"))
}
I'm not sure if there is a downside to not using other libraries (ie tidyverse), but this seems like the simplest answer?

Combining two character variables into a character list in R to be used in ggplot

I am making over 200 stacked bar graphs to illustrate plant species change over time. I need to make a named character list similar to this:
species_color_code <- c("ALDE" = "CORAL", "BRJA" = "BLUE",
"BRSQ2" = "POWDERBLUE")
This list is used in ggplot to assign each plant species a specific color.
I have already assigned a random color to each species name (i.e ALDE, BRJA, etc) in a previous step and saved to a .csv file for future use. I have just over 400 plant species that each have a random color assigned, so making the above list by hand is going to be time consuming.
My problem is that I have not figured out how to pull the species name and color name from the CSV file, add a "=" and then place them all into a c() function to make the correct list for ggplot.
> species_color_file
Species Color_Samples
1 PASM lightsalmon2
2 PSSP6 darkturquoise
3 AGCR snow2
4 ELLAL antiquewhite4
5 ELTR7 tomato1
I have looped through each row of the .csv file found each Species Name and corresponding Color. No matter how I paste(), c(), etc. I can't make them work/match what is needed for the plot.
species_color_codes <- as.character(list())
for(j in 1:nrow(species_color_file)){
species_color_names <- paste(species_color_file$Species[j],
species_color_file$Color_Samples[j], sep = "=", collapse = "")
species_color_codes <- c(species_color_codes, species_color_names,
sep = ",")
}
What I get:
> species_color_codes
sep sep
"PASM=lightsalmon2" "," "PSSP6=darkturquoise" ","
sep sep
"AGCR=snow2" "," "ELLAL=antiquewhite4" ","
What i need :
> species_color_codes <- c("ALDE" = "CORAL", "BRJA" = "BLUE", "BRSQ2" = "POWDERBLUE")
> species_color_codes
ALDE BRJA BRSQ2
"CORAL" "BLUE" "POWDERBLUE"
Thanks for any help!
Try this:
species_color_codes <- species_color_file$Color_Samples
names(species_color_codes) <- species_color_file$Species
You are attempting to create a named vector - that is, a vector where each element has a name. In your first example, you don't need the quotation around the species, as these will just be the character names.
Example:
> species_color_codes <- c(PASM = "lightsalmon2",
+ PSSP6 = "darkturquoise",
+ AGCR = "snow2",
+ ELLAL = "antiquewhite4",
+ ELTR7 = "tomato1")
> species_color_codes
PASM PSSP6 AGCR ELLAL ELTR7
"lightsalmon2" "darkturquoise" "snow2" "antiquewhite4" "tomato1"
You've stored the names in a data file and then read them back in. Thus, all you need to do is store the colors in a new object and then assign the species names to the names property of that object.

Resources