flextable: bold the maximum value in each row - r

I'm trying to bold the maximum value in every row with this code
mtcars %>%
flextable() %>%
bold(max(.[1:32,]))
and got this error: Error in .[1:32, ] : incorrect number of dimensions
I also tried, with inspiration from conditionally bold values in flextable
mtcars %>%
flextable() %>%
bold(~ max(.), 1)
Error in eval(as.call(f[[2]]), envir = data) : object '.' not found
Removing the . in max(.) doesn't make any difference.

To get the max from every row you can try to overwrite the attributes of your flextable object:
Edit:
As mentioned in the comments it is not recommended to change the object strucutre by hand:
library(magrittr)
mtcars %>%
flextable::flextable() -> bold_flex2
for(i in seq_len(nrow(mtcars)))
{
bold_flex2 %<>% flextable::bold(i, which.max(mtcars[i,]))
}
bold_flex2
old answer:
library(magrittr)
mtcars %>%
flextable::flextable() -> bold_flex
for(i in seq_len(nrow(mtcars)))
{
bold_flex$body$styles$text$bold$data[i, which.max(mtcars[i,])] <- TRUE
}

To bold the maximum value, I believe you have to do each column separately.
I used two of the columns here and only made the column where the variable is bold.
library(tidyverse)
library(flextable)
data(mtcars)
mtcars %>%
flextable() %>%
bold(~mpg == max(mpg), 1) %>%
bold(~drat == max(drat), 5)

Related

create a row gap or break the sections in the report using r2rtf package

Could you please help with getting the row gap or break line between the sections displayed in the report? I am using the r2rtf package along with tidyverse.
For example using mtcars I have a column rowname I want to display the data with gap between these rownames
mtcars$rowname <- rowname(mtcars)
mtcars %>%
rtf_body() %>%
rtf_encode() %>%
write_rtf('cars.rtf')
It can be handled in data manipulation step using dplyr before r2rtf.
You just need to add \n at the end of each value.
mtcars %>%
mutate(across(everything(), function(x) paste(as.character(x), "\n"))) %>%
rtf_body() %>%
rtf_encode() %>%
write_rtf('cars.rtf')
I'm not really familiar with the r2rtf package, but you could try with the text_space_after argument of rtf_body
library(r2rtf)
mtcars$rowname <- row.names(mtcars)
mtcars[1:5, c("rowname", "mpg", "cyl")] |>
rtf_body(text_space_after = 200) |>
rtf_encode() |>
write_rtf('cars.rtf')
Created on 2022-10-03 with reprex v2.0.2

Using a for loop in R to assign value labels

Context: I have a large dataset (CoreData) with an accompanying datafile (CoreValues) that contains the code and values for each variable within the dataset.
Problem: I want to use a loop to assign each variable within the dataset (CoreData) the correct value labels (from the CoreValues data).
What I've tried so far:
I have created a character vector that identifies which variables within my main data (CoreData) have values that need to be added:
Core_VarwithValueLabels<- unique(CoreValues$Abbreviation)
I have tried a for loop using the vector created , to create vectors for both the label and level arguments that feed into the factor() function.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=paste0(i, 'Levels'), labels = paste0(i, 'Labels'))
}
This creates the correct label and level vectors, however, they are not being picked up properly within the factor function.
Question: Can you help me identify how to get my factor function to work within this loop or if there is a more appropriate method?
Sample data:
CoreValues:
example data from CoreValues
CoreData:
example data from CoreData
UPDATE: RESOLVED
I have now resolved this by using the get() function within my factor() function as it uses the strings I've created with paste0() and find the vector of that name.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=get(paste0(i, 'Levels')), labels = get(paste0(i, 'Labels')))
}

How to change variable to factor based on its name in some list by using across?

(I am new in R)
Trying to change variables data type of df members to factors based on condition if their names available in a list to_factors_list.
I have tried some code using mutate(across()) but it's giving errors.
Data prep.:
library(tidyverse)
# tidytuesday himalayan data
members <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
# creating list of names
to_factors_list <- members %>%
map_df(~(data.frame(n_distinct = n_distinct(.x))),
.id = "var_name") %>%
filter(n_distinct < 15) %>%
select(var_name) %>% pull()
to_factors_list
############### output ###############
'season''sex''hired''success''solo''oxygen_used''died''death_cause''injured''injury_type'
Getting error in below code attempts:
members %>%
mutate(across(~.x %in% to_factors_list, factor))
members %>%
mutate_if( ~.x %in% to_factors_list, factor)
I am not sure what's wrong and how can I make this work ?
In base R, this can be done with lapply
members[to_factors_list] <- lapply(members[to_factors_list], factor)
The correct syntax is:
members %>% mutate(across(to_factors_list, factor))
Or if you prefer an older-version dplyr syntax:
members %>% mutate_at(vars(to_factors_list), factor)

Error while using unnest_tokens() while passing a function to the token

Error in unnest_tokens.data.frame(., entity, text, token = tokenize_scispacy_entities, :
Expected output of tokenizing function to be a list of length 100
The unnest_tokens() works well for a sample of few observations but fails on the entire dataset.
https://github.com/dgrtwo/cord19
Reproducible example:
library(dplyr)
library(cord19)
library(tidyverse)
library(tidytext)
library(spacyr)
Install the model from here - https://github.com/allenai/scispacy
spacy_initialize("en_core_sci_sm")
tokenize_scispacy_entities <- function(text) {
spacy_extract_entity(text) %>%
group_by(doc_id) %>%
nest() %>%
pull(data) %>%
map("text") %>%
map(str_to_lower)
}
paragraph_entities <- cord19_paragraphs %>%
select(paper_id, text) %>%
sample_n(10) %>%
unnest_tokens(entity, text, token = tokenize_scispacy_entities)
I face the same problem. I don't know the reason why, after I filter out empty abstract and shorter abstract string, everything seems work just fine.
abstract_entities <- article_data %>%
filter(nchar(abstract) > 30) %>%
select(paper_id, title, abstract) %>%
sample_n(1000) %>%
unnest_tokens(entity, abstract, token = tokenize_scispacy_entities)

R Dplyr top_n does not work when used within function

My dplyr function looks like this
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
interp(~top_n(5,wt = "QoL"))
}
I added the interp argument, as I thought the problem was due to lazyeval
However this is not the case.
Using the function below (no interp for top_n), I get a result, however I do not see the top 5 results as desired.
Reading other stackoverflow posts, I understand that this has to do with ungroup, but not sure how to implement this.
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
top_n(5,wt = "QoL")
}
Any ideas?
My solutionn, remove the inverted quotes from QoL and add an additional argument to arrange:
#Function to convert dataframe for pie chart analysis (Global)
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
top_n(5,QoL) %>%
arrange(globalsegment,desc(QoL))
}
If anyone's got a more efficient way, please share

Resources