I have this automated script that produces a table with frequencies of "thetarget" tokens by year:
library(quanteda)
vec <- c("Apple", "Google")
out <- map(vec, ~
df %>%
filter(str_detect(collectionName, .x)) %>%
filter(str_detect(Year, paste(years, collapse = "|"))) %>%
corpus(text_field = "text") %>%
tokens() %>%
tokens_select(thetarget) %>%
dfm() %>%
dfm_group(groups = "Year") %>%
convert(to = "data.frame")
)
names(out) <- sub("^(...).*\\s+(\\S)$", "\\1\\2", vec)
Using
View(out$Apple)
Produces the corresponding table.
I am trying to automate the export of these tables as a pdf or jpeg with the name of the file being "Apple" for example.
Is there a way to do this?
TIA
I can't execute your code chunk (problem with function convert, from which library is it?) - but it isn't a problem.
Exist a lot of solutions, but, f.e., you can use packages gt or flextable for this task (pile of output types).
First of all, install the webshot and PhantomJS.
And after you can install other packages (gt and flextable). See examples:
(gt)
tab_1 <-
gtcars %>%
dplyr::select(model, year, hp, trq) %>%
dplyr::slice(1:5) %>%
gt(rowname_col = "model") %>%
tab_stubhead(label = "car") %>%
gtsave("tab_1.png", expand = 10, path = "********")
(flextable)
ft <- flextable(head(mtcars))
save_as_image(x = ft, path = "********\\image_name.png")
Related
I can print my data with gt table and create hyperlinks for different cells, as shown below by knitting an HTML file:
library(dplyr)
library(gt)
raw_dat <- mtcars[1:15, ] %>% rownames_to_column(var = "id") %>% select(id, mpg)
df <- tibble(
name = raw_dat$id,
link = 'https://www.cars.com//')
df %>%
mutate(link = sprintf('<p>%s', link, raw_dat$mpg),
link = map(link, gt::html)) %>%
gt
I want to print the same output to a powerpoint slide and preserve the links. I understand that I cannot use 'ph_with' with an object of class "c('gt_tbl', 'list')". So I was wondering what else I can do to print a table with hyperlinks in powerpoint.
NEW <- read_pptx("Presentation1.pptx") %>%
add_slide(., layout = "Blank", master = "Office Theme") %>%
ph_with(raw_dat,
location = ph_location(left = 0, top = 0, width = 13.33, height = 7.5))
This is a solution with flextable that is supported by officer.
library(tibble)
library(dplyr)
library(flextable)
library(officer)
raw_dat <- mtcars[1:15, ] %>% rownames_to_column(var = "id") %>% select(id, mpg)
df <- tibble(name = raw_dat$id, link = 'https://www.cars.com//')
ft <- flextable(df) %>%
mk_par(j = "link", value = as_paragraph(hyperlink_text(x = name, url = link))) %>%
autofit()
read_pptx() %>%
add_slide() %>%
ph_with(ft, location = ph_location_type()) %>%
print(target = "toto.pptx")
PS: I am not sure that I answer the question as it is generated with officer, not officedown, but your example is not a R Markdown example. I assumed you wanted to use officer.
when doing a job I have found a problem that I don't know how to solve.
I have a data frame that has 2 columns:
date
value
And it has a total of 1303 rows.
For each date there are 12 values (1 for each month), except in the last year that only has 7
The work I have to do would be to create a 'drilldown' style chart using the 'highcharter' library. The problem is that I don't know how to do it efficiently.
The solution that comes to my mind is not very efficient, below I show my solution so you can see what I mean.
dataframe
# Load packages
library(tidyverse)
library(highcharter)
library(lubridate)
# Load dataset
df <- read.csv('example.csv')
# Prepare df to use
dfDD <- tibble(name = year(df$date),
y = round(df$value, digits = 2),
drilldown = name)
# Create a data frame to use in 'drilldown' (for each year)
df1913 <- df %>%
filter(year(date) == 1913) %>%
data.frame()
df1914 <- df %>%
filter(year(date) == 1914) %>%
data.frame()
# Create a drilldown chart using Highcharter library
highchart() %>%
hc_chart(type = "column") %>%
hc_title(text = "Example Drilldown") %>%
hc_xAxis(type = "category") %>%
hc_legend(enabled = FALSE) %>%
hc_plotOptions(series = list(boderWidth = 2,
dataLabels = list(enabled = TRUE))) %>%
hc_add_series(data = dfDD,
name = "Mean",
colorByPoint = TRUE) %>%
hc_drilldown(allowPointDrilldown = TRUE,
series = list(list(id = 1913,
data = list_parse2(df1913)),
list(id = 1914,
data = list_parse2(df1914))))
Seeing my solution for the first time, I realized that in order to complete the graph I would have to create a subset of values for each year. Having realized that I tried to find a more efficient solution using a 'for loop' but so far I can't get it to work.
Is there a more efficient way to create this graph using a 'loop'!?
If it can be done in another way than using loops, I would also like to know.
Thank you for reading my question and I hope I explained myself well.
Using split and purrr::imap you could split your data by years and loop over the resulting list to convert your data to the nested list object required by hc_drilldown. Note: It's important to make the id a numeric and to pass a unnamed list.
library(tidyverse)
library(highcharter)
library(lubridate)
series <- split(df, year(df$date)) %>%
purrr::imap(function(x, y) list(id = as.numeric(y), data = list_parse2(x)))
# Unname list
names(series) <- NULL
highchart() %>%
hc_chart(type = "column") %>%
hc_title(text = "Example Drilldown") %>%
hc_xAxis(type = "category") %>%
hc_legend(enabled = FALSE) %>%
hc_plotOptions(series = list(boderWidth = 2,
dataLabels = list(enabled = TRUE))) %>%
hc_add_series(data = dfDD,
name = "Mean",
colorByPoint = TRUE) %>%
hc_drilldown(allowPointDrilldown = TRUE,
series = series)
I'd like to generate small ggplot graphics and use them inline in a reactable. I can generate the plots needed (1 per row), but when inserting them into the table, they come up as broken images. I imagine it has something to do with the file paths?
Note: This will ultimately be used in a shiny app where the graphics are generated dynamically. I am doing this in reactable because I plan to combine this with the expandable rows feature.
Here is some reproducible code
library(tidyverse)
library(palmerpenguins)
library(reactable)
#get list of species names
species <- penguins %>% select(species ) %>%
distinct() %>% pull()
# make a simple df
plot_data <- penguins %>%
group_by(species) %>%
summarize(mean = mean(bill_length_mm, na.rm=T))
# for loop to generate and save plots
for(i in species){
tmp_plot <-
plot_data %>%
filter(species == i) %>%
ggplot(aes(y=species, x=mean))+
geom_col()+
theme_void()
file_name <- i %>% janitor::make_clean_names()
ggsave(plot = tmp_plot,
filename = paste0("plots/",file_name,".png"),
width=4,
height=.5)
}
#make a table
species %>%
as.data.frame() %>%
rename("species" = 1) %>%
mutate(plot = paste0("<img src='plots/", janitor::make_clean_names(species),".png' />")) %>%
reactable(
columns = list(
plot = colDef(html = TRUE,
resizable = TRUE,
show=T)
))
Here is what I get:
Just as a check, doing this loads the graphic: magick::image_read("plots/adelie.png") so I am not sure what I am missing.
Problem
I would like to produce a good looking table which has ggplots within the cells of one column. One key element is that I would like to create a pdf output of this table eventually.
What I have tried so far
Hopefully the example below is understandable. Essentially I found that I can achieve what I want using the gt package. The problem is this creates a html widget which you then have to use phantomJS and webshot to export as a pdf.
library(dplyr)
library(purrr)
library(gt)
library(ggplot2)
dat = tibble(
RowLabel = letters[1:5],
Numeric = seq(100,500,100)
) %>%
mutate(
plotData = RowLabel %>% map(function(pos){
tibble(y=runif(10)*100) %>%
arrange(desc(y)) %>%
mutate(x=row_number())
}),
plot_obj = plotData %>% map(function(df){
df %>%
ggplot(aes(x=x,y=y))+
geom_col()
}),
plot_grob = plot_obj %>% map(cowplot::as_grob)
)
tab = dat %>%
select(RowLabel, Numeric) %>%
mutate(
ggplot = NA
) %>%
gt() %>%
text_transform(
locations = cells_body(vars(ggplot)),
fn = function(x) {
dat$plot_obj %>%
map(ggplot_image, height = px(50))
}
)
tab
What do I want
I would like an output which is similar to the above example. However, I would like a solution which does not require me to use html widgets and can be saved directly as a pdf without the use of other programs. Is this possible to do using ggplot? I have started to learn more about grids/grobs/gtables etc but have not made any meaningful progress.
Thanks in advance!
Perhaps you could tweak the gtsave() function to suit? E.g.
library(dplyr)
library(purrr)
library(gt)
library(ggplot2)
dat = tibble(
RowLabel = letters[1:5],
Numeric = seq(100,500,100)
) %>%
mutate(
plotData = RowLabel %>% map(function(pos){
tibble(y=runif(10)*100) %>%
arrange(desc(y)) %>%
mutate(x=row_number())
}),
plot_obj = plotData %>% map(function(df){
df %>%
ggplot(aes(x=x,y=y))+
geom_col()
}),
plot_grob = plot_obj %>% map(cowplot::as_grob)
)
tab = dat %>%
select(RowLabel, Numeric) %>%
mutate(
ggplot = NA
) %>%
gt() %>%
text_transform(
locations = cells_body(vars(ggplot)),
fn = function(x) {
dat$plot_obj %>%
map(ggplot_image, height = px(50))
}
)
tab %>%
gt::gtsave(filename = "test.pdf", vwidth = 180, vheight = 250)
(R v4.0.3 / gt v0.2.2)
This is code for a sankey diagram made with networkd3. I've had success with Sankeys before - I'm aiming to create something like this https://susan-wilson.shinyapps.io/2016FederalElectionPreferences/ (although it will be a little more wild because of the nature of the Senate preference system), but I can't work out what my issue is. The code runs without error and then I get a blank viewer.
The Source and Target nodes are zero indexed, and they are consecutive integers. I know that I could have just imported ACT, but this is just an interim test and I plan to use the whole data set later. This code is only a toy example and doesn't plot all the preference flows either.
I'm pretty sure I'm just making a dumb mistake, but I'd be super grateful if someone could point it out to me.
library(tidyverse)
library(data.table)
library(networkD3)
rm(list = ls())
#Download the data from here: https://results.aec.gov.au/20499/Website/External/SenateDopDownload-20499.zip
files <- list.files("~whereveryousavedit/SenateDopDownload-20499", pattern = ".csv", full.names = T)
SenatePreferences <- lapply(files, fread)
SenatePreferences <- rbindlist(SenatePreferences)
ACT <- SenatePreferences %>%
filter(State == "ACT")
# one node for each politician, while they're still in.
ACT <- ACT %>%
mutate(NameNode = paste(Surname, GivenNm, Count),
Name = paste(Surname, GivenNm)) %>%
group_by(Name) %>%
mutate(Status= case_when(
Status %>% lag() == "Excluded" ~ "Excluded in a previous round",
Status %>% lag() == "Excluded in a previous round" ~ "Excluded in a previous round",
TRUE ~ Status)) %>%
ungroup() %>%
filter(Status !="Excluded in a previous round") %>%
mutate(Node = c(0:(n()-1)))
# For each count i, the source is the round i node, the target is the equivalent node in round 2.
ACT <- ACT %>%
mutate(Source = Node) %>%
group_by(Name) %>%
mutate(Target = Source %>% lead()) %>%
ungroup() %>%
filter(!is.na(Target))
ACT_Sankey <- list(Nodes = ACT %>%
select(NameNode) %>% data.frame(),
Links = ACT %>%
select(Source, Target, VoteTransferred, Name) %>% data.frame()
)
sankeyNetwork(Links = ACT_Sankey$Links , Nodes = ACT_Sankey$Nodes, Source = 'Source',
Target = 'Target', Value = 'VoteTransferred', NodeID = 'NameNode',LinkGroup = 'Name',
fontSize = 12)
here's a working version of what you seem to be trying to do with your code above, though I doubt the result is what you actually want to do...
library(tidyverse)
library(networkD3)
url <- "https://results.aec.gov.au/20499/Website/External/SenateDopDownload-20499.zip"
mytempfile <- tempfile(fileext = ".zip")
download.file(url = url, destfile = mytempfile)
mytempdir <- tempdir()
unzip(mytempfile, exdir = mytempdir)
unlink(mytempfile)
SenatePreferences <-
list.files(mytempdir, pattern = ".csv", full.names = TRUE) %>%
map_dfr(read_csv)
unlink(mytempdir, recursive = TRUE)
cleaned <-
SenatePreferences %>%
as_tibble() %>%
filter(State == "ACT") %>%
filter(!Surname %in% c("Exhausted", "Gain/Loss")) %>%
mutate(Name = paste(Surname, GivenNm)) %>%
mutate(NameNode = paste(Name, Count)) %>%
select(NameNode, Name, Ticket, round = Count, Status, VoteTransferred) %>%
group_by(Name) %>%
arrange(round) %>%
filter(row_number() <= min(which(Status == "Excluded" | row_number() == n()))) %>%
ungroup() %>%
mutate(Node = row_number() - 1)
links <-
cleaned %>%
mutate(Source = Node) %>%
group_by(Name) %>%
mutate(Target = Source %>% lead()) %>%
ungroup() %>%
filter(!is.na(Source) & !is.na(Target)) %>%
select(Source, Target, Name, VoteTransferred)
nodes <-
cleaned %>%
select(NameNode, Name, Node)
sankeyNetwork(Links = links, Nodes = nodes, Source = 'Source',
Target = 'Target', Value = 'VoteTransferred', NodeID = 'NameNode',
LinkGroup = 'Name', fontSize = 12)