How to set a "formatted value" in googleVis? - r

I am using googleVis and shiny to (automatically) create a Organizational Chart.
Similar to this question:
Google Visualization: Organizational Chart with fields named the same, I want to use formatted values in googleVis to be able to create fields in an organizational chart, which have the same name. I suspect it has something to do with roles but I cannot figure the correct syntax out.
The help page for gvisOrgChart mentiones formatted values but does not say how to set them:
"You can specify a formatted value to show on the chart instead, but the unformatted value is still used as the ID."
## modified example from help page
library(googleVis)
Regions[7,1] = Regions[8,1] # artificially create duplicated name in another parent node
Org <- gvisOrgChart(Regions)
plot(Org)
In the above example the duplicated name (Mexico) is only shown once in the chart. I want both of them to be drawn (One in the Europe and one in the America parent node).
Thank you for your help
cateraner

After talking to one of the developers of the googleVis package I got the solution to the problem now. The formatted value contains extra speak marks, which have to be removed before the text is usable as HTML.
## modified example from help page
library(googleVis)
# add new entry
levels(Regions$Region) = c(levels(Regions$Region), "{v: 'Germany.2', f: 'Germany'}")
Regions[8,1] = "{v: 'Germany.2', f: 'Germany'}"
Org <- gvisOrgChart(Regions)
# remove extra speak marks
Org$html$chart <- gsub("\"\\{v", "\\{v", Org$html$chart)
Org$html$chart <- gsub("\\}\"", "\\}", Org$html$chart)
plot(Org)
In the resulting graph you have two times "Germany", one under node "America" and one under "Europe". The same way you could add HTML formations to your text (color, font, etc.).
Thanks too Markus Gesmann for helping me on that.

Related

Looping variables in the parameters of the YAML header of an R Markdown file and automatically outputting a PDF for each variable

I am applying for junior data analyst positions and have come to the realization that I will be sending out a lot of cover letters.
To (somewhat) ease the pain and suffering that this will entail, I want to automate the parts of the cover letter that is suited for automation and will be using R Markdown to (hopefully) achieve this.
For the purposes of this question, let's say that the parts I am looking to automate is the position applied for and the company looking to hire someone for that position, to be used in the header of the cover letter.
These are the steps I envision in my mind's eye:
Gather the positions of interest and corresponding company in an Excel spreadsheet. This gives and Excel sheet with two columns with the variables position and company, respectively.
Read the Excel file into the R Markdown as a data frame/tibble (let's call this jobs).
Define two parameters in the YAML header of the .Rmd file to look something like this:
---
output: pdf_document
params:
position: jobs$position[i]
company: jobs$company[i]
---
The heading of the cover letter would then look something like this:
"Application for the position as r params$position at r params$company"
To summarize: In order to not have to change the values of the parameters manually for each cover letter, I would like to read an Excel file with the position titles and company names, loop these through the parameters in the YAML header, and then have R Markdown output a PDF for each pair of position and company (and ideally have the name of each PDF include the position title and company name for easier identification when sending the letters out). Is that possible? (Note: the title of the position and the company name does not necessarily have to be stored in an Excel file, that's just how I've collected them.)
Hopefully, the above makes clear what I am trying to achieve.
Any nudges in the right direction is greatly appreciated!
EDIT (11 July 2021):
I have partly arrived at an answer to this.
The trick is to define a function that includes the rmarkdown::render function. This function can then be included in a nested for-loop to produce the desired PDF files.
Again, assuming that I want to automate the position and the company, I defined the rendering function as follows (in a script separate from the "main" .Rmd file containing the text [named "loop_test.Rmd" here]):
render_function <- function(position, company){
rmarkdown::render(
# Name of the 'main' .Rmd file
'loop_test.Rmd',
# What should the output PDF files be called?
output_file = paste0(position, '-', company, '.pdf'),
# Define the parameters that are used in the 'main' .Rmd file
params = list(position = position, company = company),
evir = parent.frame()
)
}
Then, use the function in a for-loop:
for (position in positions$position) {
for (company in positions$company) {
render_function(position, company)
}
}
Where the Excel file containing the relevant positions is called positions with two variables called position and company.
I tested this method using 3 "observations" for a position and a company, respectively ("Company 1", "Company 2" and "Company 3" and "Position 1", "Position 2" and "Position 3"). One problem with the above method is that it produces 3^2 = 9 reports. For example, Position 1 is used in letters for Company 1, Company 2 and Company 3. I obviously only want to match outputs for Company 1 and Position 1. Does anyone have any idea on how to achieve this? This is quite unproblematic for two variables with only three observations, but my intent is to use several additional parameters. The number of companies (i.e. "observations") is, unfortunately, also highly likely to be quite numerous before I can end my search... With, say, 5-6 parameters and 20 companies, the number of reports output will obviously become ridiculous.
As said, I am almost there, but any nudges in the right direction for how to restrict the output to only "match" the company with the position would be highly appreciated.
You can iterate over by row like below.
for(i in 1:nrow(positions)) {
render_function(positions$position[i], positions$company[i])
}

how to remove "Modal state sequence" field from seqmsplot

Hi I would like to make a plot of the sequence of modal states (seqmsplot in TraMineR) but because I am making a figure which consists some other plots I would like to remove the in-build subtitle which says "Modal state sequence ..." because this is affecting the heights of y-axis. Does anyone know how I can remove this in seqmsplot and use main="Modal state sequence" instead?
Below is a picture (from TraMineR website) which shows which part I would like to remove.
Up to TraMineR v 2.2-2, the display of the info about the frequency of the sequence of modal states was hard coded.
The development version 2.3-3 (available on R-Forge), which will eventually become 2.2-3 on the CRAN, offers a new attribute info in the plot method plot.stslist.modst invoked by seqmsplot.
With this new version, you can suppress the info with seqmsplot(..., info=FALSE). For example, using the mvad data:
data(mvad)
m.seq <- seqdef(mvad[,17:86], xtstep=3)
seqmsplot(m.seq, info=FALSE)

Place line break within a single cell in R data frame

I am working on a Shiny app that collects some user inputs, performs some data processing and finally sends an email to the user with the required output.
While creating the table that I need to send to the user, there is one cell within the table where I need to include line breaks.
You can refer to the image below for the desired output:
I have tried using the R code below:
a <- data.frame("SrNo" = 1, "Description" = "Name: John \n Age: 45")
I have also tried other variations of line breaks such as \n (with 2 \), \r\n.
However, I only get the following output with a space rather than a line break.
I have also tried implementing the solution provided at the following link - although without much success.
line break within cell for huxtable table
Would be great if you can help me place the line break within the Description column of the data frame.

Loop multiple webpages in R

Sorry, this might be too involved a question to ask here. I'm trying to reproduce the Hack Session for NYTime Dialect Map Visualisation, located here. I'm OK in the beginning, but then I run into a problem when I try to scape multiple pages.
To save people from having to reproduce info from the slides, this is what I have so far:
Create URL addresses:
mainURL <- 'http://www4.uwm.edu/FLL/linguistics/dialect/staticmaps/'
stateURL <- 'states.html'
url <- paste0(mainURL, stateURL)
Download and Parse
tmp <- getURL(url)
tmp <- htmlTreeParse(tmp, useInternalNodes = TRUE)
Extract page addresses and save to subURL
subURL <- unlist(xpathSApply(tmp, '//a[#href]', xmlAttrs))
Remove pages that aren't state's names
subURL <- subURL[-(1:4)]
The problem begins for me on slide 24 in the original. The slides say that the next step is to loop over the list of states and read the body of each question. Of course, we also need to save the name of each state in the process. The loop is initialized with the following code:
survey <- vector(length(subURL), mode = "list")
i = 1
stateNames <- rep('', length(subURL))
Underneath this code, the slide says that survey is a list where information about every state is saved. I'm a little puzzled here about how that is the case, since survey is indeed a list with a length of 51, but every element is NULL. I'm also puzzled by what the i is doing here (and this becomes important later). Still, I can follow what the code is doing, and I assumed that the list would get populated later.
It's really the next slide where I get confused. As an example, it is shown how the URL contains the name of each state, using Alaska as an example:
Create URL for the first state and assign to suburl
suburl <- subURL[1]
Remove state_ from suburl
stateName <- gsub('state_','',suburl)
Remove .html from stateName
stateName <- gsub('.html','',stateName)
So far, so good. I can do this for each state individually. However, I can't figure out how to turn this into a loop that would apply to all the states. The slide only has the following code:
stateNames[i] <- stateName
This is where I am stuck. The previous slide assigned 1 to i, so the only thing this does is get the name for Alaska (AK), but every other element is "" (as one expect, given how stateNames was defined previously).
I did try the following:
stateNames <- gsub('state_','',subURL)
stateNames <-gsub('.html','',stateNames)
This doesn't quite work, because the lengths of this vector is 51, but the length of the one shown above is only 1. (Later, I want each state to have its own name, not for all the states to have the same 51 state name). Moreover, I didn't know what to do with the stateNames(i) <- stateName command.
Anyways, I kept working through to the end (both with the original, and the modification), hoping that things would eventually right themselves (and at times I got the same as what was on the presentation), but eventually things just broke). I think there is an additional problem later on in the slides (an object is subsetted that didn't exist before), but I'm guessing a problem also arises from a problem that occurs much easier.
Anyways, I know this is a pretty involved question, so I apologize if it is inappropriate for this site. I'm just stuck.
I believe I got this to work. See the gist or see here for the solution.

Creating a dataset from an XML file in R statistics

I am trying to download an XML file of journal article records and create a dataset for further interrogation in R. I'm completely new to XML and quite novice at R. I cobbled together some code using bits of code from 2 sources:
GoogleScholarXScraper
and
Extracting records from pubMed
library(RCurl)
library(XML)
library(stringr)
#Search terms
SearchString<-"cancer+small+cell+non+lung+survival+plastic"
mySearch<-str_c("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=",SearchString,"&usehistory=y",sep="",collapse=NULL)
#Seach
pub.esearch<-getURL(mySearch)
#Extract QueryKey and WebEnv
pub.esearch<-xmlTreeParse(pub.esearch,asText=TRUE)
key<-as.numeric(xmlValue(pub.esearch[["doc"]][["eSearchResult"]][["QueryKey"]]))
env<-xmlValue(pub.esearch[["doc"]][["eSearchResult"]][["WebEnv"]])
#Fetch Records
myFetch<-str_c("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&WebEnv=",env,"&retmode=xml&query_key=",key)
pub.efetch<-getURL(myFetch)
myxml<-xmlTreeParse(pub.efetch,asText=TRUE,useInternalNodes=TRUE)
#Create dataset of article characteristics #This doesn't work
pub.data<-NULL
pub.data<-data.frame(
journal <- xpathSApply(myxml,"//PubmedArticle/MedlineCitation/MedlineJournalInfo/MedlineTA", xmlValue),
abstract<- xpathSApply(myxml,"//PubmedArticle/MedlineCitation/Article/Abstract/AbstractText",xmlValue),
affiliation<-xpathSApply(myxml,"//PubmedArticle/MedlineCitation/Article/Affiliation", xmlValue),
year<-xpathSApply(myxml,"//PubmedArticle/MedlineCitation/Article/Journal/JournalIssue/PubDate/Year", xmlValue)
,stringsAsFactors=FALSE)
The main problem I seem to have is that my returned XML file is not completely uniformly structured. For example, some references have a node structure like this:
- <Abstract>
<AbstractText>The Wilms' tumor gene... </AbstractText>
Whilst some have labels and are like this
- <Abstract>
<AbstractText Label="BACKGROUND & AIMS" NlmCategory="OBJECTIVE">Some background text.</AbstractText>
<AbstractText Label="METHODS" NlmCategory="METHODS"> Some text on methods.</AbstractText>
When I extract the 'AbstactText' I am hoping to get 24 rows of data back (there are 24 records when I run this made up search today), but xpathSApply returns all labels within 'AbstactText' as individual elements of my dataframe. Is there a way to collapse the XML structure in this instance/Ignore the labels? Is there a way to make xpathSApply return 'NA' when nothing is found at end of a path? I am aware of xmlToDataFrame, which sounds like it should fit the bill, but whenever I try to use this it doesn't seem to give me anything sensible.
Thanks for your help
I am unsure as to which you want however:
xpathSApply(myxml,"//*/AbstractText[#Label]")
will get the nodes with labels (keeping all attributes etc).
xpathSApply(myxml,"//*/AbstractText[not(#Label)]",xmlValue)
will get the nodes without labels.
EDIT:
test<-xpathApply(myxml,"//*/Abstract",xmlValue)
> length(test)
[1] 24
may give you what you want
EDIT:
to get affiliation, year etc padded with NA's
dumfun<-function(x,xstr){
res<-xpathSApply(x,xstr,xmlValue)
if(length(res)==0){
out<-NA
}else{
out<-res
}
out
}
xpathSApply(myxml,"//*/Article",dumfun,xstr='./Affiliation')
xpathSApply(myxml,"//*/Article",dumfun,xstr='./Journal/JournalIssue/PubDate/Year')

Resources