retrieve information from activities in processMapR - r

I have a problem with "ProcessmapR" .
Take the eventlog "patients" from package "eventdataR" , as an example .
We can get a plot from Business Process via "process_map(patients)".
Now I want to find out which patient-ID has passed through a node just by selecting the node in "Viewer" .
How can I do it ?
library(eventdataR)
library(processmapR)
process_map(patients)
click here to see the plot

The graph generated by process_map is static and you can't get much information from it;
There are ways to add any tooltip that you want on any of the nodes of the graph, though:
You need to do the process_map with render=FALSE option and save it to a variable, and then you can change it as you wish; for instance:
gr <- process_map(patients, render=FALSE)
gr$nodes_df$tooltip[1] <- "Just a test!"
render_graph(gr)
The code segment above is just for illustration, and by no means is supposed to be considered as a good way of doing this job.

Related

how to remove "Modal state sequence" field from seqmsplot

Hi I would like to make a plot of the sequence of modal states (seqmsplot in TraMineR) but because I am making a figure which consists some other plots I would like to remove the in-build subtitle which says "Modal state sequence ..." because this is affecting the heights of y-axis. Does anyone know how I can remove this in seqmsplot and use main="Modal state sequence" instead?
Below is a picture (from TraMineR website) which shows which part I would like to remove.
Up to TraMineR v 2.2-2, the display of the info about the frequency of the sequence of modal states was hard coded.
The development version 2.3-3 (available on R-Forge), which will eventually become 2.2-3 on the CRAN, offers a new attribute info in the plot method plot.stslist.modst invoked by seqmsplot.
With this new version, you can suppress the info with seqmsplot(..., info=FALSE). For example, using the mvad data:
data(mvad)
m.seq <- seqdef(mvad[,17:86], xtstep=3)
seqmsplot(m.seq, info=FALSE)

How do I scrape this text from a 2004 Wayback machine site/why is the code I'm running wrong?

note: I haven't asked a question here before, and am still not sure how to make this legible, so let me know of any confusion or tips on making this more readable
I'm trying to download user information from the 2004/06 to 2004/09 Internet Archive captures of makeoutclub.com (a wacky, now-defunct social network targeted toward alternative music fans, which was created in ~2000, making it one of the oldest profile-based social networks on the Internet) using r,* specifically the rcrawler package.
So far, I've been able to use the package to get the usernames and profile links in a dataframe, using xpath to identify the elements I want, but somehow it doesn't work for either the location or interests sections of the profiles, both of which are just text instead of other elements in the html. For an idea of the site/data I'm talking about, here's the page I've been texting my xpath on: https://web.archive.org/web/20040805155243/http://www.makeoutclub.com/03/profile/html/boys/2.html
I have been testing out my xpath expressions using rcrawler's ContentScraper function, which extracts the set of elements matching the specified xpath from one specific page of the site you need to crawl. Here is my functioning expression that identifies the usernames and links on the site, with the specific page I'm using specified, and returns a vector:
testwaybacktable <- ContentScraper(Url = "https://web.archive.org/web/20040805155243/http://www.makeoutclub.com/03/profile/html/boys/2.html", XpathPatterns = c("//tr[1]/td/font/a[1]/#href", "//tr[1]/td/font/a[1]"), ManyPerPattern = TRUE)
And here is the bad one, where I'm testing the "location," which ends up returning an empty vector
testwaybacklocations <- ContentScraper(Url = "https://web.archive.org/web/20040805155243/http://www.makeoutclub.com/03/profile/html/boys/2.html", XpathPatterns = "//td/table/tbody/tr[1]/td/font/text()[2]", ManyPerPattern = TRUE)
And the other bad one, this one looking for the text under "interests":
testwaybackint <- ContentScraper(Url = "https://web.archive.org/web/20040805155243/http://www.makeoutclub.com/03/profile/html/boys/2.html", XpathPatterns = "//td/table/tbody/tr[2]/td/font/text()", ManyPerPattern = TRUE)
The xpath expressions I'm using here seem to select the right elements when I try searching them in the Chrome Inspect thing, but the program doesn't seem to read them. I also have tried selecting only one element for each field, and it still produced an empty vector. I know that this tool can read text in this webpage–I tested another random piece of text–but somehow I'm getting nothing when I run this test.
Is there something wrong with my xpath expression? Should I be using different tools to do this?
Thanks for your patience!
*This is for a digital humanities project will hopefully use some nlp to analyze especially language around gender and sexuality, in dialogue with some nlp analysis of the lyrics of the most popular bands on the site.
A late answer, but maybe it will help nontheless. Also I am not sure about the whole TOS question, but I think that's yours to figure out. Long story short ... I will just try to to adress the technical aspects of your problem ;)
I am not familiar with the rcrawler-package. Usually I use rvest for webscraping and I think it is a good choice. To achive the desired output you would have to use something like
# parameters
url <- your_url
xpath_pattern <- your_pattern
# get the data
wp <- xml2::read_html(url)
# extract whatever you need
res <- rvest::html_nodes(wp,xpath=xpath_pattern)
I think it is not possible to use a vector with multiple elements as pattern argument, but you can run html_nodes for each pattern you want to extract seperately.
I think the first two urls/patterns should work this way. The pattern in your last url seems to be wrong somehow. If you want to extract the text inside the tables, it should probably be something like "//tr[2]/td/font/text()[2]"

Execute multiple sets of lines from another R file

I asked this before, but maybe I didn't ask exactly enough.
I want to run from my Master-R file other, quite long R files. On the first glimpse that's easy to accomplish with source().
The point is, they are so long, that I don't want to run all of them, just a certain part of it. Someone on my former post showed me this hidden gem, but the both run from point A to point B.
What I want is to run from my file another file, starting at line x, then run to line x+z, skip a certain amount of rows, and then continue to run the same file from line y to y+z.
The solution in the link I attached is working and great, but I can't skip rows (This coding is above my skill), without creating another funtion and setting more start- and endpoints.
Is it possible to call something like this source(df.R, excludeLine(1:6, 20, 30:end)?
Just slightly modifying this very excellent answer: should work.
sourcePartial <- function(fn,startTag1='#from here1',endTag1='#to here1', startTag2='#from here2',endTag2='#to here2') {
lines <- scan(fn, what=character(), sep="\n", quiet=TRUE)
st1<-grep(startTag1,lines)
en1<-grep(endTag1,lines)
st2<-grep(startTag2,lines)
en2<-grep(endTag2,lines)
tc <- textConnection(lines[c((st1+1):(en1-1),(st2+1):(en2-1))])
source(tc)
close(tc)
}
But really, just have a go yourself next time and you might learn...

How to set a "formatted value" in googleVis?

I am using googleVis and shiny to (automatically) create a Organizational Chart.
Similar to this question:
Google Visualization: Organizational Chart with fields named the same, I want to use formatted values in googleVis to be able to create fields in an organizational chart, which have the same name. I suspect it has something to do with roles but I cannot figure the correct syntax out.
The help page for gvisOrgChart mentiones formatted values but does not say how to set them:
"You can specify a formatted value to show on the chart instead, but the unformatted value is still used as the ID."
## modified example from help page
library(googleVis)
Regions[7,1] = Regions[8,1] # artificially create duplicated name in another parent node
Org <- gvisOrgChart(Regions)
plot(Org)
In the above example the duplicated name (Mexico) is only shown once in the chart. I want both of them to be drawn (One in the Europe and one in the America parent node).
Thank you for your help
cateraner
After talking to one of the developers of the googleVis package I got the solution to the problem now. The formatted value contains extra speak marks, which have to be removed before the text is usable as HTML.
## modified example from help page
library(googleVis)
# add new entry
levels(Regions$Region) = c(levels(Regions$Region), "{v: 'Germany.2', f: 'Germany'}")
Regions[8,1] = "{v: 'Germany.2', f: 'Germany'}"
Org <- gvisOrgChart(Regions)
# remove extra speak marks
Org$html$chart <- gsub("\"\\{v", "\\{v", Org$html$chart)
Org$html$chart <- gsub("\\}\"", "\\}", Org$html$chart)
plot(Org)
In the resulting graph you have two times "Germany", one under node "America" and one under "Europe". The same way you could add HTML formations to your text (color, font, etc.).
Thanks too Markus Gesmann for helping me on that.

R scripting in SPSS Modeler 16: change default "rowCount=1000" for modelerData

When applying R transform Field operation node in SPSS Modeler, for every script, the system will automatically add the following code on the top of my own script to interface with the R Add-on:
while(ibmspsscfdata.HasMoreData()){
modelerDataModel <- ibmspsscfdatamodel.GetDataModel()
modelerData <- ibmspsscfdata.GetData(rowCount=1000,missing=NA,rDate="None",logicalFields=FALSE)
Please note "rowCount=1000". When I process a table with >1000 rows (which is very normal), errors occur.
Looking for a way to change the default setting or any way to help to process table >1000 rows!
I've tried to add this at the beggining of my code and it works just fine:
while(ibmspsscfdata.HasMoreData())
{
modelerData <-rbind(modelerData,ibmspsscfdata.GetData(rowCount=1000,missing=NA,rDate="None",logicalFields=FALSE))
}
Note that you will consume a lot of memory with "big data" and parameters of .GetData() function should be set accordingly to "Read Data Options" in node setting.

Resources