bupaR - changing the metric the process map displays? - r

I've gone through their documentation (here & here), but I don't see a simple way to tweak this. For instance if I wanted the lines to show time spent in minutes and the bubbles to show the distinct # of cases that appeared there instead of absolute/relative frequency - ex: when looking at incident case data.
Do I have to render the process map myself via DiagrammeR? I was trying to find some a similar example in DiagrammeR but I couldn't find much.
Any tips / good examples I could reference? I suppose its because this is quite new. I did find this older article but I wasn't sure how to connect the dots. Is it worth continuing my analysis in bupaR or should I leverage a different process map generating library?

This wasn't possible so far, but it is now due to a recent change on the github repo. You can now have frequencies on nodes and times on arrow (or vice versa).
What you do is, first install the github version of the package:
devtools::install_github("gertjanssenswillen/processmapr")
then, use the new type_edges and type_nodes arguments, in the same way as you would use the type argument before.
data %>% process_map(type_nodes = frequency(value = "absolute_case"), type_edges = performance(FUN = mean, units = "min"))

Related

Intersection and difference of PostGIS data using R

I am an absolute beginner in PostgreSQL and PostGIS (databases in general) but have a fairly good working experience in R. I have two multi-polygon data sets of vulnerable areas of India from two different sources - one is around 12gb and it's in .gdb format (let's call it mygdb) and the other is a shapefile around 2gb (let's call it myshp). I want to compare the two sets of vulnerability maps and generate some state-wise measures of fit using intersection (I), difference (D), and union (U) between the maps.
I would like to make use of PostGIS functionalities (via R) as neither R (crashes!) nor qgis (too slow) is efficient for this. To start with, I have uploaded both data sets in my PostGIS database. I used ogr2ogr in R to upload mygdb. But I am kind of stuck at this point. My idea is to split both polygon files by states and then apply other functions to get I, U and D. From my search, I think I can use sf functions like st_split, st_intersect, st_difference, and st_union. However, even after splitting, I would imagine that the file sizes will be still too large for r to process, so my questions are
Is my approach the best way forward?
How can I use sf::st_ functions (e.g. st_split, st_intersection) without importing the data from database into R
There are some useful answers to previous relevant questions, like this one for example. But I find it hard to put the steps together from different links and any help with a dummy example would be great. Many thanks in advance.
Maybe you could try loading it as a stars proxy. It doesn't load the file to the memory, it applies it directly to the hard drive.
https://r-spatial.github.io/stars/articles/stars2.html
Not answer for question sensu stricte, however in response to request in comment, an example of postgresql/postgis query for ST_Intersection. Based on OSM data in postgresql database imported with osm2pgsql:
WITH
highway AS (
select osm_id, way from planet_osm_line where osm_id = 332054927),
dln AS (
select osm_id, way from planet_osm_polygon where "boundary" = 'administrative'
and "admin_level" = '4' and "ref" = 'DS')
SELECT ST_Intersection(dln.way, highway.way) FROM highway, dln

Adding skipped/ungraded open-ended questions

Is there a way to include open-ended/free-form questions that are ungraded or skipped by r-exams?
Use case: we want to have an exam with mostly multiple choice questions using the package and its grading capability, but also have 5-10 open ended questions that are printed in the same exam. Ideally, r-exams would provide the grade for the first MCQ section, and we could manually add the grade of the open-ended questions.
I forked the package and made some small changes that allows one to control how many questions are printed on the first page and to remove the string-question pages.
The new parameters are number_of_closed_questions and include_string_pages. It is far away from being ideal, but works for me.
As an example let us have 6 mpc/single-choice questions and one essay question (essayreg):
# install devtools if you do not have it!
# install the fork
devtools::install_github("johannes-titz/exams")
library("exams")
myexam <- list(
"tstat2.Rnw",
"ttest.Rnw",
"relfreq.Rnw",
"anova.Rnw",
c("boxplots.Rnw", "scatterplot.Rnw"),
"cholesky.Rnw",
"essayreg.Rnw"
)
set.seed(403)
ex1 <- exams2nops(myexam, n = 2,
dir = "nops_pdf", name = "demo", date = "2015-07-29",
number_of_closed_questions = 6, include_string_pages = FALSE)
This will produce only 6 questions on the front page (instead of 7) and will also exclude the string-question pages.
If you want normal behavior, just exclude the new parameters. Obviously, one will have to set the number of closed questions manually, so one should be really careful.
I guess one could automatically detect how many string questions are loaded and from this determine the number of open-ended/closed-ended questions, but I currently do not have the time to write this and the presented solution is usable for my case.
I am not 100% sure that the scans will work this way, but I assume there should not be any bigger problems as I did not really change much. Maybe Achim Zeileis could comment on that? See my commit: https://github.com/johannes-titz/exams/commit/def044e7e171ea032df3553acec0ea0590ae7f5e
There is built-in support for up to three open-ended "string" questions that are printed on a separate sheet that has to be marked by hand. The resulting sheet can then be scanned and evaluated along with the main sheet using nops_scan() and nops_eval(). It's on the wish list for the package to extend that number but it hasn't been implemented yet.
Another "trick" you could do is to use the pages= argument of exams2nops() to include a separate PDF sheet with the extra questions. But this would have to be handled completely separately "by hand" afterwards.

How to check that a user-defined function works in r?

THis is probably a very silly question, but how can I check if a function written by myself will work or not?
I'm writing a not very simple function involving many other functions and loops and was wondering if there are any ways to check for errors/bugs, or simply just check if the function will work. Do I just create a simple fake data frame and test on it?
As suggested by other users in the comment, I have added the part of the function that I have written. So basically I have a data frame with good and bad data, and bad data are marked with flags. I want to write a function that allows me to produce plots as usual (with the flag points) when user sets flag.option to 1, and remove the flag points from the plot when user sets flag.option to 0.
AIR.plot <- function(mydata, flag.option) {
if (flag.option == 1) {
par(mfrow(2,1))
conc <- tapply(mydata$CO2, format(mydata$date, "%Y-%m-%d %T"), mean)
dates <- seq(mydata$date[1], mydata$date[nrow(mydata(mydata))], length = nrow(conc))
plot(dates, conc,
type = "p",
col = "blue",
xlab = "day",
ylab = "CO2"), error = function(e) plot.new(type = "n")
barplot(mydata$lines, horiz = TRUE, col = c("red", "blue")) # this is just a small bar plot on the bottom that specifies which sample-taking line (red or blue) is providing the samples
} else if (flag.option == 0) {
# I haven't figured out how to write this part yet but essentially I want to remove all
# of the rows with flags on
}
}
Thanks in advance, I'm not an experienced R user yet so please help me.
Before we (meaning, at my workplace) release any code to our production environment we run through a series of testing procedures to make sure our code behaves the way we want it to. It usually involves several people with different perspectives on the code.
Ideally, such verification should start before you write any code. Some questions you should be able to answer are:
What should the code do?
What inputs should it accept? (including type, ranges, etc)
What should the output look like?
How will it handle missing values?
How will it handle NULL values?
How will it handle zero-length values?
If you prepare a list of requirements and write your documentation before you begin writing any code, the probability of success goes up pretty quickly. Naturally, as you begin writing your code, you may find that your requirements need to be adjusted, or the function arguments need to be modified. That's okay, but document those changes when they happen.
While you are writing your function, use a package like assertthat or checkmate to write as many argument checks as you need in your code. Some of the best, most reliable code where I work consists of about 100 lines of argument checks and 3-4 lines of what the code actually is intended to do. It may seem like overkill, but you prevent a lot of problems from bad inputs that you never intended for users to provide.
When you've finished writing your function, you should at this point have a list of requirements and clearly documented expectations of your arguments. This is where you make use of the testthat package.
Write tests that verify all of the requirements you wrote are met.
Write tests that verify you can no put in unintended inputs and get the results you want.
Write tests that verify you get the output you intended on your test data.
Write tests that test any edge cases you can think of.
It can take a long time to write all of these tests, but once it is done, any further development is easier to check since anything that violates your existing requirements should fail the test.
That being said, I'm really bad at following this process in my own work. I have the tendency to write code, then document what I did. But the best code I've written has been where I've planned it out conceptually, wrote my documentation, coded, and then tested against my documentation.
As #antoine-sac pointed out in the links, some things cannot be checked programmatically; for example, if your function terminates.
Looking at it pragmatically, have a look at the packages assertthat and testthat. assertthat will help you insert checks of results "in between", testthat is for writing proper tests. Yes, the usual way of writing tests is creating a small test example including test data.

Nodes Size Relative to Associated Value

I'm fairly new to R and haven't been able to find an answer for this. Someone else asked a similar question, but no solution was ever reported. If I should have posted this Q on a different stackexchange, I apologize and will delete if it can't be migrated.
Using data I pulled from the FDIC on US based financial institutions and their total asset holdings, I would like to create a basic network graph where each node is proportionally sized to each other node in the graph. Each node would also be labeled with the name of the financial institution.
The edges of the graph actually don't matter for now, but I want each node connected to the network by at least one edge.
As of now, I've already successfully created a very basic network with 8 banks, connected by edges I randomly assigned, as shown here (I apparently can't embed pictures yet, sorry about that):
My .csv file will be formatted as:
id, bank, assets
1, JP Morgan Chase, 16928000
2, Bank of America, 19075000
... ... ...
For the graph I already created, it is the same as above except without the asset column. It was also only 8 banks, where the file I hope to use will have 25.
Like I already said, as for edges, I just randomly assigned some. If someone knows an easier way of creating random edges that connect the nodes I create, please let me know. Otherwise, this is how my file is formatted as of now:
to, from
1, 2
1, 3
...
And I created the graph I linked with the following commands:
> nodes <- read.csv("~/foo/foo/foo.csv")
> links <- read.csv("~/blah/taco/burrito/blah.csv")
> net <- graph_from_data_frame(d=links, vertices = nodes, directed = F)
> class(net)
> net
IGRAPH UN-- 8 10 --
+ attr: name (v/c), bank (v/c)
+ edges (vertex names):
[1] 1--2 1--3 1--4 1--5 2--3 2--4 2--7 4--5 5--8 7--8
> plot(net, main = "Financial Intermediaries", edge.arrow.size=.4, vertex.size=25, vertex.label.cex=1.5, vertex.label.color="black", vertex.label=V(net)$bank)
I hope I was clear with my problem and gave the necessary details/code. If not, please just let me know and I'll post it up here. Like I said, I'm really new to R (I literally picked it up today, lol), and much of the code I've used so far was less or more taken from Katya Ognyanova's examples/presentations on her blog.
For the sake of clarity, I'm currently using RStudio (most recent stable) and R v3.2.5.
I have been only using the igraph package, but if what I want can't be done with that, I am more than willing to switch over to a different package. That said, I would like to stay with R (unless there really is something so much easier for this it can't be ignored. I would like to stick with and learn R).
Thank you for any and all help, I really appreciate it.
as #Osssan linked to in the comments, there was a partial solution floating around.
That said, I think I created more of a 'hack' solution than a proper one with what I gleaned from the previous question. Here is what I did.
In my csv file, I had four columns. In the third column, I had the asset's for a given bank. NOTE Since I don't know how to do data manipulation inside of R, I had to do some work to adjust the size of the asset value so that it did not result in nodes that covered the entirety of the graph. With my solution, you will NOT get nodes that are relative in size automatically. You must do that first.
Since I wanted to create a network with nodes(banks) that were variable in size according to their respective asset holdings, what I did was create a separate vector like so
> df <- read.csv("~/blah/blah/blah.csv", colClasses = c("NULL","NULL", NA, "NULL"))
What this command does is read in the csv file, looks at the headings with colClasses and tell the interpreter to vacuum up all columns specified (non-NULL). With this vector, I then plugged it into my the plot function as such:
> plot(net, main = "Financial Intermediaries", edge.arrow.size=.4, vertex.size=as.matrix(df), vertex.label.color="black")
where I make a matrix using the as.matrix(df) and set it to vertex.size=. Given a vector of only one dimension, R is able to quickly make the appropriate matrix (I guess).
I still have to do some relabeling and connecting with edges, but it worked in graphing. I graphed the largest 26 commercial banks by total asset holdings (and adjusted them to % of total commercial bank assets in the US), so you will see that the size of nodes increase from 26-1. Here's the output.
Like I said, this solution works, but I am far from sure whether it would be considered proper or kosher. I welcome anyone to edit this solution so that it clarifies what is actually happening with my code and or post a proper/optimized solution if it exists. I'm going to give this post a solid few days before marking it solved, as I would like to still get a solid answer on this confusing problem.
P.S. If anyone knows of a way to force nodes not to overlap, I would appreciate a comment explaining how to do that. If you look at my picture, you'll see that the effect of dwarfing the other nodes is diminished when the largest node is covered by it's closely sized peers.

Masking a variable in GrADS

I am trying to plot a variable that is in a NetCDF file using GrADS and I would like to plot only the values that are smaller than -20 (could be any other number as an example). I can't find a way to do it though. I saw several examples of variable substitution using both maskoutand const (for example define ones = const(const(maskout(p,p-10),1),0,-u), which is here) but I couldn't make that work for my purposes.
I want the variable at a given point not plotted in case its value is below -20, and not this value change to another one in such point.
Solved it using the command maskout(p,p+20). The badly written documentation is what delayed me on this one. Answered here so that the community may benefit.
Thanks.

Resources