R: Graphing a table plan - r

General Goal
I am placing guests around tables according to a set of rules. My goal, in this post, is to have a handy function to display the names of these guests around their respective tables on a R graphic.
Input Data
guests describes the position of each guest at each table. Note that the number of guests per table varies.
guests = list(
table_1 = c("Jack", "Christelle", "Frank", "John S.", "Lucia"),
table_2 = c("George", "Amanda", "Alice", "Laura", "John H."),
table_3 = c("Jeanette", "Elizabeth", "Remi", "Fabian", "Urs", "Emma"),
table_5 = c("Roger", "Marry", "Henrique", "Claire", "Julia"),
table_6 = c("Alphonse", "Marie", "Dani", "Rachel")
)
Table_positions indicate where each table should be positioned on the graph. I assume here that each axis goes from 0 to 10, where the point c(5,5) is at the center of the graph.
Table_positions = data.frame(
y_position=c(3,2,3,7,8,7),
x_position=c(3,5,7,3,5,7)
)
Details of the graphic
I suggest that the tables are represented by circles centered at the position indicated by the data.frame Table_positions. The names of each guest should be written around these tables following the list guests.

Placement of Tables :
require(plotrix)
plot(x = Table_positions$x_position
,y= Table_positions$y_position
,xlim=c(0,10),ylim=c(0,10),pch=".")
draw.circle(Table_positions$x_position,
radius=0.5,
Table_positions$y_position)
Guests Positioning :
for(i in 1:length(guests)){
Table<-as.vector(unlist(guests[i]))
posTable<-c(Table_positions$x_position[i],Table_positions$y_position[i])
nbGuest<-length(Table)
for(j in 1:nbGuest){
text(x=posTable[1]+round(0.5*cos((2*j/nbGuest)*pi),2),
y=posTable[2]+round(0.5*sin((2*j/nbGuest)*pi),2),
labels=guests[[i]][[j]],
cex=0.5)
}
}
I added table4 with one pepole named "Bla".
You can specify the text size with cex option (here 0.5).

Related

Update Purrr loop to input data row by row in R

This question kinda builds on questions I asked here and here, but its finally coming together and I think I know what the problem is, just need help kicking it over the goal line. TL;DR at the bottom.
The overall goal as simply put as possible:
I have a dataframe that is from an API pull of a redcap database. It
has a few columns of information about various studies.
I'd like to go through that dataframe line by line, and push it into a different website called Oncore, through an API.
In the first question linked above (here again), I took a much simpler dataframe... took one column from that dataframe (the number), used it to do an API pull from Oncore where it would download from Oncore, copy one variable it downloaded over to a different spot, and push it back in. It would do this over and over, once per row. Then it would return a simple dataframe of the row number and the api status code returned.
Now I want to get a bit more complicated and instead of just pulling a number from one colum, I want to swap over a bunch of variables from my original dataframe, and upload them.
The idea is for sample studies input into Redcap to be pushed into Oncore.
What I've tried:
I have this dataframe from the redcap api pull:
testprotocols<-structure(list(protocol_no = c("LS-P-Joe's API", "JoeTest3"),
nct_number = c(654321, 543210), library = structure(c(2L,
2L), levels = c("General Research", "Oncology"), class = "factor"),
organizational_unit = structure(c(1L, 1L), levels = c("Lifespan Cancer Institute",
"General Research"), class = "factor"), title = c("Testing to see if basic stuff came through",
"Testing Oncology Projects for API"), department = structure(c(2L,
2L), levels = c("Diagnostic Imaging", "Lifespan Cancer Institute"
), class = "factor"), protocol_type = structure(2:1, levels = c("Basic Science",
"Other"), class = "factor"), protocolid = 1:2), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))
I have used this code to try and push the data into Oncore:
##This chunk gets a random one we're going to change later
base <- "https://website.forteresearchapps.com"
endpoint <- "/website/rest/protocols/"
protocol <- "2501"
## 'results' will get changed later to plug back in
## store
protocolid <- protocolnb <- library_names <- get_codes <- put_codes <- list()
UpdateAccountNumbers <- function(protocol){
call2<-paste(base,endpoint, protocol, sep="")
httpResponse <- GET(call2, add_headers(authorization = token))
results = fromJSON(content(httpResponse, "text"))
results$protocolId<- "8887" ## doesn't seem to matter
results$protocolNo<- testprotocols$protocol_no
results$library<- as.character(testprotocols$library)
results$title<- testprotocols$title
results$nctNo<-testprotocols$nct_number
results$objectives<-"To see if the API works, specifically if you can write over a previous number"
results$shortTitle<- "Short joseph Title"
results$nctNo<-testprotocols$nct_number
results$department <- as.character(testprotocols$department)
results$organizationalUnit<- as.charater(testprotocols$organizational_unit)
results$protocolType<- as.character(testprotocols$protocol_type)
call2 <- paste(base,endpoint, protocol, sep="")
httpResponse_put <- PUT(
call2,
add_headers(authorization = token),
body=results, encode = "json",
verbose()
)
# save stats
protocolid <- append(protocolid, protocol)
protocolnb <- append(protocolnb, testprotocols$PROTOCOL_NO[match(protocol, testprotocols$PROTOCOL_ID)])
library_names <- append(library_names, testprotocols$LIBRARY[match(protocol, testprotocols$PROTOCOL_ID)])
get_codes <- append(get_codes, status_code(httpResponse_get))
put_codes <- append(put_codes, status_code(httpResponse_put))
}
## Oncology will have to change to whatever the df name is, above and below this
purrr::walk(testprotocols$protocol_no, UpdateAccountNumbers)
allresults <- tibble('protocolNo'=unlist(protocol_no),'protocolnb'=unlist(protocolnb),'library_names'=unlist(library_names), 'get_codes'=unlist(get_codes), 'put_codes'=unlist(put_codes) )
When I get to the line:
purrr::walk(testprotocols$protocol_no, UpdateAccountNumbers)
I get this error:
When I do traceback() I get this:
When I step through the loop line by line I realized that in this chunk of code:
call2<-paste(base,endpoint, protocol, sep="")
httpResponse <- GET(call2, add_headers(authorization = token))
results = fromJSON(content(httpResponse, "text"))
results$protocolId<- "8887" ## doesn't seem to matter
results$protocolNo<- testprotocols$protocol_no
results$library<- as.character(testprotocols$library)
results$title<- testprotocols$title
results$nctNo<-testprotocols$nct_number
results$objectives<-"To see if the API works, specifically if you can write over a previous number"
results$shortTitle<- "Short joseph Title"
results$nctNo<-testprotocols$nct_number
results$department <- as.character(testprotocols$department)
results$organizationalUnit<- as.charater(testprotocols$organizational_unit)
results$protocolType<- as.character(testprotocols$protocol_type)
Where I had envisioned it downloading ONE sample study and replacing aspects of it with variables from ONE row of my beginning dataframe, its actually trying to paste everything in the column in there. I.e. results$nctNo is "654321 543210" instead of just "654321" from the first row.
TL;DR version:
I need my purrr loop to take one row at a time instead of my entire column, and I think if I do that, it'll all magically work.
Within UpdateAccountNumbers(), you are referring to entire columns of the testprotocols frame when you do things like results$nctNo<-testprotocols$nct_number ....
Instead, perhaps at the top of the UpdateAccountNumbers() function, you can do something like tp = testprotocols[testprotocols$protocol_no == protocol,], and then when you are trying to assign values to results you can refer to tp instead of testprotocols
Note that your purrr::walk() command is passing just one value of protocol at a time to the UpdateAccountNumbers() function

Flextable: change cell value if it meets a condition and formatting

I have a dataset (called example) like the following one.
mic <- rep(c("One", "Two", "Tree", "Four"), each = 3)
pap <- rep(c("1", "2", "3", "4"), each = 3)
ref <- rep(c("Trial 1", "Trial 2", "Trial 3", "Trial 4"), each = 3)
prob <- c(rep(NA,4), "Nogood", NA, "Bad", "Nogood", "Norel", NA, "Bad", "Nogood")
example <- data.frame(Micro = mic, Paper = pap, Reference = ref, Problem = prob)
example
Example
I would like to merge cells vertically when consecutive cells have identical
values so I use flextable merge_v() function.
ft_example <- example %>%
flextable() %>%
merge_v(j = ~ Micro + Paper + Reference + Problem) %>%
theme_vanilla()
ft_example
I obtain the following table when knitting in Word:
Table obtained
Is there a way to:
Insert a posteriori the value "None identified" in the empty
cells in the "Problem" field that are merged together; and
Remove the inappropriate horizontal lines in the "Problem" field when
there is one (or more) not empty cells and some empty cells so that
there is one horizontal line clearly separating each combination of
Micro, Paper, Reference and horizontal lines separating only non
empty cells in the Problem field?
You can see the desired result here below:
Table desired

Display value other than 'size' with sunburstR

The following code generates a simple, interactive sunburst using sunburstR (example taken from https://timelyportfolio.github.io/sunburstR/articles/sunburst-2-0-0.html). When you scroll over any section it displays "size", and also wedges are colored according to "size". I would like to be able to manually specify a value other than 'size' that will come up when scrolled over and also that will be used to color the wedges. Is this possible? In other words, I would like to be able to have all wedges the same size, but be able to specify a different value for each wedge.
library(sunburstR)
library(htmltools)
library(d3r)
dat <- data.frame(
level1 = rep(c("a", "b"), each=3),
level2 = paste0(rep(c("a", "b"), each=3), 1:3),
size = c(10,5,2,3,8,6),
stringsAsFactors = FALSE
)
knitr::kable(dat)
tree <- d3_nest(dat, value_cols = "size")
tree
sb1 <- sunburst(tree, width="100%", height=400)
sb1
Just now seeing this and sorry for the delay. We can specify another field other than size with the valueField argument. See https://bl.ocks.org/timelyportfolio/616fc81b3bacee0d34a2975d53e9203a as an example.
library(treemap)
library(sunburstR)
library(d3r)
# use example from ?treemap::treemap
data(GNI2014)
tm <- treemap(GNI2014,
index=c("continent", "iso3"),
vSize="population",
vColor="continent",
type="index")
tm_nest <- d3_nest(
tm$tm[,c("continent", "iso3", "vSize", "color")],
value_cols = c("vSize", "color")
)
sunburst(
jsondata = tm_nest,
valueField = "vSize",
count = TRUE,
colors = htmlwidgets::JS("function(d){return d3.select(this).datum().color;}")
)
The prior example also shows how we can change color based on a column in the data.frame using a JavaScript function.
Here is another example controlling color https://github.com/timelyportfolio/sunburstR/issues/17#issuecomment-228448029.

R Find similar sentences in texts

I have a problem where I´m struggling to find a solution or an approach to solve it.
I have some model sentences, e.g.
model_sentences = data.frame("model_id" = c("model_id_1", "model_id_2"), "model_text" = c("Company x had 3000 employees in 2016.",
"Google makes 300 dollar in revenue in 2018."))
and some texts
data = data.frame("id" = c("id1", "id2"), "text" = c("Company y is expected to employ 2000 employees in 2020. This is an increase of 10%. Some stupid sentences.",
"Amazon´s revenue is 400 dollar in 2020. That is twice as much as last year."))
and I would like to extract sentences from those texts which are similar to the model sentences.
Something like this would be my desired solution
result = data.frame("id" = c("id1", "id2"), "model_id" = c("model_id_1", "model_id_2"), "sentence_from_data" = c("Company y is expected to employ 2000 employees in 2020.", "Amazon´s revenue is 400 dollar in 2020."), "score" = c(0.5, 0.4))
Maybe it is possible to find kind of a 'similarity_score'.
I use this function to split texts by sentence:
split_by_sentence <- function (text) {
result <-unlist(strsplit(text, "(?<=[[:alnum:]]{4}[?!.])\\s+", perl=TRUE))
result <- stri_trim_both(result)
result <- result [nchar (result) > 0]
if (length (result) == 0)
result <- ""
return (result)
}
But I have no idea how to compare each sentence to a model sentence.
I'm glad for any suggestions.
Check out this package stringdist
Example:
library(stringdist)
mysent = "This is a sentence"
apply(model_sentences, 1, function(row) {
stringdist(row['model_text'], mysent, method="jaccard")
})
It will return jaccard distance from mysent to model_text variable. The smaller the value is, the sentences are more similar in terms of given distance measure.

Referencing and manipulating data frame in foreach

for(artist in 1:nrow(epop8)){
message(paste("Artist:", artist))
id = epop8$spotify_id[artist];
cur_showDate = epop8$ShowDate[artist]
dma_show = epop8$DMA_Region_Code[artist]
spotifySubset = na.omit(spotify[spotify$spotify_id == id,])
if(nrow(spotifySubset) == 0){
epop8$last6month[artist] = NA
next
}
monthsOfInterest_6 = epop8$ShowDate[artist] - 180:1
monthsOfInterest_12 = epop8$ShowDate[artist] - 365:1
epop8Subset6MO = unique(epop8[epop8$ShowDate %in% monthsOfInterest_6,
c("spotify_id", "DMA_Region_Code", "ShowDate")]) %>%
subset(., DMA_Region_Code == dma_show) %>% arrange(desc(ShowDate))
epop8Subset1Yr = unique(epop8[epop8$ShowDate %in% monthsOfInterest_12,
c("spotify_id", "DMA_Region_Code", "ShowDate")]) %>%
subset(., DMA_Region_Code == dma_show) %>% arrange(desc(ShowDate))
last6month = epop8Subset6MO[spotifySubset$relatedID %in% epop8Subset6MO$spotify_id,]
last12month = epop8Subset1Yr[spotifySubset$relatedID %in% epop8Subset1Yr$spotify_id,]
epop8$since_related_artist[artist] = ifelse(!nrow(last6month) && !nrow(last12month),
365,
as.double(cur_showDate - last6month$ShowDate[1]))
epop8$related_artist_count_6MO[artist] = nrow(last6month)
epop8$related_artist_count_12MO[artist] = nrow(last12month)
}
I am simply trying to "convert" this for loop into a parallelized foreach loop. I've tried to comprehend how to even approach the foreach loop. I've tried
ntasks <- nrow(epop8)
pb <- tkProgressBar(max=nrow(epop8))
progress <- function(n) setTkProgressBar(pb, n)
opts <- list(progress=progress)
foreach(artist=1:nrow(epop8), .combine = combine,
.packages = "dplyr", .options.snow=opts) %dopar%{
...
}
Where "..." is the exact same code that is contained within my for loop above. Now, I know this isn't the way to do it necessarily but I'm not sure how to update my dataframe or subset my dataframe at all within a foreach parallelized loop. I've scoured the internet but I'm beginning to think that foreach is a little over my head.
We have two dataframes, epop8 and spotify. epop8 is a dataframe full
of receipts from shows around the US and spotify is spotify data for
each artist in epop8.
This code establishes a unique id, a show date, a DMA (region), and a
list of related musical artist IDs to the unique id (spotifySubset).
It then establishes an array of dates in the last 6 months and then
an array of dates in the last 12 months to compare to.
The full dataset of shows is then subsetted down to all shows in the
last 6 months and all shows in the last 12 months within the
specified region.
These two subsets are then subsetted once again to just include
related artists (as determined by spotify) and produces metrics based
on the last6month and last12month data frames.
Using those two dataframes I write results to the original epop8
dataframe for the current artist.
Here is what the datasets look like:
spotify = data.frame(artistName = "Bob", spotify_id = "abcd",
related = c("Al", "Bill", "Charles","Daniel"),
relatedID = c("1234", "efrd", "bcde", "fghi"))
epop8 = data.frame(id = c("abcd", "asdf", "robd", "1234"),
dma = c("654", "332", "489", "654"),
ShowDate = c("2017-10-08", "2011-10-04", "2012-10-01", "2017-08-01"))
So here, when we subset epop8 for the last 6 months and 12 months in the DMA for Bob (the first artist in epop8), we get that there is one match for a show in that DMA in the last 12 months -- Daniel on 2017-08-01.
Hopefully this all makes sense, I'd really love your help on this one!

Resources