Visualizing hierarchical data with circle packing in ggplot2? - r

I have some hierarchical data, e.g.,
> library(dplyr)
> df <- data_frame(id = 1:6, parent_id = c(NA, 1, 1, 2, 2, 5))
> df
Source: local data frame [6 x 2]
id parent_id
(int) (dbl)
1 1 NA
2 2 1
3 3 1
4 4 2
5 5 2
6 6 5
I would like to plot the tree in a "top down" view through a circle packing plot:
http://bl.ocks.org/mbostock/4063530
The above link is for a d3 library. Is there an equivalent that allows me to make such a plot in ggplot2?
(I want this plot in a shiny app, which does support d3, but I haven't used d3 before and am unsure about the learning curve. If d3 is the obvious choice, I will try to get that working instead. Thanks.)

There were two steps: (1) aggregate the data, then (2) convert to json. After that, all the javascript has been written in that example page, so you can just plug in the resulting json data.
Since the aggregated data should have a similar structure to a treemap, we can use the treemap package to do the aggregation (could also use a loop with successive aggregation). Then, d3treeR (from github) is used to convert the treemap data to a nested list, and jsonlite to convert the list to json.
I'm using some example data GNI2010, found in the d3treeR package. You can see all of the source files on plunker.
library(treemap)
library(d3treeR) # devtools::install_github("timelyportfolio/d3treeR")
library(data.tree)
library(jsonlite)
## Get treemap data using package treemap
## Using example data GNI2010 from d3treeR package
data(GNI2010)
## aggregate by these: continent, iso3,
## size by population, and color by GNI
indexList <- c('continent', 'iso3')
treedat <- treemap(GNI2010, index=indexList, vSize='population', vColor='GNI',
type="value", fun.aggregate = "sum",
palette = 'RdYlBu')
treedat <- treedat$tm # pull out the data
## Use d3treeR to convert to nested list structure
## Call the root node 'flare' so we can just plug it into the example
res <- d3treeR:::convert_treemap(treedat, rootname="flare")
## Convert to JSON using jsonlite::toJSON
json <- toJSON(res, auto_unbox = TRUE)
## Save the json to a directory with the example index.html
writeLines(json, "d3circle/flare.json")
I also replaced the source line in the example index.html to
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js"></script>
Then fire up the index.html and you should see
To create the shiny bindings should be doable using htmlwidgets and following some examples (the d3treeR source has some). Note that certain things aren't working, like the coloring. The json that gets stored here actually contains a lot of information about the nodes (all the data aggregated using the treemap) that you could leverage in the figure.

Related

How to create a new variable using the Mutate function?

I need to add to my data a new variable, but I would like to do it using the mutate function. How can I do it? ISLR library
Create a new variable called "HighVol" that has the classes "yes" and "no"
to indicate whether the location sold 10,000 units or more in the past year.
How many stores produced a high volume?
Example below.
carseats.df$HighVol <- factor(carseats.df$HighVol,
levels = c(0,1),
labels = c("No", "Yes"))
You are going to include the entire data frame if you use mutate. You'll want to whole data frame if the assignment of yes or no is conditionally based on sales.
library(tidyverse)
# create carseats.df
set.seed(39582) # make it repeatable
carseats.df <- data.frame(sales = rnorm(100, 10000, 505))
# now create conditional variable
carseats.df <- carseats.df %>%
mutate(HighVol = ifelse(sales > 10000, # true or false
"yes", # result if true
"no") %>%
as.factor()) # result if false
head(carseats.df)
# sales HighVol
# 1 9992.190 yes
# 2 10077.482 no
# 3 9507.145 yes
# 4 10780.788 no
# 5 10433.133 no
# 6 10907.665 no
It looks like you're fairly new to SO; welcome to the community! If you want great answers quickly, it's best to make your question reproducible. This includes sample data like the output from dput(head(dataObject))) and any libraries you are using. Check it out: making R reproducible questions.
The reason you haven't seen any help is most likely due to the lag of meaningful tags. You only have the tag tree which isn't meaningful. At a minimum, you would want to include a tag for the programming language: r. You could also add things like mutate or the library it's derived from, dplyr.

R) Seurat: grouping samples

I am analyzing six single-cell RNA-seq datasets with Seurat package.
These 6 datasets were acquired through each different 10X running, then combined with batch effect-corrected via Seurat function "FindIntegrationAnchors".
Meanwhile, among the 6 datasets, data 1, 2, 3 and 4 are "untreated" group, while data 5 and 6 belongs to "treated" group.
I merged all the 6 datasets together with batch-corrected, but I also need to compare features of "untreated" vs "treated".
How can I group data 1,2,3 and 4 into "untreated group", and data 5 and 6 into "treated group", and then perform downstream analysis?
Thanks.
One quick and dirty way to do this, is to add the information before merging the Seurat objects:
...
so_samples[[1]]#meta.data$treatment <- "control"
so_samples[[2]]#meta.data$treatment <- "control"
so_samples[[3]]#meta.data$treatment <- "control"
so_samples[[4]]#meta.data$treatment <- "control"
so_samples[[5]]#meta.data$treatment <- "treated"
so_samples[[6]]#meta.data$treatment <- "treated"
...
anchors <- FindIntegrationAnchors(object.list = so_samples, dims = 1:20)
so_all_samples <- IntegrateData(anchorset = anchors, dims = 1:20)
In general, it would be better to load such meta data from a file and join it to the seurat object without such error-prone copy-paste code. Also note that it is in general a bad idea to modify R S4 objects (those where you can access elements with #) like this, but the functions provided to modify Seurat objects provided by the Seurat package are so cumbersome to use that I doubt they will ever change the underlying data structure.

Extracting from the data frame produced using GageRR/GageRRDesign in R

How do I extract the 'VarCompContrib" column in the data frame produced using the gageRR function in R?
This is for a GageRR analysis of a measurement system. I'm trying to make a very user friendly program where other people can just enter the information required, like number of operators, parts, and measurements, as well as the measurements themselves, and output the correct analysis. I'm gonna use an if-statement later on to do the "analysis" portion, but I am having trouble actually managing the data frame produced with gageRR.
library(MASS)
library(Rsolnp)
library(qualityTools)
design = gageRRDesign(Operators=3, Parts=10, Measurements=2, randomize=FALSE)
response(design) = c(23,22,22,22,22,25,23,22,23,22,20,22,22,22,24,25,27,28,
23,24,23,24,24,22,22,22,24,23,22,24,20,20,25,24,22,24,21,20,21,22,21,22,21,
21,24,27,25,27,23,22,25,23,23,22,22,23,25,21,24,23)
gdo=gageRR(design)
plot(gdo)
I am looking to get a 7 number column vector under VarCompContrib
For starters, you can look at the structure of gdo with str(gdo). From there, we see that Varcomp is a slot, so we can access it with gdo#Varcomp and just convert it to a data.frame:
library(qualityTools)
design <- gageRRDesign(Operators = 3, Parts = 10, Measurements = 2, randomize = FALSE)
response(design) <- c(
23,22,22,22,22,25,23,22,23,22,20,22,22,22,24,25,27,28,23,24,23,24,24,22,22,22,24,23,22,24,
20,20,25,24,22,24,21,20,21,22,21,22,21,21,24,27,25,27,23,22,25,23,23,22,22,23,25,21,24,23
)
gdo <- gageRR(design)
data.frame(gdo#Varcomp)
# totalRR repeatability reproducibility a a_b bTob totalVar
# 1 1.66441 1.209028 0.4553819 0.4553819 0 1.781211 3.445621

How to write contents of data frame back to range?

I need to perform the following sequence:
Open Excel Workbook
Read specific worksheet into R dataframe
Read from a database updating dataframe
Write dataframe back to worksheet
I have steps 1-3 working OK using the BERT tool. (the R scripting interface)
For step 2 I use range.to.data.frame from BERT
Any pointer on how to perform step 4? There is no data.frame.to.range
I tried range$put_Value(df) but no error return and no update to Excel
I can update a single cell from R using put_Value - which I cannot see documented
#
# manipulate status data using R BERT tool
#
wb <- EXCEL$Application$get_ActiveWorkbook()
wbname = wb$get_FullName()
ws <- EXCEL$Application$get_ActiveSheet()
topleft = ws$get_Range( "a1" )
rng = topleft$get_CurrentRegion()
#rngbody = rng$get_Offset(1,0)
ssot = rng$get_Value()
ssotdf = range.to.data.frame( ssot, headers=T )
# emulate data update on 2 columns
ssotdf$ServerStatus = "Disposed"
ssotdf$ServerID = -1
# try to write df back
retcode = rng$put_Value(ssotdf)
This answer doesn't use R Excel BERT.
Try the openxlsx library. You probably can do all the steps using that library. For the step 4, after installing openxlsx library, the following code will write a file:
openxlsx::write.xlsx(ssotdf, 'Dataframe.xlsx',asTable = T)
I think your problem is that you are not changing the size of the range, so you are not going to see your new columns. Try creating a new range that has two extra columns before you insert the data.
I just had the same question and was able to resolve it by transforming the data.frame to a matrix in the call to put_value. I figured this out after playing with the old version in excel-functions.r. Try something like:
retcode = rng$put_Value(as.matrix(ssotdf))
You may have already solved your problem but, if not, the following stripped down R function does what I think you need:
testDF <- function(rng1,rng2){
app <- EXCEL$Application
ref1 <- app$get_Range( rng1 ) # get source range reference
data <- ref1$get_Value() # get source range data
#
ref2 <- app$get_Range( rng2 ) # get destination range reference
ref2$put_Value( data ) # put data in destination range
}
I simulated a dataframe by setting values in range "D4:F6" of the speadsheet to:
col1 col2 col3
1 2 txt1
7 3 txt2
then ran
testDF("D4:F6","H10:J12")
in the Bert console. The dataframe then appears in range "H10:J12".

Have trouble running googlevis with my dataset

I am new to R programming. I was trying to visualize some dataset. I was using Googlevis in R and was unable to visualize it.
The error I got was:
Error: Length of logical index vector must be 1 or 8, got: 14835
Can someone help?
Dataset is here:
https://www.kaggle.com/c/predict-west-nile-virus/data
Code is below
# Read competition data files:
library(readr)
data_dir <- "C:/Users/Wesley/Desktop/input"
train <- read_csv(file.path(data_dir, "train.csv"))
spray <- read_csv(file.path(data_dir, "spray.csv"))
# Generate output files with write_csv(), plot() or ggplot()
# Any files you write to the current directory get shown as outputs
# Install and read packages
library(lubridate)
library(googleVis)
# Create useful date columns
spray$Date <- as.Date(as.character(spray$Date),format="%Y-%m-%d")
spray$Week <- isoweek(spray$Date)
spray$Year <- year(spray$Date)
# Create a total count of measurements
spray$Total <- 1
for(i in 1:nrow(spray)) {
spray$Total[i] = i
}
# Aggregate data by Year, Week, Trap and order by old-new
spray_agg <- aggregate(cbind(Total)~Year+Week+Latitude+Longitude,data=spray,sum)
spray_agg <- spray[order(spray$Year,spray$Week),]
# Create a misc format for Week for Google Vis Motion Chart
spray_agg$Week_Format <- paste(spray_agg$Year,"W",spray_agg$Week,sep="")
# Function to create a motion chart together with a overview table
# It takes the aggregated data as input as well as a year of choice (2007,2009,2011,2013)
# It filters out "no presence" weeks since they distort the graphical view
# Next to that it creates an overview table of that year
# With gvisMerge you can merge the 3 html outputs into 1
create_motion <- function(data=spray_agg,year=2011){
data_motion <- data[data$Year==year]
motion <- gvisMotionChart(data=data_motion,idvar="Total",timevar="Week_Format",xvar="Longitude",yvar="Latitude"
,sizevar=0.1,colorvar="Blue",options=list(width="600"))
return(motion)
}
# Get the per year motion charts
#motion1 <- create_motion(spray_agg,2007)
#motion2 <- create_motion(spray_agg,2009)
motion3 <- create_motion(spray_agg,2011) : (Error: Length of logical index vector must be 1 or 8, got: 14835)
motion4 <- create_motion(spray_agg,2013) :(Error: Length of logical index vector must be 1 or 8, got: 14835)
# Merge them together into 1 dashboard
output <- gvisMerge(gvisMerge(motion1,motion2,horizontal=TRUE),gvisMerge(motion3,motion4,horizontal=TRUE),horizontal=FALSE)
plot(output)
# Plot the output in your browser

Resources