Change name of groups in bal.plot - r

I am trying to visualize results from MatchIt procedure with bal.plot() from cobalt package.
It works just fine, except I would like to change the lables for the group which by default are "Unadjusted sample" and "Adjusted sample".
bal.plot(AHEAD_nomiss, var.name = "KCH_TKS", which = "both",
type = "histogram", mirror = F,
weights = AHEAD_nomiss$att.weights, treat = AHEAD_nomiss$group)

Author of cobalt package here. Thank you for using my package!
Edit. Original post at the bottom.
I just added some functionality to bal.plot for this in the development version of cobalt, which can be installed with devtools::install_github("ngreifer/cobalt"). Use the sample.names argument to supply a vector of names to give bal.plot and they'll appear in the facet labels. The vector should be as long as the number of samples (in your case, 2). Your new code should look like this:
bal.plot(AHEAD_nomiss, var.name = "KCH_TKS", which = "both",
type = "histogram", mirror = F,
weights = AHEAD_nomiss$att.weights, treat = AHEAD_nomiss$group,
sample.names = c("UNWEIGHTED", "WEIGHTED"))
Of course you can change the names. If you don't want to install the development version of cobalt (it't not guaranteed to be stable), you can use my solutions below.
I didn't intend bal.plot to be used for publication so I didn't make it super flexible, unlike love.plot. One thing you can do is manually program the histograms using ggplot2. Of course, this requires you learning how to use ggplot2, which can be a challenge, and looking at the source code of bal.plot probably won't help because of all the checks and transformations that occur. Here's some code that might work for you:
unweighted <- data.frame(KCH_TKS = AHEAD_nomiss$KCH_TKS,
treat = factor(AHEAD_nomiss$group),
weights = 1,
adj = "UNWEIGHTED",
stringsAsFactors = FALSE)
weighted <- data.frame(KCH_TKS = AHEAD_nomiss$KCH_TKS,
treat = factor(AHEAD_nomiss$group),
weights = AHEAD_nomiss$att.weights,
adj = "WEIGHTED",
stringsAsFactors = FALSE)
data <- rbind(unweighted, weighted)
ggplot(data, aes(x = KCH_TKS, fill = treat)) +
geom_histogram(aes(weight = weights), bins = 10, alpha = .4, color = "black") +
facet_grid(~adj)
One way you can hack bal.plot is to provide a set of weights that are all equal to 1 as well as your desired weights and leave which at its default. If you give the weights names, those names will appear on the facet labels. So, for your example, try
bal.plot(AHEAD_nomiss, var.name = "KCH_TKS",
type = "histogram", mirror = F,
weights = list(UNWEIGHTED = rep(1, nrow(AHEAD_nomiss),
WEIGHTED = AHEAD_nomiss$att.weights),
treat = AHEAD_nomiss$group)
You should see that "UNWEIGHTED" and "WEIGHTED" are the new facet label names. You can of course change them to be whatever you want.

Related

How to specify tm_fill() if I want it to be a variable from a new object?

I am trying to create an R function that would run a GWR on variables that the user specifies from a Spatial Polygons Data Frame. The end result of running the function are two mappings - one of the independent variable's values and one of the coefficient values from the GWR model. I'm having trouble with the second map.
I have managed to create the GWR model and a 'results' object for the coefficients that I would be visualizing.
gwr.model <- gwr(SpatialPolygonsDataFrame#data[, y] ~ SpatialPolygonsDataFrame#data[, x],
data = SpatialPolygonsDataFrame,
adapt = GWRbandwidth,
hatmatrix = TRUE,
se.fit = TRUE)
results <- as.data.frame(gwr.model$SDF)
gwr.map <- SpatialPolygonsDataFrame
gwr.map#data <- cbind(SpatialPolygonsDataFrame#data, as.matrix(results))
To create the visualization of the GWR coefficients, I have to specify my tm_fill() to be a column from the 'results' object, but I do not know how to do it so that the function may be used will any Spatial Polygons Data Frame. So far, I have tried using the paste0() function, as so:
map2 <- tm_shape(gwr.map) + tm_fill(paste0("SpatialPolygonsDataFrame.", x), n = 5, style = "quantile", title = "Coefficient") +
tm_layout(frame = FALSE, legend.text.size = 0.5, legend.title.size = 0.6)
But I got an error saying that the fill argument is neither colors nor a valid variable name.
I'll be grateful for any tips that could help me resolve the issue.
Switching to the package sf - leaving sp behind - probably will solve your problem here.
In the absence of a reproducible example, let me try to suggest the following here:
convert your results with gwr.map.sf <- sf::st_as_sf(gwr.map). Then you add the results of your GWR simply as a new column: gwr.map$results <- results (my understanding is that the dimensions should fit).
Finally you should be able to plot like this:
map2 <- tm_shape(gwr.map.sf) + tm_fill("results", n = 5, style = "quantile", title = "Coefficient") +
tm_layout(frame = FALSE, legend.text.size = 0.5, legend.title.size = 0.6)

R-package beeswarm generates same x-coordinates

I am working on a script where I need to calculate the coordinates for a beeswarm plot without immediately plotting. When I use beeswarm, I get x-coordinates that aren't swarmed, and more or less the same value:
But if I generate the same plot again it swarms correctly:
And if I use dev.off() I again get no swarming:
The code I used:
n <- 250
df = data.frame(x = floor(runif(n, 0, 5)),
y = rnorm(n = n, mean = 500, sd = 100))
#Plot 1:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
#Plot 2:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
dev.off()
#Plot 3:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
It seems to me like beeswarm uses something like the current plot parameters (or however it is called) to do the swarming and therefore chokes when a plot isn't showing. I have tried to play around with beeswarm parameters such as spacing, breaks, corral, corralWidth, priority, and xlim, but it does not make a difference. FYI: If do.plot is set to TRUE the x-coordinates are calculated correctly, but this is not helpful as I don't want to plot immediately.
Any tips or comments are greatly appreciated!
You're right; beeswarm uses the current plot parameters to calculate the amount of space to leave between points. It seems that setting "do.plot=FALSE" does not do what one would expect, and I'm not sure why I included this parameter.
If you want to control the parameters manually, you could use the functions swarmx or swarmy instead. These functions must be applied to each group separately, e.g.
dfsplitswarmed <- by(df, df$x, function(aa) swarmx(aa$x, aa$y, xsize = 0.075, ysize = 7.5, cex = 1, log = ""))
dfswarmed <- do.call(rbind, dfsplitswarmed)
plot(dfswarmed)
In this case, I set the xsize and ysize values based on what the function would default to for this particular data set. If you can find a set of xsize/ysize values that work for your data, this approach might work for you.
Otherwise, perhaps a simpler approach would be to leave do.plot=TRUE, and then discard the plots.

Generating Clusplot for K Prototype in R

I have a customer dataset with a mix continuous and categorical variables, and would like to do cluster the customers into groups. Am trying to use k prototype for the first time, but how would I get a nice, visual representation similar to cusplot for kmeans?
install.packages("clustMixType")
library(clustMixType)
data = read.csv("customerdata.csv", header = TRUE)
kproto = kproto(data, k=5, lambda = NULL, iter.max = 100, nstart = 1,
keep.data = TRUE)
clprofiles(kproto, data, vars = NULL, col = NULL)
Don't rely on a black box function.
Study what clusplot does, and adapt it to suit your needs exactly. Get the source code, and check what it does.
The answer lies in your code where krpoto is the object and data as the data frame
clprofiles(kproto, data, vars = NULL, col = NULL)

How to plot an nmds with coloured/symbol points based on SIMPROF

Hi so i am trying to plot my nmds of a assemblage data which is in a bray-curtis dissimilarity matrix in R. I have been able to apply ordielipse(),ordihull() and even change the colours based on group factors created by cutree() of a hclst()
e.g using the dune data from the vegan package
data(dune)
Dune.dis <- vegdist(Dune, method = "bray)
Dune.mds <- metaMDS(Dune, distance = "bray", k=2)
#hierarchical cluster
clua <- hclust(Dune.dis, "average")
plot(clua, hang = -1)
# set groupings
rect.hclust(clua, 4)
grp <- cutree(clua, 4)
#plot mds
plot(Dune.mds, display = "sites", type = "text", cex = 1.5)
#show groupings
ordielipse(Dune.mds, group = grp, border =1, col ="red", lwd = 3)
or even colour the points just by the cutree
colvec <- c("red2", "cyan", "deeppink3", "green3")
colvec[grp]
plot(Dune.mds, display = "sites", type = "text", cex = 1.5) #or use type = "points"
points(P4.mds, col = colvec[c2], bg =colvec[c2], pch=21)
However what i really want to do is use the SIMPROF function using the package "clustsig" to then colour the points based on significant groupings - this is more of a technical coding language thing - i am sure there is a way to create a string of factors but i am sure there is a more efficient way to do it
heres my code so far for that:
simp <- simprof(Dune.dis, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "braycurtis", alpha = 0.05, sample.orientation = "row")
#plot dendrogram
simprof.plot(simp, plot = TRUE)
Now i am just not sure how do the next step to plot the nmds using the groupings defined by the SIMPROF - how do i make the SIMPROF results a factor string without literally typing it my self it myself?
Thanks in advance.
You wrote you know how to get colours from an hclust object with cutree. Then read the documentation of clustsig::simprof. This says that simprof returns an hclust object within its result object. It also returns numgroups which is the suggested number of clusters. Now you have all information you need to use the cutree of hclust you already know. If your simprof result is called simp, use cutree(simp$hclust, simp$numgroups) to extract the integer vector corresponding to the clustsig::simprof result, and use this to colours.
I have never used simprof or clustsig, but I gathered all this information from its documentation.

Superheat Use Given Cluster Group

I generated dendrogram and have obtained k clusters. I wanted to obtain heatmap by using the existing dendrogram and clusters. I want to display my heatmap in multiple pages so I did them one by one in separate calls. To make the colour chart comparable, I also set pal.values. Here’s my code sample:
sh <- superheat(X = cbind(dt[grep("Action", colnames(dt))], profile),
heat.pal = brewer.pal(8, 'RdBu')[8:1],
heat.pal.values = c(-1.4:2.3),
bottom.label = 'variable',
grid.hline.col = '#F0EBEB',
grid.vline.col = '#F0EBEB',
smooth.heat = TRUE,
membership.rows = hc$cluster,
column.title = 'factors', row.title = 'Clusters',
bottom.label.text.size = 1.5, bottom.label.text.angle = 90)
sh$membership.rows == hc$cluster
From there, when I did checking if the memberships are matched, its displayed a mix of TRUE AND FALSE. I couldn’t find out what’s wrong with my code. Would you please enlighten me?

Resources