using duplicate factor to plot using ggplot2

using duplicate factor to plot using ggplot2 - r

I am trying to plot a ggplot_dumbbell with the following code:
library(ggplot2)
library(ggalt)
theme_set(theme_classic())
df_senPhi <- structure(list(phi = c(0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
0.8, 0.9, 0.9, 1), W = c(7833625.7334, 8291583.0188, 8762978.0131,
8169317.158, 8460793.8918, 8765222.8718, 8266025.5499, 8311199.2075,
8265304.816, 8289392.5799, 8273733.0523, 8284554.5615), Type = c("A, B, C",
"A, B, C", "A, B, C", "D, E", "D, E", "D, E", "F, G", "F, G",
"H, I", "H, I", "I, J", "I, J"), pChange = c(-0.0533144181552553,
0.00202924695507283, 0.0589968453118437, -0.0127464560859453,
0.0224782062508261, 0.0592681341679742, -0.00105934677399903,
0.00439984310620854, -0.00114644672167306, 0.00176453467558519,
-0.000127903066776307, 0.00117986514708678)), class = "data.frame", row.names = c(NA,
-12L), .Names = c("phi", "W", "Type", "pChange"))
df_senPhi$phi <- factor(df_senPhi$phi, levels=as.character(df_senPhi$phi)) # for right ordering of the dumbells
gg <- ggplot(df_senPhi, aes(x=0, xend=pChange, y=phi, color = Type)) +
geom_dumbbell(#colour="#a3c4dc",
size=0.75,
colour_xend="#0e668b") +
scale_x_continuous(label=scales::percent)
plot(gg)
If you run this code, you will get a warning saying "duplicate levels in factors are deprecated".
If you look closely in the df_senPhi you can see 12 records. However while plotting, only 11 records are plotted. Also the 10th and the 11th records have the same phi value in the data frame which are associated in to the same level. That is also causing the overlapping of the two phi bars in the plot (probably that's why I'm seeing only 11 dumbbells).
I want all 12 records to be plotted such that the second 0.9 phi's dumbbell appears just above the first just like they were two different values.
Is there a way to achieve this ?

used a bit of dplyr
but it seems to get what you are looking for
df_senPhi %>%
mutate(row = 1:n()) %>%
ggplot(aes(0, row, color = Type)) +
geom_dumbbell(aes(xend = pChange)) +
scale_y_continuous(labels = factor(df_senPhi$phi),
breaks = 1:12)

Related

Zoom in rectangularly in map

I'm trying to have a rectangular "zoom in" into my chart. So far, I can create the chart itself and a smaller version, but I haven't figured out how to zoom in rectangularly.
(Builds on 1 and 2)
library(sf)
library(dplyr)
library(tmap)
Get shape files for Germany [55 MB]. In Germany zip codes are called Postleitzahlen (PLZ).
germany <- read_sf("data/OSM_PLZ.shp")
Create some arbitrary groups:
germany <- germany %>%
mutate(plz_groups = case_when(
substr(plz, 1, 1) == "1" ~ "Group A",
substr(plz, 2, 2) == "2" ~ "Group B",
substr(plz, 3, 3) == "2" ~ "Group C",
TRUE ~ "Group X" # rest
))
Make plot filling by PLZ:
map_de <- tm_shape(germany) +
tm_fill(col = "plz_groups")
map_de
germany_zoomin <- germany %>%
filter(substr(plz, 1, 1) == "4")
map_zoomin <- tm_shape(germany_zoomin) +
tm_fill(col = "plz_groups")
So I can create a zoomed in chart, but this is NOT what I want:
map_zoomin
print(map_de, vp = grid::viewport(0.8, 0.185, width = 0.2, height = 0.45))
# tmap_save("test.png")
Instead, I would like to specify the location, e.g. the PLZ of Cologne (50667) and draw a rectangular box around it.

creating kendall correlation matrix

i have data that looks like this :
in total 38 columns .
data code sample :
df <- structure(
list(
Christensenellaceae = c(
0.010484508,
0.008641566,
0.010017172,
0.010741488,
0.1,
0.2,
0.3,
0.4,
0.7,
0.8,
0.9,
0.1,
0.3,
0.45,
0.5,
0.55
),
Date=c(27,27,27,27,27,27,27,27,28,28,28,28,28,28,28,28),
Treatment = c(
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 2",
"Treatment 2",
"Treatment 2",
"Treatment 2",
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 2",
"Treatment 2",
"Treatment 2",
"Treatment 2"
)
),class = "data.frame",
row.names = c(NA,-9L)
)
whay i wish to do is to create kendall correlation matrix (the data doesnt have linear behavor) between the treatment types(10 in total but 2 in example)for every column (except treatment and date) so in total 36 correlation matrix with size 1010 (here will be 22) .
this is my code:
res2 <- cor(as.matrix(data),method ="kendall")
but i get the error:
Error in cor(data, method = "kendall") : 'x' must be numeric
is there any way to solve this ? thank you:)

You can do that using a tidyverse approach by first making some data wrangling and then using correlate to calculate the correlation in pairs for every combination of variables.
library(corrr)
library(tidyverse)
df |>
# Transform data into wide format
pivot_wider(id_cols = Date,
names_from = Treatment,
values_from = -starts_with(c("Treatment", "Date"))) |>
# Unnest lists inside each column
unnest(cols = starts_with("Treatment")) |>
# Remove Date from the columns
select(-Date) |>
# Correlate all columns using kendall
correlate(method = "kendall")
# A tibble: 2 x 3
# term `Treatment 1` `Treatment 2`
# <chr> <dbl> <dbl>
#1 Treatment 1 NA 0.546
#2 Treatment 2 0.546 NA

Is there a way to colour code directionality in an igraph?

I have some data that shows Twitter connections between people (i.e. people that tag other users in their tweets) and would like to map out the connections between people. In some cases the relationship is reciprocal, as in both people have tagged the other while some people have been tagged but have not tweeted.
In the example below, Person A has tagged Person B and Person C, while Person C has only tagged Person B. The arrows are unidirectional from Person A -> Person C and from Person C -> Person B, but bidirectional between Person A <-> Person B. Is it possible to makes these arrows different colours?
library(igraph)
df <- data.frame (from = c("Person A", "Person A", "Person B", "Person C"),
to = c ("Person B", "Person C", "Person A", "Person B"),
weight = c (1, 3, 4, 5)
)
g_1 <- graph.data.frame(df,
directed = TRUE)
set.seed(123)
plot (g_1,
edge.width = E(g_1)$weight)

It is possible to choose edge color specifing color argument of E and it is possible to find reciprocical edge thanks to is.mutual() function :
E(g_1)$color <- "grey50"
E(g_1)$color[is.mutual(g_1)] = "red"
plot(g_1, edge.width = E(g_1)$weight)

You can use the duplicated() function to colourize bidirectional edges (taken from R reciprocal edges in igraph in R and modified for colouring instead of curving):
E(g_1)[duplicated(E) | duplicated(E,fromLast =TRUE)]$color <- "red"
Complete example:
library(igraph)
df <- data.frame (from = c("Person A", "Person A", "Person B", "Person C"),
to = c ("Person B", "Person C", "Person A", "Person B"),
weight = c (1, 3, 4, 5)
)
g_1 <- graph.data.frame(df,
directed = TRUE)
set.seed(123)
E <- t(apply(get.edgelist(g_1),1,sort))
E(g_1)$color <- "grey50"
E(g_1)[duplicated(E) | duplicated(E,fromLast =TRUE)]$color <- "red"
plot (g_1, edge.width = E(g_1)$weight)

Riverplot package in R - output plot covered in gridlines or outlines

I've made a Sankey diagram in R Riverplot (v0.5), the output looks OK small in RStudio, but when exported or zoomed in it the colours have dark outlines or gridlines.
I think it may be because the outlines of the shapes are not matching the transparency I want to use for the fill?
I possibly need to find a way to get rid of outlines altogether (rather than make them semi-transparent), as I think they're also the reason why flows with a value of zero still show up as thin lines.
my code is here:
#loading packages
library(readr)
library("riverplot", lib.loc="C:/Program Files/R/R-3.3.2/library")
library(RColorBrewer)
#loaing data
Cambs_flows <- read_csv("~/RProjects/Cambs_flows4.csv")
#defining the edges
edges = rep(Cambs_flows, col.names = c("N1","N2","Value"))
edges <- data.frame(edges)
edges$ID <- 1:25
#defining the nodes
nodes <- data.frame(ID = c("Cambridge","S Cambs","Rest of E","Rest of UK","Abroad","to Cambridge","to S Cambs","to Rest of E","to Rest of UK","to Abroad"))
nodes$x = c(1,1,1,1,1,2,2,2,2,2)
nodes$y = c(1,2,3,4,5,1,2,3,4,5)
#picking colours
palette = paste0(brewer.pal(5, "Set1"), "90")
#plot styles
styles = lapply(nodes$y, function(n) {
list(col = palette[n], lty = 0, textcol = "black")
})
#matching nodes to names
names(styles) = nodes$ID
#defining the river
r <- makeRiver( nodes, edges,
node_labels = c("Cambridge","S Cambs","Rest of E","Rest of UK","Abroad","to Cambridge","to S Cambs","to Rest of E","to Rest of UK","to Abroad"),
node_styles = styles)
#Plotting
plot( r, plot_area = 0.9)
And my data is here
dput(Cambs_flows)
structure(list(N1 = c("Cambridge", "Cambridge", "Cambridge",
"Cambridge", "Cambridge", "S Cambs", "S Cambs", "S Cambs", "S Cambs",
"S Cambs", "Rest of E", "Rest of E", "Rest of E", "Rest of E",
"Rest of E", "Rest of UK", "Rest of UK", "Rest of UK", "Rest of UK",
"Rest of UK", "Abroad", "Abroad", "Abroad", "Abroad", "Abroad"
), N2 = c("to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK",
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK",
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK",
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK",
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK",
"to Abroad"), Value = c(0L, 1616L, 2779L, 13500L, 5670L, 2593L,
0L, 2975L, 4742L, 1641L, 2555L, 3433L, 0L, 0L, 0L, 6981L, 3802L,
0L, 0L, 0L, 5670L, 1641L, 0L, 0L, 0L)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -25L), .Names = c("N1", "N2",
"Value"), spec = structure(list(cols = structure(list(N1 = structure(list(), class = c("collector_character",
"collector")), N2 = structure(list(), class = c("collector_character",
"collector")), Value = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("N1", "N2", "Value")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

The culprit is a line in riverplot::curveseg. We can hack this function to fix it, or there is also a very simple workaround that does not require hacking the function. In fact, the simple solution is probably preferably in many cases, but first I explain how to hack the function, so we understand why the workaround also works. Scroll to the end of this answer if you only want the simple solution:
UPDATE: The change suggested below has now been implemented in riverplot version 0.6
To edit the function, you can use
trace(curveseg, edit=T)
Then find the line near the end of the function that reads
polygon(c(xx[i], xx[i + 1], xx[i + 1], xx[i]), c(yy[i],
yy[i + 1], yy[i + 1] + w, yy[i] + w), col = grad[i],
border = grad[i])
We can see here that the package authors chose not to pass the lty parameter to polygon (UPDATE: see this answer for an explanation of why the package author did it this way). Change this line by adding lty = 0 (or, if you prefer, border = NA) and it works as intended for OPs case. (But note that this may not work well if you wish to render a pdf - see here)
polygon(c(xx[i], xx[i + 1], xx[i + 1], xx[i]), c(yy[i],
yy[i + 1], yy[i + 1] + w, yy[i] + w), col = grad[i],
border = grad[i], lty=0)
As a side note, this also explains the somewhat odd reported behaviour in the comments that "if you run it twice, the second time the plot looks OK, although export it and the lines come back". When lty is not specified in a call to polygon, the default value it uses is lty = par("lty"). Initially, the default par("lty") is a solid line, but after running the riverplot function once, par("lty") gets set to 0 during a call to riverplot:::draw.nodes thus, suppressing the lines when riverplot is run a 2nd time. But if you then try to export the image, opening a new device resets par("lty") to its default value.
An alternative way to update the function with this edit is to use assignInNamespace to overwrite the package function with your own version. Like this:
curveseg.new = function (x0, x1, y0, y1, width = 1, nsteps = 50, col = "#ffcc0066",
grad = NULL, lty = 1, form = c("sin", "line"))
{
w <- width
if (!is.null(grad)) {
grad <- colorRampPaletteAlpha(grad)(nsteps)
}
else {
grad <- rep(col, nsteps)
}
form <- match.arg(form, c("sin", "line"))
if (form == "sin") {
xx <- seq(-pi/2, pi/2, length.out = nsteps)
yy <- y0 + (y1 - y0) * (sin(xx) + 1)/2
xx <- seq(x0, x1, length.out = nsteps)
}
if (form == "line") {
xx <- seq(x0, x1, length.out = nsteps)
yy <- seq(y0, y1, length.out = nsteps)
}
for (i in 1:(nsteps - 1)) {
polygon(c(xx[i], xx[i + 1], xx[i + 1], xx[i]),
c(yy[i], yy[i + 1], yy[i + 1] + w, yy[i] + w),
col = grad[i], border = grad[i], lty=0)
lines(c(xx[i], xx[i + 1]), c(yy[i], yy[i + 1]), lty = lty)
lines(c(xx[i], xx[i + 1]), c(yy[i] + w, yy[i + 1] + w), lty = lty)
}
}
assignInNamespace('curveseg', curveseg.new, 'riverplot', pos = -1, envir = as.environment(pos))
Now for the simple solution, which does not require changes to the function:
Just add the line par(lty=0) before you plot!!!

Here is the author of the package. I am now struggling for a satisfactory solution to be included in the next version of the package.
The problem is with how R renders PDFs as compared to bitmaps. In the original version of the package, indeed I passed on lty=0 to polygon() (you can still see it in the commented source code). However, polygon w/o borders looks good only on the png graphics. In the pdf output, thin white lines appear between the polygons. Take a look:
cc <- "#E41A1C90"
plot.new()
rect(0.2, 0.2, 0.4, 0.4, col=cc, border=NA)
rect(0.4, 0.2, 0.6, 0.4, col=cc, border=NA)
dev.copy2pdf(file="riverplot.pdf")
In X or on png, the output is correct. However, if rendered as PDF, you will see a thin white line between the recangles:
When you render a riverplot graphics as PDF like the one above, this looks really bad:
I therefore forced adding borders, however forgot about checking transparency. When no transparency is used, this looks OK -- the borders overlap with the polygons as well as which each other, but you cannot see it. The PDF is now acceptable. However, it messes up the figure if you have transparency.
EDIT:
I have now uploaded version 0.6 of riverplot to CRAN. Besides some new stuff (you can now add riverplot to any part of an existing drawing), by default it uses lty=0 again. However, there is now an option called "fix.pdf" which you can set to TRUE in order to draw the borders around the segments again.
Bottom line, and solutions for now:
Use riverplot 0.6`
If you want to render a PDF, don't use transparency and use fix.pdf=TRUE
If you want to use both transparency and PDF, help me solving the issue.

R barplot - keep same colours after sorting

I want to plot percentages for 3 variables (a,b,c) one after the others. So I have a matrix (%) for a set of activities for variable a, b and c.
dta = structure(c(0.0073, 0.1467, 0.0111, 0.0294, 0.0451, 0.0031, 0.1823,
0.0452, 0.2212, 0.1123, 7e-04, 0.1138, 0.0723, 0.1649, 0.0634),
.Dim = c(5L, 3L),
.Dimnames = list(c("c Work", "e Travel/Commute",
"f Cooking", "g Housework", "h Odd jobs"),
c("a", "b", "c")))
However, I would like to plot each variables sorted and but keeping the same colours for the set of activities.
So this is the colours of the activities.
library(RColorBrewer)
rc = c(brewer.pal(n = 5, name = 'Set2'))
kol = list()
kol$act <- c("c Work", "e Travel/Commute", "f Cooking", "g Housework", "h Odd jobs" )
kol$colours <- rc
kol = as.data.frame(kol)
act colours
1 c Work #66C2A5
2 e Travel/Commute #FC8D62
3 f Cooking #8DA0CB
4 g Housework #E78AC3
5 h Odd jobs #A6D854
So here are my barplots
par(mfrow = c(2,2))
barplot(dta[,1], horiz = T, las = 2, col = kol$colours)
barplot(dta[,2], horiz = T, las = 2, col = kol$colours)
barplot(dta[,3], horiz = T, las = 2, col = kol$colours)
So I want is to sort by keep the same colours for the activities
par(mfrow = c(2,2))
barplot(sort(dta[,1]), horiz = T, las = 2)
barplot(sort(dta[,2]), horiz = T, las = 2)
barplot(sort(dta[,3]), horiz = T, las = 2)
How can I make it "match" ?

You can use the function match to match the names of the "entities" and the desired colours, for example, for the first column:
kol$colours[match(names(sort(dta[,1])), kol$act)]
so, to obtain your barplot, just do:
par(mfrow = c(2,2), mar=c(5, 8, 4, 1)) # also modifying the margins to make the names fit in
for (i in 1:3) {
barplot(sort(dta[,i]), horiz = T, las = 2, col=kol$colours[match(names(sort(dta[, i])), kol$act)])
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex