In R, add a trace in a density plotly - r

Using the plotly package in R, I would like to do a desity plot. Actually, I need to add one more density line in my graph. I have a data with the income information of some public company by geographical region. Something like this
head(data)
id income region
1 4556 1
2 6545 1
3 65465 2
4 54555 1
5 71442 2
6 5645 6
In a first moment, I analysed 5 and 6 regions' income with the following density plot
reg56<- data[data$region %in% c(5,6) , ]
dens <- with(reg56, tapply(income, INDEX = region, density))
df <- data.frame(
x = unlist(lapply(dens, "[[", "x")),
y = unlist(lapply(dens, "[[", "y")),
cut = rep(names(dens), each = length(dens[[1]]$x))
)
# plot the density
p<- plot_ly(df, x = x, y = y, color = cut)
But, I want more than this. I would like to add the total income, i.e. the income of all regions. I tried something this
data$aux<- 1
dens2 <- with(data, tapply(income, INDEX = 1, density))
df2 <- data.frame(
x = unlist(lapply(dens2, "[[", "x")),
y = unlist(lapply(dens2, "[[", "y")),
cut = rep(names(dens2), each = length(dens2[[1]]$x)) )
p<- plot_ly(df, x = x, y = y, color = cut)
p<- add_trace(p, df2, x = x, y = y, color = cut)
p
Error in FUN(X[[i]], ...) :
'options' must be a fully named list, or have no names (NULL)
Some solution for this?

Because you are not naming the parameters that you pass to add_trace, it interprets them as corresponding to the default parameter order. The usage of add_trace is
add_trace(p = last_plot(), ..., group, color, colors, symbol, symbols,
size, data = NULL, evaluate = FALSE)
So, in your function call where you provide the data.frame df2 as the 2nd parameter, this is assumed to be correspond to the ... parameter, which must be a named list. You need to specify data = df2, so that add_trace understands what this parameter is.
Lets generate some dummy data to demonstrate on
library(plotly)
set.seed(999)
data <- data.frame(id=1:500, income = round(rnorm(500,50000,15000)), region=sample(6,500,replace=T) )
Now, (after calculating df and df2 as in your example):
p <- plot_ly(df, x = x, y = y, color = cut) %>%
add_trace(data=df2, x = x, y = y, color = cut)
p

Related

Add a labelling function to just first or last ggplot label

I often find myself working with data with long-tail distributions, so that a huge amount of range in values happens in the top 1-2% of the data. When I plot the data, the upper outliers cause variation in the rest of the data to wash out, but I want to show those difference.
I know there are other ways of handling this, but I found that capping the values towards the end of the distribution and then applying a continuous color palette (i.e., in ggplot) is one way that works for me to represent the data. However, I want to ensure the legend stays accurate, by adding a >= sign to the last legend label
The picture below shows the of legend I'd like to achieve programmatically, with the >= sign drawn in messily in red.
I also know I can manually set breaks and labels, but I'd really like to just do something like, if(it's the last label) ~paste0(">=",label) else label) (to show with pseudo code)
Reproducible example:
(I want to alter the plot legend to prefix just the last label)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- tibble(x = x
,y = y
,z = z)
d %>%
ggplot(aes(x = x
,y = y
,fill = z
,color = z)) +
geom_point() +
scale_color_viridis_c()
One option would be to pass a function to the labels argument which replaces the last element or label with your desired label like so:
library(ggplot2)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- data.frame(
x = x,
y = y,
z = z
)
ggplot(d, aes(
x = x,
y = y,
fill = z,
color = z
)) +
geom_point() +
scale_fill_continuous(labels = function(x) {
x[length(x)] <- paste0(">=", x[length(x)])
x
}, aesthetics = c("color", "fill"))

How to specify groups with colors in qqplot()?

I have created a qqplot (with quantiles of beta distribution) from a dataset including two groups. To visualize, which points belong to which group, I would like to color them. I have tried the following:
res <- beta.mle(data$values) #estimate parameters of beta distribution
qqplot(qbeta(ppoints(500),res$param[1], res$param[2]),data$values,
col = data$group,
ylab = "Quantiles of data",
xlab = "Quantiles of Beta Distribution")
the result is shown here:
I have seen solutions specifying a "col" vector for qqnorm, hover this seems to not work with qqplot, as simply half the points is colored in either color, regardless of group. Is there a way to fix this?
A simulated some data just to shown how to add color in ggplot
Libraries
library(tidyverse)
# install.packages("Rfast")
Data
#Simulating data from beta distribution
x <- rbeta(n = 1000,shape1 = .5,shape2 = .5)
#Estimating parameters
res <- Rfast::beta.mle(x)
data <-
tibble(
simulated_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2])
) %>%
#Creating a group variable using quartiles
mutate(group = cut(x = simulated_data,
quantile(simulated_data,seq(0,1,.25)),
include.lowest = T))
Code
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = simulated_data, col = group))+
geom_point()
Output
For those who are wondering, how to work with pre-defined groups, this is the code that worked for me:
library(tidyverse)
library(Rfast)
res <- beta.mle(x)
# make sure groups are not numerrical
# (else color skale might turn out continuous)
g <- plyr::mapvalues(g, c("1", "2"), c("Group1", "Group2"))
data <-
tibble(
my_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2]),
group = g[order(x)]
)
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = my_data, col = group))+
geom_point()
result

How to make scatter plot points into numbers?

I am creating a scatter plot using ggplot/geom_point. Here is my code for building the function in ggplot.
AddPoints <- function(x) {
list(geom_point(data = dat , mapping = aes(x = x, y = y) , shape = 1 , size = 1.5 ,
color = "blue"))
}
I am wondering if it would be possible to replace the standard points on the plot with numbers. That is, instead of seeing a dot on the plot, you would see a number on the plot to represent each observation. I would like that number to correspond to a column for that given observation (column name 'RP'). Thanks in advance.
Sample data.
Data <- data.frame(
X = sample(1:10),
Y = sample(3:12),
RP = sample(c(4,8,9,12,3,1,1,2,7,7)))
Use geom_text() and map the rp variable to the label argument.
ggplot(Data, aes(x = X, y = Y, label = RP)) +
geom_text()

R::ggplot2 Loop over vector of Y to make multiple plots at one page

I would like to create multiple plots at one page, where I am iteration over different Y variables over the same X. = i.e. I want one plot per one Y. Usually, I would copy and paste my ggplot with just changed Y value, store individual plots as p.y1, p.y2 and plot all of them using grid.arrange(p.y1, p.y2) like here:
This approach is not very fun when I have 10 different Y variables, and I want to plot all of them. I wonder how to make the process more efficient?
I thought that I can simply create a vector of Y (colnames of df) and then loop through them to create multiple plots. But, it seems that my output plots are not correct to pass to grid.arrange(), and I can not plot them neither.
How can I loop through multiple Ys, and then arrange all plots on one page? As I do not have multiple factors, I probably cannot use facet_grid nor facet_wrap.
Here is my dummy example for two Ys: y1 and y2
set.seed(5)
df <- data.frame(x = rep(c(1:5), 2),
y1 = rnorm(10)*3+2,
y2 = rnorm(10),
group = rep(c("a", "b"), each = 5))
# Example of simple line ggplot
ggplot(df, aes(x = x,
y = y2, # here I can set either y1, y2...
group = group,
color = group)) +
geom_line()
Now, iterate over the vectors of Ys and store output plots in a list:
# create vector of my Ys
my.s<-c("y1", "y2")
# Loop over list of y to create different plots
outPlots<- list()
for (i in my.s) {
print(i)
my.plot <-
ggplot(df, aes_string(x = "x",
y = i,
group = "group",
color = "group")) +
geom_line()
# print(plot)
outPlots <- append(outPlots, my.plot)
}
Intendent plotting of multiple graphs on one page: does not work because of Error in gList(list(data.x1 = 1L, data.x2 = 2L, data.x3 = 3L, data.x4 = 4L, : only 'grobs' allowed in "gList"
grid.arrange(outPlots)
I propose another solution based on this post.
Plotfunction <- function(y){my.plot <-
ggplot(df, aes_string(x = "x",
y = y,
group = "group",
color = "group")) +
geom_line()}
n <- ceiling(sqrt(length(my.s)))
do.call("grid.arrange",
c(lapply(my.s, Plotfunction), ncol = n, nrow = n))
You could try this. I hope this helps.
library(reshape2)
Melted <- melt(df,id.vars = c('x','group'))
#Plot
ggplot(Melted,aes(x=x,y=value,group=group,color=group))+
geom_line()+
facet_wrap(~variable,ncol = 1,scales = 'free')+theme_bw()

creating histogram bins in r

I have this code.
a = c("a", 1)
b = c("b",2)
c = c('c',3)
d = c('d',4)
e = c('e',5)
z = data.frame(a,b,c,d,e)
hist = hist(as.numeric(z[2,]))
I am trying to have a histogram such that the bins would be a,b,c,d,e
and the freq values would be 1,2,3,4,5.
However, it gives me an empty screen(no bins at all for histogram model)
You are plotting the factor levels of each column for row 2, which is in this case always 1.
When creating the dataframe you add stringsAsFactors=FALSE to avoid converting the numbers to factors. This should work:
z = data.frame(a,b,c,d,e,stringsAsFactors=FALSE)
hist(as.numeric(z[2,]))
Perhaps this would work for you: it creates a data frame with the x elements being the letters a through 'e', and the y elements being the numbers 1 through 5. It then renders a histogram and tells ggplot not to perform any binning.
library(ggplot2)
tmp <- data.frame(x = letters[1:5], y = 1:5)
ggplot(tmp, aes(x = x, y = y)) + geom_histogram(stat = "identity")

Resources