R: How to visualize large and clumped scatter plot

R: How to visualize large and clumped scatter plot - r

status = sample(c(0, 1), 500, replace = TRUE)
value = rnorm(500)
plot(value)
smoothScatter(value)
I'm trying to make a scatterplot of value, but if I were to just plot it, the data is all clumped together and it's not very presentable. I've tried smoothScatter(), which makes the plot look a bit nicer, but I am wondering if there's a way to color code the values based on the corresponding status?
I am trying to see if there's a relationship between status and value. What's another way to present the data nicely? I've tried boxplot, but I'm wondering how I can make the smoothScatter() plot better or if there are other ways to visualize it.

I'm assuming you meant to write plot(status, value) in your example? Regardless, there's not going to be much difference using this data, but you should get the idea of things to maybe look at with the following examples...
Have you looked into jitter?
Some basics:
plot(jitter(status), value)
or perhaps plot(jitter(status, 0.5), value)
Fancier with package ggplot2 you could do:
library(ggplot2)
df <- data.frame(value, status)
ggplot(data=df, aes(jitter(status, 0.10), value)) +
geom_point(alpha = 0.5)
or this...
ggplot(data=df, aes(factor(status), value)) +
geom_violin()
or...
ggplot(data=df, aes(x=status, y=value)) +
geom_density2d() +
scale_x_continuous(limits=c(-1,2))
or...
ggplot(data=df, aes(x=status, y=value)) +
geom_density2d() +
stat_density2d(geom="tile", aes(fill = ..density..), contour=FALSE) +
scale_x_continuous(limits=c(-1,2))
or even this..
ggplot(data=df, aes(fill=factor(status), value)) +
geom_density(alpha=0.2)

Related

Problem when trying to plot two histograms using fill aesthetic

I've been trying to plot two histograms by using the fill aesthetic and a specific column with two levels. However, instead of displaying both desired histograms, my code displays one histogram with the whole data and another only for the second classification. I don't know if there is a problem in my syntax neither if this is some kind of tricky issue.
library(tidyverse)
db1 <- data.frame(type=rep("A",100),val=rnorm(n=100,mean=50,sd=10))
db2 <- data.frame(type=rep("B",150),val=rnorm(n=150,mean=50,sd=10))
dbf <- bind_rows(db1,db2)
P1 <- ggplot(db1, aes(x=val)) + geom_histogram()
P2 <- ggplot(db2, aes(x=val)) + geom_histogram()
PF <- ggplot(dbf, aes(x=val)) + geom_histogram()
I want to get this, P1 and P2
ggplot(db1, aes(x=val)) + geom_histogram(fill="red", alpha=0.5) + geom_histogram(data=db2, aes(x=val),fill="green", alpha=0.5)
What I want
But the code I think should work, P1 and P2 with the fill aesthetic for column val
ggplot(dbf, aes(x=val)) + geom_histogram(aes(fill=type), alpha=0.5)
My code
Produces the combination of PF and P2
ggplot(dbf, aes(x=val)) + geom_histogram(fill="red", alpha=0.5) + geom_histogram(data=db2, aes(x=val),fill="green", alpha=0.5)
What I get
Any help or idea will be highly appreciated!

All you need is to pass position = "identity" to your geom_histogram function.
library(tidyverse)
library(ggplot2)
db1 <- data.frame(type=rep("A",100),val=rnorm(n=100,mean=50,sd=10))
db2 <- data.frame(type=rep("B",150),val=rnorm(n=150,mean=50,sd=10))
dbf <- bind_rows(db1,db2)
ggplot(dbf, aes(x=val, fill = type)) + geom_histogram(alpha=0.5, position = "identity")

Is your goal to show the overlap via the color combination? I'm not sure how to force geom_histogram to show the overlap, but geom_density does do what you want. You can play with the bandwidth (bw) to show more or less detail.
dbf %>% ggplot() +
aes(x = val, fill = type) +
geom_density(alpha = .5, bw = .5) +
scale_fill_manual(values = c("red","green"))

Arranging data for two facet R line plot

I am trying to make a two facet line plot as this example. My problem is to arrange data to show desired variable on x-axis. Here is small data set I wanna use.
Study,Cat,Dim1,Dim2,Dim3,Dim4
Study1,PK,-3.00,0.99,-0.86,0.46
Study1,US,-4.67,0.76,1.01,0.45
Study2,FL,-2.856,4.15,1.554,0.765
Study2,FL,-8.668,5.907,3.795,4.754
I tried to use the following code to draw line graph from this data frame.
plot1 <- ggplot(data = dims, aes(x = Cat, y = Dim1, group = Study)) +
geom_line() +
geom_point() +
facet_wrap(~Study)
As is clear, I can only use one value column to draw lines. I want to put Dim1, Dim2, Dim3, Dim4 on x axis which I cannot do in this arrangement of data. [tried c(Dim1, Dim2, Dim3, Dim4) with no luck]
Probably the solution is to transpose the table but then I cannot reproduce categorization for facet (Study in above table) and colour (Cat in above table. Any ideas how to solve this issue?

You can try this:
library(tidyr)
library(dplyr)
gather(dims, variable, value, -Study, -Cat) %>%
ggplot(aes(x=variable, y=value, group=Cat, col=Cat)) +
geom_point() + geom_line() + facet_wrap(~Study)

The solution was quite easy. Just had to think a bit and the re-arranged data looks like this.
Study,Cat,Dim,Value
Study1,PK,Dim1,-3
Study1,PK,Dim2,0.99
Study1,PK,Dim3,-0.86
Study1,PK,Dim4,0.46
Study1,US,Dim1,-4.67
Study1,US,Dim2,0.76
Study1,US,Dim3,1.01
Study1,US,Dim4,0.45
Study2,FL,Dim1,-2.856
Study2,FL,Dim2,4.15
Study2,FL,Dim3,1.554
Study2,FL,Dim4,0.765
Study2,FL,Dim1,-8.668
Study2,FL,Dim2,5.907
Study2,FL,Dim3,3.795
Study2,FL,Dim4,4.754
After that R produced desire result with this code.
plot1 <- ggplot(data=dims, aes(x=Dim, y=Value, colour=Cat, group=Cat)) + geom_line()+ geom_point() + facet_wrap(~Study)

How to format the scatterplots of data series in R

I have been struggling in creating a decent looking scatterplot in R. I wouldn't think it was so difficult.
After some research, it seemed to me that ggplot would have been a choice allowing plenty of formatting. However, I'm struggling in understanding how it works.
I'd like to create a scatterplot of two data series, displaying the points with two different colours, and perhaps different shapes, and a legend with series names.
Here is my attempt, based on this:
year1 <- mpg[which(mpg$year==1999),]
year2 <- mpg[which(mpg$year==2008),]
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy,color="yellow")) +
geom_point(data = year2, aes(x=cty,y=hwy,color="green")) +
xlab('cty') +
ylab('hwy')
Now, this looks almost OK, but with non-matching colors (unless I suddenly became color-blind). Why is that?
Also, how can I add series names and change symbol shapes?

Don't build 2 different dataframes:
df <- mpg[which(mpg$year%in%c(1999,2008)),]
df$year<-as.factor(df$year)
ggplot() +
geom_point(data = df, aes(x=cty,y=hwy,color=year,shape=year)) +
xlab('cty') +
ylab('hwy')+
scale_color_manual(values=c("green","yellow"))+
scale_shape_manual(values=c(2,8))+
guides(colour = guide_legend("Year"),
shape = guide_legend("Year"))

This will work with the way you currently have it set-up:
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy), col = "yellow", shape=1) +
geom_point(data = year2, aes(x=cty,y=hwy), col="green", shape=2) +
xlab('cty') +
ylab('hwy')

You want:
library(ggplot2)
ggplot(mpg, aes(cty, hwy, color=as.factor(year)))+geom_point()

How to improve the aspect of ggplot histograms with log scales and discrete values

I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.
Please consider the following MWE
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()
which produces
and then
ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))
which probably is even worse
since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).
I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.
Is it possible to improve something?
EDIT:
This what happen when I applied Jaap solution to my real data
Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?

The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth=10) +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0.015,0)) +
theme_bw()
gives:
In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.
The following code:
library(ggplot2)
library(scales)
ggplot(data, aes(x=dist)) +
stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
theme_bw()
will give this result:

I am wondering, what if, y-axis is scaled instead of x-axis. It will results into few warnings wherever values are 0, but may serve your purpose.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram() + scale_y_log10()
Also you may want to display frequencies as data labels, since people might ignore the y-scale and it takes some time to realize that y scale is logarithmic.
ggplot(data, aes(x=dist)) + geom_histogram(fill = 'skyblue', color = 'grey30') + scale_y_log10() +
stat_bin(geom="text", size=3.5, aes(label=..count.., y=0.8*(..count..)))

A solution could be to convert your data to a factor:
library(ggplot2)
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
ggplot(data, aes(x=factor(dist))) +
geom_histogram(stat = "count") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Resulting in:

I had the same issue and, inspired by #Jaap's answer, I fiddled with the histogram binwidth using the x-axis in log scale.
If you use binwidth = 0.201, the bars will be juxtaposed as expected. However, this means you can only have up to five bars between two x coordinates.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth = 0.201, color = 'red') +
scale_x_log10()
Result:

How do I create a categorical scatterplot in R like boxplots?

Does anyone know how to create a scatterplot in R to create plots like these in PRISM's graphpad:
I tried using boxplots but they don't display the data the way I want it. These column scatterplots that graphpad can generate show the data better for me.
Any suggestions would be appreciated.

As #smillig mentioned, you can achieve this using ggplot2. The code below reproduces the plot that you are after pretty well - warning it is quite tricky. First load the ggplot2 package and generate some data:
library(ggplot2)
dd = data.frame(values=runif(21), type = c("Control", "Treated", "Treated + A"))
Next change the default theme:
theme_set(theme_bw())
Now we build the plot.
Construct a base object - nothing is plotted:
g = ggplot(dd, aes(type, values))
Add on the points: adjust the default jitter and change glyph according to type:
g = g + geom_jitter(aes(pch=type), position=position_jitter(width=0.1))
Add on the "box": calculate where the box ends. In this case, I've chosen the average value. If you don't want the box, just omit this step.
g = g + stat_summary(fun.y = function(i) mean(i),
geom="bar", fill="white", colour="black")
Add on some error bars: calculate the upper/lower bounds and adjust the bar width:
g = g + stat_summary(
fun.ymax=function(i) mean(i) + qt(0.975, length(i))*sd(i)/length(i),
fun.ymin=function(i) mean(i) - qt(0.975, length(i)) *sd(i)/length(i),
geom="errorbar", width=0.2)
Display the plot
g
In my R code above I used stat_summary to calculate the values needed on the fly. You could also create separate data frames and use geom_errorbar and geom_bar.
To use base R, have a look at my answer to this question.

If you don't mind using the ggplot2 package, there's an easy way to make similar graphics with geom_boxplot and geom_jitter. Using the mtcars example data:
library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + geom_jitter() + theme_bw()
which produces the following graphic:
The documentation can be seen here: http://had.co.nz/ggplot2/geom_boxplot.html

I recently faced the same problem and found my own solution, using ggplot2.
As an example, I created a subset of the chickwts dataset.
library(ggplot2)
library(dplyr)
data(chickwts)
Dataset <- chickwts %>%
filter(feed == "sunflower" | feed == "soybean")
Since in geom_dotplot() is not possible to change the dots to symbols, I used the geom_jitter() as follow:
Dataset %>%
ggplot(aes(feed, weight, fill = feed)) +
geom_jitter(aes(shape = feed, col = feed), size = 2.5, width = 0.1)+
stat_summary(fun = mean, geom = "crossbar", width = 0.7,
col = c("#9E0142","#3288BD")) +
scale_fill_manual(values = c("#9E0142","#3288BD")) +
scale_colour_manual(values = c("#9E0142","#3288BD")) +
theme_bw()
This is the final plot:
For more details, you can have a look at this post:
http://withheadintheclouds1.blogspot.com/2021/04/building-dot-plot-in-r-similar-to-those.html?m=1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: How to visualize large and clumped scatter plot - r

Related

Problem when trying to plot two histograms using fill aesthetic

Arranging data for two facet R line plot

How to format the scatterplots of data series in R

How to improve the aspect of ggplot histograms with log scales and discrete values

How do I create a categorical scatterplot in R like boxplots?

Categories

Resources