Continuous gradient color & fixed scale heatmap ggplot2 - r

I'm switching from Mathematica to R but I'm finding some difficulties with visualizations.
I'm trying to do a heatmap as follows:
short
penetration scc pi0
1 0 0 0.002545268
2 5 0 -0.408621176
3 10 0 -0.929432006
4 15 0 -1.121309680
5 20 0 -1.587298317
6 25 0 -2.957853131
7 30 0 -5.123329738
8 0 50 1.199748327
9 5 50 0.788581883
10 10 50 0.267771053
11 15 50 0.075893379
12 20 50 -0.390095258
13 25 50 -1.760650073
14 30 50 -3.926126679
15 0 100 2.396951386
16 5 100 1.985784941
17 10 100 1.464974112
18 15 100 1.273096438
19 20 100 0.807107801
20 25 100 -0.563447014
21 30 100 -2.728923621
mycol <- c("navy", "blue", "cyan", "lightcyan", "yellow", "red", "red4")
ggplot(data = short, aes(x = penetration, y = scc)) +
geom_tile(aes(fill = pi0)) +
scale_fill_gradientn(colours = mycol)
And I get this:
But I need something like this:
That is, I would like that the color is continuous (degraded) over the surface of the plot instead of discrete for each square. I've seen in other SO questions that some people interpolate de data but I think there should be an easier way to do it inside the ggplot call (in Mathematica is done by default).
Besides, I would like to lock the color scale such that the 0 is always white (separating therefore between warm colors for positive values and cold for negative ones) and the color distribution is always the same across plots independently of the range of the data (since I will use the same plot structure for several datasets)

You can use geom_raster with interpolate=TRUE:
ggplot(short , aes(x = penetration, y = scc)) +
geom_raster(aes(fill = pi0), interpolate=TRUE) +
scale_fill_gradient2(low="navy", mid="white", high="red",
midpoint=0, limits=range(short$pi0)) +
theme_classic()
To get the same color mapping to values of pi0 across all of your plots, set the limits argument of scale_fill_gradient2 to be the same in each plot. For example, if you have three data frames called short, short2, and short3, you can do this:
# Get range of `pi0` across all data frames
pi0.rng = range(lapply(list(short, short2, short3), function(s) s$pi0))
Then set limits=pi0.rng in scale_fill_gradient2 in all of your plots.

I would adjust your scale_fill_gradient2:
scale_fill_gradient2('pi0', low = "blue", mid = "white", high = "red", midpoint = 0)
to make plot colours directly comparable add consistent limits to each plot:
scale_fill_gradient2('pi0', low = "blue", mid = "white", high = "red", midpoint = 0, limits=c('your lower limit','your upper limit'))

Related

R: Plotting Columns of Different Sizes on Same Graph

I am using the R programming language. I have two datasets:
The first dataset:
my_data_1 <- data.frame(read.table(header=TRUE,
row.names = 1,
text="
height weight age
1 13.14600 2882.7709 49
2 12.65080 3183.7991 48
3 13.84154 3138.2280 48
4 15.25780 2786.5297 49
5 15.01213 3006.9687 50
6 14.37567 3286.9644 50
7 12.99385 2881.7667 51
8 15.38893 2916.1883 50
9 14.80093 2791.7292 49
10 15.40423 2427.7706 50
11 17.55129 630.8886 20
12 18.34758 1076.6810 19
13 16.37789 1778.5550 20
14 14.98782 1401.4328 17
15 17.40527 361.3323 20
16 16.53979 869.5829 21
17 16.61986 1712.1686 19
18 17.78508 1961.6090 20
19 16.83144 1043.5052 19
20 18.66166 360.3037 20
"))
The second dataset:
prior_age = rnorm(100000, 50,5)
prior_height = rnorm(100000, 17,1)
prior_weight = rnorm(100000, 3000, 200)
my_data_2 = data.frame(prior_age, prior_height, prior_weight)
(Based on the answer from this post: ggplot combining two plots from different data.frames) I am trying to plot the "densities" of the height variables from both data sets on the same graph. However, both datasets differ in the number of rows.
I tried the following code in R:
library(ggplot2)
ggplot() +
geom_density(data=my_data1, aes(x=height), color='green') +
geom_density(data=my_data2, aes(x=prior_height), color='red')
But this produces the following error:
Error: Aesthetics must be either length 1 or the same as the data (20): x
Can someone please show me how to fix this problem?
Thanks!
Well, from code you provide, I didn't need to change shape of data. Just use guides(... = guide_legend(title = ...)) and scale_colour_discrete to manually change the legend's components.
ggplot() +
geom_density(data=my_data_1, aes(x=height), color='green') +
stat_density(data = my_data_1, aes(x=height, colour="red"), geom="line",position="identity") +
geom_density(data=my_data_2, aes(x=prior_height), color='red') +
stat_density(aes(x=prior_height, colour='green'), geom="line",position="identity") +
guides(colour = guide_legend(title = "new title"),) +
scale_colour_discrete(labels = c( "prior", "measurements"))

How do you plot the first few values of a PCA

I've run a PCA with a moderately-sized data set, but I only want to visualize a certain amount of points from that analysis because they are from repeat observations and I want to see how close the paired observations are to each other on the plot. I've set it up so that the first 18 individuals are the ones I want to plot, but I can't seem to only plot just the first 18 points without only doing an analysis of only the first 18 instead of the whole data set (43 individuals).
# My data file
TrialsMR<-read.csv("NER_Trials_Matrix_Retrials.csv", row.names = 1)
# I ran the PCA of all of my values (without the categorical variable in col 8)
R.pca <- PCA(TrialsMR[,-8], graph = FALSE)
# When I try to plot only the first 18 individuals with this method, I get an error
fviz_pca_ind(R.pca[1:18,],
labelsize = 4,
pointsize = 1,
col.ind = TrialsMR$Bands,
palette = c("red", "blue", "black", "cyan", "magenta", "yellow", "gray", "green3", "pink" ))
# This is the error
Error in R.pca[1:18, ] : incorrect number of dimensions
The 18 individuals are each paired up, so only using 9 colours shouldn't cause an error (I hope).
Could anyone help me plot just the first 18 points from a PCA of my whole data set?
My data frame looks similar to this in structure
TrialsMR
Trees Bushes Shrubs Bands
JOHN1 1 4 18 BLUE
JOHN2 2 6 25 BLUE
CARL1 1 3 12 GREEN
CARL2 2 4 15 GREEN
GREG1 1 1 15 RED
GREG2 3 11 26 RED
MIKE1 1 7 19 PINK
MIKE2 1 1 25 PINK
where each band corresponds to a specific individual that has been tested twice.
You are using the wrong argument to specify individuals. Use select.ind to choose the individuals required, for eg.:
data(iris) # test data
If you want to rename your rows according to a specific grouping criteria for readily identifiable in a plot. For eg. let setosa lies in series starting with 1, something like in 100-199, similarly versicolor in 200-299 and virginica in 300-399. Do it before the PCA.
new_series <- c(101:150, 201:250, 301:350) # there are 50 of each
rownames(iris) <- new_series
R.pca <- prcomp(iris[,1:4],scale. = T) # pca
library(factoextra)
fviz_pca_ind(X= R.pca, labelsize = 4, pointsize = 1,
select.ind= list(name = new_series[1:120]), # 120 out of 150 selected
col.ind = iris$Species ,
palette = c("blue", "red", "green" ))
Always refer to R documentation first before using a new function.
R documentation: fviz_pca {factoextra}
X
an object of class PCA [FactoMineR]; prcomp and princomp [stats]; dudi and pca [ade4]; expOutput/epPCA [ExPosition].
select.ind, select.var
a selection of individuals/variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib
For your particular dummy data, this should do:
R.pca <- prcomp(TrailsMR[,1:3], scale. = TRUE)
fviz_pca_ind(X= R.pca,
select.ind= list(name = row.names(TrialsMR)[1:4]), # 4 out of 8
pointsize = 1, labelsize = 4,
col.ind = TrialsMR$Bands,
palette = c("blue", "green" )) + ylim(-1,1)
Dummy Data:
TrialsMR <- read.table( text = "Trees Bushes Shrubs Bands
JOHN1 1 4 18 BLUE
JOHN2 2 6 25 BLUE
CARL1 1 3 12 GREEN
CARL2 2 4 15 GREEN
GREG1 1 1 15 RED
GREG2 3 11 26 RED
MIKE1 1 7 19 PINK
MIKE2 1 1 25 PINK", header = TRUE)

R Plot Bar graph transposed dataframe

I'm trying to plot the following dataframe as bar plot, where the values for the filteredprovince column are listed on a separate column (n)
Usually, the ggplot and all the other plots works on horizontal dataframe, and after several searches I am not able to find a way to plot this "transposed" version of dataframe.
The cluster should group each bar graph, and within each cluster I would plot each filteredprovince based on the value of the n column
Thanks you for the support
d <- read.table(text=
" cluster PROVINCIA n filteredprovince
1 1 08 765 08
2 1 28 665 28
3 1 41 440 41
4 1 11 437 11
5 1 46 276 46
6 1 18 229 18
7 1 35 181 other
8 1 29 170 other
9 1 33 165 other
10 1 38 153 other ", header=TRUE,stringsAsFactors = FALSE)
UPDATE
Thanks to the suggestion in comments I almost achived the format desired :
ggplot(tab_s, aes(x = cluster, y = n, fill = factor(filteredprovince))) + geom_col()
There is any way to put on Y labels not frequencies but the % ?
If I understand correctly, you're trying to use the geom_bar() geom which gives you problems because it wants to make sort of an histogram but you already have done this kind of summary.
(If you had provided code which you have tried so far I would not have to guess)
In that case you can use geom_col() instead.
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) + geom_col()
Alternatively, you can change the default stat of geom_bar() from "count" to "identity"
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) +
geom_bar(stat = "identity")
See this SO question for what a stat is
EDIT: Update in response to OP's update:
To display percentages, you will have to modify the data itself.
Just divide n by the sum of all n and multiply by 100.
d$percentage <- d$n / sum(d$n) * 100
ggplot(d, aes(x = cluster, y = percentage, fill = factor(filteredprovince))) + geom_col()
I'm not sure I perfectly understand, but if the problem is the orientation of your dataframe, you can transpose it with t(data) where data is your dataframe.

ggplot: how to choose the "proper" colors relating on a column

Suppose I have a simple dataframe to plot, in which I have to color the points related to the measure contained in a column. So, if I have:
dataframe
# X1 X2 pop
# 1 -0.11092652 -1.955598e-09 448053
# 2 -0.09999865 -2.310067e-10 418231
# 3 -0.05944755 -3.475013e-09 448473
# 4 0.51378848 1.631781e-09 119548
# 5 0.09438223 -9.606475e-10 323288
# 6 0.19349045 6.074025e-10 203153
# 7 0.06685609 3.210156e-10 208339
# 8 -0.10915456 -1.407190e-09 429178
# 9 -0.10348100 -1.401948e-09 1218038
# 10 -0.08607617 -7.356602e-10 383018
# 11 1.00343465 -2.423237e-08 209550
# 12 -0.05839148 1.503955e-09 287042
# 13 -0.09960163 2.167945e-10 973129
# 14 -0.05793417 2.510107e-09 187249
# 15 0.02191610 2.479708e-09 915225
# 16 0.48877872 1.338346e-08 462999
# 17 -0.10289556 1.472368e-09 1108776
# 18 -0.10316414 2.933469e-10 402422
# 19 -0.09545279 -2.926035e-10 274035
# 20 -0.06111044 3.464014e-09 230749
and I use ggplot in the following way:
ggplot(dataframe) +
ggtitle("Somehow useful spatialization")+ # Electricity / Gas
geom_point(aes(dataframe$X1, dataframe$X2), color = dataframe$pop, size=2 ) +
theme_classic(base_size = 16) +
guides(colour = guide_legend(override.aes = list(size=4)))+
xlab("X")+ylab("Y")
I obtain something like:
that is a possible representaion.
Neverthless, suppose that I want the points colored such to represent the column pop, i.e., having colors from (for example) light orange, passing for dark red and then black. How can I "scale" the column pop to obtain such graphics?
EDIT:
> dput(dataframe)
structure(list(X1 = c(-0.110926520419347, -0.0999986452719714,
-0.0594475526112884, 0.513788479303472, 0.0943822277852107, 0.193490454204271,
0.0668560854540437, -0.109154563987586, -0.103480996064617, -0.0860761723229372,
1.00343465471568, -0.0583914756527933, -0.0996016272609995, -0.0579341671474729,
0.0219161022704227, 0.488778719096658, -0.102895564162661, -0.103164140322136,
-0.0954527927249849, -0.0611104428640883), X2 = c(-1.9555978205951e-09,
-2.31006712207053e-10, -3.47501251356368e-09, 1.63178106438806e-09,
-9.60647459243156e-10, 6.07402512804044e-10, 3.21015629676789e-10,
-1.40718981687972e-09, -1.40194842954735e-09, -7.35660154466167e-10,
-2.423237202138e-08, 1.50395541775022e-09, 2.16794489937917e-10,
2.51010717100061e-09, 2.47970820013341e-09, 1.33834570208731e-08,
1.47236816671351e-09, 2.93346922578509e-10, -2.92603459149485e-10,
3.46401369936372e-09), pop = c(448053L, 418231L, 448473L, 119548L,
323288L, 203153L, 208339L, 429178L, 1218038L, 383018L, 209550L,
287042L, 973129L, 187249L, 915225L, 462999L, 1108776L, 402422L,
274035L, 230749L)), .Names = c("X1", "X2", "pop"), row.names = c(NA,
20L), class = "data.frame")
With ggplot you can add your aesthetics (aes) in your inital ggplot call. Since you're already telling ggplot where the data is (in dataframe), you can refer to the variables directly by their name (without dataframe$). Now for the color to be a scale it needs to be called as a aesthetic, inside the aes() call, and not as a static value. Once it is added as an aesthetic, we can customize how it reacts by adding a scale. Taking this all into account gives us the following code:
ggplot(dataframe, aes(x = X1, y = X2, color = pop)) +
ggtitle("Somehow useful spatialization")+ # Electricity / Gas
geom_point(size=2) +
theme_classic(base_size = 16) +
guides(colour = guide_legend(override.aes = list(size=4))) +
xlab("X")+ylab("Y") +
scale_color_gradient2(low = "green", mid = "red", high = "black", midpoint = mean(dataframe$pop))
This code gives the following graph. The colors could be further adjusted by playing around with the scale_color_gradient2 part. (Why green as low gives a better orange than actually choosing orange as the low color is beyond me, I just ended up there by coincidence)

Display single cases in mosaic plot in R

I have the following problem:
I need to create a mosaic plot but want to display the number of cases for each mosaic, as total numbers per country differ. The plot is based on the following data:
1 - not agree 2 3 4 5 - fully agree
DE 6 2 0 0 1
ES 5 3 1 1 0
FR 6 3 1 2 0
SE 4 3 0 0 0
I used the following code:
> mosaicplot(Q1, col=c("red", "orange", "yellow", "green", "green4"),
+ las = 1,
+ main = "There is no need to do anything about it.",
+ ylab = "",
+ xlab = "Country")
Giving me this graph:
Now I would like to divide the first red bar into six bars of the same colour, as there were 6 votes in Germany a.s.o. Any ideas on how to accomplish that?
I applied the procedure explained here:
https://learnr.wordpress.com/2009/03/29/ggplot2_marimekko_mosaic_chart/
Only I had to use two data frames, one for the percentages and one for the absolute values.
Both data frames went through the same calculations. Whilst dfm1 created the chart, dfm21 was used for the labels:
p2 <- p1 + geom_text(aes(x = xtext, y = ytext,
label = ifelse(dfm21$value == "0", paste(" "), paste(dfm21$value))), size = 3.5)

Resources