Vertical gradient color with geom_area [duplicate] - r

This question already has answers here:
How to make gradient color filled timeseries plot in R
(4 answers)
Closed 5 years ago.
I have hard time finding a solution for creating gradient color.
This is how it should look like(dont mind the blue bars)
Something similar to How to make gradient color filled timeseries plot in R, but a bit to advanced for me to reuse this example. I dont have any negative values and max is 80.I have tried the answer offered by nograpes, my PC was frozen for some 6-7 min and then I got message:
Error in rowSums(na) :
'Calloc' could not allocate memory (172440001 of 16 bytes)
This is only a subset of data with 841 rows (some containing NAs), and solution in previous answer could hardly work for me.
df <- structure(list(date = structure(c(1497178800, 1497182400, 1497186000,
1497189600, 1497193200, 1497196800, 1497200400, 1497204000, 1497207600,
1497211200, 1497214800, 1497218400, 1497222000, 1497225600, 1497229200,
1497232800, 1497236400, 1497240000, 1497243600, 1497247200, 1497250800,
1497254400, 1497258000, 1497261600, 1497265200, 1497268800, 1497272400,
1497276000, 1497279600, 1497283200, 1497286800, 1497290400, 1497294000,
1497297600, 1497301200, 1497304800, 1497308400, 1497312000, 1497315600,
1497319200, 1497322800, 1497326400, 1497330000, 1497333600, 1497337200,
1497340800, 1497344400, 1497348000, 1497351600, 1497355200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), dk_infpressure = c(22, 21.6, 21.2,
20.9, 20.5, 20.1, 19.8, 19.4, 19, 18.6, 18.2, 17.9, 17.5, 17.1,
16.8, 16.4, 16, 15.6, 15.2, 14.9, 14.5, 14.1, 13.8, 13.4, 13,
12.5, 11.9, 11.4, 10.8, 10.3, 9.8, 9.2, 8.7, 8.1, 7.6, 7, 6.5,
6, 5.4, 4.9, 4.3, 3.8, 3.2, 2.7, 2.2, 1.6, 1.1, 0.5, 0, 0)), .Names = c("date",
"dk_infpressure"), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
Code to get basic plot:
ggplot()+
geom_area(data=df, aes(x = date, y= dk_infpressure ) )+
scale_y_continuous(limits = c(0, 80))

Because geom_area can't take a gradient fill, it's a somewhat hard problem.
Here's a decidedly hacky but possibly sufficient option that makes a raster (but using geom_tile since x and y sizes differ) and covering the ragged edges with cropping and ggforce::geom_link (a version of geom_segment that can plot a gradient):
library(tidyverse)
df %>%
mutate(dk_infpressure = map(dk_infpressure, ~seq(0, .x, .05))) %>% # make grid of points
unnest() %>%
ggplot(aes(date, dk_infpressure, fill = dk_infpressure)) +
geom_tile(width = 3600, height = 0.05) +
# hide square tops
ggforce::geom_link(aes(color = dk_infpressure, xend = lag(date), yend = lag(dk_infpressure)),
data = df, size = 2.5, show.legend = FALSE) +
scale_x_datetime(expand = c(0, 0)) + # hide overplotting of line
scale_y_continuous(expand = c(0, 0))

Related

How to transform average hourly data per month into time series?

I have a data set with peculiar granularity: each month has 24 average hourly data and time span is jan-2012 until dec-2019 (see image). I am interested in the data of column "Ws".
I try unsuccessfully transforming it in times series with this R code:
{
suppressPackageStartupMessages(library(forecast))
suppressPackageStartupMessages(library(dlm))
suppressPackageStartupMessages(library(fpp2))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(zoo))
suppressPackageStartupMessages(library(gridExtra))
}
data <- read.csv("c:/bib/test-1.csv", dec = ".", header = TRUE)
inds <- seq(as.Date("2012-01-01"), as.Date("2019-12-31"), by = "hour")
## Create a time series object
myData <- ts(data,
start = c(2012-1, as.numeric(format(inds[1], "%j"))),
frequency = 2304)
Output of 'dput(head(data, 20))' code:
structure(list(V1 = c(8.4, 8.2, 8.2, 8, 7.8, 7.5, 7.3, 7.2, 8.2,
8.8, 9.2, 9.5, 9.7, 9.9, 10, 10.1, 9.9, 9.5, 8.9, 8.6)), row.names = c(NA, 20L), class = "data.frame")
Does someone could help me with this?

How to color the points and background in within area using ggplot2

I am really new to use R. So I am having a problem to visualize data using ggplot2 package in R.
I would like to create a linear regression graph in which the points within the specific area have the same color and the points outside that area have the same color. Also, I would like to change the background within the specific area to focus on that area as well.
The graph I would like to make will be similar like the below graph.
Target graph
But until now, I only could create the below simple graph.
My current graph
My code to generate the current graph is below.
g <- ggplot(df, aes(x = real, y = predicted))
g + geom_point() +
geom_abline(intercept = 0, slope = 1, color='black') +
theme_classic() +
geom_abline(intercept = 0+s_est, slope = 1, color = 'darkgrey')+
geom_abline(intercept = 0-s_est, slope = 1, color = 'darkgrey') +
ggtitle("Test Set")
The first 100 lines of data are as follows.
structure(list(real = c(3.33, 5.92, 5.3, 6, 6.96, 7.03, 6.6,
7.92, 8.3, 10.52, 6.34, 4.38, 4.59, 9.8, 10.3, 10, 8.25, 6, 7.44,
6.66, 9.09, 9.22, 9.7, 4.82, 6.1, 4.92, 4.29, 3.22, 6.01, 9.05,
9.04, 4.85, 8.22, 6.7, 6.7, 4.62, 4.82, 8.52, 5.24, 8.15, 7,
10, 7, 5.18, 5.93, 8.4, 7.7, 7.24, 9.54, 6.06, 8, 4.35, 4.2,
4.51, 2.48, 9.1, 5.34, 4.19, 8.05, 8.55, 6.55, 11.4, 10.96, 9.64,
4.49, 6, 6.9, 6.17, 9, 6.92, 3.77, 4.22, 8.92, 7.55, 7.6, 6.82,
5.32, 8.39, 5.09, 10.96, 6.68, 9.4, 5.04, 5.59, 9.21, 9.7, 6.98,
6.17, 8.89, 9.74, 6.08, 6.7, 4.41, 3.57, 7.12, 6.09, 6.11, 6.82,
7.3, 6.77), predicted = c(3.3049898147583, 7.57794666290283,
5.81329345703125, 3.71067190170288, 6.35026741027832, 6.59200620651245,
6.32752990722656, 7.13449430465698, 7.78791570663452, 8.61589622497559,
7.72269868850708, 5.33322525024414, 7.26069974899292, 9.23727989196777,
8.27904891967773, 7.55226612091064, 5.94742393493652, 4.07633399963379,
7.67468595504761, 5.64575576782227, 7.85368394851685, 7.73117685317993,
10.2843132019043, 4.96891403198242, 6.29262351989746, 6.03091764450073,
6.71697568893433, 3.50744342803955, 6.46608829498291, 8.20327758789062,
7.52885150909424, 4.58155632019043, 6.1530909538269, 6.49482202529907,
5.28225088119507, 4.44094896316528, 5.503089427948, 7.79408073425293,
5.6220269203186, 7.12402009963989, 6.30716276168823, 7.15596580505371,
7.26271867752075, 5.41359615325928, 5.68268489837646, 6.81329536437988,
7.10254955291748, 8.64251136779785, 8.65674114227295, 5.94885206222534,
9.24687099456787, 5.93400239944458, 5.66134691238403, 6.14793062210083,
2.94440221786499, 9.21078777313232, 5.96825170516968, 4.69157028198242,
7.91313886642456, 6.90836668014526, 6.72082805633545, 9.95611953735352,
9.15732383728027, 6.68948268890381, 3.60811305046082, 7.42742109298706,
6.05647945404053, 6.2350025177002, 8.12950134277344, 7.56590843200684,
5.3975772857666, 3.48417925834656, 7.63604927062988, 8.04048824310303,
7.78053188323975, 7.34217929840088, 7.93345308303833, 8.03125,
5.62498426437378, 4.80621385574341, 5.19631958007812, 7.51661252975464,
5.43919944763184, 5.5195426940918, 6.10152912139893, 8.25357818603516,
5.73111486434937, 7.27180528640747, 8.37008285522461, 7.78157567977905,
7.52273559570312, 4.32158374786377, 6.20211696624756, 4.30103015899658,
7.89811611175537, 6.88143062591553, 6.74230575561523, 6.75651741027832,
6.64747190475464, 6.72232007980347)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -100L))
s_est = 4.536
Thank you so much for any help.
In your target image it looks like the points are colored by a measure of the absolute error, where points which fall inside the confidence (?) band are colored blue and points which fall outside are colored red. To achieve the same result you could map the absolute error (or whatever measure you prefer) on the color aesthetic. To get the coloring right I use a scale_color_gradient2 where I have set the midpoint to s_est. However, I set an upper bound for the color gradient, i.e. values with an abs error greater or equal to 2 * s_est are assigned the same "red" color. But you you could adjust that if you like.
To get a shading for the area between your ablines I first get rid of your geom_ablines and use a geom_ribbon instead. One drawback is that the ribbon will not extend to the axis but is restricted to the data range. To "fix" that I use a small hack, i.e. I use a separate dataset where I extend the range of real values slightly by 5% of the data range and additionally get rid of the default expansion of the x-scale.
Finally I added a coord_equal to equalize the range or the limits of both scales.
Note: I used a smaller value for s_est as for the example data no value would have fallen outside of the confidence band.
library(ggplot2)
s_est <- 4.536 / 4
# Absolute Error
df$resid <- abs(df$predicted - df$real)
# Range of "real" values used for the ribbon. Manually expand range by 5%
range_ribbon <- diff(range(df$real))
range_ribbon <- range(df$real) + .05 * range_ribbon * c(-1, 1)
ggplot(df, aes(x = real, y = predicted)) +
geom_point(aes(color = resid)) +
geom_abline(intercept = 0, slope = 1, color = "black") +
geom_ribbon(
data = data.frame(real = range_ribbon, predicted = 0),
aes(ymin = real - s_est, ymax = real + s_est),
color = "darkgrey", fill = "darkgrey", alpha = .2
) +
# Remove default expansion of the x scale
scale_x_continuous(expand = c(0, 0)) +
# Color gradient. Limit range to 2 * s_est
scale_color_gradient2(
midpoint = s_est, low = "blue", high = "red",
limits = c(0, 2 * s_est),
oob = scales::oob_squish
) +
labs(title = "Test Set") +
coord_equal()

Is there a way I can move my first column in this excel dataset to be the column that specifies the numbers 1 to 8 [duplicate]

This question already has answers here:
Convert the values in a column into row names in an existing data frame
(5 answers)
Closed 1 year ago.
I´d like to change the first data column named "Especies" and the other species names below it; (i.e "Strix_varia, Strix_rufipes...) and make them become the numbers 1 to 8 enclosed in red from link.
I´m working with Moran´s I and having the column "Especies" as data throws me incorrect results.
Any help will be great!
Thanks!
Heres my dput():
structure(list(Especies = c("Strix_varia", "Strix_rufipes", "Strix_occidentalis",
"Strix_aluco", "Strix_uralensis", "Strix_woodfordii", "Strix_leptogrammica",
"Strix_nebulosa"), Notas.segundo = c(2.9, 4.3, 2.9, 1.3, 1, 3,
3.1, 1.1), Notas.llamado = c(6.3, 13.5, 12.2, 5, 3, 6, 4, 9.3
), Duracion.llamado = c(2.9, 2.9, 5.3, 4, 4.5, 1.6, 1.5, 7.3),
Frecuencia.minima = c(149.4, 157.4, 167, 314.7, 75.3, 149.3,
212.2, 147.5), Frecuencia.maxima = c(518.6, 564.8, 594.3,
846.2, 394.9, 438.4, 396.8, 263.8), Ancho.banda = c(369.1,
407.3, 427.2, 531.5, 319.6, 289, 184.6, 116.3), Frecuencia.central = c(522.1,
551.8, 589.9, 844, 385.9, 429, 374.9, 255.2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -8L))
Assuming that in your table, you have one row per species and species do not repeat, simple data$Especies = seq_along(data$Especies) will do the job. I would suggest keeping the original table so you remember what code belongs to the species, such as with data$id = seq_along(data$Especies).
data = structure(list(Especies = c("Strix_varia", "Strix_rufipes", "Strix_occidentalis",
"Strix_aluco", "Strix_uralensis", "Strix_woodfordii", "Strix_leptogrammica",
"Strix_nebulosa"), Notas.segundo = c(2.9, 4.3, 2.9, 1.3, 1, 3,
3.1, 1.1), Notas.llamado = c(6.3, 13.5, 12.2, 5, 3, 6, 4, 9.3
), Duracion.llamado = c(2.9, 2.9, 5.3, 4, 4.5, 1.6, 1.5, 7.3),
Frecuencia.minima = c(149.4, 157.4, 167, 314.7, 75.3, 149.3,
212.2, 147.5), Frecuencia.maxima = c(518.6, 564.8, 594.3,
846.2, 394.9, 438.4, 396.8, 263.8), Ancho.banda = c(369.1,
407.3, 427.2, 531.5, 319.6, 289, 184.6, 116.3), Frecuencia.central = c(522.1,
551.8, 589.9, 844, 385.9, 429, 374.9, 255.2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -8L))
# If you don't want to overwrite your data:
data$id = seq_along(data$Especies)
# If you are OK with overwriting your data
data$Especies = seq_along(data$Especies)
If the species names are not unique and can repeat, this means that two species will have a different id. If that is not what you want, you can use factor():
data$id = as.numeric(factor(data$Especies))
Alternatively, you can create an encoding of your own by creating a named vector and use it to translate species names to id:
names = unique(data$Especies)
coding = seq_along(names)
names(coding) = names
data$id = coding[data$Especies]

Data labels for mean and percentiles in a distribution chart

I'm creating a custom chart to visualize a variable's distribution using geom_density. I added 3 vertical lines for a custom value, the 5th percentile and the 95th percentile.
How do I add labels for those lines?
I tried using geom_text but i don't know how to parameter the x and y variables
library(ggplot2)
ggplot(dataset, aes(x = dataset$`Estimated percent body fat`)) +
geom_density() +
geom_vline(aes(xintercept = dataset$`Estimated percent body fat`[12]),
color = "red", size = 1) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.05, na.rm = TRUE)),
color = "grey", size = 0.5) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.95, na.rm = TRUE)),
color="grey", size=0.5) +
geom_text(aes(x = dataset$`Estimated percent body fat`[12],
label = "Custom", y = 0),
colour = "red", angle = 0)
I'd like to obtain the following:
for the custom value, I'd like to add the label at the top of the chart, just to the right of the line
for the percentiles label, I'd like to add them in the middle of the chart; at the left of the line for the 5th percentile and right of the line for 95th percentile
Here is what I was able to obtain https://i.imgur.com/thSQwyg.png
And these are the first 50 lines of my dataset:
structure(list(`Respondent sequence number` = c(21029L, 21034L,
21043L, 21056L, 21067L, 21085L, 21087L, 21105L, 21107L, 21109L,
21110L, 21125L, 21129L, 21138L, 21141L, 21154L, 21193L, 21195L,
21206L, 21215L, 21219L, 21221L, 21232L, 21239L, 21242L, 21247L,
21256L, 21258L, 21287L, 21310L, 21325L, 21367L, 21380L, 21385L,
21413L, 21418L, 21420L, 21423L, 21427L, 21432L, 21437L, 21441L,
21444L, 21453L, 21466L, 21467L, 21477L, 21491L, 21494L, 21495L
), `Estimated percent body fat` = c(NA, 7.2, NA, NA, 24.1, 25.1,
30.2, 23.6, 24.3, 31.4, NA, 14.1, 20.5, NA, 23.1, 30.6, 21, 20.9,
NA, 24, 26.7, 16.6, NA, 26.9, 16.9, 21.3, 15.9, 27.4, 13.9, NA,
20, NA, 12.8, NA, 33.8, 18.1, NA, NA, 28.4, 10.9, 38.1, 33, 39.3,
15.9, 32.7, NA, 20.4, 16.8, NA, 29)), row.names = c(NA, 50L), class =
"data.frame")
First I recommend clean column names.
dat <- dataset
names(dat) <- tolower(gsub("\\s", "\\.", names(dat)))
Whith base R plots you could do the following. The clou is, that you can store the quantiles and custom positions to use them as coordinates later which gives you a dynamic positioning. I'm not sure if/how this is possible with ggplot.
plot(density(dat$estimated.percent.body.fat, na.rm=TRUE), ylim=c(0, .05),
main="Density curve")
abline(v=c1 <- dat$estimated.percent.body.fat[12], col="red")
abline(v=q1 <- quantile(dat$estimated.percent.body.fat, .05, na.rm=TRUE), col="grey")
abline(v=q2 <- quantile(dat$estimated.percent.body.fat, .95, na.rm=TRUE), col="grey")
text(c1 + 4, .05, c(expression("" %<-% "custom")), cex=.8)
text(q1 - 5.5, .025, c(expression("5% percentile" %->% "")), cex=.8)
text(q2 + 5.5, .025, c(expression("" %<-% "95% percentile")), cex=.8)
Note: Case you don't like the arrows just do e.g. "5% percentile" instead of c(expression("5% percentile" %->% "")).
Or in ggplot you could use annotate.
library(ggplot2)
ggplot(dataset, aes(x = dataset$`Estimated percent body fat`)) +
geom_density() +
geom_vline(aes(xintercept = dataset$`Estimated percent body fat`[12]),
color = "red", size = 1) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.05, na.rm = TRUE)),
color = "grey", size = 0.5) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.95, na.rm = TRUE)),
color="grey", size=0.5) +
annotate("text", x=16, y=.05, label="custom") +
annotate("text", x=9.5, y=.025, label="5% percentile") +
annotate("text", x=38, y=.025, label="95% percentile")
Note, that in either solution the result (i.e. exact label positions) depends on your export size. To learn how to control this, take e.g. a look into How to save a plot as image on the disk?.
Data
dataset <- structure(list(`Respondent sequence number` = c(21029L, 21034L,
21043L, 21056L, 21067L, 21085L, 21087L, 21105L, 21107L, 21109L,
21110L, 21125L, 21129L, 21138L, 21141L, 21154L, 21193L, 21195L,
21206L, 21215L, 21219L, 21221L, 21232L, 21239L, 21242L, 21247L,
21256L, 21258L, 21287L, 21310L, 21325L, 21367L, 21380L, 21385L,
21413L, 21418L, 21420L, 21423L, 21427L, 21432L, 21437L, 21441L,
21444L, 21453L, 21466L, 21467L, 21477L, 21491L, 21494L, 21495L
), `Estimated percent body fat` = c(NA, 7.2, NA, NA, 24.1, 25.1,
30.2, 23.6, 24.3, 31.4, NA, 14.1, 20.5, NA, 23.1, 30.6, 21, 20.9,
NA, 24, 26.7, 16.6, NA, 26.9, 16.9, 21.3, 15.9, 27.4, 13.9, NA,
20, NA, 12.8, NA, 33.8, 18.1, NA, NA, 28.4, 10.9, 38.1, 33, 39.3,
15.9, 32.7, NA, 20.4, 16.8, NA, 29)), row.names = c(NA, 50L), class =
"data.frame")

Wrap Axis Labels in Correlation Matrix

I'm attempting to use the ggcorr() function within library(GGally) to create a correlation matrix. The package is working as it is supposed to, but I'm running into an issue where I would like to edit how the axis labels appear on the plot.
Currently, they will automatically add a _ or . to separate names with spaces or other characters between them. Ideally, I would like to create a line break (\n) between spaces in names so that long names and short names can be easily read and don't extend much further beyond the appropriate column and row.
I have found solutions that others have used on SO, including using str_wrap(), but it was within a ggplot() call, not this specific package. I have inspected the R code for the package, but couldn't find where to edit these labels specifically. Whenever I attempt to edit X or Y axis text, it adds an entirely new axis and set of labels.
I currently dcast() a data frame into the resulting data frame and even when I gsub() "\n" into the player names column, they get lost in the dcast() transition.
Here is an example of what I am working with. I would like to be able to automatically create line breaks between first and last name of the labels.
library(GGally)
library(ggplot2)
test <- structure(list(Date = structure(c(17100, 17102, 17103, 17106,
17107), class = "Date"), `Alexis Ajinca` = c(1.2, NA, 9.2, 6.4,
NA), `Anthony Davis` = c(95.7, 76.9, 29, 67, 24.9), `Buddy Hield` = c(9.7,
4.7, 17, 8, 28.3), `Cheick Diallo` = c(NA, NA, 3.2, NA, NA),
`Dante Cunningham` = c(0.5, 27.6, 14, 13.5, -1), `E'Twaun Moore` = c(19.2,
16.1, 22, 20.5, 10.1), `Lance Stephenson` = c(16.1, 31.6,
8, 8.1, 34.8), `Langston Galloway` = c(10.9, 2, 13.8, 2.2,
29.4), `Omer Asik` = c(4.7, 6.6, 9.9, 15.9, 14.2), `Solomon Hill` = c(4.7,
13.2, 12.8, 35.2, 4.4), `Terrence Jones` = c(17.1, 12.4,
9.8, NA, 20.8), `Tim Frazier` = c(40.5, 40.2, 18.3, 44.1,
7.2)), .Names = c("Date", "Alexis Ajinca", "Anthony Davis",
"Buddy Hield", "Cheick Diallo", "Dante Cunningham", "E'Twaun Moore",
"Lance Stephenson", "Langston Galloway", "Omer Asik", "Solomon Hill",
"Terrence Jones", "Tim Frazier"), row.names = c(NA, -5L), class = "data.frame")
ggc <- ggcorr(test[,-1], method = c("pairwise","pearson"),
hjust = .85, size = 3,
layout.exp=2)
ggc
Thank you for any and all help and please, let me know if you have any questions or need any clarification!
A couple of approaches
You can edit the object returned by ggcorr
g = ggplot_build(ggc)
g$data[[2]]$label = gsub("_", "\n", g$data[[2]]$label )
grid::grid.draw(ggplot_gtable(g))
Or you can create a new data frame and add the labels manually using geom_text. This probably gives a bit more control over the text justification and placement.
# I dont see how to suppress the labels so just set the size to zero
ggc <- ggcorr(test[,-1], method = c("pairwise","pearson"),
hjust = .85,
size = 0, # set this to zero
layout.exp=2)
# Create labels and plot
dat <- data.frame(x = seq(test[-1]), y = seq(test[-1]),
lbs = gsub(" ", "\n", names(test[-1]) ))
ggc + geom_text(data=dat, aes(x, y, label=lbs), nudge_x = 2, hjust=1)

Resources