I am working with the R programming language.
I have the following data that contains 10 measurements for a set of people (and includes NA's):
my_data <- structure(list(id = 1:20, weight_time_1 = c(NA, NA, NA, NA, 99.4800556826432,
NA, NA, NA, NA, 92.7723003148797, NA, 102.130637355002, NA, NA,
96.4306038435274, 117.519167258681, NA, NA, NA, NA), weight_time_2 = c(NA,
NA, NA, 100.096037354425, 98.5573457978251, NA, 99.2565971422039,
NA, NA, 78.2178327860056, NA, 93.1290042175411, NA, 105.999332486733,
102.324404273109, 106.249390147503, NA, NA, NA, NA), weight_time_3 = c(NA,
NA, NA, 109.653641754063, 108.67612106402, 89.245436013972, 76.0388764710753,
NA, 121.434141230992, 93.5040344542738, NA, 106.261290772666,
NA, 107.27650959864, 99.9614325607138, 106.822602397336, NA,
NA, NA, NA), weight_time_4 = c(NA, NA, NA, 83.4057073444694,
100.0475658129, 101.181524203485, 109.854456857605, NA, 109.39925298469,
100.127289780991, NA, 92.3537705948637, NA, 97.484431731186,
93.1880798156964, 98.2949614096827, NA, NA, NA, NA), weight_time_5 = c(85.9705471396862,
NA, 101.810197281424, 125.878759238011, 90.5377892614597, 100.977860860978,
105.206211167738, 105.925495763829, 95.0038093722839, 91.7697262180746,
112.751436397665, 89.3570085447357, NA, 105.334871042565, 107.101908594036,
121.466895783898, NA, NA, NA, NA), weight_time_6 = c(91.3939219450539,
NA, 102.295063295212, 112.648885364836, 92.858993235862, 84.9768973349691,
106.268407819189, 91.2142736262532, 94.5206092516322, 106.102317632812,
106.800383289515, 96.8243417950671, 112.526148273022, 96.0060934996047,
108.127666530717, 100.80395850135, NA, NA, NA, 97.1665601525516
), weight_time_7 = c(78.1538622765699, NA, 98.3267913598314,
97.694334342899, 88.2573884491152, 94.0391463446378, 79.107127345042,
98.6717305266368, 87.4584802875, 91.0212929680695, 115.449312672637,
108.505222479846, 87.7272780928247, 98.2950591116351, 108.64305435295,
100.971252881422, NA, NA, NA, 89.7627845887151), weight_time_8 = c(88.9847618154833,
NA, 75.9578295182105, 123.066624773516, 103.899907028919, 86.3922722708996,
101.056470605625, 93.9274704914096, 116.225266396545, 119.261812971557,
120.470004522712, 95.1540411812936, 103.625912955529, 119.112226243372,
97.2548085647629, 93.4809837458108, NA, 107.551887082473, 103.626395948971,
92.497583506856), weight_time_9 = c(106.965867937613, NA, 111.885847224286,
95.4347167550049, 89.629232996398, 99.279432759281, 111.111236025807,
106.187409603617, 95.0731389891664, 102.40946902701, 98.7215766413794,
108.440350789909, 111.841323303161, 98.6631240530225, 108.178201457868,
102.289607726024, 108.679229829576, 93.9424920702776, 102.660681952024,
90.7932196785015), weight_time_10 = c(98.5452360068031, 100.417384196154,
94.4492002344181, 100.711643341273, 119.565187908911, 103.54455492062,
74.0330331656656, 103.431332886172, 112.355083085616, 100.345180859457,
97.3988962137931, 96.9401740645521, 116.008033135044, 106.302406861972,
96.7028852299552, 111.699115637383, 95.3519501717543, 89.9061904342833,
107.36861168758, 102.797106848808)), row.names = c(NA, 20L), class = "data.frame")
I would like to make a "longitudinal" graph for this data. I tried to do this two different ways:
Option 1: https://cran.r-project.org/web/packages/lcsm/vignettes/v0-longitudinal-plots.html
library(lcsm)
library(ggplot2)
library(tidyr)
library(dplyr)
library(stringr)
x_var_list <- c("weight_time_1", "weight_time_2", "weight_time_3", "weight_time_4", "weight_time_5", "weight_time_6", "weight_time_7", "weight_time_8", "weight_time_9", "weight_time_10")
plot_trajectories(data = my_data,
id_var = "id",
var_list = x_var_list,
xlab = "Time", ylab = "Value",
connect_missing = FALSE,
random_sample_frac = 1,
title_n = TRUE)
This seemed to have worked, but produces a warning message stating that NA's were not plotted:
Warning messages:
1: Removed 64 row(s) containing missing values (geom_path).
2: Removed 64 rows containing missing values (geom_point).
Option 2: https://www.r-bloggers.com/2015/08/managing-longitudinal-data-conversion-between-the-wide-and-the-long/#google_vignette
dat <- reshape(my_data, varying= c("weight_time_1", "weight_time_2", "weight_time_3", "weight_time_4", "weight_time_5", "weight_time_6", "weight_time_7", "weight_time_8", "weight_time_9", "weight_time_10"), idvar="id", direction="long")
library(ggplot2)
ggplot(dat, aes(x=time, y=measure, colour=tx, group=id)), geom_line(alpha=.5)
But this returns the following error: Error in guess(varying) :
failed to guess time-varying variables from their names
Can someone please show me how to fix this and plot this data? I would like the NA's to appear on the graph.
Thanks!
NAs cannot be represented as data points. However, they can be made indirectly visible by plotting the probands separately, so that the presence of NAs is obvious.
library(tidyverse)
my_data <- as_tibble(my_data)
my_data <- my_data %>%
pivot_longer(-id, names_to = "tp", values_to = "measure") %>%
mutate(
tp = parse_number(tp),
tp = factor(tp),
id = factor(id)
)
my_data %>%
ggplot(aes(tp, measure, col = id, group = id)) +
geom_point() +
geom_smooth(method = "loess", se = F) +
theme(legend.position = "none") +
facet_wrap(~id)
NAs are, by definition, no data and therefore cannot be represented graphically. Already the lines between the points are strictly speaking wrong because one does not know the data between the points. Therefore, technically correct would be to represent the existing data only as points. Within the known data range, one can try to connect the points as best as possible with a smoothing line. More complex modeling is needed for the areas outside the known data range.
In short, it is wrong for NAs to appear in the graph.
Your option 1 is probably fine. It's just warning you that it is impossible to plot an NA. Here is the ggplot2 version, you need to make the wide data long.
my_data_long <- my_data %>%
tidyr::pivot_longer(-id, names_to = "time", values_to = "Value") %>%
drop_na() %>%
mutate(id = factor(id))
ggplot(my_data_long, aes(x = time, y = Value, color = factor(id))) +
geom_point() +
geom_line(aes(group = id)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = -90)
You could also use dygraphs which is quite straightforward for this use case and handles nicely NAs:
library(dygraphs)
dygraph(my_data) %>% dyLegend(show = "always")
Further formatting options can be found here
update - apparently the OP wanted to explicitly plot missing values. see further below for one approach
There are plenty of options to deal with NAs when plotting with ggplot2.
Just leave them, and accept the warning (really nothing wrong with that)
drop NA before plotting see JeffV's answer using tidyr::drop_na, but there are many ways, see this ultra-popular thread
in your case, you can drop NA when pivoting - use tidyr::pivot_longer(..., values_drop_na = TRUE)
add na.rm = TRUE to the geom of interest:
library(ggplot2)
library(dplyr)
library(tidyr)
my_data %>%
pivot_longer(cols = starts_with("weight")) %>%
# your x is essentially continuous. Thus make it REALLY continuous!
# your id is categorical, so make it that
mutate(time = as.integer(gsub(".*([0-9]+)", "\\1", name)),
id = as.character(id)) %>%
ggplot(aes(x=time, y=value, colour=id, group=id)) +
geom_line(alpha=.5, na.rm = TRUE)
Visualising NAs in a line plot
It is a whole new problem "how to visualise NAs". There is the {naniar} package which helps visualising NAs, but to my knowledge not "within" a line plot. One way to do that would to first interpolate or impute NAs based on the present data. This should not be the place to discuss the best way to do this, but here a quick way using the zoo package.
my_data_long <- my_data %>%
pivot_longer(cols = starts_with("weight")) %>%
mutate(time = as.integer(gsub(".*([0-9]+)", "\\1", name)),
id = factor(id, levels = 1:max(id))) %>%
group_by(id) %>%
## interpolate NA's with the zoo package
mutate(na_ip = zoo::na.approx(value, time, na.rm = FALSE))
## store your NA's in a different frame
my_nas <- my_data_long %>% filter(is.na(value))
ggplot(my_data_long, aes(x=time, y=value, colour=id, group=id)) +
## e.g., use the interpolated values for dashed lines
geom_line(data = my_nas, aes(y = na_ip), lty = 2) +
geom_line(alpha=.5, na.rm = TRUE) +
## because this is otherwise a complete visual disaster, I'm untangling with facet
facet_wrap(~id) +
theme(legend.position = "none")
#> Warning: Removed 9 row(s) containing missing values (geom_path).
#> geom_path: Each group consists of only one observation. Do you need to adjust
#> the group aesthetic?
#> geom_path: Each group consists of only one observation. Do you need to adjust
#> the group aesthetic?
#> geom_path: Each group consists of only one observation. Do you need to adjust
#> the group aesthetic?
#> geom_path: Each group consists of only one observation. Do you need to adjust
#> the group aesthetic?
Related
I am trying to plot a Milestone Trend Analysis with R. At some point, a Milestone is reached and will not be reported anymore. That's also when the line in the graph should stop. So I tried to implement this with ggplot and reshape2 to melt the dataset in a long format:
#edit (Data model by code, sorry)
Datamodel:
MTA_data<- data.frame(ReportingDates = c("01.01.2021", "01.02.2021",
"01.03.2021", "01.04.2021", "01.05.2021", "01.06.2021", "01.07.2021",
"01.08.2021", "01.09.2021", "01.10.2021", "01.11.2021", "01.12.2021"),
Milestone1 = c("01.02.2021", "01.03.2021", NA, NA, NA, NA,NA, NA, NA, NA, NA, NA),
Milestone2 = c("01.06.2021", "01.06.2021","01.06.2021", "01.06.2021", "01.07.2021",
"01.07.2021", NA, NA,NA, NA, NA, NA),
Milestone3 = c("01.09.2021", "01.09.2021", "01.09.2021", "01.09.2021", "01.09.2021",
"01.09.2021", "01.09.2021", "01.11.2021", "01.11.2021", "01.11.2021",
"01.11.2021", NA),
MilestoneDates = c("01.01.2021","01.02.2021", "01.03.2021", "01.04.2021", "01.05.2021", "01.06.2021",
"01.07.2021", "01.08.2021", "01.09.2021", "01.10.2021", "01.11.2021","01.12.2021"))
dput(MTA_data)
#code for meld and plot:
MTA_data.long <-melt(MTA_data,id.vars = "ReportingDates")
x <- ggplot(MTA_data.long, aes(ReportingDates,value,color=variable))+
geom_line(data=MTA_data.long, aes(x=ReportingDates, y=value, group=variable))+
geom_point()
x
result:
Well, since some milestones don't have a planning date anymore at some reporting date (milestone is reached), the value is NA or empty (doesn't matter, the issue remains the same). Is there a way to make ggplot ignoring the NA/empty this way so that the lines will stop at this points?
expected result:
Thanks for your help!
To not melt the NA values, set na.rm = TRUE inside the melt() function.
MTA_data.long <-melt(MTA_data,id.vars = "ReportingDates", na.rm = TRUE)
x <- ggplot(MTA_data.long, aes(ReportingDates, value, color = variable)) +
geom_line(aes(group = variable)) +
geom_point()
As a side note, you don't need to re-specify the data and aesthetics set in ggplot(). They will be inherited by subsequent layers.
I am trying to filter every column of my dataframe with a certain threshold (in this case >= 1.2) with dplyrs filter function. It worked nicely so far, but suddenly I get this error message, when I try to run the code:
Error in env_bind_lazy(private$bindings, !!!set_names(promises, names_bindings)) :
attempt to use zero-length variable name
This is part of my dataframe (it has 108 columns, some rows contain NA):
Mean 1
Mean 2
Mean 3
1.1874
1.0944
1.2376
1.2258
1.0665
1.2365
1.0953
1.1420
1.2479
1.2234
1.0949
1.0608
NA
NA
1.146
This is my code:
Heights_filtered = list()
for (i in 1:length(allHeights)){
filtered = filter(allHeights, allHeights[,i] >=1.2, .preserve = TRUE)
filterlist = cbind.data.frame(filtered[,i])
colnames(filterlist) = colnames(allHeights[i])
Heights_filtered[[i]] = cbind.data.frame(filterlist)
names(Heights_filtered) = colnames(allHeights[i])}
Do you have an idea why this happens now?
Thanks for your help!
These are the first rows of my dataframe
> dput(head(allHeights[1:10], 10))
structure(list(Mean1 = c(1.18743006611931, 1.22582285838843,
1.09595291724188, 1.22341059362058, 1.32431882583739, 1.31219937513623,
1.28004068880331, 1.29884472862021, 1.36733270362566, 1.38170457022452
), Mean2 = c(1.09447069039104, 1.09233667417252, 1.08767127319823,
1.06656658866469, 1.14203717603426, 1.09491221098798, 1.03171589621323,
1.15308990831089, 1.17585765375955, 1.11962264706315), Mean3 = c(1.23761700966768,
1.07486913672867, 1.2605330014152, 1.21512728264762, 1.23659397432181,
1.17488789237668, 1.28191444014391, 1.23137649405787, 1.22165765827209,
1.17481969002029), Mean4 = c(1.0608309164187, 1.06201740178538,
1.07512524012204, 1.07230027496328, 1.07823270179668, 1.08137782967343,
1.08704659309202, 1.09783795999849, 1.05538815021281, 1.04118799201477
), Mean5 = c(1.3872325431161, 1.34236438736957, 1.11657498580741,
1.19758040835503, 1.19718888867138, 1.12759626490222, 1.13074799835562,
1.19262768435683, 1.16498639469099, 1.2131433157802), Mean6 = c(1.18440664423239,
1.20342967777624, 1.21238802071329, 1.12420289186988, 1.22123880207133,
1.19712964243458, 1.20605725349191, 1.23989305305859, 1.21075923108837,
1.24834431998033), Mean7 = c(1.13543425248546, 1.12286625398612,
1.09469483808257, 1.10461963472656, 1.11445916679456, 1.08465067103221,
1.12117801538173, 1.08284306202145, 1.11304377483331, 1.13541719957027
), Mean8 = c(1.24793883159642, 1.19395390601616, 1.18592691355337,
1.19717830807325, 1.191232891622, 1.19336888792142, 1.17576392479116,
1.13564256754918, 1.11424178933907, 1.18585888819352), Mean9 = c(1.20505670697375,
1.18604713515832, 1.19024318309784, 1.21607636002896, 1.30812129661903,
1.24325012735609, 1.19658417567097, 1.27798482451672, 1.04137061962088,
1.30975681690216), Mean10 = c(1.06327665140615, 1.13939757285081,
1.12462757067074, 1.06967153549887, 1.08647627352663, 1.16336022091418,
1.15385873119686, 1.1672116851973, 1.22303975001817, 1.13392922026016
)), row.names = c(NA, 10L), class = "data.frame")
And here the last part of the dataframe which gives me the error:
dput(head(allHeights[100:108], 10))
structure(list(c(1.3975238170743, 1.42479618398277, 1.36302374440084,
1.33075672890157, 1.30214981303101, 1.29526565452359, 1.31860044132609,
1.23876534400972, 1.15907559361002, 1.26664552529697), c(2.22279564798051,
2.15443577725511, 2.36887256975583, 2.04737812822552, 2.21183099544832,
2.08881706966277, NA, NA, NA, NA), c(1.03731717809005, 1.07517206767995,
1.10263120160597, 1.17071264697448, 1.12660596501291, 1.07340120447376,
1.05339833667909, 1.02742328649269, 1.04743332377402, 1.09359764840837
), c(1.75325898322414, 1.80777043843246, 1.26273660420002, 1.59312822030592,
1.11652967053664, 1.62459472912435, 1.28563356786353, 1.95060067533935,
NA, NA), c(1.34261413268355, 1.30548480529631, 1.32490460208726,
1.05392855500896, 1.36887499425314, 1.12776424072456, 1.24322559882304,
1.24394280722725, 1.51098340306193, 1.35122063353409), c(1.30861179458687,
1.30802444638463, 1.32818477656957, 1.2115882212874, 1.27803793951901,
1.34488451464402, 1.2494642431939, 1.14564647987936, 1.13223271688229,
1.21111199301532), c(1.19828142850047, 1.2299458600308, 1.18492028013709,
1.24207768340535, 1.14210500173844, 1.14374410172354, 1.17129836586698,
1.20543386479909, 1.17938210897531, 1.1315377738042), c(1.06870742201506,
1.19744233297478, 1.14709573323772, 1.21291980399187, 1.19923509023545,
1.1095972272021, 1.1777817616828, 1.13757918011235, 1.18910601171268,
1.18139715549181), c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Mean100",
"Mean101", "Mean102", "Mean103", "Mean104", "Mean105", "Mean106",
"Mean107", NA), row.names = c(NA, 10L), class = "data.frame")
This solution might work and be more inline with a dplyr approach. The code below uses mtcars as an example. It will keep rows where all values are greater than or equal to one.
library(dplyr)
mtcars %>%
filter(
across(
.cols = everything(),
.fns = ~ .x >= 1
)
)
Edit: Depending if you want to handle missing data this should help. You can specify your function you want to use to filter and then apply in in the where statement. Here is another example with mtcars.
myrowfun <- function(x){is.na(x)| x >=2}
mtcars <- mutate(mtcars, mpg = NA)
mtcars$mpg[1:7] <- rep(1,7)
mtcars %>%
filter(myrowfun(
across(
.cols = everything(),
.fns = ~.x
)
)
)
edit: adding an example of how to remove all missing columns
mtcars <- mutate(mtcars, newcol = NA)
#shows which columns are not all missing
sapply(mtcars, function(x)all(!is.na(x)))
#subset on that
mtcars2 <- mtcars[, sapply(mtcars, function(x)all(!is.na(x)))]
First of all, I would like to say sorry for your confusion to my description because of my poor English. I try to explain my question as my best, if you have anything that not understand please add a comment, I will explain with more details.
The data set used to draw plot like that (the image here is just a part of):
I have put a output of the dput at below.
That is a movement data captured by linear accelerator with the timestamp. I use ggplot2 to draw a line plot to show that in my report. There is my code:
......
#Convert timestamp format
time <- gsub(":", ".", x)
time <- strptime(time, format = "%H.%M.%OS")
time <- as.POSIXct(time)
df["time"] <- time
# Person B Plot
p <- ggplot(df, aes(x = time)) +
scale_x_datetime(name = "Time", labels = date_format("%H:%M:%OS")) +
ylab("PCA") +
geom_hline(aes(yintercept = 0)) +
scale_colour_manual("", values = c("PCA_A" = "hotpink3", "PCA_B" = "steelblue3", "Correlation" = "chocolate")) +
geom_line(aes(y = PCA_b, group = 1, colour = "PCA_B"), size = 0) +
# theme(text = element_text(size = 23), plot.title = element_text(hjust = 0.5)) +
ggtitle("PCA_Two")
Because of the timestamp stored in the csv file as string. I have to change the format to POSIXct, then I can use scale_x_datetime to show the time on the x axis.Then I get a strange plot.
There is a break between the two point. If I remove the first five lines and the "scale_x_datetime" in the code I showed. The plot will be fine, the curve will be smooth but the x axis can not show the time correctly.
Why and How?
---------- update 20/4/2020
I use the dput(df[20:50,]) to output a part of my dataset, I hope that will be helpful. Thanks the help from #chemdork123.
There is a simple description for the data structure below. The dataframe used to draw plot have four columns, time, PCA_a, PCA_b, cor. I will draw three line plot, all the three plot's X data is time (timestamp). In this post, I just show the "time - PCA_b" plot. In fact, all the three plot have the same issue, the break, and the break locations are same. (The NA in the "cor" col is not a bug, that's what I did on purpose.)
structure(list(time = structure(c(1587503540.556, 1587503540.577,
1587503540.615, 1587503540.637, 1587503540.675, 1587503540.696,
1587503540.716, 1587503540.756, 1587503540.776, 1587503540.817,
1587503540.837, 1587503540.876, 1587503540.893, 1587503540.915,
1587503540.937, 1587503540.976, 1587503540.997, 1587503541.018,
1587503541.059, 1587503541.078, 1587503541.117, 1587503541.138,
1587503541.18, 1587503541.201, 1587503541.24, 1587503541.26,
1587503541.3, 1587503541.339, 1587503541.358, 1587503541.4, 1587503541.423
), class = c("POSIXct", "POSIXt"), tzone = ""), PCA_a = c(1.56737319252217,
2.04606254627585, 2.49366222484302, 2.88101522283612, 3.18379411504211,
3.38503090762478, 3.47436865063648, 3.44747654856326, 3.30707775976109,
3.06371801441373, 2.73437161733756, 2.33935677190782, 1.89968708587307,
1.43586301558354, 0.967277030171067, 0.511214148600076, 0.0816220889456876,
-0.311381715806983, -0.661355048674678, -0.965683235694069, -1.22624198074107,
-1.44997061419577, -1.64740413737597, -1.82782646420492, -1.99421995781177,
-2.14199256386341, -2.26073408401317, -2.33585157388011, -2.34937651266747,
-2.28185734041769, -2.11603996134387), PCA_b = c(0.428589019048672,
0.437715207869297, 0.44415836273225, 0.447676595545035, 0.448336071890988,
0.446396459498192, 0.442205853553038, 0.43616876635858, 0.42877854629294,
0.420603253124693, 0.412148862183822, 0.403676755189904, 0.395124979959946,
0.386241966203463, 0.376849622459395, 0.367015680942488, 0.35712348581213,
0.347977244142877, 0.340825041944267, 0.337103574812562, 0.338073413214583,
0.344591707232845, 0.35695103029739, 0.374713701538921, 0.396660690638421,
0.420888192551911, 0.445042523797771, 0.466693774961235, 0.483678597255532,
0.494312865435414, 0.497599592736315), cor = c(0.787242026266416,
NA, NA, NA, NA, NA, NA, NA, NA, 0.297936210131332, NA, NA, NA,
NA, NA, NA, NA, NA, -0.074108818011257, NA, NA, NA, NA, NA, NA,
NA, NA, -0.437523452157598, NA, NA, NA)), row.names = 20:50, class = "data.frame")
---------- update 21/4/2020
I found a very interesting thing. If the size of the dataset smaller than 277, the plot will be perfect. Or the No.277 point will shift. I make a gist here with a 277 size dput. Anyone can test it? My plot will be
Thanks the help from #chemdork123 . I found all the data in one second around the break point disappeared in the raw dataset. It should be a failure made by the research instrument.
The answer is so easy that make me looks like a fool XD.
I am a little bit new to R plot_ly and have an issue with a graph.
I have constructed a dataframe with values for different dates depending on the column. My dataframe has dates in the first column and then values for the different dates but not necessarily on all dates. Hence I am using na.locf to remove na between 2 values by the first value (hope I am clear enough). The remaining na are the ones before the first value for each column, which I then replace by 0.
Then I am trying to plot my df with the dates on the x axis and the evolution of my time series on the y axis.
My issue is that somehow there is some difference between 0 values for one of my graph (see screenshot) The orange line begins at 0 and then has some values. If I put the mouse on the graph, before the "big drop", the value is shown to be 0, after it is 0.00. The df has no values for this time serie at this point. Also the graphs does not seem to have the same y axis even if they should and I no not understand why.
My code to create the graph is:
if (dim(df1)[1] != 0){
df1 <- na.locf(df1)
df1[is.na(df1)] <- 0.00
all_names <- colnames(df1)[-1]
for (i in all_names){
if (i==all_names[1]){
p <- plot_ly(x = df1$date, y= df1[,i] , name = i, type = 'scatter', mode = 'lines')
}else{
p <- p %>% add_trace(y = df1[,i], name = i, type = 'scatter', mode = 'lines')
}
}
output$info_graph <- renderPlotly({
p
})
output$info_output <- renderUI({
plotlyOutput("info_graph")
})
}else{
output$info_output <- renderUI({
})
}
EDIT: I have added a screenshot of my data where the gap is (orange line is the third column (before the 2008-09-12 I only have NA), blue one is the second column:
EDIT2: I just reproduced with 26 dates. You can see the screenshot:
dput gives:
structure(list(date = structure(c(14062, 14069, 14076, 14083,
14090, 14097, 14104, 14111, 14118, 14125, 14132, 14134, 14139,
14141, 14146, 14148, 14153, 14155, 14160, 14162, 14167, 14169,
14174, 14176, 14181, 14183), class = "Date"), col1 = c(3036258.57195313,
3023427.6675, 2971520.82675781, 3093997.64199219, 3042965.63564453,
3119076.22796875, 3154652.82667969, 3120534.28529297, 3101871.15154297,
3226680.85849609, 3185563.64195312, NA, 3077375.78849609, NA,
3039466.29806641, NA, 2956357.03058594, NA, 2701488.6103125,
NA, 2715194.34916016, NA, 2687199.64853516, NA, 2733857.48291016,
NA), col2 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0, NA, 0, NA, 0, NA, 0, NA, 0, NA, 0, NA, 0, NA, 0)), .Names = c("date",
"col1", "col2"), row.names = 145:170, class = "data.frame")
The problem with your code is that you are plotting character variables which are converted to categorical in the plot call. The culprit is the mis(s)use of na.locf function.
The first column of your data frame is a character column, when you provide the whole data frame to na.locf it converts everything to character. Here is a fix:
library(zoo)
library(plotly)
convert the date column to POSIXct
df1$date <- as.POSIXct(df1$date)
use na.locf only on numerical columns
df1[,2:3] <- na.locf(df1[,2:3])
df1[is.na(df1)] <- 0.00
for (i in all_names){
if (i==all_names[1]){
p <- plot_ly(x = df1$date, y= df1[,i] , name = i, type = 'scatter', mode = 'lines')
}else{
p <- p %>% add_trace(y = df1[,i], name = i, type = 'scatter', mode = 'lines')
}
}
p
How can I put two columns in one heatmap?
Lets say I have the following data
data<- structure(list(names = structure(c(5L, 1L, 10L, 2L, 6L, 4L, 9L,
7L, 11L, 3L, 8L), .Label = c("Bin", "Dari", "Down", "How", "India",
"Karachi", "Left", "middle", "Right", "Trash", "Up"), class = "factor"),
X1Huor = c(1.555555556, 5.2555556, 2.256544, 2.3654225, 1.2665545,
0, 1.889822365, 2.37232101, -1, -1.885618083, 1.128576187
), X2Hour = c(1.36558854, 2.254887, 2.3333333, 0.22255444,
2.256588, 5.66666, -0.377964473, 0.107211253, -1, 0, 0),
X3Hour = c(0, 1.222222222, 5.336666, 1.179323788, 0.832050294,
-0.397359707, 0.185695338, 1.393746295, -1, -2.121320344,
1.523019248), X4Hour = c(3.988620176, 3.544745039, -2.365555,
2.366666, 1.000000225, -0.662266179, -0.557086015, 0.862662186,
0, -1.305459824, 1.929157714), X5Hour = c(2.366666, 2.333365,
4.22222, 0.823333333, 0.980196059, -2.516611478, 2.267786838,
0.32163376, 0, -2.592724864, 0.816496581)), .Names = c("names",
"X1Huor", "X2Hour", "X3Hour", "X4Hour", "X5Hour"), class = "data.frame", row.names = c(NA,
-11L))
This data has 5 columns of values. I want to make a heatmap which half of it is the value from first colum and the other half of each cell is from the second column.
The same for the third column and fourth
The same for the fifth and sixth ( there is no sixth but I can leave it empty)
This is just an example to show what I am looking for. I have searched a lot but I could not find anything like this
The color range from Red to green, if the value is higher than 2 the color red and if the value is lower than -2 the color is green.
Any thought how to do this ?
This is a somewhat hacky solution, but it might work for you, so check this out.
The idea is to utilize geom_polygon to create the triangles and stack them. To do that we first need to generate the triangle coordinates
library(dplyr)
library(tidyr)
library(stringr)
# the following two line create the triangle coordinates
x = rep(c(1,2,2, 1, 1, 2),nrow(data))
y = rep(c(1,1,2, 1, 2, 2),nrow(data)) + rep(0:10, each=6)
Now that we have our coordinates we need to generate their ids, which are the names. But because we want each triangle to be unique, we need to create two unique versions of each name:
names <- data %>%
select(names, X1Huor, X2Hour) %>%
gather("key", "value", X1Huor, X2Hour) %>%
arrange(names, key) %>%
mutate(name = str_c(names, key)) %>%
.$name %>%
rep(each = 3)
And now we do the same with the hours:
hour <- data %>%
select(names, X1Huor, X2Hour) %>%
gather("key", "value", X1Huor, X2Hour) %>%
arrange(names, key) %>%
.$value %>%
rep(each = 3)
datapoly <- data.frame(x = x, y = y , hour = hour, names = names)
Since there are no proper labels for the plot in our datapoly df, we need to create one:
name_labels <- data %>%
select(names) %>%
arrange(names) %>%
.$names
The scene is now set for our graph:
ggplot(datapoly, aes(x = x, y = y)) +
geom_polygon(aes(group = names, fill = hour), color = "black") +
scale_fill_continuous(low = "green", high = "red") +
scale_y_continuous(breaks = 1:nrow(data), labels = name_labels) +
theme(axis.text.y = element_text(vjust = -2),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
axis.title = element_blank())
The output looks like this:
Several points to keep in mind: Is this really a plot you want to be creating and using? Is this really useful for your purposes? Perhaps other, more traditional visualization methods are more suitable. Also, I didn't bother doing the same for the other hour columns as these are quite tedious, but the method on how to achieve them should be clear enough (I hope).