Multiple individual graphs from a unique dataframe - r

I know that some subjects are about similar questions, but even using those I was not able to resolve the issue on my own. Thus, I am sorry if this subject appears as a duplicate but I am a bit stuck.
I have to draw nearly 40 graphs representing body temperature variations accross 24hours (a graph per individual of the study). To do that, I tried to write a loop using dplyrand ggplot2 packages. You may find bellow an exemple of my data. There are numerous missing values but I don't think it represents an issue regarding the current question.
structure(list(heures = structure(1:13, .Label = c("01:00:00",
"03:00:00", "05:00:00", "07:00:00", "08:00:00", "10:00:00", "12:00:00",
"13:30:00", "15:00:00", "17:00:00", "19:00:00", "21:00:00", "23:00:00"
), class = "factor"), x1= c(36.55, 36.5, 36.44444444,
36.6, 36.86666667, 37.26, 37, NA, NA, 37.3, 37.1, 37, 35.6),
x2 = c(NA, 34.5, 35.4, 36.1, NA, NA, NA, NA, NA,
NA, NA, NA, NA), x3 = c(36.9, 36.4, NA, NA, 36.9,
NA, NA, NA, NA, 37.5, 37.5, 36.9, 37.1), x4 = c(36,
35.8, NA, NA, NA, 37.4, 36.7, 36.3, NA, 37.5, 37, NA, NA)), class = "data.frame", row.names = c(NA,
-13L))
So far, I have written the following code with "indiv" being a dataframe containing the above presented data.
names <- c(colnames(indiv))
graph <- list()
test <- function(df, names) {
for (i in 1:length(df)) {
name <- names[i]
stock <- df %>%
filter(heures, !!name)
graph[[i]] <- ggplot(data=stock, aes(x=heures, y=stock[,2])) +
geom_point() +
labs(x="Hours (HH:MM:SS)",
y="Temperature",
title=colnames(stock[2]))
}
return(graph)
}
It returns an error that seems to indicate the filter function does not work properly:
Warning messages:
1: In Ops.factor(~heures, ~"x1") :
‘&’ not meaningful for factors
I can't figure out what I'm doing wrong in this. I also tried a code without the dplyr part present in the current loop, but it didn't gave me the wanted output neither.
Thank you in advance for your advises.

I have came out with this idea : tidy a little the dataset to make it easier to use with ggplot and then split it and store the splitted dataframe in a list. Then I use lapply to avoid using a loop along with a custom function to create plots.
This is not a very fast way if you have a lot of data but I use this trick a lot with small datasets.
This code creates a plot for each individual (not facets).
library(tidyverse) # all functions of these packages are not necessary here
df = structure(list(heures = structure(1:13, .Label = c("01:00:00",
"03:00:00", "05:00:00", "07:00:00", "08:00:00", "10:00:00", "12:00:00",
"13:30:00", "15:00:00", "17:00:00", "19:00:00", "21:00:00", "23:00:00"
), class = "factor"), x1= c(36.55, 36.5, 36.44444444,
36.6, 36.86666667, 37.26, 37, NA, NA, 37.3, 37.1, 37, 35.6),
x2 = c(NA, 34.5, 35.4, 36.1, NA, NA, NA, NA, NA,
NA, NA, NA, NA), x3 = c(36.9, 36.4, NA, NA, 36.9,
NA, NA, NA, NA, 37.5, 37.5, 36.9, 37.1), x4 = c(36,
35.8, NA, NA, NA, 37.4, 36.7, 36.3, NA, 37.5, 37, NA, NA)), class = "data.frame", row.names = c(NA,
-13L))
# tidy your data, good practice makes it easier to plot things with ggplot
df = df %>% pivot_longer(2:ncol(df), names_to = "individual", values_to = "temperature")
# I would do it this way:
df_list = split(df, df$individual)
plot_fun = function(df) {
title = unique(df$individual)
ggplot(df, aes(x=heures, y=temperature))+
geom_point() +
labs(title = title)
#### add here things to save your plots, store them somewhere, etc
}
lapply(df_list, FUN = plot_fun)

Using toy data as your data frame is incomplete:
df <- tibble(
X=rep(1:10, times=2),
Y=c(1:10, seq(10, 1, -1)),
Name=rep(c("Patient 1", "Patient 2"), each=10)
)
df %>% ggplot() +
geom_line(aes(x=X, y=Y)) +
facet_grid(rows=vars(Name))
Giving

Related

replace values in a data frame with an NA based on the occurrence of NAs in lookup table

I want to replaces values in a data frame in R with NA based on the occurrence of NAs in a lookup table like the below example.
lookup <- data.frame(date1=c("2018-02-21", "2019-01-14", "2019-01-14", "2019-01-14"),
date2=c("2018-08-22", "2019-01-14", "2019-01-14", NA),
date3=c("2018-10-03", "2019-01-14", NA, NA),
date4=c("2018-10-31", NA, NA, NA)
)
values <- data.frame(val1=c(22.2, 42.1, 38.2, 41.9),
val2=c(23.8, 40.5, 38.5, 39.7),
val3=c(24.2, 39.8, 40.2, NA),
val4=c(27.0,40.1, NA, NA)
)
values_new <- data.frame(val1=c(22.2, 42.1, 38.2, 41.9),
val2=c(23.8, 40.5, 38.5, NA),
val3=c(24.2, 39.8, NA, NA),
val4=c(27.0,NA, NA, NA)
)
We may use
values2 <- values * NA^(is.na(lookup))
Or use
values[is.na(lookup)] <- NA
-checking
> identical(values, values_new)
[1] TRUE

Remove NA and only fill cells containing numbers in tableGrob

I have a table (top.table) I would like to display in a ggplot, but am having issues reformatting the table. I need to format it such that all NA elements are blank, and only fill with specified colors if there is a number contained within the element. Basically, fill the colors like in the code below except the NA elements should be filled default (white), and the NA text should be removed. If the removing of the NA is not possible in the way I described, changing the text color/fill would also work for me (i.e. change text color/fill of numbers, but not NA).
top.table <- structure(c(7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 57.5, 45.5,
NA, NA, NA, 128.5, 78.5, 71.5, 49, NA, NA, NA, 1043, NA, NA,
710, 838, 1481, 737, NA, NA, 1096, 5923, 3697, NA, 1726, NA,
NA, 3545, NA, NA, 1733, 2333, NA, 3807, 1795, NA, 2761, NA, 2887,
NA, NA, 2211, 2544), .Dim = c(11L, 5L), .Dimnames = list(NULL,
c("Sample Number", "Static", "D10 FB", "D12 FB", "D14 FB"
)))
colors <- structure(list(newcolor = c("dodgerblue2", "#E31A1C", "#FDBF6F",
"palegreen2", "skyblue2", "green4", "#6A3D9A", "#FF7F00", "gold1",
"#CAB2D6", "#FB9A99")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
tt1 <- ttheme_minimal(
core = list(bg_params = list(fill = colors, col = NA))
)
g <- tableGrob(top.table, theme = tt1)
grid.draw(g)
This may seem like a very obvious solution, but why not just replace the NA with empty strings when you plot the table?
g <- tableGrob(replace(top.table, is.na(top.table), ""), theme = tt1)
grid.newpage()
grid.draw(g)
With help from #AllanCameron, the solution I came up with was to use repeat the colors to the number of columns in top.table and use replace() to convert all NA elements to "white" before calling tableGrob()
#make repeated columns of colors
table.colors <- matrix(rep(colors, each = ncol(top.table)),
ncol = ncol(top.table), byrow = TRUE)
#index matrix to fine NAs
table.ind <- is.na(top.table)
#make replacements
table.colors <- replace(table.colors, table.ind, "white")
tt1 <- ttheme_minimal(
core = list(bg_params = list(fill = table.colors))
)
g <- tableGrob(replace(top.table, is.na(top.table), ""), theme = tt1)
grid.draw(g)

Plot graph with PLOTLY

This is small example of my data set.This set contain weekly data about 52 weeks.You can see data with code below:
# CODE
#Data
ARTIFICIALDATA<-dput(structure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52), `2019 Series_1` = c(534.771929824561,
350.385964912281, 644.736842105263, 366.561403508772, 455.649122807018,
533.614035087719, 829.964912280702, 466.035087719298, 304.421052631579,
549.473684210526, 649.719298245614, 537.964912280702, 484.982456140351,
785.929824561404, 576.736842105263, 685.508771929824, 514.842105263158,
464.491228070175, 608.245614035088, 756.701754385965, 431.859649122807,
524.315789473684, 739.40350877193, 604.736842105263, 669.684210526316,
570.491228070175, 641.649122807018, 649.298245614035, 664.210526315789,
530.385964912281, 754.315789473684, 646.80701754386, 764.070175438596,
421.333333333333, 470.842105263158, 774.245614035088, 752.842105263158,
575.368421052632, 538.315789473684, 735.578947368421, 522, 862.561403508772,
496.526315789474, 710.631578947368, 584.456140350877, 843.19298245614,
563.473684210526, 568.456140350877, 625.368421052632, 768.912280701754,
679.824561403509, 642.526315789474), `2020 Series_1` = c(294.350877192983,
239.824561403509, 709.614035087719, 569.824561403509, 489.438596491228,
561.964912280702, 808.456140350877, 545.157894736842, 589.649122807018,
500.877192982456, 584.421052631579, 524.771929824561, 367.438596491228,
275.228070175439, 166.736842105263, 58.2456140350878, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA)), row.names = c(NA, -52L), class = c("tbl_df", "tbl",
"data.frame")))
# CODE WITH PLOTLY
library(tidyverse)
library(plotly)
library(reshape2)
library(ggplot2)
library(dplyr)
ARTIFICIALDATA_rec <- ARTIFICIALDATA %>%
gather(key = Year_indicator, value = time_series_value, -1)
ARTIFICIALDATA_rec$color <- factor(ARTIFICIALDATA_rec$Year_indicator, labels = c("royalblue", "orange"))
Chart <- plot_ly(ARTIFICIALDATA_rec, x = ~week , y = ~time_series_value,
type = 'bar',
marker = list(color = ~color), name = ~Year_indicator) %>%
layout(title = "TEST",yaxis = list(title = 'Millions EUR '), barmode = 'stack')
Chart<-ggplotly(Chart)
Chart
So next steep is plot this data with plotly. So you can see how my plot look like below:
But my intention is to make plot like plot below.I plot in Excel but defently i need this plot with plotly.Most important thing is to compare only data which is same.For example data for 2020 contain data about 16 weeks and compratation must be with the same period of 2019. So can anybody help me about this problem and plot this plot with plotly ?
You need to add a trace for each time series you want to plot and specify barmode in the layout of your `plotly plot. No additional data manipulation seems necessary to get what you want:
CODE
dat <- as.data.table(ARTIFICIALDATA)
colnames(dat) <- c('week', 'series1', 'series2')
plt <- plot_ly(dat) %>%
add_trace(x = ~week, y = ~series1, type = 'bar', name = '2019 Series 1') %>%
add_trace(x = ~week, y = ~series2, type = 'bar', name = '2020 Series 1') %>%
layout(
xaxis = list(title = 'week'),
yaxis = list(title = ''),
barmode = 'group'
)
the data.table part is not necessary - I did that purely to get simpler column names and because I prefer data.table for subsetting etc.
OUTPUT
The above code returns the below plot:
You can subset your data to include only weeks for which both series have data to get the graph in your post.
plt <- plot_ly(dat[!is.na(series2)]) %>%
...
Optionally, you can move the legend to the bottom by specifying the legend in layout - makes it nicer to read in my opinion:
layout(
...
legend = list(orientation = 'h')
)
This gives you:

Trouble trying to clean a character vector in R data frame (UTF-8 encoding issue)

I'm having some issues cleaning up a dataset after I manually extracted the data online - I'm guessing these are encoding issues. I have an issue trying to remove the "U+00A0" in the "Athlete" column cels along with the operator brackets. I looked up the corresponding UTF-8 code and it's for "No-Break-Space". I'm also not sure how to replace the other UTF-8 characters to make the names legible - for e.g. getting U+008A to display as Š.
Subset of data
head2007decathlon <- structure(list(Rank = 1:6, Athlete = c("<U+00A0>Roman <U+008A>ebrle<U+00A0>(CZE)", "<U+00A0>Maurice Smith<U+00A0>(JAM)", "<U+00A0>Dmitriy Karpov<U+00A0>(KAZ)", "<U+00A0>Aleksey Drozdov<U+00A0>(RUS)", "<U+00A0>Andr<e9> Niklaus<U+00A0>(GER)", "<U+00A0>Aleksey Sysoyev<U+00A0>(RUS)"), Total = c(8676L, 8644L, 8586L, 8475L, 8371L, 8357L), `100m` = c(11.04, 10.62, 10.7, 10.97, 11.12, 10.8), LJ = c(7.56, 7.5, 7.19, 7.25, 7.42, 7.01), SP = c(15.92, 17.32, 16.08, 16.49, 14.12, 16.16), HJ = c(2.12, 1.97, 2.06, 2.12, 2.06, 2.03), `400m` = c(48.8, 47.48, 47.44, 50, 49.4, 48.42), `110mh` = c(14.33, 13.91, 14.03, 14.76, 14.51, 14.59), DT = c(48.75, 52.36, 48.95, 48.62, 44.48, 49.76), PV = c(4.8, 4.8, 5, 5, 5.3, 4.9), JT = c(71.18, 53.61, 59.84, 65.51, 63.28, 57.75), `1500m` = c(275.32, 273.52, 279.68, 276.93, 272.5, 276.16), Year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2007", class = "factor"), Nationality = c(NA, NA, NA, NA, NA, NA)), .Names = c("Rank", "Athlete", "Total", "100m", "LJ", "SP", "HJ", "400m", "110mh", "DT", "PV", "JT", "1500m", "Year", "Nationality"), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
This is what I've tried so far to no success:
1) head2007decathlon$Athlete <- gsub(pattern="\U00A0",replacement="",x=head2007decathlon$Athlete)
2) head2007decathlon$Athlete <- gsub(pattern="<U00A0>",replacement="",x=head2007decathlon$Athlete)
3) head2007decathlon$Athlete <- iconv(head2007decathlon$Athlete, from="UTF-8", to="LATIN1")
4) Encoding(head2007decathlon$Athlete) <- "UTF-8"
5) head2007decathlon$Athlete<- enc2utf8(head2007decathlon$Athlete)
The following would remove the no break space.
head2007decathlon$Athlete <- gsub(pattern="<U\\+00A0>",replacement="",x=head2007decathlon$Athlete)
Not sure how to convert the other characters. One problem could be that the codes are not exactly in a format that R sees as UTF-8.
One example:
iconv('\u008A', from="UTF-8", to="LATIN1")
this seems to have an effect, contrary to trying to convert U+008A. Although
the output is:
[1] "\x8a"
not the character you want. Hope this helps somehow.

Calculate percentage to total using rowPercents

I am trying to calculate a percentage to total for, lets say, the following reproducible example:
structure(c(197.95, 197.95, 197.95, 186.8, 190.51, 195.16, 199.81,
202.59, 202.59, 202.59, 92.28, 92.28, 90.07, 89.82, 87.36, 87.61,
90.56, 89.82, 90.07, 89.82, 20.43, 20.43, 20.43, 20.43, 20.43,
20.43, 20.43, 20.43, 20.43, 20.64, 24.7, 24.95, 24.54, 23.97,
23.97, 24.38, 24.38, 24.38, 24.54, 24.54, 37.4, 37.4, 37.4, 35.43,
35.43, 35.43, 35.43, 35.43, 35.43, 39.37, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16.05,
16.05, 16.05, 16.05, 15.62, 15.62, 16.05, 15.62, 15.62, 15.62,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), index = structure(c(470620800,
470880000, 470966400, 471052800, 471139200, 471225600, 471484800,
471571200, 471657600, 471744000), tzone = "UTC", tclass = "Date"), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", class = c("xts",
"zoo"), .Dim = c(10L, 9L), .Dimnames = list(NULL, c("AVON", "BA.",
"CMRG", "COB", "MGGT", "QQ.", "RR.", "SNR", "ULE")))
I need to return the same presentation of my data but each value is a percentage of the total of the row it belongs to. I did a lot of research and tried prop.table which returns a subscript error and finally I used rowPercents which is part of RcmdrMisc package. However, I could not find how to let it ignore the NA in my data set.
In the example provides there are two whole columns of NA. I can not drop them as the whole data set has some values for the subsequent rows.
Note the the class of my example is zoo and xts
You don't need any external packages for this.
dat.percent <- dat / rowSums(dat, na.rm = T) * 100
Check that it works:
> all(abs(rowSums(dat.percent, na.rm = T) - 100) < 0.0001)
[1] TRUE
prop.table does not seem to work with xts/zoo objects but this works:
library(xts)
prop.table(coredata(x), 1)
It returns all NAs which is correct since there is an NA in each row (and it is impossible to calculate the proportions without knowing every value). If you want to regard the NA values as zero then:
prop.table( na.fill(coredata(x), 0), 1)

Resources