I am trying to generate a series of plots that show the same patient taking drinks and urinating at different times. Each plot represents a single day. I want to compare the days and hence I need to ensure that all graphs plotted have the same x-axis. My code is below which I cribbed from How to specify the actual x axis values to plot as x axis ticks in R
### Data Input
time_Thurs <- c("01:10", "05:50", "06:00","06:15", "06:25", "09:35", "10:00", "12:40",
"14:00", "17:20", "18:50", "19:10", "20:10", "21:00", "22:05", "22:35")
event_Thurs <- c("u", "u", "T", "T", "u", "u", "T","T","u", "u", "T", "T", "T", "T", "u", "W")
volume_Thurs <- c(NA, NA, 0.25, 0.25, NA, NA, 0.125, 0.625, NA, NA, 0.25, 0.25, 0.25, 0.25,
NA, 0.25)
total_liquids_Thurs <- sum(volume_Thurs, na.rm=TRUE)
time_Thurs <- paste("04/04/2019", time_Thurs, sep=" ")
time_Fri <- c("01:15", "06:00", "06:10", "06:25", "06:30", "07:10", "08:40", "09:20",
"12:45", "13:45")
event_Fri <- c("u","u", "T","T","u","uu","T", "u", "T", "u")
volume_Fri <- c(NA, NA, 0.25, 0.25, NA, NA, 0.125, NA, 0.625, NA)
total_liquids_Fri <- sum(volume_Fri, na.rm=TRUE)
time_Fri <- paste("05/04/2019", time_Fri, sep=" ")
### Collect all data together
event <- c(event_Thurs, event_Fri)
Volume <- c(volume_Thurs, volume_Fri)
time_log <- c(time_Thurs, time_Fri)
time_log <- strptime(time_log, format = "%d/%m/%Y %H:%M")
time_view <- format(time_log, "%H:%M")
### Put into Dataframe
patient_data <- data.frame(time_log, time_view, event, Volume)
# write.csv(patient_data, file="patient_data.csv", row.names = FALSE)
daily_plot <- function(x, day) {
# x patient data - a data.frame with four columns:
# POSIXct time, time, event and Volume
# date number of day of month
# y volume of liquid
# TotVol total volume of intake over week
# Event - drink or otherwise
x <- x[as.numeric(format(x[,1], "%d")) == day, ]
TotVol <- sum(x[,4], na.rm = TRUE)
DayOfWeek <- weekdays(x[1,1], abbreviate = FALSE)
plot(x[,1],x[,4],
xlim = c(x[1,1],x[length(x[,1]),1]),
xlab="Hours of Study", ylab = "Volume of Liquid Drank /L",
main = paste("Total Liquids Drank = ", TotVol, " L on ", DayOfWeek, "Week 1, Apr 2019"),
sub = "dashed red line = urination", pch=16,
col = c("black", "yellow", "green", "blue")[as.numeric(x[,3])],
xaxt = 'n'
)
xAxis_hrs <- seq(as.POSIXct(x[1,1]), as.POSIXct(x[length(x[,1]),1]), by="hour")
axis(1, at = xAxis_hrs, las = 2)
abline( v = c(x[x[,3] == "u",1]), lty=3, col="red")
}
When I run the function,
daily_plot(patient_data, 4)
I want to print out my x-axis, as amended in the form of hours representing the events over the 24 hour period.
When I wrap my xAxis_hrs vector in strptime(xAxis_hrs, format = "%H") the code crashes - that is the x-axis doesn't print out and I see, Error in axis(1, at = xAxis_hrs, las = 2) : (list) object cannot be coerced to type 'double' . Any help?
The issue is that you pass the labels to the wrong named argument, namely at (which should be the numeric positions of the labels). Use the following instead:
axis(1, at = xAxis_hrs, labels = strptime(xAxis_hrs, format = "%H"), las = 2)
Unfortunately this doesn’t change the fact that the axis labels don’t fit into the plot, and collide with the axis title. The former can be fixed by adjusting the plot margins. I’m not aware of a good solution for the latter, although changing the time format might help: it’s probably not necessary/helpful to print the full minutes and seconds (which are always 0). In fact, did you mean to use format instead of strptime?
Apart from that I fundamentally agree with the other answer recommending ggplot2 in the long run. It makes this kind of stuff a lot less painful.
If you're open to a ggplot solution:
library(tidyverse)
library(lubridate)
daily_ggplot <- function(df, selected_day) {
df_day <- filter(df, day(time_log) == selected_day)
df_urine <- filter(df_day, event == "u")
df_drink <- filter(df_day, event != "u")
TotVol <- sum(df_day$Volume, na.rm = TRUE)
Date <- floor_date(df_day$time_log[1], 'days')
DayOfWeek <- weekdays(Date, abbreviate = F)
plot_title <- paste0("Total drank = ", TotVol, "L on ", DayOfWeek, " Week 1, Apr 2018")
ggplot(df_drink) +
aes(time_log, Volume, color = event) +
geom_point(size = 2) +
geom_vline(data = df_urine, aes(xintercept = time_log), color = "red", linetype = 3) +
labs(x = "Hours of Study", ylab = "Volume of Liquid Drank (L)",
title = plot_title, subtitle = "lines = urination") +
theme_bw() +
scale_x_datetime(date_labels = "%H:%M", limits = c(Date, Date + days(1)))
}
daily_ggplot(patient_data, 4)
Related
My problems can easily be summarized in these two pictures:
Sequence Plots:
Sequence Frequency:
The x axis is exceeded, although the dataset ends at 2018. I really hope to find out why this is the case .
I tried limiting the time period by a year to end in 2017. No change. I am a bit of a noob though so my ideas were limited. My guess might be that it gets confused by the "NA" category. But the overall plot looks normal. It is only the cluster plots and the sequence frequency plot that goes beyond the x axis.
MyData <- read.csv2(file="e:/Dokumente (-videoedluxe)/IBP dokumente/Msc/seq/mappe1.csv", sep=";", skipNul=TRUE, stringsAsFactors=FALSE, na = "empty")
str(MyData, sep=",")
install.packages("TraMineR")
Yes
library("TraMineR")
table(data$x1989)
data.seq <-seqdef(MyData, var = 2:31, ext = TRUE, gaps="NA"
alphabet=c("GOVspec","GOVinv","GOVno","IOspec","IOinv","IOno","ICspec","ICinv","ICno","MNCspec","MNCinv","MNCno","NGOspec","NGOspec","NGOinv","NGOno","NPOspec","NPOinv","NPOno","UNIspec","UNIinv","UNIno","EDUspec","EDUinv","EDUno", NA),
states = c("GOVspec","GOVinv","GOVno","IOspec","IOinv","IOno","ICspec","ICinv","ICno","MNCspec","MNCinv","MNCno","NGOspec","NGOspec","NGOinv","NGOno","NPOspec","NPOinv","NPOno","UNIspec","UNIinv","UNIno","EDUspec","EDUinv","EDUno", NA)
cpal(data.seq) <-c("aquamarine2","aquamarine3","aquamarine4","chocolate2","chocolate3","chocolate4","cadetblue2","cadetblue3","cadetblue4","gold1","gold3","gold4","green2","green3","green4","hotpink2","hotpink3","hotpink4","orange2","orange3","orange4","purple2","purple4","rosybrown","orchid2", "white")
seqstatl(MyData)
summary(data.seq)
years = c(1989:2018)
par(mfrow = c(1, 2))
seqdplot(data.seq, with.legend = FALSE, border = NA, x = years)
seqlegend(data.seq)
cost.constant <- seqsubm(data.seq, method="CONSTANT", time.varying= T, with.miss = FALSE)
cost.trate <- seqsubm(data.seq, method="TRATE", time.varying= T, with.miss = FALSE)
seqfplot(data.seq, withlegend="FALSE")
seqmtplot(data.seq, withlegend="RIGHT", title="Mean Time",
analysis.manual <- seqdist(data.seq, method="OM", sm="TRATE", indel=1.5)
library(cluster)
analysis.manual = agnes(analysis.manual)
clusterward <- agnes(data.seq, method="ward")
plot(clusterward, which.plots = 2)
plot(analysis.trate, which.plots = 8)
## CLUSTER ANALYSIS
cluster1 = cutree(analysis.trate, 1)
cluster2 = cutree(analysis.trate, 2)
cluster3 = cutree(analysis.trate, 3)
cluster3 = cutree(analysis.trate, 4)
# Distribution plot
seqdplot(data.seq, group= cluster1, withlegend = F, border = NA, x = years)
# Index plots
seqIplot(data.seq, group= cluster1, withlegend = F, border = NA, x = years)
seqIplot(data.seq, group= cluster2, withlegend = F, border = NA, x = years)
seqIplot(data.seq, group= cluster3, withlegend = F, border = NA, x = years)
seqIplot(data.seq, group= cluster4, withlegend = F, border = NA, x = years)
I expect the plots to end when they should.
We cannot reproduce the issue because we don't have the data. However, I observe that your alphabet contains twice the state NGOspec. Removing one of them and adapting conformably the states and cpal argument should solve the issue.
I am studying patient fluid intake and frequency of urination.
I'm collecting volume and time of fluids drank and time of urination.
I want to indicate on a graph that has liquid intake when urination occurs.
Here's my data and code so far ...
time_log <- c("01:10", "05:50", "06:00","06:15", "06:25", "09:35", "10:00", "12:40",
"14:00")
time_log <- paste("04/04/2019", time_log, sep=" ")
time_log <- strptime(time_log, format = "%d/%m/%Y %H:%M")
time_view <- format(time_log, "%H:%M")
event <- c("u", "u", "T", "T", "u", "u", "T","T","u")
Volume <- c(NA, NA, 0.25, 0.25, NA, NA, 0.125, 0.625, NA)
patient_data <- data.frame(time_log, time_view, event, Volume)
total_liquids <- sum(patient_data$Volume, na.rm=TRUE)
plot(patient_data$time_log, patient_data$Volume,
xlim = c(as.POSIXct("2019-04-04 00:00:00"),as.POSIXct("2019-04-04 24:00:00")),
xlab="Hours of Study", ylab = "Volume of Liquid Drank /L",
main = paste("Total Liquids Drank = ", total_liquids, " L"))
This is related to the following question
Time Series Data - How to which was poorly received by the Stack Overflow community.
Here's a way using ggplot2 and dashed vertical lines. When adding the geom_vline, we subset the data for just the urination events (i.e., event == "u").
library(ggplot2)
ggplot(patient_data, aes(x = time_log, y = Volume)) +
geom_point() +
geom_vline(
data = subset(patient_data, event == "u"),
aes(xintercept = time_log),
linetype = 2
) +
labs(
title = paste("Total Liques Drank = ", total_liquids, " L"),
subtitle = "Dashed line reprents urination",
x = "Hours of Study",
y = "Volume of Liquid Drank (L)"
) +
scale_y_continuous(limits = c(0, NA)) # just so we don't start the y-axis at 0.1 or something misleading.
I am using twoord.plot for the first time, and I am having trouble getting the x axis set to years for a time-series data set. I have two different y-axes on different scales. Here is the code that I am working with.
#Install BatchGetSymbols
install.packages('BatchGetSymbols')
library(BatchGetSymbols)
#Get data from FRED
library(quantmod)
getSymbols('CPALTT01USM661S', src = 'FRED')
library(quantmod)
getSymbols('M2SL', src = 'FRED')
#Create data sets with equal number of observations
CPI = CPALTT01USM661S["1960-01-01/2019-01-01"]
M2 = M2SL["1960-01-01/2019-01-01"]
library(plotrix)
twoord.plot(rx = time(CPI), ry = CPI, lx = time(CPI), ly = M2,
main = "Money Supply and Prices",
xlim = NULL, lylim = NULL, rylim = NULL,
mar = c(5,4,4,4), lcol = "red", rcol = "blue", xlab = "", lytickpos = NA,
ylab = "M2", ylab.at = NA,
rytickpos = NA, rylab = "CPI", rylab.at = NA, lpch = 1,rpch = 2,
type = "l", xtickpos = NULL, xticklab = NULL,
halfwidth = 0.4, axislab.cex = 1, do.first = NULL)
Here is the graph that I am getting. Notice the x-axis is not in years.
The date values ( beginnings of each month) are in the index of the matrices, so to extract the year beginnings get every 12th item:
twoord.plot(rx=time(CPI), ry=CPI, lx=time(CPI),ly = M2, main="Money Supply and Prices",xlim=NULL,lylim=NULL,rylim=NULL,
mar=c(5,4,4,4),lcol="red",rcol="blue",xlab="",lytickpos=NA,ylab="M2",ylab.at=NA,
rytickpos=NA,rylab="CPI",rylab.at=NA,lpch=1,rpch=2,
type="l",
xtickpos=index(CPI)[seq(1,nrow(CPI), by=12)], #tick at year start
xticklab=format( index(CPI)[seq(1,nrow(CPI), by=12)], "%Y"), #just year
halfwidth=0.4, axislab.cex=1,
do.first=NULL, las=2) # not sure why las=2 didn't seem to work.
I want to build several plots from one large database, so that I have one plot for each Text (factor) and for each Measure (the many resulting measures of an eye tracking study). The following is a much simpler example of what I am trying to to:
Let's say this is my dataset
Text <- c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)
Position <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
Modified <- c(1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0)
Line_on_page <- c(1, 1, 1, 1, 2,2,2,2 ,1 ,1,1,1,2,2,2,2)
IA_FIXATION_DURATION <- c(250.3, 70.82, 400, 120.12, 270, 120.5, 100.54, 212.43, 250.3, 70.82, 320.29, 123.12, 260, 121.5, 100.54, 272.43)
IA_FIXATION_COUNT <- c(1,0,1,1,3,2,0, 1, 1,0,1,2,3,2,0, 2)
IA_LABEL <- c("she", "did", "not", "know", "what", "to", "say", "to", "she", "did", "not", "know", "what", "to", "do", "to")
testDF <- data.frame(Text , Position , Line_on_page, Modified, IA_FIXATION_DURATION, IA_FIXATION_COUNT, IA_LABEL)
so I want a heatmap (or another graph) for each Text (1/2/3), and for each measure (IA_FIXATION_DURATION/IA_FIXATION_COUNT)
# so first i create my vectors
library(stringr)
library(reshape2)
library(ggplot2)
library(ggthemes)
library(tidyverse)
Text_list <- unique(testDF$Text)
Measure_list <- testDF %>% dplyr::select_if(is.numeric) %>% colnames() %>% as.vector()
# create graphing function
Heatmap_FN <- function(testDF, na.rm = TRUE, ...){
# create for loop to produce ggplot2 graphs
for (i in seq_along(Text_list)) {
for (j in seq_along(Measure_list)) {
# create plot for each text in dataset
plots <- ggplot(subset(testDF, testDF$Text==Text_list[i])) +
geom_tile(aes(x=Position,
y=Line_on_page,
fill = Measure_list[j])) +
geom_text(aes(x=Position,
y=Line_on_page,
label=IA_LABEL),
color = "white", size = 2, family = "sans") +
scale_fill_viridis_c(option = "C", na.value = "black") +
scale_y_reverse() +
facet_grid(Page ~ Modified)+
theme(legend.position = "bottom") +
ggtitle(paste(Text_list[i],j, 'Text \n'))
ggsave(plots, file=paste(Measure_list[j], "_T", Text_list[i], ".pdf", sep = ""), height = 8.27, width = 11.69, units = c("in"))
}
}
}
Heatmap_FN(testDF)
now, I am pretty sure that the problem lies in the geom_tile "fill" part, where I would like to indicate to the function that I want to use the results variables one by one to produce the plot.
Any ideas on how to fix that?
Thanks
This is the first 10 rows of my data frame:
head(test.data,10)
# A tibble: 10 x 5
date o2.permeg co2.ppm apo o2.spike
<time> <dbl> <dbl> <dbl> <chr>
1 2015-01-01 00:00:00 -685.09 413.023 -354.1816 N
2 2015-01-01 00:02:00 -695.10 412.894 -364.8690 N
3 2015-01-01 00:04:00 -687.84 412.979 -357.1627 N
4 2015-01-01 00:06:00 -683.23 412.866 -353.1460 N
5 2015-01-01 00:08:00 -683.28 412.755 -353.7788 N
6 2015-01-01 00:10:00 -685.40 412.647 -356.4659 N
7 2015-01-01 00:12:00 -687.80 412.659 -358.8029 N
8 2015-01-01 00:14:00 -662.79 412.665 NA Y
9 2015-01-01 00:16:00 -684.17 412.762 -354.6321 N
10 2015-01-01 00:18:00 -680.37 412.720 -351.0526 N
As you can see there's a last column named o2.spike, which has characters N and Y in it. N means that the data point is not a spike, and Y means that it is a spike. In this sample, there's only 1 Y, but in the real frame, there are loads, and randomly placed.
My desire is to plot all the data points in a plot, and those marked with Y will be plotted in a different colour.
For your information, this is the current code that I am using to plot everything. The first 3 variables are plotted in red, green, and blue, and I want the "Y" rows to be plotted in as, for example, pink.
library(openair)
test.data$yr_day <- format(as.Date(test.data$date), "%Y-%m-%d")
dir.create(daily) # where "daily" is the path of the folder I want to save the plots into
for (d in unique(test.data$yr_day)) {
mypath <- file.path(daily, paste(name, d, ".png", sep = "" ))
png(filename = mypath, width = 963, height = 690)
timePlot(subset(test.data, yr_day == d),
plot.type = "p",
pollutant = c("co2.ppm", "o2.permeg", "apo"),
y.relation = "free",
date.pad = TRUE,
pch = c(19,19,19),
cex = 0.2,
xlab = paste("Time of day in hours on", d),
ylab = "CO2, O2, and APO concentrations",
name.pol = c("CO2 (ppm)", "O2 (per meg)", "APO (per meg)"),
date.breaks = 24,
date.format = "%H:%M"
)
dev.off()
}
An example plot (containing all the spikes with the same colour as the non-spike ones) is as follows:
So how do I plot the spikes in a different colour from the others? Thank you very much!
Edit:
As asked by Sebastian, I have added this (not sure how you guys will be able to extract the data from that)
dput(head(test.data,20))
structure(list(date = structure(c(1420070400, 1420070520, 1420070640,
1420070760, 1420070880, 1420071000, 1420071120, 1420071240, 1420071360,
1420071480, 1420071600, 1420071720, 1420071840, 1420071960, 1420072080,
1420072200, 1420072320, 1420072440, 1420072560, 1420072680), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), o2.permeg = c(-685.09, -695.1, -687.84,
-683.23, -683.28, -685.4, -687.8, -662.79, -684.17, -680.37,
-684.66, -686.13, -683.27, -680.77, -682.16, -692.54, NA, NA,
NA, NA), co2.ppm = c(413.023, 412.894, 412.979, 412.866, 412.755,
412.647, 412.659, 412.665, 412.762, 412.72, 412.692, 412.71,
412.757, 412.838, 412.922, 413.019, NA, NA, NA, NA), apo = c(-354.181646778043,
-364.868973747017, -357.162673031026, -353.145990453461, -353.778806682578,
-356.465871121718, -358.802863961814, NA, -354.632052505966,
-351.052577565632, -355.489594272076, -356.86508353222, -353.75830548926,
-350.833007159904, -351.781957040573, -361.652649164678, NA,
NA, NA, NA), o2.spike = c("N", "N", "N", "N", "N", "N", "N",
"Y", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"
)), .Names = c("date", "o2.permeg", "co2.ppm", "apo", "o2.spike"
), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
))
Unfortunately, without having data, it's not easy to answer the question.
A ggplot2 solution could be:
g1 <- ggplot(data=test.data, aes(x=date, y=o2.permeg, col=o2.spike)) + geom_point()
g1
Passing a column of the dataframe to parameter "col" in "aes" makes you map with different colors every different value in that column.
It creates even a legend, with names associated to different colors.
I tried this with another dataframe ("iris", contained in base R) and it worked, hope it will be helpful.
Edit:
To have side-by-side plots, you can create 3 plots with ggplot and the use the function plot_grid() provided by "cowplot" package.
library(cowplot)
g1 <- ggplot(data=test.data, aes(x=date, y=o2.permeg, col=o2.spike)) + geom_point()
g2 <- ggplot(data=test.data, aes(x=date, y=co2.ppm, col=o2.spike)) + geom_point()
g3 <- ggplot(data=test.data, aes(x=date, y=apo, col=o2.spike)) + geom_point()
plot_grid(g1, g2, g3, nrow=3, ncol=1)