I'm trying to visualize missing observations in a time series dataset using R.
I have a dataset of the following format with temperature values on 5 different sensors over 15 minute time periods.
quarter_hour_readings <- structure(list(sensor_id = c(
"5a", "5b", "5c", "6a", "6b", "5a",
"5b", "5c", "6a", "6b"
), date_time = structure(c(
1573805700,
1573805700, 1573805700, 1573805700, 1573805700, 1573806600, 1573806600,
1573806600, 1573806600, 1573806600
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(
14.4, 21.8, 15, 19.2, 10, 14.7, 21.1,
15.8, 18.5, 10.4
)), class = "data.frame", row.names = c(NA, -10L))
I tried the following code to plot a tile plot of observations of each sensor over time, but an empty plot appears with no error message.
library(ggplot2)
library(dplyr)
quarter_hour_readings %>%
ggplot(aes(x = date_time,
y = sensor_id,
fill = value)) +
geom_tile()
The following is a similar plot I visualized using Tableau for the same data, where each bar is a sensor and the x-axis represents continuous-time.
Related
structure(list(Position = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Date = structure(c(1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400), tzone = "UTC", class = c("POSIXct", "POSIXt")), Time
= structure(c(-2209017523, -2209017518, -2209017513, -2209017508, -2209017503, -2209017498, -2209017493, -2209017488, -2209017483, -2209017478), tzone = "UTC", class = c("POSIXct", "POSIXt")), DateTime = structure(c(1685808077, 1685808082, 1685808087,
1685808092, 1685808097, 1685808102, 1685808107, 1685808112, 1685808117, 1685808122), tzone = "UTC", class = c("POSIXct", "POSIXt")), Temperatuur = c(21.2, 21.2, 21.6, 21.7, 22, 22.2, 20.1, 20.2, 20.3, 20.3), Treatment = c("Tempex", "Tempex", "Tempex",
"Tempex", "Tempex", "Tempex", "Tempex", "Tempex", "Tempex", "Tempex")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
My R code:
time.start=as.POSIXct("2023-06-03, 16:00",format="%H:%M")
time.end=as.POSIXct("2023-07-03, 07:30",format="%H:%M")
ggplot(logger, aes(DateTime, Temperatuur, color = Treatment))+
geom_line(size = 1)+
scale_x_datetime(limits = c(time.start, time.end),
breaks = date_breaks("24 hours"),
labels = date_format("%H:%M"))
I just cannot figure out why I am getting this: plot
Can someone please help me?
I tried to follow advice online, but that got me nowhere. Changing from 24 to 12 hours also didn't help. Maybe it's a problem in my excel file, but that all seems alright.
If we remove the commas from the time.* strings and fix the errant format=, it should work.
time.start <- as.POSIXct("2023-06-03 16:00")
time.end <- as.POSIXct("2023-07-03 07:30")
ggplot(logger, aes(DateTime, Temperatuur, color = Treatment)) +
geom_line(size = 1) +
scale_x_datetime(
limits = c(time.start, time.end),
breaks = date_breaks("24 hours"),
labels = date_format("%H:%M"))
I think perhaps you are confounding the use of labels=date_format("%H:%M") as what you think time.start/time.end should be (or how they should look), which is incorrect. Since the x-axis is (I'm inferring) a POSIXt object, then the time.* objects also must be the same object. You don't need to make them "look" (i.e., "%H:%M") the same as how you want the axis labels rendered, it is handled automatically by ggplot.
I've been doing these awful graphs with R with a very basic code below
mydata %>%
mutate(week = week(date)) %>%
ggplot(aes(x = week))+
geom_freqpoly()
In the data there are recorded events, in the standard date format, in all four weeks of a month. But as you can see in the picture, the graph dives to the bottom in between of the weeks making it look awful. So how to make the graph go from one point to the other without this dive?
To reconstruct the data frame
structure(list(ID = c(82, 23, 81, 76, 56, 17, 11, 50, 69, 84),
pvm = structure(c(1295395200, 1295222400, 1295395200, 1295654400,
1294272000, 1294272000, 1293926400, 1294185600, 1294012800,
1295222400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
Not sure if this is something you're looking for but you can use geom_point and geom_line to produce a 'better graph'. I'm not sure what the data is meant to show and why you're using geom_freqpoly
Data <- structure(list(ID = c(82, 23, 81, 76, 56, 17, 11, 50, 69, 84),
pvm = structure(c(1295395200, 1295222400, 1295395200, 1295654400,
1294272000, 1294272000, 1293926400, 1294185600, 1294012800,
1295222400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))
ggplot(Data, aes(x=pvm, y=ID), group = 1)+
geom_point()+
geom_line()
Geom_point and Line graoh
New to this answering questions game but let me know if this isn't what you're looking for.
I'm having trouble getting my plot to display dates (ie. 23/01) instead of weekday names (ie. Thu). My dataset consists of dates and measurements of bat activity. I've set the 'Dates' column of my data as as.Date in the format "%d.%m.%y" and whenever I plot my graph I get weekday names instead of dates.
My code looks like this:
rdate<-as.Date(df,"%d.%m.%Y")
plot(df$Afromontane)
My plot ends up looking like this (below). It's all fine except I'd like the weekday names to be dates in the format (d/m).
df looks like this:
structure(list(Date = c("23.01.20", "24.01.20", "25.01.20", "26.01.20",
"27.01.20", "28.01.20", "29.01.20"), Afromontane = c(13.67, 0,
0, 1.67, 3.67, 22, 3.33), Milkwood = c(8.33, 3.67, 8, 8.33, 4.33,
6.33, 1)), row.names = c(NA, -7L), class = c("tbl_df", "tbl",
"data.frame"))
A minimal example using ggplot2:
library(ggplot2)
df = data.frame(date = sample(seq(as.Date('2001/01/01'), as.Date('2003/01/01'), by="day"), 10), x = runif(10, 1, 10))
df$shortdate <- format(df$date, format="%m-%d")
ggplot(df, aes(x = shortdate, y = x)) +
geom_point()
Alternatively, using base R:
df = data.frame(date = sample(seq(as.Date('2001/01/01'), as.Date('2003/01/01'), by="day"), 10), x = runif(10, 1, 10))
plot(as.Date(df$date), df$x,xaxt = "n", type = "p")
axis(1, df$date, format(df$date, "%m-%d"))
I have a long format dataset with each row being another measurement (as indicated by my "timeline.compressed" variable, which has 8 possible values; see dput below).
However, now I want to check the descriptive statistics of some of my variables (i.e., x1-x3) but for each of the timepoints seperately. I've tried using the if function, but that gives me the warning that the condition has >1 in length.
Does anyone perhaps know what code I should use to be able to get summary statistics for each of the timepoints seperately?
dput for table with possible timeline values:
structure(c(7518L, 6178L, 6393L, 5886L, 6121L, 5977L, 7440L,
5886L), .Dim = 8L, .Dimnames = structure(list(c("5", "16", "28",
"40", "52", "64", "79", "95")), .Names = ""), class = "table")
dput for example dataset
structure(list(nomem_encr = c(800009L, 800009L, 800012L, 800015L,
800015L, 800015L), timeline.compressed = c(79, 95, 79, 28, 40,
52), sel = c(4.9, NA, NA, 6.9, 6.7, NA), close_num = c(1, 0.2,
1, 0.8, 1, 0.8), gener_sat = c(7, 7, 8, 7, 7, 5)), .Names = c("ID",
"timeline.compressed", "x1", "x2", "x3"), row.names = c(NA,
6L), class = "data.frame")
Using dplyr you can do, e.g. with timeline_values being your frequency table and df your data
data.frame(timeline.compressed = as.numeric(names(timeline_values))) %>%
left_join(df) %>%
group_by(timeline.compressed) %>%
summarize_all(mean, na.rm = TRUE)
I have a data frame like this:
dput(head(y,20))
structure(list(DATETIME = structure(c(1369540800, 1369541700,
1369542600, 1369543500, 1369544400, 1369545300, 1369546200, 1369547100,
1369548000, 1369548900, 1369549800, 1369550700, 1369551600, 1369552500,
1369553400, 1369554300, 1369555200, 1369556100, 1369557000, 1369557900
), class = c("POSIXct", "POSIXt"), tzone = ""), CPU = c(14.84,
13.6333333333333, 14.7666666666667, 13.5333333333333, 17.8666666666667,
15.9333333333333, 14.2333333333333, 13.3, 10.8333333333333, 9.76666666666667,
8.93333333333333, 9.43333333333333, 10.2, 6.63333333333333, 13,
14.3, 15.3666666666667, 16.6666666666667, 17.8666666666667, 14.7
)), .Names = c("DATETIME", "CPU"), row.names = c(NA, 20L), class = "data.frame")
I would like to convert this data frame to json format as below:
library(RJSONIO)
data<-toJSON(y)
cat(data, file="data.json"
when I look at the data.json file, I only the see DATETIME HEADER, not the CPU. What am I doing wrong here?
[{"DATETIME":[1369540800,1369541700,1369542600,1369543500,1369544400