How to transform average hourly data per month into time series? - r

I have a data set with peculiar granularity: each month has 24 average hourly data and time span is jan-2012 until dec-2019 (see image). I am interested in the data of column "Ws".
I try unsuccessfully transforming it in times series with this R code:
{
suppressPackageStartupMessages(library(forecast))
suppressPackageStartupMessages(library(dlm))
suppressPackageStartupMessages(library(fpp2))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(zoo))
suppressPackageStartupMessages(library(gridExtra))
}
data <- read.csv("c:/bib/test-1.csv", dec = ".", header = TRUE)
inds <- seq(as.Date("2012-01-01"), as.Date("2019-12-31"), by = "hour")
## Create a time series object
myData <- ts(data,
start = c(2012-1, as.numeric(format(inds[1], "%j"))),
frequency = 2304)
Output of 'dput(head(data, 20))' code:
structure(list(V1 = c(8.4, 8.2, 8.2, 8, 7.8, 7.5, 7.3, 7.2, 8.2,
8.8, 9.2, 9.5, 9.7, 9.9, 10, 10.1, 9.9, 9.5, 8.9, 8.6)), row.names = c(NA, 20L), class = "data.frame")
Does someone could help me with this?

Related

Difficulty in splitting my data set based on desired date frame

I want to split my data into 6 data frames for time series analysis. Example:
Time period 1; 23/3/2015 to 23/4/2015.
Time period 2; 23/3/2016 to 23/4/2016
Stuck in this stage for a while.
I tried using split by Date function in open air package like this;
n _ data <- split By Date (n _data, dates= "23/3/2015", "23/4/2015",
labels = c("March 2015", "April 2015"))
Error message; `code` Error in cut.default(as.numeric(mydata$date), breaks = c(0, as.numeric(dates), :
lengths of 'breaks' and 'labels' differ
In addition: Warning messages:
1: In cut(as.numeric(mydata$date), breaks = c(0, as.numeric(dates), :
NAs introduced by coercion
2: In sort.int(as.double(breaks)) : NAs introduced by coercion
Then I mutated my data frame then used selectbydate function:
data_1$date <- as.Date(data_1$date, format = "%d/%m/%y")
H_15 <- selectByDate(data_1, start = "23/3/2015", end = "24/4/2015")
The data frame created is empty
##dput##
structure(list(NO2 = c(10.04, 12.74, 16.95, 13.96, 12.68, 9.91,
8.48, 7.46, 7.24, 7.35), PM10 = c(28.1, 22.7, 22.3, 25.5, 21.8,
20, 15.2, 12.1, 14.2, 16.7), PM2.5 = c(24.4, 14.7, 16, 15.5,
13.4, 11.8, 7.5, 7.4, 8.3, 10.1), O3 = c(53.15, 50.24, 46.95,
51.49, 53.98, 57.08, 58.97, 61.22, 59.12, 57.78), date = c("01/01/2015",
"01/01/2015", "01/01/2015", "01/01/2015", "01/01/2015", "01/01/2015",
"01/01/2015", "01/01/2015", "01/01/2015", "01/01/2015"), time = c("00:00:00",
"01:00:00", "02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00",
"07:00:00", "08:00:00", "09:00:00")), row.names = c(NA, 10L), class = "data.frame")

Retain rows in a dataframe based on 2 columns satisfying different criteria ranges; there are 27 rows of ranges in the dataframe

Edited for better clarity:
I have a dataframe dat which I need to extract the entire row based on 2 columns ut and ctz which must satisfy their respective rows of ranges in data.frame range_criteria simultaneously; the ranges are different for ut and ctz and they must satisfy their respective ranges. If either the ut or ctz is out of the range, the entire row will be discarded.
In another words when checking to each row of criteria, dat$ut must be equal OR within range_criteria$ut_min to range_criteria$ut_max AND dat$ctz must be equal OR within range_criteria$ctz_min to range_criteria$ctz_max
I have been cracking my brain over this for for 12 hours, I must make sure each row of ut and dat is checked by every row of the respective range_criteria. I know I have to loop, but I am not sure how... please help!
dat <- data.frame(name = c("Asics", 'Tom', "Harry", "David", "Daniel", "Harri", "Davidi", "Daniely", "May", "Kelly"),
ut = c(33, 2.4, 3.2, 3.5,9.5,5.2,6.0,45, 46, 51),
ctz = c(7.3, 1, 6.0, 3.5, 5.1, 51.5, 6.6, 7, 9.1, 10.1))
range_criteria <- data.frame(ut_min = c(0.0, 0.5, 1.0, 2.0, 7.2, 9.0, 21.0),
ut_max = c(5, 10, 15, 25, 30, 35, 50),
ctz_min = c(0, 1, 2, 3.2, 4.3, 6.3, 6.9),
ctz_max = c(5, 5.5, 6.1, 6.2, 6.4 ,6.5, 7.8))
The expected outcome should be:
interest <- data.frame(name = c('Asics', 'Tom', "David", "Daniely" , "May"),
ut = c(33, 2.4, 3.5,45, 46),
ctz = c(7.3, 1, 3.5, 7, 9.1))
Thank you so much !!
Based on your description it sounds like you want the ith row of dat to satisfy both ranges specified in the ith row of range_criteria, is that correct?
If so, there's no need to loop (explicitly). R's vectorized approach makes this work pretty easily:
dat <- data.frame(name = c('Tom', "Harry", "David", "Daniel", "Harri", "Davidi", "Daniely"),
ut = c(2.4, 3.2, 3.5,9.5,5.2,6.0,45),
ctz = c(1, 6.0, 3.5, 5.1, 51.5, 6.6, 7))
rc <- data.frame(ut_min = c(0.0, 0.5, 1.0, 2.0, 7.2, 9.0, 21.0),
ut_max = c(5, 10, 15, 25, 30, 35, 50),
ctz_min = c(0, 1, 2, 3.2, 4.3, 6.3, 6.9),
ctz_max = c(5, 5.5, 6.1, 6.2, 6.4 ,6.5, 7.8))
dat[dat$ut >= rc$ut_min & dat$ut <= rc$ut_max & dat$ctz >= rc$ctz_min & dat$ctz <= rc$ctz_max,]
This also returns "Daniel" in addition to the other three names you mentioned, but looking at the data I think that's correct.
Alternately you could use a package designed for data manipulation like dplyr or data.table to do the same thing a bit more smoothly.
library(data.table)
both <- cbind(dat, rc)
setDT(both)
interest <- both[between(ut, ut_min, ut_max) & between(ctz, ctz_min, ctz_max)]
or
library(dplyr)
both <- bind_cols(dat, rc)
interest <- both %>%
filter(ut >= ut_min & ut <= ut_max & ctz >= ctz_min & ctz <= ctz_max)

Is there a way I can move my first column in this excel dataset to be the column that specifies the numbers 1 to 8 [duplicate]

This question already has answers here:
Convert the values in a column into row names in an existing data frame
(5 answers)
Closed 1 year ago.
I´d like to change the first data column named "Especies" and the other species names below it; (i.e "Strix_varia, Strix_rufipes...) and make them become the numbers 1 to 8 enclosed in red from link.
I´m working with Moran´s I and having the column "Especies" as data throws me incorrect results.
Any help will be great!
Thanks!
Heres my dput():
structure(list(Especies = c("Strix_varia", "Strix_rufipes", "Strix_occidentalis",
"Strix_aluco", "Strix_uralensis", "Strix_woodfordii", "Strix_leptogrammica",
"Strix_nebulosa"), Notas.segundo = c(2.9, 4.3, 2.9, 1.3, 1, 3,
3.1, 1.1), Notas.llamado = c(6.3, 13.5, 12.2, 5, 3, 6, 4, 9.3
), Duracion.llamado = c(2.9, 2.9, 5.3, 4, 4.5, 1.6, 1.5, 7.3),
Frecuencia.minima = c(149.4, 157.4, 167, 314.7, 75.3, 149.3,
212.2, 147.5), Frecuencia.maxima = c(518.6, 564.8, 594.3,
846.2, 394.9, 438.4, 396.8, 263.8), Ancho.banda = c(369.1,
407.3, 427.2, 531.5, 319.6, 289, 184.6, 116.3), Frecuencia.central = c(522.1,
551.8, 589.9, 844, 385.9, 429, 374.9, 255.2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -8L))
Assuming that in your table, you have one row per species and species do not repeat, simple data$Especies = seq_along(data$Especies) will do the job. I would suggest keeping the original table so you remember what code belongs to the species, such as with data$id = seq_along(data$Especies).
data = structure(list(Especies = c("Strix_varia", "Strix_rufipes", "Strix_occidentalis",
"Strix_aluco", "Strix_uralensis", "Strix_woodfordii", "Strix_leptogrammica",
"Strix_nebulosa"), Notas.segundo = c(2.9, 4.3, 2.9, 1.3, 1, 3,
3.1, 1.1), Notas.llamado = c(6.3, 13.5, 12.2, 5, 3, 6, 4, 9.3
), Duracion.llamado = c(2.9, 2.9, 5.3, 4, 4.5, 1.6, 1.5, 7.3),
Frecuencia.minima = c(149.4, 157.4, 167, 314.7, 75.3, 149.3,
212.2, 147.5), Frecuencia.maxima = c(518.6, 564.8, 594.3,
846.2, 394.9, 438.4, 396.8, 263.8), Ancho.banda = c(369.1,
407.3, 427.2, 531.5, 319.6, 289, 184.6, 116.3), Frecuencia.central = c(522.1,
551.8, 589.9, 844, 385.9, 429, 374.9, 255.2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -8L))
# If you don't want to overwrite your data:
data$id = seq_along(data$Especies)
# If you are OK with overwriting your data
data$Especies = seq_along(data$Especies)
If the species names are not unique and can repeat, this means that two species will have a different id. If that is not what you want, you can use factor():
data$id = as.numeric(factor(data$Especies))
Alternatively, you can create an encoding of your own by creating a named vector and use it to translate species names to id:
names = unique(data$Especies)
coding = seq_along(names)
names(coding) = names
data$id = coding[data$Especies]

Vertical gradient color with geom_area [duplicate]

This question already has answers here:
How to make gradient color filled timeseries plot in R
(4 answers)
Closed 5 years ago.
I have hard time finding a solution for creating gradient color.
This is how it should look like(dont mind the blue bars)
Something similar to How to make gradient color filled timeseries plot in R, but a bit to advanced for me to reuse this example. I dont have any negative values and max is 80.I have tried the answer offered by nograpes, my PC was frozen for some 6-7 min and then I got message:
Error in rowSums(na) :
'Calloc' could not allocate memory (172440001 of 16 bytes)
This is only a subset of data with 841 rows (some containing NAs), and solution in previous answer could hardly work for me.
df <- structure(list(date = structure(c(1497178800, 1497182400, 1497186000,
1497189600, 1497193200, 1497196800, 1497200400, 1497204000, 1497207600,
1497211200, 1497214800, 1497218400, 1497222000, 1497225600, 1497229200,
1497232800, 1497236400, 1497240000, 1497243600, 1497247200, 1497250800,
1497254400, 1497258000, 1497261600, 1497265200, 1497268800, 1497272400,
1497276000, 1497279600, 1497283200, 1497286800, 1497290400, 1497294000,
1497297600, 1497301200, 1497304800, 1497308400, 1497312000, 1497315600,
1497319200, 1497322800, 1497326400, 1497330000, 1497333600, 1497337200,
1497340800, 1497344400, 1497348000, 1497351600, 1497355200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), dk_infpressure = c(22, 21.6, 21.2,
20.9, 20.5, 20.1, 19.8, 19.4, 19, 18.6, 18.2, 17.9, 17.5, 17.1,
16.8, 16.4, 16, 15.6, 15.2, 14.9, 14.5, 14.1, 13.8, 13.4, 13,
12.5, 11.9, 11.4, 10.8, 10.3, 9.8, 9.2, 8.7, 8.1, 7.6, 7, 6.5,
6, 5.4, 4.9, 4.3, 3.8, 3.2, 2.7, 2.2, 1.6, 1.1, 0.5, 0, 0)), .Names = c("date",
"dk_infpressure"), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
Code to get basic plot:
ggplot()+
geom_area(data=df, aes(x = date, y= dk_infpressure ) )+
scale_y_continuous(limits = c(0, 80))
Because geom_area can't take a gradient fill, it's a somewhat hard problem.
Here's a decidedly hacky but possibly sufficient option that makes a raster (but using geom_tile since x and y sizes differ) and covering the ragged edges with cropping and ggforce::geom_link (a version of geom_segment that can plot a gradient):
library(tidyverse)
df %>%
mutate(dk_infpressure = map(dk_infpressure, ~seq(0, .x, .05))) %>% # make grid of points
unnest() %>%
ggplot(aes(date, dk_infpressure, fill = dk_infpressure)) +
geom_tile(width = 3600, height = 0.05) +
# hide square tops
ggforce::geom_link(aes(color = dk_infpressure, xend = lag(date), yend = lag(dk_infpressure)),
data = df, size = 2.5, show.legend = FALSE) +
scale_x_datetime(expand = c(0, 0)) + # hide overplotting of line
scale_y_continuous(expand = c(0, 0))

Wrap Axis Labels in Correlation Matrix

I'm attempting to use the ggcorr() function within library(GGally) to create a correlation matrix. The package is working as it is supposed to, but I'm running into an issue where I would like to edit how the axis labels appear on the plot.
Currently, they will automatically add a _ or . to separate names with spaces or other characters between them. Ideally, I would like to create a line break (\n) between spaces in names so that long names and short names can be easily read and don't extend much further beyond the appropriate column and row.
I have found solutions that others have used on SO, including using str_wrap(), but it was within a ggplot() call, not this specific package. I have inspected the R code for the package, but couldn't find where to edit these labels specifically. Whenever I attempt to edit X or Y axis text, it adds an entirely new axis and set of labels.
I currently dcast() a data frame into the resulting data frame and even when I gsub() "\n" into the player names column, they get lost in the dcast() transition.
Here is an example of what I am working with. I would like to be able to automatically create line breaks between first and last name of the labels.
library(GGally)
library(ggplot2)
test <- structure(list(Date = structure(c(17100, 17102, 17103, 17106,
17107), class = "Date"), `Alexis Ajinca` = c(1.2, NA, 9.2, 6.4,
NA), `Anthony Davis` = c(95.7, 76.9, 29, 67, 24.9), `Buddy Hield` = c(9.7,
4.7, 17, 8, 28.3), `Cheick Diallo` = c(NA, NA, 3.2, NA, NA),
`Dante Cunningham` = c(0.5, 27.6, 14, 13.5, -1), `E'Twaun Moore` = c(19.2,
16.1, 22, 20.5, 10.1), `Lance Stephenson` = c(16.1, 31.6,
8, 8.1, 34.8), `Langston Galloway` = c(10.9, 2, 13.8, 2.2,
29.4), `Omer Asik` = c(4.7, 6.6, 9.9, 15.9, 14.2), `Solomon Hill` = c(4.7,
13.2, 12.8, 35.2, 4.4), `Terrence Jones` = c(17.1, 12.4,
9.8, NA, 20.8), `Tim Frazier` = c(40.5, 40.2, 18.3, 44.1,
7.2)), .Names = c("Date", "Alexis Ajinca", "Anthony Davis",
"Buddy Hield", "Cheick Diallo", "Dante Cunningham", "E'Twaun Moore",
"Lance Stephenson", "Langston Galloway", "Omer Asik", "Solomon Hill",
"Terrence Jones", "Tim Frazier"), row.names = c(NA, -5L), class = "data.frame")
ggc <- ggcorr(test[,-1], method = c("pairwise","pearson"),
hjust = .85, size = 3,
layout.exp=2)
ggc
Thank you for any and all help and please, let me know if you have any questions or need any clarification!
A couple of approaches
You can edit the object returned by ggcorr
g = ggplot_build(ggc)
g$data[[2]]$label = gsub("_", "\n", g$data[[2]]$label )
grid::grid.draw(ggplot_gtable(g))
Or you can create a new data frame and add the labels manually using geom_text. This probably gives a bit more control over the text justification and placement.
# I dont see how to suppress the labels so just set the size to zero
ggc <- ggcorr(test[,-1], method = c("pairwise","pearson"),
hjust = .85,
size = 0, # set this to zero
layout.exp=2)
# Create labels and plot
dat <- data.frame(x = seq(test[-1]), y = seq(test[-1]),
lbs = gsub(" ", "\n", names(test[-1]) ))
ggc + geom_text(data=dat, aes(x, y, label=lbs), nudge_x = 2, hjust=1)

Resources