I am trying to create a clustered bar chart and am trying to use the gather function to get the correct bars to group together. When I do this, I get the error that the gather function could not be found. I have dplyr and magrittr installed. Any thoughts on how to make this work or if there is a better way to create the bar chart grouped by test and date?
Plaster <- Plaster_2019_Data %>%
gather("pH", "Temperature", "Surface", -Date)
Data:
Surface pH Temperature Date
12.08 8.56 11.16 5/13/2019
11.68 8.90 8.76 5/29/2019
8.69 9.07 14.65 6/10/2019
2.26 7.49 17.51 6/24/2019
4.54 7.77 23.82 7/8/2019
2.13 8.17 25.29 8/5/2019
6.34 8.62 26.50 8/19/2019
9.33 9.03 24.31 9/4/2019
10.98 8.58 21.02 9/16/2019
9.59 8.61 17.33 9/30/2019
16.07 8.70 10.39 10/14/2019
9.12 8.07 6.38 11/14/2019
We can use require to install and load the package tidyr as gather is from tidyr
require('tidyr')
As mentioned by #akrun, you need tidyr. Moreover, the function pivot_longer is dedicated to replace in a near future the function gather (https://tidyr.tidyverse.org/reference/gather.html).
Moreover, as the range of values between your test is quite different, I would suggest to use facet_wrap to make a nice plot.
Altogether, you can write something like that:
df$Date = as.Date(df$Date, format = "%m/%d/%Y")
library(tidyr)
library(ggplot2)
library(dplyr)
df %>% pivot_longer(., -Date, names_to = "Test", values_to = "value") %>%
ggplot(aes(x = Date, y = value, fill = Test))+
geom_bar(stat = "identity", position = position_dodge())+
facet_wrap(.~Test, scales = "free") +
scale_x_date(date_labels = "%b %d",
date_breaks = "2 weeks")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Data
structure(list(Surface = c(12.08, 11.68, 8.69, 2.26, 4.54, 2.13,
6.34, 9.33, 10.98, 9.59, 16.07, 9.12), pH = c(8.56, 8.9, 9.07,
7.49, 7.77, 8.17, 8.62, 9.03, 8.58, 8.61, 8.7, 8.07), Temperature = c(11.16,
8.76, 14.65, 17.51, 23.82, 25.29, 26.5, 24.31, 21.02, 17.33,
10.39, 6.38), Date = structure(c(18029, 18045, 18057, 18071,
18085, 18113, 18127, 18143, 18155, 18169, 18183, 18214), class = "Date")), row.names = c(NA,
-12L), class = "data.frame")
I had this problem I was hoping someone could help me with. I have a data set which shows the prices of multiple goods (each a column) on a daily basis for some years. I've run a regression like below for part of my data frame, and then created predicted values for the rest of the time period I have. My predicted values are the log changes in price for pet. For clarification, I have all of the actual values for price of pet already, however I am just trying to predict them using this method.
lin <- lm(diff(log(pet)) ~ diff(log(bron)) + diff(log(yen)) +
diff(yal) - 1, data = codData[1:634,])
predictions <- (predict(lin, newdata = codData[635:1025,]))
My problem now is that I want to get the actual predicted value of the price of pet, which I would normally do by multiplying the first predicted log change + 1 by the first price of pet which I want to predict, which would get me the first predicted value of the price of pet. I would then multiply the second predicted log change + 1 by that newly predicted value of pet, and so on and so forth. I'm not sure how I can do this in R though. Does anyone have any ideas?
Thanks ahead of time!
Code to get sample data
codData <- structure(list(date = structure(c(1306800000, 1306886400, 1306972800,
1307059200, 1307318400, 1307404800, 1307491200, 1307577600, 1307664000,
1307923200, 1308009600, 1308096000, 1308182400, 1308268800, 1308528000,
1308614400, 1308700800, 1308787200, 1308873600, 1309132800, 1309219200,
1309305600, 1309392000, 1309478400, 1309824000, 1309910400, 1309996800,
1310083200, 1310342400, 1310428800, 1310515200, 1310601600, 1310688000,
1310947200, 1311033600, 1311120000, 1311206400, 1311292800, 1311552000,
1311638400, 1311724800, 1311811200, 1311897600, 1312156800, 1312243200,
1312329600, 1312416000, 1312502400, 1312761600, 1312848000, 1312934400,
1313020800, 1313107200, 1313366400, 1313452800, 1313539200, 1313625600,
1313712000, 1313971200, 1314057600, 1314144000, 1314230400, 1314316800,
1314576000, 1314662400, 1314748800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), yal = c(3.05, 2.96, 3.04, 2.99, 3.01, 3.01,
2.98, 3.01, 2.99, 3, 3.11, 2.98, 2.93, 2.94, 2.97, 2.99, 3.01,
2.93, 2.88, 2.95, 3.05, 3.14, 3.18, 3.22, 3.16, 3.12, 3.17, 3.03,
2.94, 2.92, 2.92, 2.98, 2.94, 2.94, 2.91, 2.96, 3.03, 2.99, 3.03,
2.99, 3.01, 2.98, 2.82, 2.77, 2.66, 2.64, 2.47, 2.58, 2.4, 2.2,
2.17, 2.34, 2.24, 2.29, 2.23, 2.17, 2.08, 2.07, 2.1, 2.15, 2.29,
2.23, 2.19, 2.28, 2.19, 2.23), sp = c(1345.2, 1314.55, 1312.9399,
1300.16, 1286.17, 1284.9399, 1279.5601, 1289, 1270.98, 1271.83,
1287.87, 1265.42, 1267.64, 1271.5, 1278.36, 1295.52, 1287.14,
1283.5, 1268.45, 1280.1, 1296.67, 1307.41, 1320.64, 1339.67,
1337.88, 1339.22, 1353.22, 1343.8, 1319.49, 1313.64, 1317.72,
1308.87, 1316.14, 1305.4399, 1326.73, 1325.84, 1343.8, 1345.02,
1337.4301, 1331.9399, 1304.89, 1300.67, 1292.28, 1286.9399, 1254.05,
1260.34, 1200.0699, 1199.38, 1119.46, 1172.53, 1120.76, 1172.64,
1178.8101, 1204.49, 1192.76, 1193.89, 1140.65, 1123.53, 1123.8199,
1162.35, 1177.6, 1159.27, 1176.8, 1210.08, 1212.92, 1218.89),
pet = c(102.63, 100.13, 100.54, 100.49, 98.85, 98.98, 100.93,
101.71, 99.02, 96.98, 99.17, 95.29, 94.96, 92.96, 93.25,
93.4, 94.59, 91.75, 91.25, 90.81, 92.89, 94.93, 94.92, 94.7,
96.8, 96.64, 98.49, 96.31, 95.05, 96.77, 97.89, 95.73, 97.32,
96, 97.7, 98.14, 99.25, 99.82, 99.13, 99.44, 97.31, 97.13,
95.92, 95.33, 93.25, 91.93, 86.44, 87.07, 80.74, 81.12, 81.55,
85.46, 85.25, 87.89, 86.93, 87.45, 81.58, 82.63, 84.12, 86.12,
85.17, 84.94, 85.42, 87.45, 88.76, 88.91), bron = c(419.25,
409.5, 409.7, 412.4, 412.25, 414.65, 411.25, 410.5, 404.45,
403.38, 415.85, 411.63, 412.3, 410.05, 407.7, 408.35, 405.85,
406.58, 408.45, 407.2, 409.85, 421.8, 426.45, 430.25, 432.95,
432.4, 442.15, 439.08, 434.5, 438.52, 438.52, 437.95, 440.73,
440.55, 446.45, 442.42, 437.92, 440.2, 440.33, 447.3, 443.15,
447.3, 448.3, 441, 438.3, 433.65, 421.4, 412.35, 393.05,
403.55, 389.5, 404.1, 399.5, 403.67, 399.25, 404, 394.13,
396.85, 393.98, 401.25, 401.27, 409.17, 409.8, 409.5, 414.7,
418.2), yen = c(929.87, 932.16, 927.79, 922.76, 925.77, 921.77,
925.73, 926.87, 934, 929.98, 928.28, 939.99, 939.99, 934.44,
934.93, 929.78, 932.43, 936.68, 940.12, 938.95, 935.56, 930.47,
927.23, 925.86, 929.43, 932.42, 930.49, 931.15, 939.64, 938.86,
929.71, 930.59, 929.31, 931.59, 929.23, 925.3, 919.2, 919.95,
918.83, 912.58, 917.17, 919.02, 915.52, 918.61, 920.61, 918.09,
932.46, 926.3, 931.17, 921.45, 931.42, 929.27, 929.41, 922.31,
923.17, 920.27, 926.05, 924.52, 926.53, 923.23, 926.24, 929.12,
923.74, 922.74, 924.79, 925.04)), row.names = c(NA, -66L), class = c("tbl_df",
"tbl", "data.frame"))
Data picture
I recommend you read into these post to get familiar with the fable package https://www.mitchelloharawild.com/blog/fable/
library(tidyverse)
library(lubridate)
library(tsibble)
library(fable)
df_example <- codData %>%
mutate(simple_date = as_date(date)) %>%
select(-date) %>%
as_tsibble(index = simple_date) %>%
tsibble::fill_gaps() %>%
tidyr::fill(yal:yen)
fit <- df_example %>%
filter(simple_date < yearmonth("2011 08")) %>%
model(linear_reg = TSLM(log(pet) ~ log(bron) + log(yen) + log(yal)))
forecasts_result <- fit %>% forecast(df_example)
forecasts_result %>%
filter(simple_date >= yearmonth("2011 08")) %>%
autoplot(df_example)
forecasts_result %>%
accuracy(df_example)
#> # A tibble: 1 x 9
#> .model .type ME RMSE MAE MPE MAPE MASE ACF1
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 linear_reg Test -3.34 6.30 4.70 -3.97 5.34 NaN 0.947
Another option is using the VAR model
var_fit <- df_example %>%
filter(simple_date < yearmonth("2011 08")) %>%
model(VAR_MODEL =VAR(vars(yal,pet,bron,yen) ~ AR(7)))
forecast_result_var <- var_fit %>%
forecast(h = 31)
forecast_result_var %>%
autoplot(df_example)
forecast_result_var %>%
accuracy(df_example)
#> # A tibble: 4 x 10
#> .model .response .type ME RMSE MAE MPE MAPE MASE ACF1
#> <chr> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 VAR_MODEL yal Test -0.448 0.477 0.450 -20.2 20.2 4.53 0.713
#> 2 VAR_MODEL pet Test -10.7 11.3 10.7 -12.6 12.6 4.33 0.639
#> 3 VAR_MODEL bron Test -32.0 34.7 32.0 -7.95 7.95 4.75 0.746
#> 4 VAR_MODEL yen Test 41.8 45.3 41.8 4.52 4.52 6.19 0.827
Created on 2020-01-05 by the reprex package (v0.3.0)
I am a newbie working with streamflow duration curves and the function fdc.
I am working with more than 300 series and I am interested in saving the low quartile threshold Qlow.thr value that appears in the plot generated:
Here is the reproducible example:
dat <- c(13.05, 90.29, 5.68, 49.13, 26.39, 15.06, 23.39, 17.98, 4.21, 2.51, 38.29, 8.57, 2.48 , 3.78, 18.09 ,15.16, 13.46, 8.69, 6.85, 11.97, 12.10, 9.87 ,21.89, 2.60 ,2.40, 27.40, 4.94, 83.17 ,12.10, 5.08 ,12.42, 6.19 ,3.60 ,32.58, 53.69, 38.49,3.61, 14.84, 34.48, 1.91, 21.79, 31.53, 6.70, 9.52, 22.64, 1.80 , 8.13, 10.60, 12.73, 4.17, 6.70 ,16.45)
fdc(dat,plot = T,lQ.thr=0.8,ylab='Hm3',main='Upstream monthly duration curve',thr.shw=TRUE)
The fdc function returns a vector of probabilities, but I am not sure how to convert these probabilities to the original units and select the 80% percentile value expressed in Hm3 as I would do with pnorm, for example, in case of working with normal probabilities.
Thank you so much.
You can construct the FDC yourself by
dat <- c(13.05, 90.29, 5.68, 49.13, 26.39, 15.06, 23.39, 17.98,
4.21, 2.51, 38.29, 8.57, 2.48 , 3.78, 18.09 ,15.16,
13.46, 8.69, 6.85, 11.97, 12.10, 9.87 ,21.89, 2.60,
2.40, 27.40, 4.94, 83.17 ,12.10, 5.08 ,12.42, 6.19,
3.60 ,32.58, 53.69, 38.49,3.61, 14.84, 34.48, 1.91,
21.79, 31.53, 6.70, 9.52, 22.64, 1.80 , 8.13, 10.60,
12.73, 4.17, 6.70 ,16.45)
dat <- sort(dat, decreasing = T)
df <- data.frame(x = 100/length(dat) * 1:length(dat), y = dat)
plot(x = df$x, y = df$y, type = "l", log = "y")
So the sorted flow data is simply plotted against the percentage exceedance scale. This scale is created by dividing 100% by the number of data points which gives us the increment for each point.
Therefore
quantile(dat, p = c(0.2, 0.8), type = 1)
gives you your desired results.
Notice that the computation of the quantile differs in fdc. It seems like they just use
p <- c(0.8, 0.2)
dat[round(p * length(dat))]
> [1] 4.21 27.40
to compute the values.
reproducible example for my data:
df_1 <- data.frame(cbind("Thriving" = c(2.33, 4.21, 6.37, 5.28, 4.87, 3.92, 4.16, 5.53), "Satisfaction" = c(3.45, 4.53, 6.01, 3.87, 2.92, 4.50, 5.89, 4.72), "Wellbeing" = c(2.82, 3.45, 5.23, 3.93, 6.18, 4.22, 3.68, 4.74), "id" = c(1:8)))
As you can see, it includes three variables of psychological measures and one identifier with an id for each respondent.
Now, my aim is to create a 2D-grid with which I can have a nice overview of all the values for all respondents concerning each of the variables. So on the x-axis I would have the id of all the respondents and on the y-axis all variables, whereas the colour of the particular field depends on the value - 1 to 3 in red, 3 to 5 in yellow and 5 to 7 in green The style of the grid should be like this image.
All I have achieved so far is the following code which compresses all the variables/items into one column so they can together be portrayed on the y-axis - the id is of course included in its own column as are the values:
df_1 %>%
select("Thr" = Thriving, "Stf" = Satisfaction, "Wb" = Wellbeing, "id" = id) %>%
na.omit %>%
gather(key = "variable", value = "value", -id) %>%
I am looking for a solution that works without storing the data in a new frame.
Also, I am looking for a solution that would be useful for even 100 or more respondents and up to about 40 variables. It would not matter if one rectangle would then be very small, I just want to have a nice colour play which would give a nice taste of where an organisation may be achieving low or high - and how it is achieving in general.
Thanks for reading, very grateful for any help!
There is probably a better graphics oriented approach, but you can do this with base plot and by treating your data as a raster:
library(raster)
df_1 <- cbind("Thriving" = c(2.33, 4.21, 6.37, 5.28, 4.87, 3.92, 4.16, 5.53), "Satisfaction" = c(3.45, 4.53, 6.01, 3.87, 2.92, 4.50, 5.89, 4.72), "Wellbeing" = c(2.82, 3.45, 5.23, 3.93, 6.18, 4.22, 3.68, 4.74), "id" = c(1:8))
r <- raster(ncol=nrow(df_1), nrow=3, xmn=0, xmx=8, ymn=0, ymx=3)
values(r) <- as.vector(as.matrix(df_1[,1:3]))
plot(r, axes=F, box=F, asp=NA)
axis(1, at=seq(-0.5, 8.5, 1), 0:9)
axis(2, at=seq(-0.5, 3.5, 1), c("", colnames(df_1)), las=1)