Related
I have some hex colours:
hex_col <- hcl(h = c(0, 120, 240), c = 35, l = 85)
hex_col
#> [1] "#FFC5D0" "#BBDEB1" "#B8D8F8"
How can I find their HCL representation?
hcl_col <- cbind(h = c(0, 120, 240), c = 35, l = 85)
hcl_col
#> h c l
#> [1,] 0 35 85
#> [2,] 120 35 85
#> [3,] 240 35 85
decode_colour() from the
farver package can be used to do this directly:
library(farver)
hex_col <- hcl(h = c(0, 120, 240), c = 35, l = 85)
decode_colour(hex_col, to = "hcl")
#> h c l
#> [1,] 359.3883 35.09095 84.86854
#> [2,] 120.2448 34.65261 84.98657
#> [3,] 239.3602 34.63943 85.03581
The result is not exact, but close enough.
Another question about lists. Say I have a dataframe containing several lists. Assume they're the results of an American election. They include the vote shares for Democrats, Republicans, and Third Party candidates across three simulations for three states:
list1 <- list(c(40, 44, 52))
list2 <- list(c(22, 36, 18))
list3 <- list(c(45, 37, 42))
list4 <- list(c(60, 56, 48))
list5 <- list(c(34, 52, 26))
list6 <- list(c(55, 63, 58))
list7 <- list(c(0, 0, 0))
list8 <- list(c(44, 12, 56))
list9 <- list(c(0, 0, 0))
dat <- data.frame(State = c("Iowa", "Wisconsin", "Ohio"))
dat$DemocratVoteShare <- c(list1, list2, list3)
dat$RepublicanVoteShare <- c(list4, list5, list6)
dat$ThirdPartyVoteShare <- c(list7, list8, list9)
Note that Wisconsin is the only state with a Third Party candidate.
I'm trying to evaluate when a party had the maximum vote share in a state for a given simulation. The results would look like this:
dat$Winner <- c(list(c("Republican", "Republican", "Democrat")),
list(c("Third Party", "Republican", "Third Party")),
list(c("Republican", "Republican", "Republican")))
How can I achieve this using R, and ideally functionality from the tidyverse? Thanks in advance.
Here is a simple solution
dat %>%
mutate(
Winner = Map(function(x, y, z) {
c("Democrat", "Republican", "Third Party")[max.col(cbind(x, y ,z) == pmax(x, y ,z))]
}, DemocratVoteShare, RepublicanVoteShare, ThirdPartyVoteShare)
)
Output
State DemocratVoteShare RepublicanVoteShare ThirdPartyVoteShare Winner
1 Iowa 40, 44, 52 60, 56, 48 0, 0, 0 Republican, Republican, Democrat
2 Wisconsin 22, 36, 18 34, 52, 26 44, 12, 56 Third Party, Republican, Third Party
3 Ohio 45, 37, 42 55, 63, 58 0, 0, 0 Republican, Republican, Republican
Update
As pointed out by #akrun, you can use max.col directly, and it allows you to select a ties.method.
dat %>%
mutate(
Winner = Map(function(x, y, z) {
c("Democrat", "Republican", "Third Party")[max.col(cbind(x, y ,z), "first")]
}, DemocratVoteShare, RepublicanVoteShare, ThirdPartyVoteShare)
)
The code above gives higher priority to the option "Democrat". If you want to use "Republican" instead, then swap their positions like this
dat %>%
mutate(
Winner = Map(function(x, y, z) {
c("Republican", "Democrat", "Third Party")[max.col(cbind(x, y ,z), "first")]
}, RepublicanVoteShare, DemocratVoteShare, ThirdPartyVoteShare)
)
We can use pmap
library(dplyr)
library(purrr)
dat %>%
mutate(Winner = pmap(select(cur_data(), -State),
~ c("Democrat", "Republican", "Third Party")[max.col(cbind(..1, ..2, ..3))]))
# State DemocratVoteShare RepublicanVoteShare ThirdPartyVoteShare Winner
#1 Iowa 40, 44, 52 60, 56, 48 0, 0, 0 Republican, Republican, Democrat
#2 Wisconsin 22, 36, 18 34, 52, 26 44, 12, 56 Third Party, Republican, Third Party
#3 Ohio 45, 37, 42 55, 63, 58 0, 0, 0 Republican, Republican, Republican
I think it would be better if you unnest the data.
library(dplyr)
val <- sub('VoteShare', '', names(dat[-1]))
dat %>%
tidyr::unnest(-State) %>%
mutate(Winner = val[max.col(.[-1], ties.method = 'first')])
# State DemocratVoteShare RepublicanVoteShare ThirdPartyVoteShare Winner
# <chr> <dbl> <dbl> <dbl> <chr>
#1 Iowa 40 60 0 Republican
#2 Iowa 44 56 0 Republican
#3 Iowa 52 48 0 Democrat
#4 Wisconsin 22 34 44 ThirdParty
#5 Wisconsin 36 52 12 Republican
#6 Wisconsin 18 26 56 ThirdParty
#7 Ohio 45 55 0 Republican
#8 Ohio 37 63 0 Republican
#9 Ohio 42 58 0 Republican
I am performing a splsda-model in R on 10 dataframes (data of 10 study areas), stored as a list (datalist). All these dataframes are similar, with the same variables, but just different values.
I use the micromics library to do this.
This is the head of the first study area. It compares the absence or presence of wetlands (factor variable - wetl or no wetl) depending on its value of TPI of different ranges.
> head(datalist[[1]])
OID POINTID WETLAND TPI200 TPI350 TPI500 TPI700 TPI900 TPI1000 TPI2000 TPI3000 TPI4000 TPI5000 TPI2500
1 -1 1 no wetl 70 67 55 50 48 46 53 47 49 63 48
2 -1 2 no wetl 37 42 35 29 32 16 17 35 49 63 26
3 -1 3 no wetl 45 55 45 39 41 41 53 47 49 63 48
4 -1 4 no wetl 46 58 51 43 46 36 54 47 49 62 49
5 -1 5 no wetl 58 55 53 49 47 46 54 47 49 62 49
6 -1 6 no wetl 56 53 51 49 46 46 54 47 49 61 49
I have done the cross validation step using following code:
library(mixOmics)
for (i in 1: length(model_list))
{
myperf_plsda <- perf(model_list[[i]], validation = "Mfold", folds = 10,
progressBar = FALSE, nrepeat = 10, auc = TRUE)
save(myperf_plsda, file="performancePLSDA.RData")
}
model_list is the list obtained from the spslda-function.
But now I am stuck in the next step, which is to look at the error rate (overall and per class)
For just one dataframe (studyarea), I can use the following code:
# cross-validation error in function of nr of PCs
# can see how many PCs is best
plot(myperf_plsda, col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal")
# error rate overall and per class
myperf_plsda$error.rate
myperf_plsda$error.rate.class
myperf_plsda$auc
So first, I am trying to plot see the error in function of the prinipal components (= plot, first code here above for one study area). The result would be something like I would like to have it in a pdf.
Second, I want to know the overall error rate and error rate per class, from which the code is mentioned above for one study area. The result for one study area is then for example:
overall error rate:
error rate per class:
I have tried some ways to all this codes in a for loop, or using lapply, in order to get these results for the 10 study areas.
, such as:
### To see how many PCs is best ###
pdf('overallerrorrate_wetlall_small.pdf')
for (i in 1:length(myperf_plsda))
{
plot(model_list[[i]], col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal")
}
dev.off()
or
for (i in 1:length(myperf_plsda))
{plot(myperf_plsda, col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal")}
or
for (i in 1:length(myperf_plsda))
{myperf_plsda[[1]]error.rate
myperf_plsda[[1]]error.rate.class
myperf_plsda[[i]]auc
}
or
lapply(myperf_plsda, [[, 'error.rate')`
But all these codes don't work! How can I run the code for multiple elements in a list? Many thanks!
Based on your outputs, you will have to create a new list and save the results on it. Using just myperf_plsda could be overwriting each step in the loop. Also most of the measures you want are lists, so I added some processing functions to reach dataframes. I used next dummy data:
library(mixOmics)
#Function
custom_splsda <- function(datalist, ncomp, keepX, ..., Xcols, Ycol){
Y <- datalist[[Ycol]]
X <- datalist[Xcols]
res <- splsda(X, Y, ncomp = ncomp, keepX = keepX, ...)
res
}
#Data
datalist <- list(df1 = structure(list(OID = c(-1, -1, -1, -1, -1, -1), POINTID = c(1,
2, 3, 4, 5, 6), WETLAND = c("no wetl", "no wetl", "no wetl",
"wetl", "wetl", "wetl"), TPI200 = c(70, 37, 45, 46, 58, 56),
TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55, 35, 45,
51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46),
TPI2000 = c(53, 17, 53, 54, 54, 54), TPI3000 = c(47, 35,
47, 47, 47, 47), TPI4000 = c(49, 49, 49, 49, 49, 49), TPI5000 = c(63,
63, 63, 62, 62, 61), TPI2500 = c(48, 26, 48, 49, 49, 49)), row.names = c(NA,
6L), class = "data.frame"), df2 = structure(list(OID = c(-1,
-1, -1, -1, -1, -1), POINTID = c(1, 2, 3, 4, 5, 6), WETLAND = c("no wetl",
"no wetl", "no wetl", "wetl", "wetl", "wetl"), TPI200 = c(70,
37, 45, 46, 58, 56), TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55,
35, 45, 51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46), TPI2000 = c(53,
17, 53, 54, 54, 54), TPI3000 = c(47, 35, 47, 47, 47, 47), TPI4000 = c(49,
49, 49, 49, 49, 49), TPI5000 = c(63, 63, 63, 62, 62, 61), TPI2500 = c(48,
26, 48, 49, 49, 49)), row.names = c(NA, 6L), class = "data.frame"))
Now the code, I will create an empty list myperf_plsda:
#Create model_list, you must have the object created
model_list <- lapply(datalist, custom_splsda,
ncomp = 2, keepX = c(5, 5),
Xcols = 4:8, Ycol = "WETLAND")
#Create empty list
myperf_plsda <- list()
#Loop for objects and saving
for (i in 1: length(model_list))
{
myperf_plsda[[i]] <- perf(model_list[[i]], validation = "Mfold", folds = 3,
progressBar = FALSE, nrepeat = 3, auc = TRUE)
object <- myperf_plsda[[i]]
save(object,file = paste0("performancePLSDA.",i,".RData"))
}
#Process the object myperf_plsda
#First function to get elements
extract1 <- function(x)
{
#Object
error.rate <- x$error.rate
error.rate <- lapply(error.rate, as.data.frame)
#Process
O1 <- do.call(rbind,error.rate)
#Separate vars
O1$id <- rownames(O1)
rownames(O1) <- NULL
O1$id1 <- gsub("\\..*","", O1$id )
O1$id2 <- gsub(".*\\.","", O1$id )
O1$id <- NULL
return(O1)
}
#Function 2
extract2 <- function(x)
{
#Object
error.rate.class <- x$error.rate.class
names(error.rate.class) <- gsub('.','_',names(error.rate.class),fixed = T)
error.rate.class <- lapply(error.rate.class, as.data.frame)
#Process
O2 <- do.call(rbind,error.rate.class)
#Separate vars
O2$id <- rownames(O2)
rownames(O2) <- NULL
O2$id1 <- gsub("\\..*","", O2$id )
O2$id2 <- gsub(".*\\.","", O2$id )
O2$id <- NULL
return(O2)
}
#Function 3
extract3 <- function(x)
{
#Object
auc <- x$auc
#Modify for dataframe
change <- function(x)
{
y <- as.data.frame(x)
y$id1 <- rownames(y)
rownames(y)<-NULL
y$id1 <- gsub('.','_',y$id1,fixed = T)
return(y)
}
auc <- lapply(auc, change)
#Process
O3 <- do.call(rbind,auc)
#Separate vars
O3$id2 <- rownames(O3)
rownames(O3) <- NULL
O3$id2 <- gsub("\\..*","", O3$id2 )
return(O3)
}
#Apply functions and save in lists for late process
L1 <- lapply(myperf_plsda,extract1)
L2 <- lapply(myperf_plsda,extract2)
L3 <- lapply(myperf_plsda,extract3)
#Assign the same names from model_list
names(L1) <- names(model_list)
names(L2) <- names(model_list)
names(L3) <- names(model_list)
#Bind the data
#Error rate
error.rate.df <- do.call(rbind,L1)
error.rate.df$genid <- gsub("\\..*","", rownames(error.rate.df) )
rownames(error.rate.df) <- NULL
#Error rate class
error.rate.class.df <- do.call(rbind,L2)
error.rate.class.df$genid <- gsub("\\..*","", rownames(error.rate.class.df) )
rownames(error.rate.class.df) <- NULL
#Auc
auc.df <- do.call(rbind,L3)
auc.df$genid <- gsub("\\..*","", rownames(auc.df) )
rownames(auc.df) <- NULL
With previous code you will end up with three dataframes that contains the values that are identified according to names of model_list, you can navigate by vars id1, id2 and genid to see measures, components and datasets:
error.rate.df
max.dist centroids.dist mahalanobis.dist id1 id2 genid
1 0.2222222 0.2222222 0.2222222 overall comp1 df1
2 0.2777778 0.3888889 0.2777778 overall comp2 df1
3 0.2222222 0.2222222 0.2222222 BER comp1 df1
4 0.2777778 0.3888889 0.2777778 BER comp2 df1
5 0.2222222 0.2222222 0.2222222 overall comp1 df2
6 0.2777778 0.3333333 0.2777778 overall comp2 df2
7 0.2222222 0.2222222 0.2222222 BER comp1 df2
8 0.2777778 0.3333333 0.2777778 BER comp2 df2
error.rate.class.df
comp1 comp2 id1 id2 genid
1 0.3333333 0.3333333 max_dist no wetl df1
2 0.1111111 0.2222222 max_dist wetl df1
3 0.3333333 0.6666667 centroids_dist no wetl df1
4 0.1111111 0.1111111 centroids_dist wetl df1
5 0.3333333 0.3333333 mahalanobis_dist no wetl df1
6 0.1111111 0.2222222 mahalanobis_dist wetl df1
7 0.3333333 0.3333333 max_dist no wetl df2
8 0.1111111 0.2222222 max_dist wetl df2
9 0.3333333 0.5555556 centroids_dist no wetl df2
10 0.1111111 0.1111111 centroids_dist wetl df2
11 0.3333333 0.3333333 mahalanobis_dist no wetl df2
12 0.1111111 0.2222222 mahalanobis_dist wetl df2
auc.df
x id1 id2 genid
1 0.62966667 AUC_mean comp1 df1
2 0.06414361 AUC_sd comp1 df1
3 0.81483333 AUC_mean comp2 df1
4 0.06414361 AUC_sd comp2 df1
5 0.62966667 AUC_mean comp1 df2
6 0.06414361 AUC_sd comp1 df2
7 0.77780000 AUC_mean comp2 df2
8 0.11110000 AUC_sd comp2 df2
Finally for the plots you can use next code (I have assigned the name of the dataset to x label so you can identify it into the plots):
#Plot and save
#Assign names
names(myperf_plsda) <- names(model_list)
pdf('example.pdf')
for (i in 1:length(myperf_plsda))
{
plot(myperf_plsda[[i]], col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal",xlab = paste0(names(myperf_plsda)[i],' (Comp)'))
}
dev.off()
As remark, I have changed the number of folds in order to make the code working but with your real data you could set the original values you have.
I have the following table, with ordered variables:
table <- data.frame(Ident = c("Id_01", "Id_02", "Id_03", "Id_04", "Id_05", "Id_06"),
X01 = c(NA, 18, 0, 14, 0, NA),
X02 = c(0, 16, 0, 17, 0, 53),
X03 = c(NA, 15, 20, 30, 0, 72),
X04 = c(0, 17, 0, 19, 0, NA),
X05 = c(NA, 29, 21, 23, 0, 73),
X06 = c(0, 36, 22, 19, 0, 55))
Ident X01 X02 X03 X04 X05 X06
Id_01 NA 0 NA 0 NA 0
Id_02 18 16 15 17 29 36
Id_03 0 0 20 0 21 22
Id_04 14 17 30 19 23 19
Id_05 0 0 0 0 0 0
Id_06 NA 53 72 NA 73 55
From a previous question, I have the following code provided from a user here, to search by row for one condition (1st and 2nd position > 0) and returning the position of the ocurrence (name of the variable for the specific position):
apply(table[-1], 1, function(x) {
i1 <- x > 0 & !is.na(x)
names(x)[which(i1[-1] & i1[-length(i1)])[1]]})
I'm looking to add a second condition to the apply code, so the conditions needs to be:
1st and 2nd ocurrence (consecutive) > 0
OR
1st and 3rd ocurrence > 0
Considering this change, the output of the evaluation for the table posted before should be:
For Id_01: never occurs (NA?)
For Id_02: 1st position (X01)
For Id_03: 3rd position (X03)
For Id_04: 1st position (X01)
For Id_05: never occurs (NA?)
For Id_06: 2nd position (X02)
Thanks in advance!
We can use lag and lead from dplyr
library(dplyr)
f1 <- function(x) {
i1 <- x > 0 & !is.na(x)
which((i1 & lag(i1, default = i1[1])) |
(i1 & lead(i1, n = 3, default = i1[1])))[1]
}
n1 <- apply(table[-1], 1, f1)
names(table)[-1][n1]
#[1] NA "X01" "X03" "X01" NA "X02"
Or use pmap
library(purrr)
n1 <- pmap_int(table[-1], ~ c(...) %>%
f1)
names(table)[-1][n1]
sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] WriteXLS_3.5.1 tidyr_0.2.0 scales_0.2.4 gridExtra_0.9.1
[5] ggplot2_1.0.1 RPostgreSQL_0.4 DBI_0.3.1
loaded via a namespace (and not attached):
[1] Rcpp_0.11.6 assertthat_0.1 dplyr_0.4.1 digest_0.6.8
[5] MASS_7.3-40 plyr_1.8.2 gtable_0.1.2 magrittr_1.5
[9] stringi_0.4-1 lazyeval_0.1.10 reshape2_1.4.1 proto_0.3-10
[13] tools_3.2.0 stringr_1.0.0 munsell_0.4.2 parallel_3.2.0
[17] colorspace_1.2-6
#
library(RPostgreSQL)
library(ggplot2)
library(gridExtra)
library(scales)
library(tidyr)
blue.bold.italic.16.text <- element_text(face = "bold", color = "black", size = 12)
#
Consider four machines running in parllel and producing products. And the each dataframe(l1,l2,l3,l4) below represents no of pieces per hour for each machine (actually I collect data from database using RPostgreSQL and these is sample how it looks)
l1 <- structure(list(hours = structure(c(1434081600, 1434085200, 1434088800,
1434092400, 1434096000, 1434099600, 1434103200, 1434106800, 1434110400,
1434114000, 1434117600, 1434121200, 1434124800, 1434128400, 1434132000,
1434135600, 1434139200, 1434142800, 1434146400, 1434150000, 1434153600,
1434157200, 1434160800, 1434164400), class = c("POSIXct", "POSIXt"
), tzone = ""), count = c(25, 29, 28, 32, 33, 13, 33, 29, 32,
33, 27, 34, 25, 30, 13, 24, 26, 33, 40, 34, 26, 30, 22, 30)), .Names = c("hours",
"count"), row.names = c(NA, 24L), class = "data.frame")
l2 <- structure(list(hours = structure(c(1434081600, 1434085200, 1434088800,
1434092400, 1434096000, 1434099600, 1434103200, 1434106800, 1434110400,
1434114000, 1434117600, 1434121200, 1434124800, 1434128400, 1434132000,
1434135600, 1434139200, 1434142800, 1434146400, 1434150000, 1434153600,
1434157200, 1434160800, 1434164400), class = c("POSIXct", "POSIXt"
), tzone = ""), count = c(25, 29, 28, 32, 33, 13, 33, 29, 32,
33, 27, 34, 25, 30, 13, 24, 26, 33, 40, 34, 26, 30, 22, 30)), .Names = c("hours",
"count"), row.names = c(NA, 24L), class = "data.frame")
l3 <- structure(list(hours = structure(c(1434081600, 1434085200, 1434088800,
1434092400, 1434096000, 1434099600, 1434103200, 1434106800, 1434110400,
1434114000, 1434117600, 1434121200, 1434124800, 1434128400, 1434132000,
1434135600, 1434139200, 1434142800, 1434146400, 1434150000, 1434153600,
1434157200, 1434160800, 1434164400), class = c("POSIXct", "POSIXt"
), tzone = ""), count = c(25, 29, 28, 32, 33, 13, 33, 29, 32,
33, 27, 34, 25, 30, 13, 24, 26, 33, 40, 34, 26, 30, 22, 30)), .Names = c("hours",
"count"), row.names = c(NA, 24L), class = "data.frame")
l4 <- structure(list(hours = structure(c(1434081600, 1434085200, 1434088800,
1434092400, 1434096000, 1434099600, 1434103200, 1434106800, 1434110400,
1434114000, 1434117600, 1434121200, 1434124800, 1434128400, 1434132000,
1434135600, 1434139200, 1434142800, 1434146400, 1434150000, 1434153600,
1434157200, 1434160800, 1434164400), class = c("POSIXct", "POSIXt"
), tzone = ""), count = c(25, 29, 28, 32, 33, 13, 33, 29, 32,
33, 27, 34, 25, 30, 13, 24, 26, 33, 40, 34, 26, 30, 22, 30)), .Names = c("hours",
"count"), row.names = c(NA, 24L), class = "data.frame")
#
here is my script for the attached plot(output)
df <- merge(l1,l2, by="hours")
df <- merge(df,l3, by="hours")
df <- merge(df,l4, by="hours")
colnames(df) <- c("hours","L 1","L 2","L 3","L 4")
pd <- gather(df, 'Ls', 'count', 2:5)
q <- ggplot(pd, aes(x = hours, y = count)) + geom_bar(stat = "identity") + theme(legend.position = "none")+
xlab("Time") + ylab("No.Of Pecies") +
ggtitle("my sample")+
scale_y_continuous(breaks=seq(0,45, by = 5))+
theme(axis.text = blue.bold.italic.16.text) +
scale_x_datetime(breaks=date_breaks("2 hour"),minor_breaks=date_breaks("2 hour"),labels=date_format("%H")) +
theme(axis.text.x=element_text(angle=0))+
facet_grid(~ Ls)
# when all the 4 machines are working - everything is fine, i will run the above script and i will get the rquired output.
Incase if any machine is not working and i have a dataframe with empty rows..then i will get an error while running my script file.
# df <- merge(l1,l2, by="hours")
df <- merge(df,l3, by="hours")
df <- merge(df,l4, by="hours")
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
and the next error at
pd <- gather(df, 'Ls', 'count', 2:5)
how to avoid the empty dataframes and run the script succesfully to produce the output with whatever the no of machines are operating (either it is 2 or 3 or 4)
Judging from the error message, the data.frame that causes the error has neither rows nor columns, it seems to be NULL. So the easiest way would be to check for that situation and if the data.frame is NULL, create a a dummy that can be merge()d and gather()ed.
What I would do (not saying this is the best way) is
# for easier looping, put your data.frames in a list
l <- list( l1, l2, l3, l4 )
# create a dummy that mimics the structure of your data.frames
dummy <- structure( list( hours = structure( c( Sys.time() ),
class = c( "POSIXct", "POSIXt" ), tzone = ""),
count = c(0)), .Names = c("hours", "count"),
row.names = c(NA, 1L), class = "data.frame")
# check for empty data.frames and replace with dummy (will be NA)
for( i in 1:4 ) if( length( l[[ i ]] ) == 0 ) l[[ i ]] <- dummy
# merge
for( i in 2:4 ) l[[ 1 ]] <- merge( l[[ 1 ]], l[[ i ]],
by = "hours", all = TRUE )
# remove dummy and go back to your code
df <- l[[ 1 ]][ 1:24, ]
colnames( df ) <- c( "hours","L 1","L 2","L 3","L 4" )
There is room for improvement but at least it should display the results, whether or not a machine is operating:
l2 <- NULL
One alternative would be to skip the merging all together and go right to stacking the datasets. You would just need to add the Ls column to each individual dataset first.
l1$Ls = "L 1"
l2$Ls = "L 2"
l3$Ls = "L 3"
l4$Ls = "L 4"
Then you could use, e.g., bind_rows from dplyr to make your long dataset pd.
bind_rows(l1, l2, l3, l4)
Source: local data frame [96 x 3]
hours count Ls
1 2015-06-11 21:00:00 25 L 1
2 2015-06-11 22:00:00 29 L 1
3 2015-06-11 23:00:00 28 L 1
4 2015-06-12 00:00:00 32 L 1
5 2015-06-12 01:00:00 33 L 1
6 2015-06-12 02:00:00 13 L 1
7 2015-06-12 03:00:00 33 L 1
8 2015-06-12 04:00:00 29 L 1
9 2015-06-12 05:00:00 32 L 1
10 2015-06-12 06:00:00 33 L 1
.. ... ... ...
The positive of this approach is that one of the objects you bind can be an empty data.frame or NULL and it still works.
Example empty data.frame:
l4.2 = data.frame()
bind_rows(l1, l2, l3, l4.2)
Source: local data frame [72 x 3]
hours count Ls
1 2015-06-11 21:00:00 25 L 1
2 2015-06-11 22:00:00 29 L 1
3 2015-06-11 23:00:00 28 L 1
4 2015-06-12 00:00:00 32 L 1
5 2015-06-12 01:00:00 33 L 1
6 2015-06-12 02:00:00 13 L 1
7 2015-06-12 03:00:00 33 L 1
8 2015-06-12 04:00:00 29 L 1
9 2015-06-12 05:00:00 32 L 1
10 2015-06-12 06:00:00 33 L 1
.. ... ... ...
Example NULL:
l4.3 = NULL
bind_rows(l1, l2, l3, l4.3)
Source: local data frame [72 x 3]
hours count Ls
1 2015-06-11 21:00:00 25 L 1
2 2015-06-11 22:00:00 29 L 1
3 2015-06-11 23:00:00 28 L 1
4 2015-06-12 00:00:00 32 L 1
5 2015-06-12 01:00:00 33 L 1
6 2015-06-12 02:00:00 13 L 1
7 2015-06-12 03:00:00 33 L 1
8 2015-06-12 04:00:00 29 L 1
9 2015-06-12 05:00:00 32 L 1
10 2015-06-12 06:00:00 33 L 1
.. ... ... ...