aggregate() puts multiple output columns in a matrix instead

aggregate() puts multiple output columns in a matrix instead - r

I am to compute multiple quantiles for a certain variable:
> res1 <- aggregate(airquality$Wind, list(airquality$Month), function (x) quantile(x, c(0.9, 0.95, 0.975)))
> head(res1)
Group.1 x.90% x.95% x.97.5%
1 5 16.6000 17.5000 18.8250
2 6 14.9000 15.5600 17.3650
3 7 14.3000 14.6000 14.9000
4 8 12.6000 14.0500 14.6000
5 9 14.9600 15.5000 15.8025
The result looks good at first, but aggregate actually returns it in a very strange form, where the last 3 columns are not columns of a data.frame, but a single matrix!
> names(res1)
[1] "Group.1" "x"
> dim(res1)
[1] 5 2
> class(res1[,2])
[1] "matrix"
This causes a lot of problems in further processing.
Few questions:
Why is aggregate() behaving so strange?
Is there any way to
persuade it to make the result I expect?
Or am I perhaps using a
wrong function for this purpose? Is there any other prefered way to
get the wanted result?
Of course I could do some transformation of the output of aggregate(), but I look for some more simple and straightforward solution.

Q1: Why is the behavior so strange?
This is actually a documented behavior at ?aggregate (though it may still be unexpected). The relevant argument to look at would be simplify.
If simplify is set to FALSE, aggregate would produce a list instead in a case like this.
res2 <- aggregate(airquality$Wind, list(airquality$Month), function (x)
quantile(x, c(0.9, 0.95, 0.975)), simplify = FALSE)
str(res2)
# 'data.frame': 5 obs. of 2 variables:
# $ Group.1: int 5 6 7 8 9
# $ x :List of 5
# ..$ 1 : Named num 16.6 17.5 18.8
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 32 : Named num 14.9 15.6 17.4
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 62 : Named num 14.3 14.6 14.9
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 93 : Named num 12.6 14.1 14.6
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 124: Named num 15 15.5 15.8
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
Now, both a matrix and a list as columns may seem to be strange behavior, but I presume it's more of a case of "status by design" rather than a "bug" or a "flaw".
For instance, consider the following: We want to aggregate both the "Wind" and the "Temp" columns from the "airquality" dataset, and we know that each aggregation would result in multiple columns (like we would expect with quantile).
res3 <- aggregate(cbind(Wind, Temp) ~ Month, airquality,
function (x) quantile(x, c(0.9, 0.95, 0.975)))
res3
# Month Wind.90% Wind.95% Wind.97.5% Temp.90% Temp.95% Temp.97.5%
# 1 5 16.6000 17.5000 18.8250 74.000 77.500 79.500
# 2 6 14.9000 15.5600 17.3650 87.300 91.100 92.275
# 3 7 14.3000 14.6000 14.9000 89.000 91.500 92.000
# 4 8 12.6000 14.0500 14.6000 94.000 95.000 96.250
# 5 9 14.9600 15.5000 15.8025 91.100 92.550 93.000
In some ways, keeping these values as matrix-columns might make sense--the data aggregated data are easily accessible by their original column names:
res3$Temp
# 90% 95% 97.5%
# [1,] 74.0 77.50 79.500
# [2,] 87.3 91.10 92.275
# [3,] 89.0 91.50 92.000
# [4,] 94.0 95.00 96.250
# [5,] 91.1 92.55 93.000
Q2: How do you get the results as separate columns in a data.frame?
But a list as a column is just as awkward to deal with as a matrix as a column in many cases. If you want to "flatten" your matrix into columns, use do.call(data.frame, ...):
do.call(data.frame, res1)
# Group.1 x.90. x.95. x.97.5.
# 1 5 16.60 17.50 18.8250
# 2 6 14.90 15.56 17.3650
# 3 7 14.30 14.60 14.9000
# 4 8 12.60 14.05 14.6000
# 5 9 14.96 15.50 15.8025
str(.Last.value)
# 'data.frame': 5 obs. of 4 variables:
# $ Group.1: int 5 6 7 8 9
# $ x.90. : num 16.6 14.9 14.3 12.6 15
# $ x.95. : num 17.5 15.6 14.6 14.1 15.5
# $ x.97.5.: num 18.8 17.4 14.9 14.6 15.8a
Q3: Are there other alternatives?
As with most things R, yes of course. My preferred alternative would be to use the "data.table" package, with which you can do:
library(data.table)
as.data.table(airquality)[, as.list(quantile(Wind, c(.9, .95, .975))),
by = Month]
# Month 90% 95% 97.5%
# 1: 5 16.60 17.50 18.8250
# 2: 6 14.90 15.56 17.3650
# 3: 7 14.30 14.60 14.9000
# 4: 8 12.60 14.05 14.6000
# 5: 9 14.96 15.50 15.8025
str(.Last.value)
# Classes ‘data.table’ and 'data.frame': 5 obs. of 4 variables:
# $ Month: int 5 6 7 8 9
# $ 90% : num 16.6 14.9 14.3 12.6 15
# $ 95% : num 17.5 15.6 14.6 14.1 15.5
# $ 97.5%: num 18.8 17.4 14.9 14.6 15.8
# - attr(*, ".internal.selfref")=<externalptr>

Related

Converting the structure of input data in R

Below is the structures of my dataframe:
'data.frame': 213 obs. of 2 variables:
$ up_entrez: Factor w/ 143 levels "101739","108077",...: 3 94 125 103 3 34 3 37 134 13 ...
$ Ratio : num 3.1 3.37 1.8 1.21 6.92 ....
and I want to convert it to something like this for the function to take it as an input:
Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ....
- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ....
How do I do that?

We can use setNames to create a named vector
v1 <- with(df1, setNames(Ratio, up_entrez))

Appending/Growing List Elements Separately in r

I have a loop with an embedded function inside, which creates a list with the same elements but different length each time. I want the created list of elements to grow or merge in each loop. Here is a very simplified visualization of it:
# Code Body-------------------------------------------------------------
desiredList <- list()
for (i in 1:3){
# "existingList" has three dataframes of A, B embedded
# A is a dataframe with x1 and x2 columns
# B is a vector
# A, B values are calculated using "somefunction" &
# in each loop their lenght differ
existingList <- somefunction(variable[i])
# "desiredList" should also have three dataframes of A, B
# In each loop, gererated Ai, Bi append to A, B elements of "desiredList"
desiredList <- append(desiredList, (existingList))
}
# existingList in each loop--------------------------------------------
# i=1................................
A:'data.frame': 3 obs. of 2 variables:
..$ x1: num [1:3] 1 2 3
..$ x2: num [1:3] 13 26 39
B:'data.frame': 1 obs. of 1 variables:
..$ b: num [1:1] 2.6
# i=2................................
A:'data.frame': 2 obs. of 2 variables:
..$ x1: num [1:2] 4 5
..$ x2: num [1:2] 52 65
B:'data.frame': 3 obs. of 1 variables:
..$ b: num [1:3] 5.2 7.8 10.4
# i=3................................
A:'data.frame': 5 obs. of 2 variables:
..$ x1: num [1:5] 6 7 8 9 10
..$ x2: num [1:5] 78 91 104 117 130
B:'data.frame': 2 obs. of 1 variables:
..$ b: num [1:2] 13 15.6
# desiredList at the end of the loop
A:'data.frame': 10 obs. of 2 variables:
..$ x1: num [1:10] 1 2 3 4 5 6 7 8 9 10
..$ x2: num [1:10] 13 26 39 52 65 78 91 104 117 130
B:'data.frame': 6 obs. of 1 variables:
..$ b: num [1:6] 2.6 5.2 7.8 10.4 13 15.6
I have tried "append", "lapply", "Map", and bunches of other functions. However, none gives the correct answer.

Your desired output requires "row-binding" the dataframes, as in:
library(tibble)
a1 <- tribble(
~A1, ~B1, ~C1,
1, 13 , 2.6,
2, 26 , 5.2,
3, 39 , 7.8,
7, 91 , 18.2,
8, 104, 20.8
)
a2 <- tribble(
~A2, ~B2,~C2,
4, 52, 10.4,
5, 65, 13,
6, 78, 15.6
)
a3 <- tribble(
~A3, ~B3, ~C3,
9, 117, 23.4,
10, 130, 26
)
out <- lapply(
X = list(a1, a2, a3),
FUN = function(x) `names<-`(x, substr(names(x), 1,1))
)
do.call("rbind", out)
#> # A tibble: 10 x 3
#> A B C
#> <dbl> <dbl> <dbl>
#> 1 1 13 2.6
#> 2 2 26 5.2
#> 3 3 39 7.8
#> 4 7 91 18.2
#> 5 8 104 20.8
#> 6 4 52 10.4
#> 7 5 65 13
#> 8 6 78 15.6
#> 9 9 117 23.4
#> 10 10 130 26
dplyr::bind_rows(out)
#> # A tibble: 10 x 3
#> A B C
#> <dbl> <dbl> <dbl>
#> 1 1 13 2.6
#> 2 2 26 5.2
#> 3 3 39 7.8
#> 4 7 91 18.2
#> 5 8 104 20.8
#> 6 4 52 10.4
#> 7 5 65 13
#> 8 6 78 15.6
#> 9 9 117 23.4
#> 10 10 130 26
Created on 2020-07-21 by the reprex package (v0.3.0)
If you have control of the data frame names and can avoid appending numbers to them, you can row bind them more easily. dplyr::bind_rows() is also an easy way to bind them.

One way to solve this is by using the "Map" function. The only problem with Map is that it can not merge an empty list with a list containing elements. This can be solved by applying an if-else statement. When the loop is running for the first time, the empty list is set to the existing list and for the remaining loops, the "Map" function is applied to update the desired list.
# Code Body-------------------------------------------------------------
desiredList <- list()
for (i in 1:3){
# "existingList" generated using an external function
existingList <- somefunction(variable[i])
# "desiredList" generation
if (i == 1){ # define the list in first loop
desiredList <- existingList
} else { # append the list
desiredList <- Map(rbind, desiredList, existingList)
}
}

Why is distance matrix (dist()) giving empty values for data sets having more than ~50 observations?

I have a data set for which I'm calculating its distance matrix. Below is the data, which has 251 observations.
> str(mydata)
'data.frame': 251 obs. of 7 variables:
$ BodyFat: num 12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
$ Weight : num 154 173 154 185 184 ...
$ Chest : num 93.1 93.6 95.8 101.8 97.3 ...
$ Abdomen: num 85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
$ Hip : num 94.5 98.7 99.2 101.2 101.9 ...
$ Thigh : num 59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
$ Biceps : num 32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
I normalize the data.
means = apply(mydata,2,mean)
sds = apply(mydata,2,sd)
nor = scale(mydata,center=means,scale=sds)
When i calculate the distance matrix, I can see lot of empty values and moreover distance is measured only from 4 observations.
distance =dist(nor)
> str(distance)
'dist' num [1:31375] 1.33 2.09 1.9 3.08 3.99 ...
- attr(*, "Size")= int 251
- attr(*, "Labels")= chr [1:251] "1" "2" "3" "4" ...
- attr(*, "Diag")= logi FALSE
- attr(*, "Upper")= logi FALSE
- attr(*, "method")= chr "euclidean"
- attr(*, "call")= language dist(x = nor)
> distance # o/p omitted from this post as it has 257 observations.
1 2 3 4 5 6 7
2 1.3346445
3 2.0854437 2.5474796
4 1.8993458 1.4908813 2.5840752
5 3.0790252 3.4485667 2.2165366 2.7021809
8 9 10 11 12 13 14
2
3
4
5
15 16 17 18 19 20 21
This list goes on empty for the remaining 247 comparisons.
Now, I reduce the data set to 20 observations
Here I get a proper distance matrix.
distancetiny=dist(nor)
> str(distancetiny)
'dist' num [1:1176] 1.14 1.8 1.61 2.62 3.39 ...
- attr(*, "Size")= int 49
- attr(*, "Labels")= chr [1:49] "1" "2" "3" "4" ...
- attr(*, "Diag")= logi FALSE
- attr(*, "Upper")= logi FALSE
- attr(*, "method")= chr "euclidean"
- attr(*, "call")= language dist(x = nor)
> distancetiny
1 2 3 4 5 6 7
2 1.1380433
3 1.7990293 2.2088928
4 1.6064118 1.2871522 2.2483586
5 2.6235853 2.9669283 1.9132224 2.3256624
6 3.3898119 3.3730508 3.3718447 2.2615557 2.0094434
7 1.8947704 2.0065514 1.7685604 1.1065940 1.7387938 2.2321156
8 1.1732465 1.0663217 1.6733689 0.8873140 2.1959298 2.7939555 1.1448269
9 2.2721969 2.0545882 3.4263262 1.4058375 3.1811955 2.4011074 2.3078714
10 2.3753110 2.2424464 3.0289947 1.2808398 2.3230202 1.4242653 1.8571654
11 1.5620472 1.1878554 2.5750350 0.5718248 2.7714795 2.6314286 1.5132365
12 3.5088571 3.2484020 4.1164488 2.2723772 3.1377318 1.4795230 2.8274818
13 2.1448841 2.2679705 1.8726670 1.3494988 1.2176727 1.5544030 1.0725518
14 3.6679035 3.7459402 3.6869023 2.6677308 2.1318420 0.7347359 2.5729973
15 2.9908457 3.3312661 3.1289870 2.4340473 1.8027070 1.3626019 2.3795360
16 1.6117570 2.0283356 1.2011116 1.5961064 1.3196981 2.4456436 1.2569683
17 3.2991393 3.5991747 3.0438049 2.6066933 1.4742664 1.0945621 2.2214101
18 3.9409008 4.0726826 4.0113908 2.9250144 2.5228901 0.9087254 2.8158563
19 2.7468511 2.9495031 3.2439229 1.8312508 2.4122436 1.3932604 1.9640170
20 3.7515064 3.7021743 3.9404231 2.5813440 2.5390519 0.8352961 2.6530503
21 2.3102053 2.3878491 2.0836800 1.4328028 1.2991221 1.5287862 1.1769205
There is no empty values in the output when the observation is 21.
Why is this so? Does the dist() do not work when the observation count goes beyond a threshold ?
I'm unable to figure it out. Please help.

This seems to be a size issue. When the dataset contains more than 60-80 observations, the distance matrix is unable to be displayed properly (even for the initial rows). Looks like the values are present in it perfectly alright, and just that we cannot see them as it is.
Further operation on the distance matrix (like Hierarchical agglomerative clustering ) proved that nothing to worried about it's weird display.

Error in ncol(xj) : object 'xj' not found when using R matplot()

Using matplot, I'm trying to plot the 2nd, 3rd and 4th columns of airquality data.frame after dividing these 3 columns by the first column of airquality.
However I'm getting an error
Error in ncol(xj) : object 'xj' not found
Why are we getting this error? The code below will reproduce this problem.
attach(airquality)
airquality[2:4] <- apply(airquality[2:4], 2, function(x) x /airquality[1])
matplot(x= airquality[,1], y= as.matrix(airquality[-1]))

You have managed to mangle your data in an interesting way. Starting with airquality before you mess with it. (And please don't attach() - it's unnecessary and sometimes dangerous/confusing.)
str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
After you do
airquality[2:4] <- apply(airquality[2:4], 2,
function(x) x /airquality[1])
you get
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R:'data.frame': 153 obs. of 1 variable:
..$ Ozone: num 4.63 3.28 12.42 17.39 NA ...
$ Wind :'data.frame': 153 obs. of 1 variable:
..$ Ozone: num 0.18 0.222 1.05 0.639 NA ...
$ Temp :'data.frame': 153 obs. of 1 variable:
..$ Ozone: num 1.63 2 6.17 3.44 NA ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
or
sapply(airquality,class)
## Ozone Solar.R Wind Temp Month Day
## "integer" "data.frame" "data.frame" "data.frame" "integer" "integer"
that is, you have data frames embedded within your data frame!
rm(airquality) ## clean up
Now change one character and divide by the column airquality[,1] rather than airquality[1] (divide by a vector, not a list of length one ...)
airquality[,2:4] <- apply(airquality[,2:4], 2,
function(x) x/airquality[,1])
matplot(x= airquality[,1], y= as.matrix(airquality[,-1]))
In general it's safer to use [, ...] indexing rather than [] indexing to refer to columns of a data frame unless you really know what you're doing ...

Split and unsplit a dataframe in four parts

I'd like to split a dataframe in 4 equals parts, because I'd like to use the 4 cores of my computer.
I did this :
df2 <- split(df, 1:4)
unsplit(df2, f=1:4)
and that
df2 <- split(df, 1:4)
unsplit(df2, f=c('1','2','3','4')
But the unsplit function did not work, I have these warnings messages
1: In split.default(seq_along(x), f, drop = drop, ...) :
data length is not a multiple of split variable
...
Do you have an idea of the reason ?

How many rows in df? You will get that warning if the number of rows in your table is not divisible by 4. I think you are using the split factor f incorrectly, unless what you want to do is put each subsequent row into a different split data.frame.
If you really want to split your data into 4 dataframes. one row after the other then make your splitting factor the same size as the number of rows in your dataframe using rep_len like this:
## Split like this:
split(df , f = rep_len(1:4, nrow(df) ) )
## Unsplit like this:
unsplit( split(df , f = rep_len(1:4, nrow(df) ) ) , f = rep_len(1:4,nrow(df) ) )
Hopefully this example illustrates why the error occurs and how to avoid it (i.e. use a proper splitting factor!).
## Want to split our data.frame into two halves, but rows not divisible by 2
df <- data.frame( x = runif(5) )
df
## Splitting still works but...
## We get a warning because the split factor 'f' was not recycled as a multiple of it's length
split( df , f = 1:2 )
#$`1`
# x
#1 0.6970968
#3 0.5614762
#5 0.5910995
#$`2`
# x
#2 0.6206521
#4 0.1798006
Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable
## Instead let's use the same split levels (1:2)...
## but make it equal to the length of the rows in the table:
splt <- rep_len( 1:2 , nrow(df) )
splt
#[1] 1 2 1 2 1
## Split works, and f is not recycled because there are
## the same number of values in 'f' as rows in the table
split( df , f = splt )
#$`1`
# x
#1 0.6970968
#3 0.5614762
#5 0.5910995
#$`2`
# x
#2 0.6206521
#4 0.1798006
## And unsplitting then works as expected and reconstructs our original data.frame
unsplit( split( df , f = splt ) , f = splt )
# x
#1 0.6970968
#2 0.6206521
#3 0.5614762
#4 0.1798006
#5 0.5910995

In the R language 'split' example . . .
aq <- airquality
g <- aq$Month
l <- split(aq,g)
After the 'scale' function is executed
l <- lapply(l, transform, Ozone = scale(Ozone))
I am guessing that at one time in R history
the function 'scale' did not add extra attributes
to the column it is modifying.
..$ Ozone : num ...
.. ..- attr(*, "scaled:center")= num 29.4
.. ..- attr(*, "scaled:scale")= num 18.2
As seen in here . . .
> str(l)
List of 5
$ 5:'data.frame': 31 obs. of 6 variables:
..$ Ozone : num [1:31, 1] 0.782 0.557 -0.523 -0.253 NA ...
.. ..- attr(*, "scaled:center")= num 23.6
.. ..- attr(*, "scaled:scale")= num 22.2
..$ Solar.R: int [1:31] 190 118 149 313 NA NA 299 99 19 194 ...
..$ Wind : num [1:31] 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
..$ Temp : int [1:31] 67 72 74 62 56 66 65 59 61 69 ...
..$ Month : int [1:31] 5 5 5 5 5 5 5 5 5 5 ...
..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
$ 6:'data.frame': 30 obs. of 6 variables:
..$ Ozone : num [1:30, 1] NA NA NA NA NA ...
.. ..- attr(*, "scaled:center")= num 29.4
.. ..- attr(*, "scaled:scale")= num 18.2
..$ Solar.R: int [1:30] 286 287 242 186 220 264 127 273 291 323 ...
..$ Wind : num [1:30] 8.6 9.7 16.1 9.2 8.6 14.3 9.7 6.9 13.8 11.5 ...
..$ Temp : int [1:30] 78 74 67 84 85 79 82 87 90 87 ...
..$ Month : int [1:30] 6 6 6 6 6 6 6 6 6 6 ...
..$ Day : int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
$ 7:'data.frame': 31 obs. of 6 variables:
..$ Ozone : num [1:31, 1] 2.399 -0.32 -0.857 NA 0.154 ...
.. ..- attr(*, "scaled:center")= num 59.1
.. ..- attr(*, "scaled:scale")= num 31.6
..$ Solar.R: int [1:31] 269 248 236 101 175 314 276 267 272 175 ...
..$ Wind : num [1:31] 4.1 9.2 9.2 10.9 4.6 10.9 5.1 6.3 5.7 7.4 ...
..$ Temp : int [1:31] 84 85 81 84 83 83 88 92 92 89 ...
..$ Month : int [1:31] 7 7 7 7 7 7 7 7 7 7 ...
..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
$ 8:'data.frame': 31 obs. of 6 variables:
..$ Ozone : num [1:31, 1] -0.528 -1.284 -1.108 0.455 -0.629 ...
.. ..- attr(*, "scaled:center")= num 60
.. ..- attr(*, "scaled:scale")= num 39.7
..$ Solar.R: int [1:31] 83 24 77 NA NA NA 255 229 207 222 ...
..$ Wind : num [1:31] 6.9 13.8 7.4 6.9 7.4 4.6 4 10.3 8 8.6 ...
..$ Temp : int [1:31] 81 81 82 86 85 87 89 90 90 92 ...
..$ Month : int [1:31] 8 8 8 8 8 8 8 8 8 8 ...
..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
$ 9:'data.frame': 30 obs. of 6 variables:
..$ Ozone : num [1:30, 1] 2.674 1.928 1.721 2.467 0.644 ...
.. ..- attr(*, "scaled:center")= num 31.4
.. ..- attr(*, "scaled:scale")= num 24.1
..$ Solar.R: int [1:30] 167 197 183 189 95 92 252 220 230 259 ...
..$ Wind : num [1:30] 6.9 5.1 2.8 4.6 7.4 15.5 10.9 10.3 10.9 9.7 ...
..$ Temp : int [1:30] 91 92 93 93 87 84 80 78 75 73 ...
..$ Month : int [1:30] 9 9 9 9 9 9 9 9 9 9 ...
..$ Day : int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
But now it does add those attributes
..$ Ozone : num ...
.. ..- attr(*, "scaled:center")= num 29.4
.. ..- attr(*, "scaled:scale")= num 18.2
and the very simple 'unsplit' function is not programmed to handle those attributes.
> unsplit(l,g)
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long
The (direct and simple) solution is to get rid of those attributes.
attributes(l[[1]]$Ozone) <- NULL
attributes(l[[2]]$Ozone) <- NULL
attributes(l[[3]]$Ozone) <- NULL
attributes(l[[4]]$Ozone) <- NULL
attributes(l[[5]]$Ozone) <- NULL
Then try to unsplit again.
str( unsplit(l,g) )
> str( unsplit(l,g) )
'data.frame': 153 obs. of 6 variables:
$ Ozone : num 0.782 0.557 -0.523 -0.253 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
So, now it works.
Andre Mikulec

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

aggregate() puts multiple output columns in a matrix instead - r

Related

Converting the structure of input data in R

Appending/Growing List Elements Separately in r

Why is distance matrix (dist()) giving empty values for data sets having more than ~50 observations?

Error in ncol(xj) : object 'xj' not found when using R matplot()

Split and unsplit a dataframe in four parts

Categories

Resources