Melt not working when upgrading from Reshape to Reshape 2
I have a large list of values. Here is the summary (lots of columns):
List of 46
$ Date: Date[1:9], format: "2011-03-04" ...
$ 1 : num [1:9] 20278 19493 20587 24679 55708 ...
$ 2 : num [1:9] 24029 25317 25103 28871 79423 ...
$ 3 : num [1:9] 6657 7025 6603 8105 17883 ...
$ 4 : num [1:9] 29684 27555 28956 31504 73638 ...
$ 5 : num [1:9] 9572 8759 9947 11173 22341 ...
$ 6 : num [1:9] 18935 20168 22963 24387 58640 ...
$ 7 : num [1:9] 8299 8297 10484 10211 19277 ...
$ 8 : num [1:9] 14365 13691 13906 17149 38364 ...
$ 9 : num [1:9] 10333 10899 9708 11297 24100 ...
$ 10 : num [1:9] 33647 33455 35327 49031 128927 ...
$ 11 : num [1:9] 15090 16105 16343 18624 53809 ...
$ 12 : num [1:9] 17971 16408 15911 18350 44048 ...
$ 13 : num [1:9] 36820 44024 52026 62491 142186 ...
$ 14 : num [1:9] 27036 33240 39248 53035 148606 ...
$ 15 : num [1:9] 11490 11704 12587 17840 50201 ...
$ 16 : num [1:9] 11016 11768 13711 13323 21258 ...
$ 17 : num [1:9] 19792 18734 20477 30433 66028 ...
$ 18 : num [1:9] 19920 20316 21285 29360 88008 ...
$ 19 : num [1:9] 17046 19281 19610 30376 80302 ...
$ 20 : num [1:9] 32886 38971 44672 53278 141423 ...
$ 21 : num [1:9] 11324 13211 13123 15510 32014 ...
$ 22 : num [1:9] 21416 23530 25978 37096 94035 ...
$ 23 : num [1:9] 29527 33310 32701 42628 112442 ...
$ 24 : num [1:9] 19479 19181 20525 25210 69559 ...
$ 25 : num [1:9] 20727 20620 22190 29052 59528 ...
$ 26 : num [1:9] 16056 15122 15240 17327 39292 ...
$ 27 : num [1:9] 19020 28919 29659 43806 94475 ...
$ 28 : num [1:9] 19041 15803 15940 20319 49065 ...
$ 29 : num [1:9] 15775 15080 17841 21492 49891 ...
$ 30 : num [1:9] 9554 10395 9605 11513 13558 ...
$ 31 : num [1:9] 15322 16603 16348 17228 32973 ...
$ 32 : num [1:9] 19752 21591 21272 24639 52204 ...
$ 33 : num [1:9] 2017 2109 1944 1899 2224 ...
$ 34 : num [1:9] 18797 18496 17514 20066 39702 ...
$ 35 : num [1:9] 14306 13489 14507 18560 51028 ...
$ 36 : num [1:9] 2247 2558 2232 2401 2931 ...
$ 37 : num [1:9] 10971 10779 10272 11788 17386 ...
$ 38 : num [1:9] 6241 6414 6024 6291 8257 ...
$ 39 : num [1:9] 16933 18888 20160 25847 60786 ...
$ 40 : num [1:9] 18254 17638 17956 20265 43778 ...
$ 41 : num [1:9] 18249 19955 20016 25647 53012 ...
$ 42 : num [1:9] 9917 10655 10194 10354 15472 ...
$ 43 : num [1:9] 6561 6903 6941 6174 14034 ...
$ 44 : num [1:9] 5857 5968 6283 7645 9861 ...
$ 45 : num [1:9] 17185 18197 19508 26187 67014 ...
- attr(*, "row.names")= int [1:9] 1 2 3 4 5 6 7 8 9
- attr(*, "idvars")= chr "Date"
- attr(*, "rdimnames")=List of 2
..$ :'data.frame': 9 obs. of 1 variable:
.. ..$ Date: Date[1:9], format: "2011-03-04" ...
..$ :'data.frame': 45 obs. of 1 variable:
.. ..$ Store: num [1:45] 1 2 3 4 5 6 7 8 9 10 ...
'data.frame': 405 obs. of 3 variables:
$ Date : Date, format: "2011-03-04" ...
$ value: num 20278 19493 20587 24679 55708 ...
$ Store: num 1 1 1 1 1 1 1 1 1 2 ...
With the original reshape library I am able to melt it down without issue:
'data.frame': 405 obs. of 3 variables:
$ Date : Date, format: "2011-03-04" ...
$ value: num 20278 19493 20587 24679 55708 ...
$ Store: num 1 1 1 1 1 1 1 1 1 2 ...
However, when I try to use melt from Reshape2, I get the following warning and error:
attributes are not identical across measure variables; they will be dropped
Error: `by` must be supplied when `x` and `y` have no common variables.
What happened here between versions here? Any suggestions for fixing? I'm stuck using Reshape2 for this. Thanks!
Related
I have the following object with a lot of lists and I need to convert all of them in dataframes using R...
glimpse(pickle_data)
List of 32
$ 2020-02-01:List of 11
..$ model :List of 6
.. ..$ : num [1:88, 1:100] 0.00487 0.13977 -0.07648 0.18417 -0.1105 ...
.. ..$ : num [1:25, 1:100] -0.186 0.0703 0.1479 0.0321 0.1185 ...
.. ..$ : num [1:100(1d)] 0.0119 0.0457 0.023 0.0295 0.0115 ...
.. ..$ : num [1:25, 1:132] -0.024 0.0756 -0.0724 -0.1112 -0.1974 ...
.. ..$ : num [1:33, 1:132] 0.1904 0.1275 0.0684 0.1707 0.017 ...
.. ..$ : num [1:132(1d)] 0.0434 0.0636 0.0444 0.0329 0.0393 ...
..$ X_train : num [1:2494, 1:13, 1:88] 0.0676 0.0697 0.0717 0.0753 0.0783 ...
..$ X_test : num [1:3180, 1:13, 1:88] 0.0676 0.0697 0.0717 0.0753 0.0783 ...
..$ df_input : feat__price__mean_22_days ... feat__us_area_harvested__slope_66_days
ds ...
2010-01-01 NaN ... NaN
2010-01-04 NaN ... NaN
2010-01-05 NaN ... NaN
2010-01-06 NaN ... NaN
2010-01-07 NaN ... NaN
... ... ... ...
2022-09-13 0.699482 ... 0.157974
2022-09-14 0.705994 ... 0.163528
2022-09-15 0.713729 ... 0.171177
2022-09-16 0.722944 ... 0.176913
2022-09-19 0.728798 ... 0.184181
[3317 rows x 88 columns]
..$ index_test :DatetimeIndex(['2010-07-13', '2010-07-14', '2010-07-15', '2010-07-16',
'2010-07-19', '2010-07-20', '2010-07-21', '2010-07-22',
'2010-07-23', '2010-07-26',
...
'2022-09-06', '2022-09-07', '2022-09-08', '2022-09-09',
'2022-09-12', '2022-09-13', '2022-09-14', '2022-09-15',
'2022-09-16', '2022-09-19'],
dtype='datetime64[ns]', name='ds', length=3180, freq=None)
..$ window : int 12
..$ shift_coeff : int 4
..$ target_frequency : int 6
..$ nb_of_months_to_predict : int 9
..$ spine_params :List of 3
.. ..$ month_duration: int 22
.. ..$ week_duration : int 5
.. ..$ min_periods : int 7
..$ neural_network_hyperparams:List of 10
.. ..$ number_of_neurons_first_layer: int 25
.. ..$ l1_regularization : int 0
.. ..$ l2_regularization : int 0
.. ..$ decrease_reg_deep_layers : logi TRUE
.. ..$ decrease_reg_factor : int 10
.. ..$ dropout_rate : num 0.1
.. ..$ use_dropout : logi TRUE
.. ..$ initial_learning_rate : num 0.000151
.. ..$ output_layer_cell : chr "lstm"
.. ..$ gradient_initialization : int 42
$ 2020-03-01:List of 11
..$ model :List of 6
It is possible to extract dataframe for the first list and all of them inside him and so on untill to the end of this object using R?
This is the frist time to perform KDE in R with data which has more than 5 variables for me for anomaly detection.
As far as I know that KDE is performable for multidimensional data but I couldn't find examples which using more than 5 dimensional data.
I'm using data which have 'age', 'trestbps', 'chol', 'thalach', and 'oldpeak' 5 variables as like below.
'data.frame': 176 obs. of 5 variables:
$ age : int 30 50 50 50 50 60 50 40 50 40 ...
$ trestbps: int 130 130 130 130 130 130 130 130 130 130 ...
$ chol : int 198 245 221 288 205 309 240 243 289 250 ...
$ thalach : int 130 166 164 159 184 131 154 152 124 179 ...
$ oldpeak : num 1.6 2.4 0 0.2 0 1.8 0.6 0 1 0 ...
I performed KDE for those data, with the approach as like below, but I'm not sure it is correct approach, and proper result.
evpts <- do.call(expand.grid, lapply(df3, quantile, prob = c(0.1,.25,.5,.75,.9)))
hat2 <- kde(df3, eval.points = evpts)
> str(hat2)
List of 9
$ x : num [1:176, 1:5] 30 50 50 50 50 60 50 40 50 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "age" "trestbps" "chol" "thalach" ...
$ eval.points:'data.frame': 3125 obs. of 5 variables:
..$ age : Named num [1:3125] 40 40 50 60 60 40 40 50 60 60 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "25%" "50%" "75%" ...
..$ trestbps: Named num [1:3125] 108 108 108 108 108 112 112 112 112 112 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ chol : Named num [1:3125] 194 194 194 194 194 194 194 194 194 194 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ thalach : Named num [1:3125] 114 114 114 114 114 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ oldpeak : Named num [1:3125] 0 0 0 0 0 0 0 0 0 0 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..- attr(*, "out.attrs")=List of 2
.. ..$ dim : Named int [1:5] 5 5 5 5 5
.. .. ..- attr(*, "names")= chr [1:5] "age" "trestbps" "chol" "thalach" ...
.. ..$ dimnames:List of 5
.. .. ..$ age : chr [1:5] "age=40" "age=40" "age=50" "age=60" ...
.. .. ..$ trestbps: chr [1:5] "trestbps=108" "trestbps=112" "trestbps=120" "trestbps=128" ...
.. .. ..$ chol : chr [1:5] "chol=194.00" "chol=211.00" "chol=244.00" "chol=283.75" ...
.. .. ..$ thalach : chr [1:5] "thalach=113.50" "thalach=128.25" "thalach=150.00" "thalach=164.00" ...
.. .. ..$ oldpeak : chr [1:5] "oldpeak=0.0" "oldpeak=0.0" "oldpeak=0.8" "oldpeak=1.8" ...
$ estimate : Named num [1:3125] 5.64e-12 5.64e-12 2.85e-09 7.76e-10 7.76e-10 ...
..- attr(*, "names")= chr [1:3125] "1" "2" "3" "4" ...
$ H : num [1:5, 1:5] 6.972 0.866 5.065 -6.541 0.189 ...
$ gridded : logi FALSE
$ binned : logi FALSE
$ names : chr [1:5] "age" "trestbps" "chol" "thalach" ...
$ w : num [1:176] 1 1 1 1 1 1 1 1 1 1 ...
$ type : chr "kde"
- attr(*, "class")= chr "kde"
If it is not proper approach, could you please help me to get correct approach?
Thank you for your support.
I'm unable to call the function randomForest.plot() when loading a randomForest object through an RData file.
library("randomForest")
load("rf.RData")
plot(rf)
I get the error:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
Get the same error when I call randomForest:::plot.randomForest(rf)
Other function calls on rf work just fine.
EDIT:
See output of str(rf)
str(rf)
List of 15
$ call : language randomForest(x = data[, match("feat1", names(data)):match("feat_n", names(data))], y = data[, match("my_y", n| __truncated__ ...
$ type : chr "regression"
$ predicted : Named num [1:723012] -1141 -1767 -1577 NA -1399 ...
..- attr(*, "names")= chr [1:723012] "1" "2" "3" "4" ...
$ oob.times : int [1:723012] 3 4 6 3 2 3 2 6 7 5 ...
$ importance : num [1:150, 1:2] 6172 928 6367 5754 1013 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
.. ..$ : chr [1:2] "%IncMSE" "IncNodePurity"
$ importanceSD : Named num [1:150] 400.9 96.7 500.1 428.9 194.8 ...
..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
$ localImportance: NULL
$ proximity : NULL
$ ntree : num 60
$ mtry : num 10
$ forest :List of 11
..$ ndbigtree : int [1:60] 392021 392219 392563 392845 393321 392853 392157 392709 393223 392679 ...
..$ nodestatus : num [1:393623, 1:60] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 ...
..$ leftDaughter : num [1:393623, 1:60] 2 4 6 8 10 12 14 16 18 20 ...
..$ rightDaughter: num [1:393623, 1:60] 3 5 7 9 11 13 15 17 19 21 ...
..$ nodepred : num [1:393623, 1:60] -8.15 -31.38 5.62 -59.87 -16.06 ...
..$ bestvar : num [1:393623, 1:60] 118 57 82 77 65 148 39 39 12 77 ...
..$ xbestsplit : num [1:393623, 1:60] 1.08e+02 -8.26e+08 -2.50 8.55e+03 1.20e+04 ...
..$ ncat : Named int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
..$ nrnodes : int 393623
..$ ntree : num 60
..$ xlevels :List of 150
.. ..$ feat1 : num 0
.. ..$ feat2 : num 0
.. ..$ feat3 : num 0
.. ..$ feat4 : num 0
.. ..$ featn : num 0
.. .. [list output truncated]
$ coefs : NULL
$ y : num [1:723012] -1885 -1918 -1585 -1838 -2035 ...
$ test : NULL
$ inbag : NULL
- attr(*, "class")= chr "randomForest"
sortedSL=sort(iris$Sepal.Length)
gainsV<-NULL
splitV<-NULL
library(gains)
for(i in 1:NROW(sortedSL))
{
splitVal=sortedSL[i]
iris$new=0
if(sum(iris$Sepal.Length>splitVal)>0)
{
iris[iris$Sepal.Length>splitVal,]$new=1
}
gainsV<-c(gainsV,gains(iris$Sepal.Length, iris$new))
splitV <-c(splitV, splitVal)
}
finalSplitV<-splitV[which.max(gainsV)]
I type in the code above but saw this error message
Error in which.max(gainsV) :
(list) object cannot be coerced to type 'double'
Please advise. Thanks.
This should admittedly be a comment, but the formatting options for comments are too limited to display the results of str() on complex objects. You are getting errors but you are ignoring what I suspect are important warnings;
There were 50 or more warnings (use warnings() to see the first 50)
... coming from this line of code:
gainsV<-c(gainsV,gains(iris$Sepal.Length, iris$new))
Looking at
str(gainsV)
#============
List of 2265
$ depth : num [1:2] 99 100
$ obs : int [1:2] 149 1
$ cume.obs : int [1:2] 149 150
$ mean.resp : num [1:2] 5.85 4.3
$ cume.mean.resp : num [1:2] 5.85 5.84
$ cume.pct.of.total: num [1:2] 0.995 1
$ lift : num [1:2] 100 74
$ cume.lift : num [1:2] 100 100
$ mean.prediction : num [1:2] 1 0
$ min.prediction : num [1:2] 1 0
$ max.prediction : num [1:2] 1 0
$ conf : chr "none"
$ optimal : logi FALSE
$ num.groups : int 2
$ percents : logi FALSE
$ depth : num [1:2] 97 100
$ obs : int [1:2] 146 4
$ cume.obs : int [1:2] 146 150
$ mean.resp : num [1:2] 5.88 4.38
$ cume.mean.resp : num [1:2] 5.88 5.84
$ cume.pct.of.total: num [1:2] 0.98 1
$ lift : num [1:2] 101 75
$ cume.lift : num [1:2] 101 100
$ mean.prediction : num [1:2] 1 0
$ min.prediction : num [1:2] 1 0
$ max.prediction : num [1:2] 1 0
$ conf : chr "none"
$ optimal : logi FALSE
$ num.groups : int 2
$ percents : logi FALSE
$ depth : num [1:2] 97 100
$ obs : int [1:2] 146 4
$ cume.obs : int [1:2] 146 150
$ mean.resp : num [1:2] 5.88 4.38
#### output at your console would continue for much longer
I'm suspicious that this is not the structure you had in mind, but since the goals of this effort were not describe I cannot know this to any degree of certainty.
One of the best ways to make a question reproducible is to use one of the built in data sets. Using data(), however, is frustrating because no information about the structure of the data set is provided.
How can I quickly view the structure of available data sets?
The following function may help:
dataStr <- function(fun=function(x) TRUE)
str(
Filter(
fun,
Filter(
Negate(is.null),
mget(data()$results[, "Item"], inh=T, ifn=list(NULL))
) ) )
It accepts a filtering function, applies it to all the data sets, and prints out the structure of the matching data sets. For example, if we're looking for matrices:
> dataStr(is.matrix)
List of 8
$ WorldPhones : num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
.. ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...
$ occupationalStatus : 'table' int [1:8, 1:8] 50 16 12 11 2 12 0 0 19 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ origin : chr [1:8] "1" "2" "3" "4" ...
.. ..$ destination: chr [1:8] "1" "2" "3" "4" ...
$ volcano : num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
--- 5 entries omitted ---
Or for data frames (also omitting entries):
> dataStr(is.data.frame)
List of 42
$ BOD :'data.frame': 6 obs. of 2 variables:
..$ Time : num [1:6] 1 2 3 4 5 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..- attr(*, "reference")= chr "A1.4, p. 270"
$ CO2 :Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 84 obs. of 5 variables:
..$ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
..$ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
..$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
..$ conc : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
..$ uptake : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
--- 40 entries omitted ---
Or even for simple vectors:
> dataStr(function(x) is.atomic(x) && is.vector(x) && !is.ts(x))
List of 4
$ euro : Named num [1:11] 13.76 40.34 1.96 166.39 5.95 ...
..- attr(*, "names")= chr [1:11] "ATS" "BEF" "DEM" "ESP" ...
$ islands: Named num [1:48] 11506 5500 16988 2968 16 ...
..- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ...
$ precip : Named num [1:70] 67 54.7 7 48.5 14 17.2 20.7 13 43.4 40.2 ...
..- attr(*, "names")= chr [1:70] "Mobile" "Juneau" "Phoenix" "Little Rock" ...
$ rivers : num [1:141] 735 320 325 392 524 ...