Wrong data type conversion after melt - r

I get a numeric to integer64 type conversion after melting a data.table object in R.
Given the file stats.txt, tab separated:
id x y
A 283726709252 0.1
B 288604342155 0.2
C 329048184196 0.3
D 192107948937 0.4
I want to read it into a data.table and melt it. So:
library(data.table)
stats<- fread('stats.txt')
stats
id x y
1: A 283726709252 0.1
2: B 288604342155 0.2
3: C 329048184196 0.3
4: D 192107948937 0.4
str(stats)
Classes ‘data.table’ and 'data.frame': 4 obs. of 3 variables:
$ id: chr "A" "B" "C" "D"
$ x :integer64 283726709252 288604342155 329048184196 192107948937
$ y : num 0.1 0.2 0.3 0.4
- attr(*, ".internal.selfref")=<externalptr>
So far so good. Now if I melt it, I get the y variable converted from numeric to integer64:
xm<- melt.data.table(data= stats, id.vars= 'id')
xm
id variable value
1: A x 283726709252
2: B x 288604342155
3: C x 329048184196
4: D x 192107948937
5: A y 4591870180066957722
6: B y 4596373779694328218
7: C y 4599075939470750515
8: D y 4600877379321698714
str(xm)
Classes ‘data.table’ and 'data.frame': 8 obs. of 3 variables:
$ id : chr "A" "B" "C" "D" ...
$ variable: Factor w/ 2 levels "x","y": 1 1 1 1 2 2 2 2
$ value :integer64 283726709252 288604342155 329048184196 192107948937 4591870180066957722 4596373779694328218 4599075939470750515 4600877379321698714
- attr(*, ".internal.selfref")=<externalptr>
Is this a bug or am I doing something wrong?
sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.2.1 data.table_1.10.4-3
loaded via a namespace (and not attached):
[1] compiler_3.4.1 colorspace_1.3-2 scales_0.5.0 lazyeval_0.2.0 plyr_1.8.4 gtable_0.2.0 tibble_1.3.3 Rcpp_0.12.12 grid_3.4.1 rlang_0.1.1 munsell_0.4.3

Related

Is there a way to prevent copy-on-modify when modifying attributes?

I am surprised that a copy of the matrix is made in the following code:
> (m <- matrix(1:12, nrow = 3))
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> tracemem(m)
[1] "<000001E2FC1E03D0>"
> str(m)
int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
> attr(m, "dim") <- 4:3
tracemem[0x000001e2fc1e03d0 -> 0x000001e2fcb05008]:
> m
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> str(m)
int [1:4, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
Is it useful? Is it avoidable?
EDIT: I do not have the same results as GKi.
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3
> m <- matrix(1:12, nrow = 3)
> tracemem(m)
[1] "<000001F8DB2C7D90>"
> attr(m, "dim") <- c(4, 3)
tracemem[0x000001f8db2c7d90 -> 0x000001f8db2d93f0]:
One difference is that I do not use BLAS library...
I'm using R 3.6.3 and indeed a copy is made. To change an attribute without making a copy, you can use the setattr function of the data.table package:
library(data.table)
m <- matrix(1:12, nrow = 3)
.Internal(inspect(m))
setattr(m, "dim", c(4L,3L))
.Internal(inspect(m))
In my case it is not making a copy of the data:
m <- matrix(1:12, nrow = 3)
.Internal(inspect(m))
##250ff98 13 INTSXP g0c4 [REF(1),ATT] (len=12, tl=0) 1,2,3,4,5,...
#ATTRIB:
# #38da270 02 LISTSXP g0c0 [REF(1)]
# TAG: #194d610 01 SYMSXP g0c0 [MARK,REF(1171),LCK,gp=0x4000] "dim" (has value)
# #38c3d88 13 INTSXP g0c1 [REF(65535)] (len=2, tl=0) 3,4
attr(m, "dim") <- 4:3
.Internal(inspect(m))
##250ff98 13 INTSXP g0c4 [REF(1),ATT] (len=12, tl=0) 1,2,3,4,5,...
#ATTRIB:
# #38da270 02 LISTSXP g0c0 [REF(1)]
# TAG: #194d610 01 SYMSXP g0c0 [MARK,REF(1171),LCK,gp=0x4000] "dim" (has value)
# #38d9978 13 INTSXP g0c0 [REF(65535)] 4 : 3 (expanded)
It was #250ff98 and is afterwards still there. It is only changing the dim from #38c3d88 to #38d9978.
sessionInfo()
#R version 4.0.3 (2020-10-10)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: Debian GNU/Linux 10 (buster)
#
#Matrix products: default
#BLAS: /usr/local/lib/R/lib/libRblas.so
#LAPACK: /usr/local/lib/R/lib/libRlapack.so
#
#locale:
# [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
# [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
# [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
# [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
# [9] LC_ADDRESS=C LC_TELEPHONE=C
#[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#
#loaded via a namespace (and not attached):
#[1] compiler_4.0.3 tools_4.0.3
The same with tracemem.
m <- matrix(1:12, nrow = 3)
tracemem(m)
#[1] "<0x289ff98>"
attr(m, "dim") <- 4:3
tracemem(m)
#[1] "<0x289ff98>"
But if you make an str(m) in between it makes currently a copy:
m <- matrix(1:12, nrow = 3)
tracemem(m)
#[1] "<0x28a01c8>"
str(m)
# int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
attr(m, "dim") <- 4:3
#tracemem[0x28a01c8 -> 0x2895608]:

as_tibble() not working as expected

I am attempting the exercise in R for data science (7.5.2.1, #2): Use geom_tile() together with dplyr to explore how average flight delays vary by destination and month of year. What makes the plot difficult to read? How could you improve it?
First, transmute columns.
library(nycflights13)
foo <- nycflights13::flights %>%
transmute(tot_delay = dep_delay + arr_delay, m = month, d = dest) %>%
filter(!is.na(tot_delay)) %>%
group_by(m, d) %>%
summarise(avg_delay = mean(tot_delay))
Now foo appears to be a data frame based on the 'Source' output.
> foo
Source: local data frame [1,112 x 3]
Groups: m [?]
m d avg_delay
<int> <chr> <dbl>
1 1 ALB 76.571429
2 1 ATL 8.567982
3 1 AUS 19.017751
4 1 AVL 49.000000
5 1 BDL 32.081081
6 1 BHM 47.043478
7 1 BNA 25.930233
8 1 BOS 2.698517
9 1 BQN 8.516129
10 1 BTV 18.393665
# ... with 1,102 more rows
It doesn't appear that as_tibble is working, what could I be doing wrong?
> as_tibble(foo)
Source: local data frame [1,112 x 3]
Groups: m [?]
m d avg_delay
<int> <chr> <dbl>
1 1 ALB 76.571429
2 1 ATL 8.567982
3 1 AUS 19.017751
4 1 AVL 49.000000
5 1 BDL 32.081081
6 1 BHM 47.043478
7 1 BNA 25.930233
8 1 BOS 2.698517
9 1 BQN 8.516129
10 1 BTV 18.393665
# ... with 1,102 more rows
Shouldn't the internals be different for a tibble?
> str(foo)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1112 obs. of 3 variables:
$ m : int 1 1 1 1 1 1 1 1 1 1 ...
$ d : chr "ALB" "ATL" "AUS" "AVL" ...
$ avg_delay: num 76.57 8.57 19.02 49 32.08 ...
- attr(*, "vars")=List of 1
..$ : symbol m
- attr(*, "drop")= logi TRUE
> str(as_tibble(foo))
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1112 obs. of 3 variables:
$ m : int 1 1 1 1 1 1 1 1 1 1 ...
$ d : chr "ALB" "ATL" "AUS" "AVL" ...
$ avg_delay: num 76.57 8.57 19.02 49 32.08 ...
- attr(*, "vars")=List of 1
..$ : symbol m
- attr(*, "drop")= logi TRUE
Note that as_tibble() works as expected
> packageDescription("tibble")
Package: tibble
Encoding: UTF-8
Version: 1.3.0
> is_tibble(foo)
[1] TRUE
Works for me - foo is a "tibble" and is announced as "A tibble: 112 x 3" in the print:
> foo
Source: local data frame [1,112 x 3]
Groups: m [?]
# A tibble: 1,112 x 3
m d avg_delay
<int> <chr> <dbl>
1 1 ALB 76.571429
2 1 ATL 8.567982
So you possibly have an old version of dplyr. Mine is:
> packageDescription("dplyr")
Package: dplyr
Type: Package
Version: 0.5.0
And everything else:
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.5.0 tibble_1.3.1
loaded via a namespace (and not attached):
[1] magrittr_1.5 R6_2.2.0 assertthat_0.2.0 DBI_0.5-1
[5] tools_3.3.1 Rcpp_0.12.11 rlang_0.1.1

How to save a CSV file with R with line breaks that Notepad will recognize?

I'm sorry to bother you with probably an encoding question. Spending couple of hours without getting the solution I decided to post it here.
I'm trying to write a simple table unsuccessfully using write.table, write.csv,write.csv2from Ubuntu 14.04. My data is kind of messy resulting from a cronjob:
ID <- c("",30,26,20,30,40,5,10,4)
b <- c("",2233,12,2,22,13,23,23,100)
c <- c("","","","","","","","","")
d <- c("","","","","","","","","")
e <- c("","","","","","800","","","")
f <- c("","","","","","","","","")
g <- c("","","","","","","","EA","")
h <- c("","","","","","","","","")
df <- data.frame(ID,b,c,d,e,f,g,h)
# change columns to chr
for(i in c(1,2:ncol(df))) {
df[,i] <- as.character(df[,i])
}
str(df)
# data.frame': 9 obs. of 8 variables:
# $ ID: chr "" "30" "26" "20" ...
# $ b : chr "" "2233" "12" "2" ...
# $ c : chr "" "" "" "" ...
# $ d : chr "" "" "" "" ...
# $ e : chr "" "" "" "" ...
# $ f : chr "" "" "" "" ...
# $ g : chr "" "" "" "" ...
# $ h : chr "" "" "" "" ...
head(df,n=9)
ID b c d e f g h
# 1
# 2 30 2233
# 3 26 12
# 4 20 2
# 5 30 22
# 6 40 13 800
# 7 5 23
# 8 10 23 EA
# 9 4 100
I have tried different combinations and suggestions found on SO, however nothing worked. The result is always somehow displaced instead of long its wide. In the current example ist just one long row.
I tried:
write.table(df,"df.csv",row.names = FALSE, dec=".",sep=";")
write.table(df,"df.csv",row.names = FALSE,dec=".",sep=";", col.names = T)
write.table(df,"df.csv",row.names = FALSE,sep=";",fileEncoding = "UTF-8")
write.table(df,"df.csv",row.names = FALSE,fileEncoding = "UTF-8")
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8
[4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3 DBI_0.4-1 RGA_0.4.2 RMySQL_0.11-3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.5 lubridate_1.5.6 digest_0.6.9 assertthat_0.1 R6_2.1.2
[6] plyr_1.8.3 jsonlite_1.0 magrittr_1.5 httr_1.1.0 stringi_1.1.1
[11] curl_0.9.7 tools_3.3.1 stringr_1.0.0 parallel_3.3.1
Wrong output as pic:
Correct output results from the same data on :
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
[![enter image description here][2]][2]
The problem isn't R or Ubuntu it is notepad. Specifically, it expects "\r\n" for line breaks whereas most other text readers are happy with "\n" which is the default line break used by write.xxx.
If you add the parameter eol="\r\n" then you should be able to open in Notepad and see the expected line breaks.
For instance:
write.table(df,"df.csv",row.names = FALSE, dec=".",sep=";",eol="\r\n")

Unable to pipe predict() output through filter() to ggplot()

I'm struggling to figure out why I can't use filter() on the results
of predict.gam() and then ggplot() the subset of predictions. I'm not
sure the prediction step is really part of the problem, but that's what
it takes to trigger the error. Just filter() %>% ggplot() with a
dataframe works fine.
library(dplyr)
library(ggplot2)
library(mgcv)
gam1 <- gam(Petal.Length~s(Petal.Width) + Species, data=iris)
nd <- expand.grid(Petal.Width = seq(0,5,0.05),
Species = levels(iris$Species),
stringsAsFactors = FALSE)
predicted <- predict(gam1,newdata=nd)
predicted <- cbind(predicted,nd)
filter(tbl_df(predicted), Species == "setosa") %>%
ggplot(aes(x=Petal.Width, y = predicted)) +
geom_point()
## Error: length(rows) == 1 is not TRUE
But:
filter(tbl_df(predicted), Species == "setosa")
## Source: local data frame [101 x 3]
##
## predicted Petal.Width Species
## (dbl[10]) (dbl) (chr)
## 1 1.294574 0.00 setosa
## 2 1.327482 0.05 setosa
## 3 1.360390 0.10 setosa
## 4 1.393365 0.15 setosa
## 5 1.426735 0.20 setosa
## 6 1.460927 0.25 setosa
## 7 1.496477 0.30 setosa
## 8 1.533949 0.35 setosa
## 9 1.573888 0.40 setosa
## 10 1.616810 0.45 setosa
## .. ... ... ...
And the problem is filter() because:
pick <- predicted$Species == "setosa"
ggplot(predicted[pick,],aes(x=Petal.Width, y = predicted)) +
geom_point()
I've also tried saving the result of filter to an object and using that directly in ggplot() but that has the same error.
Obviously not a crisis, because there's a workaround, but my mental
model of how to use filter() is obviously wrong! Any insights much
appreciated.
Edit: When I first posted this I was still using R 3.2.3 and was getting warnings from ggplot2 and dplyr. So I upgraded to 3.3.0 and it's still happening.
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10586)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] mgcv_1.8-12 nlme_3.1-127 ggplot2_2.1.0 dplyr_0.4.3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.3 knitr_1.11 magrittr_1.5 munsell_0.4.2
## [5] colorspace_1.2-6 lattice_0.20-33 R6_2.1.1 stringr_1.0.0
## [9] plyr_1.8.3 tools_3.3.0 parallel_3.3.0 grid_3.3.0
## [13] gtable_0.1.2 DBI_0.3.1 htmltools_0.2.6 lazyeval_0.1.10
## [17] yaml_2.1.13 assertthat_0.1 digest_0.6.8 Matrix_1.2-6
## [21] formatR_1.2 evaluate_0.7.2 rmarkdown_0.9.5 labeling_0.3
## [25] stringi_1.0-1 scales_0.3.0
The problem arises because your predict() call generates a named array, instead of just a numerical vector.
class(predicted$predicted)
# [1] "array"
The first filter() will give you the correct output on the surface, however if you inspect the output you will notice that the column predicted is still some sort of nested array.
str(filter(tbl_df(predicted), Species == "setosa"))
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 101 obs. of 3 variables:
$ predicted : num [1:303(1d)] 1.29 1.33 1.36 1.39 1.43 ...
..- attr(*, "dimnames")=List of 1
.. ..$ : chr "1" "2" "3" "4" ...
$ Petal.Width: num 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 ...
$ Species: chr "setosa" "setosa" "setosa" "setosa" ...
In contrast, good old logical subsetting does the job on all dimensions:
str(predicted[pick,])
'data.frame': 101 obs. of 3 variables:
$ predicted : num [1:101(1d)] 1.29 1.33 1.36 1.39 1.43 ... # Now 101 obs here too
..- attr(*, "dimnames")=List of 1
.. ..$ : chr "1" "2" "3" "4" ...
$ Petal.Width: num 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 ...
$ Species : chr "setosa" "setosa" "setosa" "setosa" ...
So either you coerce the predicted column to numeric:
library(dplyr)
library(ggplot2)
predicted %>% mutate(predicted = as.numeric(predicted)) %>%
filter(Species == "setosa") %>%
ggplot(aes(x = Petal.Width, y = predicted)) +
geom_point()
Or replace filter() by subset():
predicted %>%
subset(Species == "setosa") %>%
ggplot(aes(x = Petal.Width, y = predicted)) +
geom_point()

ddply error: Error in attributes(out) <- attributes(col) : 'names' attribute must be the same length as the vector

I am trying to apply ddply on a large data.frame (38000 rows / 10 variables), but I am stuck with an error:
ddply(uncertainty.long, .(Species), "nrow")
returns the error:
Error in attributes(out) <- attributes(col) :
'names' attribute [38000] must be the same length as the vector [3800]
> traceback()
11: FUN(1:10[[5L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: (function (i)
{
piece <- pieces[[i]]
if (.inform) {
res <- try(.fun(piece, ...))
if (inherits(res, "try-error")) {
piece <- paste(capture.output(print(piece)), collapse = "\n")
stop("with piece ", i, ": \n", piece, call. = FALSE)
}
}
else {
res <- .fun(piece, ...)
}
progress$step()
res
})(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(uncertainty.long, .(Species), "nrow")
Some more details about my data.frame:
> head(uncertainty.long)
Stack Variable PARun Model Species value year scenario GCM sp
1 sync_current Total PA1 GLM Arctosafulvolineata 100.0000 NA <NA> <NA> Arctosa\nfulvolineata
2 sync_cgcm2_B2A_2020 Total PA1 GLM Arctosafulvolineata 134.6840 2020 B2A cgcm2 Arctosa\nfulvolineata
3 sync_cgcm2_B2A_2050 Total PA1 GLM Arctosafulvolineata 153.7617 2050 B2A cgcm2 Arctosa\nfulvolineata
4 sync_cgcm2_B2A_2080 Total PA1 GLM Arctosafulvolineata 195.7176 2080 B2A cgcm2 Arctosa\nfulvolineata
5 sync_mk2_B2A_2020 Total PA1 GLM Arctosafulvolineata 172.2967 2020 B2A mk2 Arctosa\nfulvolineata
6 sync_mk2_B2A_2050 Total PA1 GLM Arctosafulvolineata 198.9391 2050 B2A mk2 Arctosa\nfulvolineata
> str(uncertainty.long)
'data.frame': 38000 obs. of 10 variables:
$ Stack : Factor w/ 19 levels "sync_cgcm2_B2A_2020",..: 7 1 2 3 14 15 16 11 12 13 ...
$ Variable: Factor w/ 5 levels "Lost","NetChange",..: 5 5 5 5 5 5 5 5 5 5 ...
$ PARun : Factor w/ 5 levels "PA1","PA2","PA3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Model : Factor w/ 8 levels "CTA","FDA","GAM",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "names")= chr "1" "1" "1" "1" ...
$ value : num 100 135 154 196 172 ...
$ year : num NA 2020 2050 2080 2020 2050 2080 2020 2050 2080 ...
$ scenario: chr NA "B2A" "B2A" "B2A" ...
$ GCM : chr NA "cgcm2" "cgcm2" "cgcm2" ...
$ sp : chr "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" ...
This is my sessionInfo():
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] parallel splines grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 Hmisc_3.12-2 Formula_1.1-1 RCurl_1.95-4.1 bitops_1.0-6 biomod2_3.0.3 pROC_1.5.4 plyr_1.8
[9] rpart_4.1-3 randomForest_4.6-7 mda_0.4-4 class_7.3-9 gbm_2.1 survival_2.37-4 nnet_7.3-7 rasterVis_0.21
[17] hexbin_1.26.2 latticeExtra_0.6-26 RColorBrewer_1.0-5 lattice_0.20-23 abind_1.4-0 raster_2.1-49 sp_1.0-13 ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] cluster_1.14.4 colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 gtable_0.1.2 labeling_0.2 MASS_7.3-29 munsell_0.4.2 proto_0.3-10 scales_0.2.3
[11] stringr_0.6.2 tools_3.0.1 zoo_1.7-10
I have tried to reproduce it with a fewer number of columns (2 columns), it did not change anything.
However, if I reduce the number of lines, it can work when the requested variable "Species" has only one level value:
> small.df <- uncertainty.long[1:3800, ]
> unique(small.df$Species)
[1] Arctosafulvolineata
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis
> ddply(small.df, .(Species), "nrow")
Species nrow
1 Arctosafulvolineata 3800
But if I had another line:
> small.df <- uncertainty.long[1:3801, ]
> unique(small.df$Species)
[1] Arctosafulvolineata Argyronetaaquatica
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis
> small.df[3800:3801, ]
Stack Variable PARun Model Species value year scenario GCM sp
3800 sync_hadcm3_A1B_2080 Lost PA5 MAXENT Arctosafulvolineata -54.90872 2080 A1B hadcm3 Arctosa\nfulvolineata
3801 sync_current Total PA1 GLM Argyronetaaquatica 100.00000 NA <NA> <NA> Argyroneta\naquatica
> ddply(small.df, .(Species), "nrow")
Error in attributes(out) <- attributes(col) :
'names' attribute [3801] must be the same length as the vector [3800]
I have found others with a similar problem : https://stackoverflow.com/a/14162351/2788395.
However, their workaround (reinstalling plyr 1.7 instead of 1.8) did not work for me.
Does anyone have an idea of the problem and/or how to solve it?
Thanks!
Problem solved
The issue was with the "names" attribute of the "Species" column.
I removed them with the following code and ddply worked:
> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
Species nrow
1 Arctosafulvolineata 3800
2 Argyronetaaquatica 3800
3 Dolomedesplantarius 3800
4 Enoplognathamordax 3800
5 Iciussubinermis 3800
6 Neonvalentulus 3800
7 Pardosabifasciata 3800
8 Pardosaoreophila 3800
9 Piratauliginosus 3800
10 Trochosaspinipalpis 3800
The issue was with the "names" attribute of the "Species" column:
$ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "names")= chr "1" "1" "1" "1" ...
I removed them with the following code and ddply worked:
> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
Species nrow
1 Arctosafulvolineata 3800
2 Argyronetaaquatica 3800
3 Dolomedesplantarius 3800
4 Enoplognathamordax 3800
5 Iciussubinermis 3800
6 Neonvalentulus 3800
7 Pardosabifasciata 3800
8 Pardosaoreophila 3800
9 Piratauliginosus 3800
10 Trochosaspinipalpis 3800

Resources