I am using the ggmcmc package to produce a summary pdf file of rjags package output using the ggmcmc() function. However, I get the following error message:
> ggmcmc(x, file = "Model0-output.pdf")
Plotting histograms
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 160, 164
When I check the structure of the input dataframe I created with the ggs() function, everything looks to be correct.
> str(x)
'data.frame': 240000 obs. of 4 variables:
$ Iteration: int 1 2 3 4 5 6 7 8 9 10 ...
$ Chain : int 1 1 1 1 1 1 1 1 1 1 ...
$ Parameter: Factor w/ 32 levels "N[1]","N[2]",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 96 87 76 79 89 95 85 78 86 89 ...
- attr(*, "nChains")= int 3
- attr(*, "nParameters")= int 32
- attr(*, "nIterations")= int 2500
- attr(*, "nBurnin")= num 2000
- attr(*, "nThin")= num 2
- attr(*, "description")= chr "postout0"
- attr(*, "parallel")= logi FALSE
Can anyone help me identify where the error is being caused and how I can correct it? Am I missing something obvious?
ggmcmc 0.5.1 solves the calculation of the number of bins in a different manner that it did it in previous versions. Previous versions relied on ggplot2:::bin, whereas 0.5.1 computes the bins and their binwidth by itself.
It is likely your case that the range of some of the parameters was so extreme that rounding errors would make some of them have one more or one less bins, therefore producing this error.
Related
Issue:
I have a data frame (called Yeo) containing six parameters with continuous values (columns 5-11)(see parameters below) and I conducted a Shapiro-Wilk test to determine whether or not the univariate samples came from a normal distribution. For each parameter, the residuals showed non-normality and it's skewed, so I want to transform my variables using both the yjPower (Yeo transformation) and the bcPower(Box Cox transformation) families to compare both transformations.
I have used this R code below before on many occassions so I know it works. However, for this data frame, I keep getting this error (see below). Unfortunately, I cannot provide a reproducible example online as the data belongs to three different organisations. I have opened an old data frame with the same parameters and my R code runs absolutely fine. I really can't figure out a solution.
Would anybody be able to please help me understand this error message below?
Many thanks if you can advise.
Error
transform=powerTransform(as.matrix(Yeo[5:11]), family= "yjPower")
Error
Error in optim(start, llik, hessian = TRUE, method = method, ...) :
non-finite finite-difference value [1]
#save transformed data in strand_trans to compare both
stand_trans=Yeo
stand_trans[,5]=yjPower(Yeo[,5],transform$lambda[1])
stand_trans[,6]=yjPower(Yeo[,6],transform$lambda[2])
stand_trans[,7]=yjPower(Yeo[,7],transform$lambda[3])
stand_trans[,8]=yjPower(Yeo[,8],transform$lambda[4])
stand_trans[,9]=yjPower(Yeo[,9],transform$lambda[5])
stand_trans[,10]=yjPower(Yeo[,10],transform$lambda[6])
stand_trans[,11]=yjPower(Yeo[,11],transform$lambda[7])
Parameters
'data.frame': 888 obs. of 14 variables:
$ ID : num 1 2 3 4 5 6 7 8 9 10 ...
$ Year : num 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
$ Date : Factor w/ 19 levels "","01.09.2019",..: 19 19 19 19 19 19 19 17 17 17 ...
$ Country : Factor w/ 3 levels "","France","Argentina": 3 3 3 3 3 3 3 3 3 3 ...
$ Low.Freq : num 4209 8607 9361 9047 7979 ...
$ High.Freq : num 15770 18220 19853 18220 17843 ...
$ Start.Freq : num 4436 13945 16264 12283 12691 ...
$ End.Freq : num 4436 13945 16264 12283 12691 ...
$ Peak.Freq : num 4594 8906 11531 10781 8812 ...
$ Center.Freq : num 1.137 0.754 0.785 0.691 0.883 ...
$ Delta.Freq : num 11560 9613 10492 9173 9864 ...
*I have a large data set including 2000 variables, including factors and continuous variables.
For example:
library(finalfit)
library(dplyr)
data(colon_s)
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
I use the following function to compare the mean of each continuous variable among the level of the categorical dependent variable (ANOVA) or the percentage of each categorical variable among the level of the categorical dependent variable (CHI-SQUARE)
summary_factorlist(colon_s, dependent ="perfor.factor", explanatory =explanatory , add_dependent_label=T, p=T,p_cat="fisher", p_cont_para = "aov", fit_id
= T)
But as soon as running the above code, I got the following error:
Error in dplyr::summarise():
! Problem while computing ..1 = ...$p.value.
Caused by error in fisher.test():
! 'x' and 'y' must have at least 2 levels
*In the data set, there are some variables which do not include at least two levels or just one of their levels has a non-zero frequency. I was wondering if there is any loop function to remove the variable if one of these conditions satisfies.
If the variable includes just one level
If the variable includes more than one level but the frequency of just one level is no-zero.
if all values of the variable are missing*
Update (partial answer):
With this code we can remove factors with only one level and keep other non factor variables:
x <- colon_s[, (sapply(colon_s, nlevels)>1) | (sapply(colon_s, is.factor)==FALSE)]
The OP's code does work with the data provided
library(dplyr)
library(finalfit)
summary_factorlist(colon_s, dependent ="perfor.factor",
explanatory =explanatory ,
add_dependent_label=TRUE, p=TRUE,p_cat="fisher", p_cont_para = "aov", fit_id = TRUE)
Dependent: Perforation No Yes p fit_id index
Age (years) Mean (SD) 59.8 (11.9) 58.4 (13.3) 0.542 age 1
Age <40 years 68 (7.5) 2 (7.4) 1.000 age.factor<40 years 2
40-59 years 334 (37.0) 10 (37.0) age.factor40-59 years 3
60+ years 500 (55.4) 15 (55.6) age.factor60+ years 4
Sex Female 432 (47.9) 13 (48.1) 1.000 sex.factorFemale 5
Male 470 (52.1) 14 (51.9) sex.factorMale 6
Obstruction No 715 (81.2) 17 (63.0) 0.026 obstruct.factorNo 7
Yes 166 (18.8) 10 (37.0) obstruct.factorYes 8
The strcture of data shows the factor variables to have more than 1 level
> str(colon_s[c(explanatory, dependent)])
'data.frame': 929 obs. of 5 variables:
$ age : num 43 63 71 66 69 57 77 54 46 68 ...
..- attr(*, "label")= chr "Age (years)"
$ age.factor : Factor w/ 3 levels "<40 years","40-59 years",..: 2 3 3 3 3 2 3 2 2 3 ...
..- attr(*, "label")= chr "Age"
$ sex.factor : Factor w/ 2 levels "Female","Male": 2 2 1 1 2 1 2 2 2 1 ...
..- attr(*, "label")= chr "Sex"
$ obstruct.factor: Factor w/ 2 levels "No","Yes": NA 1 1 2 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "Obstruction"
$ perfor.factor : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "Perforation"
Regarding selection of factor variables with the condition mentioned, we could use
library(dplyr)
colon_s_sub <- colon_s %>%
select(where(~ is.factor(.x) && nlevels(.x) > 1 && all(table(.x) > 0) &
sum(complete.cases(.x)) > 0))
I'm re-running Kaplan-Meier Survival Curves from previously published data, using the exact data set used in the publication (Charpentier et al. 2008 - Inbreeding depression in ring-tailed lemurs (Lemur catta): genetic diversity predicts parasitism, immunocompetence, and survivorship). This publication ran the curves in SAS Version 9, using LIFETEST, to analyze the age at death structured by genetic heterozygosity and sex of the animal (n=64). She reports a Chi square value of 6.31 and a p value of 0.012; however, when I run the curves in R, I get a Chi square value of 0.9 and a p value of 0.821. Can anyone explain this??
R Code used: Age is the time to death, mort is the censorship code, sex is the stratum of gender, and ho2 is the factor delineating the two groups to be compared.
> survdiff(Surv(age, mort1)~ho2+sex,data=mariekmsurv1)
Call:
survdiff(formula = Surv(age, mort1) ~ ho2 + sex, data = mariekmsurv1)
N Observed Expected (O-E)^2/E (O-E)^2/V
ho2=1, sex=F 18 3 3.23 0.0166 0.0215
ho2=1, sex=M 12 3 2.35 0.1776 0.2140
ho2=2, sex=F 17 5 3.92 0.3004 0.4189
ho2=2, sex=M 17 4 5.50 0.4088 0.6621
Chisq= 0.9 on 3 degrees of freedom, p= 0.821
> str(mariekmsurv1)
'data.frame': 64 obs. of 6 variables:
$ id : Factor w/ 65 levels "","aeschylus",..: 14 31 33 30 47 57 51 39 36 3 ...
$ sex : Factor w/ 3 levels "","F","M": 3 2 3 2 2 2 2 2 2 2 ...
$ mort1: int 0 0 0 0 0 0 0 0 0 0 ...
$ age : num 0.12 0.192 0.2 0.23 1.024 ...
$ sex.1: Factor w/ 3 levels "","F","M": 3 2 3 2 2 2 2 2 2 2 ...
$ ho2 : int 1 1 1 2 1 1 1 1 1 2 ...
- attr(*, "na.action")=Class 'omit' Named int [1:141] 65 66 67 68 69 70 71 72 73 74 ...
.. ..- attr(*, "names")= chr [1:141] "65" "66" "67" "68" ...
Some ideas:
Try running it in SAS -- see if you get the same results as the author. Maybe they didn't send you the exact same dataset they used.
Look into the default values of the relevant SAS PROC and compare to the defaults of the R function you are using.
Given the HUGE difference between the Chi-squared (6.81 and 0.9) and P values (0.012 and 0.821) beteween SAS procedure and R procedure for survival analyses; I suspect that you have used wrong variables in the either one of the procedures.
The procedural difference / (data handling difference between SAS and R can cause some very small differences ) .
This is not a software error, this is highly likely to be a human error.
I am trying to convert a spatial object into a data.frame using the function fortify from the package ggplot2. But I am getting an error. For example, following the exact same code used in Hadley Wickhan's plotting polygon shapefiles example, I type the following line of commands:
require("rgdal")
require("maptools")
require("ggplot2")
require("plyr")
utah = readOGR(dsn="/path/to/shapefile", layer="eco_l3_ut")
OGR data source with driver: ESRI Shapefile
Source: ".", layer: "eco_l3_ut"
with 10 features and 7 fields
Feature type: wkbPolygon with 2 dimensions
utah#data$id = rownames(utah#data)
Everything seems to work OK:
> str(utah)
..# data :'data.frame': 10 obs. of 8 variables:
.. ..$ AREA : num [1:10] 1.42e+11 1.33e+11 3.10e+11 4.47e+10 1.26e+11 ...
.. ..$ PERIMETER : num [1:10] 4211300 3689180 4412500 2722190 3388270 ...
.. ..$ USECO_ : int [1:10] 164 170 204 208 247 367 373 386 409 411
.. ..$ USECO_ID : int [1:10] 163 216 201 206 245 366 372 385 408 410
.. ..$ ECO : Factor w/ 7 levels "13","14","18",..: 7 3 1 4 5 6 2 4 4 6
.. ..$ LEVEL3 : int [1:10] 80 18 13 19 20 21 14 19 19 21
.. ..$ LEVEL3_NAM: Factor w/ 7 levels "Central Basin and Range",..: 4 7 1 6 2 5 3 6 6 5
.. ..$ id : chr [1:10] "0" "1" "2" "3" ...
...
...
However, when I try to convert the utah object using the function fortify from the packcage ggplot2, I get the following error:
> utah.points = fortify(utah, region="id")
Error in UseMethod("fortify") : no applicable method for 'fortify' applied to an object of class "c('SpatialPolygonsDataFrame', 'SpatialPolygons', 'Spatial')"
I am getting the same error for all other spatial objects that I have tried to convert using fortify; even when using code that had worked ok in the past (before upgrading to R's version 3.0.2).
I have R version 3.0.2 running on a Mac with Intel Core i7 and 16GB of ram.
I got the same problem.
After reinstalling the main packages, the error message was still there.
Eventually I realized that a function fortify is also present in the package lme4 that was loaded after ggplot2.
Using ggplot2::fortify(utah, region="id") solved the problem.
I realized that the problem has to do with the .Rprofile file. This is what I have in that file:
options(repos="http://cran.stat.ucla.edu")
utils::update.packages(ask=FALSE)
pkgs <- getOption("defaultPackages")
options(defaultPackages = c(pkgs,"ggplot2","arm", "Zelig","stringr", "plyr", "reshape2", "MatchIt", "ISLR", "rgdal"))
Whenever ggplot2 loads from .Rprofile, I get the error mentioned in my question above. Whenever I take out ggplot2 from the .Rprofile options, I don't get the error.
I downloaded the shapefile and tried your code; it (or rather forfify) appeared to work fine on my installation. I suggest you reinstall the main packages, reboot and try again.
> utah.points = fortify(utah, region="id")
Loading required package: rgeos
rgeos version: 0.3-1, (SVN revision 413M)
GEOS runtime version: 3.3.8-CAPI-1.7.8
Polygon checking: TRUE
> head(utah.points)
long lat order hole piece group id
1 -1405382 2224519 1 FALSE 1 0.1 0
2 -1406958 2222744 2 FALSE 1 0.1 0
3 -1408174 2221195 3 FALSE 1 0.1 0
4 -1409680 2220162 4 FALSE 1 0.1 0
5 -1411068 2219579 5 FALSE 1 0.1 0
6 -1412780 2219001 6 FALSE 1 0.1 0
> tail(utah.points)
long lat order hole piece group id
19615 -1172872 1741373 19615 FALSE 1 9.1 9
19616 -1172522 1740139 19616 FALSE 1 9.1 9
19617 -1172366 1739158 19617 FALSE 1 9.1 9
19618 -1172124 1737840 19618 FALSE 1 9.1 9
19619 -1171788 1737281 19619 FALSE 1 9.1 9
19620 -1171309 1736884 19620 FALSE 1 9.1 9
>
I have a data.frame mydf, that contains data from 27 subjects. There are two predictors, congruent (2 levels) and offset (5 levels), so overall there are 10 conditions. Each of the 27 subjects was tested 20 times under each condition, resulting in a total of 10*27*20 = 5400 observations. RT is the response variable. The structure looks like this:
> str(mydf)
'data.frame': 5400 obs. of 4 variables:
$ subject : Factor w/ 27 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...
$ congruent: logi TRUE FALSE FALSE TRUE FALSE TRUE ...
$ offset : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 1 2 5 5 2 2 3 5 ...
$ RT : int 330 343 457 436 302 311 595 330 338 374 ...
I've used daply() to calculate the mean RT of each subject in each of the 10 conditions:
myarray <- daply(mydf, .(subject, congruent, offset), summarize, mean = mean(RT))
The result looks just the way I wanted, i.e. a 3d-array; so to speak 5 tables (one for each offset condition) that show the mean of each subject in the congruent=FALSE vs. the congruent=TRUE condition.
However if I check the structure of myarray, I get a confusing output:
List of 270
$ : num 417
$ : num 393
$ : num 364
$ : num 399
$ : num 374
...
# and so on
...
[list output truncated]
- attr(*, "dim")= int [1:3] 27 2 5
- attr(*, "dimnames")=List of 3
..$ subject : chr [1:27] "1" "2" "3" "5" ...
..$ congruent: chr [1:2] "FALSE" "TRUE"
..$ offset : chr [1:5] "1" "2" "3" "4" ...
This looks totally different from the structure of the prototypical ozone array from the plyr package, even though it's a very similar format (3 dimensions, only numerical values).
I want to compute some further summarizing information on this array, by means of aaply. Precisely, I want to calculate the difference between the congruent and the incongruent means for each subject and offset.
However, already the most basic application of aaply() like aaply(myarray,2,mean) returns non-sense output:
FALSE TRUE
NA NA
Warning messages:
1: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
2: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
I have no idea, why the daply() function returns such weirdly structured output and thereby prevents any further use of aaply. Any kind of help is kindly appreciated, I frankly admit that I have hardly any experience with the plyr package.
Since you haven't included your data it's hard to know for sure, but I tried to make a dummy set off your str(). You can do what you want (I'm guessing) with two uses of ddply. First the means, then the difference of the means.
#Make dummy data
mydf <- data.frame(subject = rep(1:5, each = 150),
congruent = rep(c(TRUE, FALSE), each = 75),
offset = rep(1:5, each = 15), RT = sample(300:500, 750, replace = T))
#Make means
mydf.mean <- ddply(mydf, .(subject, congruent, offset), summarise, mean.RT = mean(RT))
#Calculate difference between congruent and incongruent
mydf.diff <- ddply(mydf.mean, .(subject, offset), summarise, diff.mean = diff(mean.RT))
head(mydf.diff)
# subject offset diff.mean
# 1 1 1 39.133333
# 2 1 2 9.200000
# 3 1 3 20.933333
# 4 1 4 -1.533333
# 5 1 5 -34.266667
# 6 2 1 -2.800000