Error with function fortify of ggplot2 - r

I am trying to convert a spatial object into a data.frame using the function fortify from the package ggplot2. But I am getting an error. For example, following the exact same code used in Hadley Wickhan's plotting polygon shapefiles example, I type the following line of commands:
require("rgdal")
require("maptools")
require("ggplot2")
require("plyr")
utah = readOGR(dsn="/path/to/shapefile", layer="eco_l3_ut")
OGR data source with driver: ESRI Shapefile
Source: ".", layer: "eco_l3_ut"
with 10 features and 7 fields
Feature type: wkbPolygon with 2 dimensions
utah#data$id = rownames(utah#data)
Everything seems to work OK:
> str(utah)
..# data :'data.frame': 10 obs. of 8 variables:
.. ..$ AREA : num [1:10] 1.42e+11 1.33e+11 3.10e+11 4.47e+10 1.26e+11 ...
.. ..$ PERIMETER : num [1:10] 4211300 3689180 4412500 2722190 3388270 ...
.. ..$ USECO_ : int [1:10] 164 170 204 208 247 367 373 386 409 411
.. ..$ USECO_ID : int [1:10] 163 216 201 206 245 366 372 385 408 410
.. ..$ ECO : Factor w/ 7 levels "13","14","18",..: 7 3 1 4 5 6 2 4 4 6
.. ..$ LEVEL3 : int [1:10] 80 18 13 19 20 21 14 19 19 21
.. ..$ LEVEL3_NAM: Factor w/ 7 levels "Central Basin and Range",..: 4 7 1 6 2 5 3 6 6 5
.. ..$ id : chr [1:10] "0" "1" "2" "3" ...
...
...
However, when I try to convert the utah object using the function fortify from the packcage ggplot2, I get the following error:
> utah.points = fortify(utah, region="id")
Error in UseMethod("fortify") : no applicable method for 'fortify' applied to an object of class "c('SpatialPolygonsDataFrame', 'SpatialPolygons', 'Spatial')"
I am getting the same error for all other spatial objects that I have tried to convert using fortify; even when using code that had worked ok in the past (before upgrading to R's version 3.0.2).
I have R version 3.0.2 running on a Mac with Intel Core i7 and 16GB of ram.

I got the same problem.
After reinstalling the main packages, the error message was still there.
Eventually I realized that a function fortify is also present in the package lme4 that was loaded after ggplot2.
Using ggplot2::fortify(utah, region="id") solved the problem.

I realized that the problem has to do with the .Rprofile file. This is what I have in that file:
options(repos="http://cran.stat.ucla.edu")
utils::update.packages(ask=FALSE)
pkgs <- getOption("defaultPackages")
options(defaultPackages = c(pkgs,"ggplot2","arm", "Zelig","stringr", "plyr", "reshape2", "MatchIt", "ISLR", "rgdal"))
Whenever ggplot2 loads from .Rprofile, I get the error mentioned in my question above. Whenever I take out ggplot2 from the .Rprofile options, I don't get the error.

I downloaded the shapefile and tried your code; it (or rather forfify) appeared to work fine on my installation. I suggest you reinstall the main packages, reboot and try again.
> utah.points = fortify(utah, region="id")
Loading required package: rgeos
rgeos version: 0.3-1, (SVN revision 413M)
GEOS runtime version: 3.3.8-CAPI-1.7.8
Polygon checking: TRUE
> head(utah.points)
long lat order hole piece group id
1 -1405382 2224519 1 FALSE 1 0.1 0
2 -1406958 2222744 2 FALSE 1 0.1 0
3 -1408174 2221195 3 FALSE 1 0.1 0
4 -1409680 2220162 4 FALSE 1 0.1 0
5 -1411068 2219579 5 FALSE 1 0.1 0
6 -1412780 2219001 6 FALSE 1 0.1 0
> tail(utah.points)
long lat order hole piece group id
19615 -1172872 1741373 19615 FALSE 1 9.1 9
19616 -1172522 1740139 19616 FALSE 1 9.1 9
19617 -1172366 1739158 19617 FALSE 1 9.1 9
19618 -1172124 1737840 19618 FALSE 1 9.1 9
19619 -1171788 1737281 19619 FALSE 1 9.1 9
19620 -1171309 1736884 19620 FALSE 1 9.1 9
>

Related

Car Package in R: powerTransform() - Error non-finite finite-difference value [1]

Issue:
I have a data frame (called Yeo) containing six parameters with continuous values (columns 5-11)(see parameters below) and I conducted a Shapiro-Wilk test to determine whether or not the univariate samples came from a normal distribution. For each parameter, the residuals showed non-normality and it's skewed, so I want to transform my variables using both the yjPower (Yeo transformation) and the bcPower(Box Cox transformation) families to compare both transformations.
I have used this R code below before on many occassions so I know it works. However, for this data frame, I keep getting this error (see below). Unfortunately, I cannot provide a reproducible example online as the data belongs to three different organisations. I have opened an old data frame with the same parameters and my R code runs absolutely fine. I really can't figure out a solution.
Would anybody be able to please help me understand this error message below?
Many thanks if you can advise.
Error
transform=powerTransform(as.matrix(Yeo[5:11]), family= "yjPower")
Error
Error in optim(start, llik, hessian = TRUE, method = method, ...) :
non-finite finite-difference value [1]
#save transformed data in strand_trans to compare both
stand_trans=Yeo
stand_trans[,5]=yjPower(Yeo[,5],transform$lambda[1])
stand_trans[,6]=yjPower(Yeo[,6],transform$lambda[2])
stand_trans[,7]=yjPower(Yeo[,7],transform$lambda[3])
stand_trans[,8]=yjPower(Yeo[,8],transform$lambda[4])
stand_trans[,9]=yjPower(Yeo[,9],transform$lambda[5])
stand_trans[,10]=yjPower(Yeo[,10],transform$lambda[6])
stand_trans[,11]=yjPower(Yeo[,11],transform$lambda[7])
Parameters
'data.frame': 888 obs. of 14 variables:
$ ID : num 1 2 3 4 5 6 7 8 9 10 ...
$ Year : num 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
$ Date : Factor w/ 19 levels "","01.09.2019",..: 19 19 19 19 19 19 19 17 17 17 ...
$ Country : Factor w/ 3 levels "","France","Argentina": 3 3 3 3 3 3 3 3 3 3 ...
$ Low.Freq : num 4209 8607 9361 9047 7979 ...
$ High.Freq : num 15770 18220 19853 18220 17843 ...
$ Start.Freq : num 4436 13945 16264 12283 12691 ...
$ End.Freq : num 4436 13945 16264 12283 12691 ...
$ Peak.Freq : num 4594 8906 11531 10781 8812 ...
$ Center.Freq : num 1.137 0.754 0.785 0.691 0.883 ...
$ Delta.Freq : num 11560 9613 10492 9173 9864 ...

C5.0 algorithm not working due to logical factor, solutions?

This question has been asked before, however it was not answered in a way that solved my problem. The question was also slightly different.
I am trying to build a decision tree model using the c5 package. I am trying to predict if MMA fighters have championship potential (this is a logical factor with 2 levels yes/no).
Originally this column was a boolean but i converted it to a factor using
fighters_clean$championship_potential <- as.factor(fighters_clean$championship_potential)
table(fighters_clean$championship_potential)
#Rename binary outcome
fighters_clean$championship_potential <- factor(fighters_clean$championship_potential,
levels = c("TRUE", "FALSE"), labels = c("YES", "NO"))
on my data frame it says "Factor with 2 levels" which should work as the classifier for a c5 decision tree, however I keep getting this error message.
Error in UseMethod("QuinlanAttributes") :
no applicable method for 'QuinlanAttributes' applied to an object of class "logical"
The code for my model is below.
#Lets use a decision tree to see what fighters have that championship potential
table(fighters_clean$championship_potential)
#FALSE TRUE
#2578 602
#create test and training data
#set seed alters the random number generator so that it is random but repeatable, the number is arbitrary.
set.seed(123)
Tree_training <- sample(3187, 2868)
str(Tree_training)
#So what this does is it creates a vector of 2868 random integers.
#We use this vector to split our data into training and test data
#it should be a representative 90/10 split.
Tree_Train <- fighters_clean[Tree_training, ]
Tree_Test <- fighters_clean[-Tree_training, ]
#That worked, sweet.
#Now lets see if they are representative.
#Should be even number of champ potential in both data sets,
prop.table(table(Tree_Train$championship_potential))
prop.table(table(Tree_Test$championship_potential))
#awesome so thats a perfect split, with each data set having 18% champions.
#C5 is a commercial software for decision tree models that is built into R
#We will use this to build a decision tree.
str(Tree_Train)
'data.frame': 2868 obs. of 12 variables:
$ name : chr "Jesse Juarez" "Milton Vieira" "Joey Gomez" "Gilbert Smith" ...
$ SLpM : num 1.71 1.13 2.93 1.09 5.92 0 0 1.2 0 2.11 ...
$ Str_Acc : num 48 35 35 41 51 0 0 33 0 50 ...
$ SApM : num 2.87 2.36 4.03 2.73 3.6 0 0 1.73 0 1.89 ...
$ Str_Def : num 52 48 53 35 55 0 0 73 0 63 ...
$ TD_Avg : num 2.69 2.67 1.15 3.51 0.44 0 0 0 0 0.19 ...
$ TD_Acc : num 33 53 37 60 33 0 0 0 0 40 ...
$ TD_Def : num 50 12 50 0 70 0 0 50 0 78 ...
$ Sub_Avg : num 0 0.7 0 1.2 0.4 0 0 0 0 0.3 ...
$ Win_percentage : num 0.667 0.565 0.875 0.714 0.8 ...
$ championship_potential: Factor w/ 2 levels "YES","NO": 2 2 1 2 2 2 1 2 2 2 ...
$ contender : logi FALSE FALSE TRUE TRUE TRUE TRUE ...
library(C50)
DTModel <- C5.0(Tree_Train [-11], Tree_Train$championship_potential, trials = 1, costs = NULL)

Error using ggmcmc package

I am using the ggmcmc package to produce a summary pdf file of rjags package output using the ggmcmc() function. However, I get the following error message:
> ggmcmc(x, file = "Model0-output.pdf")
Plotting histograms
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 160, 164
When I check the structure of the input dataframe I created with the ggs() function, everything looks to be correct.
> str(x)
'data.frame': 240000 obs. of 4 variables:
$ Iteration: int 1 2 3 4 5 6 7 8 9 10 ...
$ Chain : int 1 1 1 1 1 1 1 1 1 1 ...
$ Parameter: Factor w/ 32 levels "N[1]","N[2]",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 96 87 76 79 89 95 85 78 86 89 ...
- attr(*, "nChains")= int 3
- attr(*, "nParameters")= int 32
- attr(*, "nIterations")= int 2500
- attr(*, "nBurnin")= num 2000
- attr(*, "nThin")= num 2
- attr(*, "description")= chr "postout0"
- attr(*, "parallel")= logi FALSE
Can anyone help me identify where the error is being caused and how I can correct it? Am I missing something obvious?
ggmcmc 0.5.1 solves the calculation of the number of bins in a different manner that it did it in previous versions. Previous versions relied on ggplot2:::bin, whereas 0.5.1 computes the bins and their binwidth by itself.
It is likely your case that the range of some of the parameters was so extreme that rounding errors would make some of them have one more or one less bins, therefore producing this error.

Error reading Stata data in R

I am trying to read a Stata dataset in R with the foreign package, but when I try to read the file using:
library(foreign)
data <- read.dta("data.dta")
I got the following error:
Error in read.dta("data.dta") : a binary read error occurred
The file works fine in Stata. This site suggests saving the file in Stata without labels and then reading it into R. With this workaround I am able to load the file into R, but then I lose the labels. Why am I getting this error and how can I read the file into R with the labels? Another person finds that they get this error when they have variables with no values. My data do have at least one or two such variables, but I have no easy way to identify those variables in stata. It is a very large file with thousands of variables.
You should call library(foreign) before reading the Stata data.
library(foreign)
data <- read.dta("data.dta")
Updates: As mentioned here,
"The error message implies that the file was found, and that it started
with the right sequence of bytes to be a Stata .dta file, but that
something (probably the end of the file) prevented R from reading what it
was expecting to read. "
But, we might be just guessing without any further information.
Update to OP's question and answer:
I have tried whether that is the case using auto data from Stata, but its not.So, there should be other reasons:
*Claims 1 and 2: if there is missings in variable or there is dataset with labels, R read.dta will generate the error *
sysuse auto #this dataset has labels
replace mpg=. #generates missing for mpg variable
br in 1/10
make price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign
AMC Concord 4099 3 2.5 11 2930 186 40 121 3.58 Domestic
AMC Pacer 4749 3 3.0 11 3350 173 40 258 2.53 Domestic
AMC Spirit 3799 3.0 12 2640 168 35 121 3.08 Domestic
Buick Century 4816 3 4.5 16 3250 196 40 196 2.93 Domestic
Buick Electra 7827 4 4.0 20 4080 222 43 350 2.41 Domestic
Buick LeSabre 5788 3 4.0 21 3670 218 43 231 2.73 Domestic
Buick Opel 4453 3.0 10 2230 170 34 304 2.87 Domestic
Buick Regal 5189 3 2.0 16 3280 200 42 196 2.93 Domestic
Buick Riviera 10372 3 3.5 17 3880 207 43 231 2.93 Domestic
Buick Skylark 4082 3 3.5 13 3400 200 42 231 3.08 Domestic
save "~myauto"
de(myauto)
Contains data from ~\myauto.dta
obs: 74 1978 Automobile Data
vars: 12 25 Aug 2013 11:32
size: 3,478 (99.9% of memory free) (_dta has notes)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair Record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn Circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear Ratio
foreign byte %8.0g origin Car type
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: foreign
library(foreign)
myauto<-read.dta("myauto.dta") #works perfect
str(myauto)
'data.frame': 74 obs. of 12 variables:
$ make : chr "AMC Concord" "AMC Pacer" "AMC Spirit" "Buick Century" ...
$ price : int 4099 4749 3799 4816 7827 5788 4453 5189 10372 4082 ...
$ mpg : int NA NA NA NA NA NA NA NA NA NA ...
$ rep78 : int 3 3 NA 3 4 3 NA 3 3 3 ...
$ headroom : num 2.5 3 3 4.5 4 4 3 2 3.5 3.5 ...
$ trunk : int 11 11 12 16 20 21 10 16 17 13 ...
$ weight : int 2930 3350 2640 3250 4080 3670 2230 3280 3880 3400 ...
$ length : int 186 173 168 196 222 218 170 200 207 200 ...
$ turn : int 40 40 35 40 43 43 34 42 43 42 ...
$ displacement: int 121 258 121 196 350 231 304 196 231 231 ...
$ gear_ratio : num 3.58 2.53 3.08 2.93 2.41 ...
$ foreign : Factor w/ 2 levels "Domestic","Foreign": 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "datalabel")= chr "1978 Automobile Data"
- attr(*, "time.stamp")= chr "25 Aug 2013 11:23"
- attr(*, "formats")= chr "%-18s" "%8.0gc" "%8.0g" "%8.0g" ...
- attr(*, "types")= int 18 252 252 252 254 252 252 252 252 252 ...
- attr(*, "val.labels")= chr "" "" "" "" ...
- attr(*, "var.labels")= chr "Make and Model" "Price" "Mileage (mpg)" "Repair Record 1978" ...
- attr(*, "expansion.fields")=List of 2
..$ : chr "_dta" "note1" "from Consumer Reports with permission"
..$ : chr "_dta" "note0" "1"
- attr(*, "version")= int 12
- attr(*, "label.table")=List of 1
..$ origin: Named int 0 1
.. ..- attr(*, "names")= chr "Domestic" "Foreign"
Here's a solver list. My guess is that the first item has a 75% likelihood to solve your issue.
In Stata, resave a fresh copy of your dta file with saveold, and try again.
If that fails, provide a sample to show what kind of values kill the read.dta function.
If missing values are to blame, run the loop from the other answer.
A more thorough description of the dataset would be required to work past that point. The issue seems fixable, I've never had much trouble using foreign with tons of Stata files.
You might also give a try to the Stata.file function in the memisc package to see if that fails too.
I do not know why this occurs and would be interested if anyone could explain, but read.dta indeed cannot handle variables that are all NA. A solution is to delete such variables in Stata with the following code:
foreach varname of varlist * {
quietly sum `varname'
if `r(N)'==0 {
drop `varname'
disp "dropped `varname' for too much missing data"
}
}
It's been a lot of time, but I solved this same problem exporting the .dta data to .csv. The problem was related to the labels of the factor variables, especially because the labels were in Spanish and the ASCII encoding is a mess. I hope this work for someone with the same problem and with Stata software.
In stata:
export delimited using "/Users/data.csv", nolabel replace
In R:
df <- read.csv("lapop2014.csv")

R merge() not working (anymore) as intended [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 6 years ago.
This has worked for me before but now it isn't and I have spent two days tinkering with it before I ask for help here.
I have two datasets, one called Access, the other CO2. Each one has four variables, two of which are common and are what I want to use to merge the two datasets. Just to play it really save, I am pasting the head() and str() outputs here:
> head(Access) > head(CO2)
x y access x y CO2equ
1 -32.65 83.65 0.00 1 -32.65 83.65 183316.4
2 -36.85 83.55 4481.25 2 -36.85 83.55 173327.8
3 -36.75 83.55 4464.75 3 -36.75 83.55 301413.9
4 -36.65 83.55 4448.25 4 -36.65 83.55 360757.2
5 -36.55 83.55 4431.00 5 -36.55 83.55 409523.5
6 -36.45 83.55 4414.50 6 -36.45 83.55 448302.0
> str(Access)
'data.frame': 2183106 obs. of 3 variables:
$ x : num -32.7 -36.8 -36.8 -36.7 -36.5 ...
$ y : num 83.7 83.5 83.5 83.5 83.5 ...
$ access: num 0 4481 4465 4448 4431 ...
- attr(*, "data_types")= chr "N" "N" "N"
> str(CO2)
'data.frame': 2183106 obs. of 3 variables:
$ x : num -32.7 -36.9 -36.8 -36.7 -36.6 ...
$ y : num 83.6 83.5 83.5 83.5 83.5 ...
$ CO2equ: num 183316 173328 301414 360757 409523 ...
- attr(*, "data_types")= chr "N" "N" "N"
Now I am trying to versions of merge(). The first one results in an empty data.frame, the second in all rows existing twice, once for the variables from the first dataset, and the second with the variables from the second dataset:
> M1 = merge(Access, CO2, c("x","y"))
> head(M1)
[1] x y access CO2equ
<0 rows> (or 0-length row.names)
> M2 = merge(Access, CO2, by=c("x","y"), all=TRUE)
> length(M2$x)
[1] 4366212
> head(M2)
x y access CO2equ
1 -179.95 -89.95 NA 0
2 -179.95 -89.85 NA 0
3 -179.95 -89.75 NA 0
4 -179.95 -89.65 NA 0
5 -179.95 -89.55 NA 0
6 -179.95 -89.45 NA 0
Obviously, the respective x- and y-values are not recognized as being equivalent - but I do not know why. The data types are the same, the values look the same, and worst of all, I did this successfully a few months ago. Back then, I sasve the command history and now when I just copy and paste it into my R console, it does not work. I tried it in both R 2.13.0 and Revolution R Enterprise 4.3. I am reasonably sure that this is not a software bug but something trivial that I just overlooked even after spending some two days on this.
Cheers,
Jochen
Try round(..., 1) on both x and y before the merge.

Resources