Cannot coerce class ....to a data.frame error - r

R subject
I have an "cannot coerce class "c("summary.turnpoints", "turnpoints")" to a data.frame" error when trying to save the summary in a file. I have tried to fix that with as.data.frame with no success.
code :
library(plyr)
library(pastecs)
data <- read.table("C:\\Users\\Ron\\Desktop\\dataset.txt", header=F, col.name="A")
data.tp=turnpoints(data$A)
print(data.tp)
Turning points for: data$A
nbr observations : 5990
nbr ex-aequos : 51
nbr turning points: 413 (first point is a pit)
E(p) = 3992 Var(p) = 1064.567 (theoretical)
Turning points for: data$A
nbr observations : 5990
nbr ex-aequos : 51
nbr turning points: 413 (first point is a pit)
E(p) = 3992 Var(p) = 1064.567 (theoretical)
data.sum=summary(data.tp)
print(data.sum)
point type proba info
1 11 pit 7.232437e-15 46.97444
2 21 peak 7.594058e-14 43.58212
3 30 pit 3.479857e-27 87.89303
4 51 peak 5.200612e-29 93.95723
5 62 pit 7.594058e-14 43.58212
6 70 peak 6.213321e-14 43.87163
7 81 pit 6.276081e-16 50.50099
8 91 peak 5.534016e-23 73.93602
.....................................
write.table(data.sum, file = "C:\\Users\\Ron\\Desktop\\datasetTurnP.txt")
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class "c("summary.turnpoints", "turnpoints")" to a data.frame
In addition: Warning messages:
1: package ‘plyr’ was built under R version 3.0.1
2: package ‘pastecs’ was built under R version 3.0.1
How can I save these summary results to a text file?
Thank you.

Look at the Value section of:
?pastecs::summary.turnpoints
It should be clear that this will not be a set of lists all of which have the same length. Hence the error message. So rather than asking for the impossible, ... tell us what you wanted to save.
It's actually not impossible, just not possible with write.table, since it's not a dataframe. The dump function would allow you to construct an ASCII representation of the structure(...) representation of that summary-object.
dump(data.sum, file="dump_data_sum.asc")
This could then be source()-ed

Related

Duplicate Row Name Error for FAMD Visualization

I'm trying to perform this function in R: fviz_famd_ind() and keep getting an error. It works on the wine dataset provided in the package, but not on my cleaned data set from Telco.Customer.Churn from IBM.
I've created the object of the FAMD function using the cleaned data set called dfcfamd1. I've verified there are no duplicate row or column names in the sets using any(duplicated(rownames())) for both Telco.Customer.Churn and dfcfamd1 which both return FALSE.
fviz_famd_ind(dfcfamd1)
> Error in `.rowNamesDF<-`(x, value = value) :
> duplicate 'row.names' are not allowed
> In addition: Warning message:
> non-unique values when setting 'row.names': ‘No’, ‘Yes’
Sample Data below
head(Telco.Customer.Churn)
customerID gender SeniorCitizen Partner Dependents tenure
1 7590-VHVEG Female 0 Yes No 1
2 5575-GNVDE Male 0 No No 34
3 3668-QPYBK Male 0 No No 2
PhoneService MultipleLines InternetService OnlineSecurity
1 No No DSL No
2 Yes No DSL Yes
3 Yes Yes Fiber optic No
OnlineBackup DeviceProtection TechSupport StreamingTV
1 Yes No No No
2 No No No No
3 No Yes No Yes
StreamingMovies Contract PaperlessBilling PaymentMethod
1 No Month-to-month Yes Electronic check
2 No One year No Mailed check
3 No Month-to-month Yes Mailed check
MonthlyCharges TotalCharges Churn
1 29.85 29.85 No
2 56.95 1889.50 No
3 53.85 108.15 Yes
The output should give me a graphical output which it does for the package data, but not for my data.
Attempting to set names to unique, I get a vector error.
rownames(dfcfamd1) = make.names(names, unique=TRUE)
> Error in as.character(names) :
> cannot coerce type 'builtin' to vector of type 'character'
The issue is that names is a function
rownames(dfcfamd1) = make.names(names, unique=TRUE)
instead it should be
row.names(dfcfamd1) = make.names(row.names(dfcfamd1), unique=TRUE)
Try:
fviz_pca_ind(dfcfamd1)
PS: I met the same problem! It could be solved by simply using the function fviz_pca_ind rather than using the function fviz_famd_ind, as the two functions use data with similar structures.
It seems that fviz_famd_ind cannot handle the same values across multiple categorical columns.
One way to solve this is to rename the values to be unique across columns:
# Define factors
cols <- c("Partner","Dependents ", "PhoneService", "MultipleLines", "InternetService","OnlineSecurity" "OnlineBackup", "DeviceProtection",
"TechSupport", "StreamingTV", "StreamingMovies","PaperlessBilling","Churn")
dfcfamd1[cols] <- lapply(dfcfamd1[cols], factor)
rm(cols)
# Rename the factors
# Do this for every column until only unique values remain.
dfcfamd1$Partner<- recode_factor(dfcfamd1$Partner,"Yes" = "yesParnter", "No" = "noPartner")
#[...]
dfcfamd1$Churn<- recode_factor(dfcfamd1$Churn,"Yes" = "yesChurn", "No" = "noChurn")
# Run the function on dfcfamd1
fviz_famd_ind(dfcfamd1)

subsetting data.cube inside custom function

I am trying to make a function of my own to subset a data.cube in R, and format the result automatically for some predefined plots I aim to build.
This is my function.
require(data.table)
require(data.cube)
secciona <- function(cubo = NULL,
fecha_valor = list(),
loc_valor = list(),
prod_valor = list(),
drop = FALSE){
cubo[fecha_valor, loc_valor, prod_valor, drop = drop]
## The line above will really be an asignment of type y <- format(cubo[...drop])
## Rest of code which will end up plotting the subset of the function
}
The thing is I keep on getting the error: Error in eval(expr, envir, enclos) : object 'fecha_valor' not found
What is most strange for me, is that on the console everything works fine, but not when defined inside the subsetting function of mine.
In console:
> dc[list(as.Date("2013/01/01"))]
> dc[list(as.Date("2013/01/01")),]
> dc[list(as.Date("2013/01/01")),,]
> dc[list(as.Date("2013/01/01")),list(),list()]
all give as result:
<data.cube>
fact:
5627 rows x 2 dimensions x 1 measures (0.32 MB)
dimensions:
localizacion : 4 entities x 3 levels (0.01 MB)
producto : 153994 entities x 3 levels (21.29 MB)
total size: 21.61 MB
But whenever I try
secciona(dc)
secciona(dc, fecha_valor = list(as.Date("2013/01/01")))
secciona(dc, fecha_valor = list())
I always get the error above mentioned.
Any ideas why this is happening? should I proceed in else way for my approach of editing the subset for plotting?
This is the standard issue that R users will face when dealing with non-standard evaluation. This is a consequence of Computing on the language R language feature.
[.data.cube function expects to be used in interactive way, that extends the flexibility of the arguments passed to it, but gives some restrictions. In that aspect it is similar to [.data.table when passing expressions from wrapper function to [ subset operator. I've added dummy example to make it reproducible.
I see you are already using data.cube-oop branch, so just to clarify for other readers. data.cube-oop branch is 92 commits ahead of master branch, to install use the following.
install.packages("data.cube", repos = paste0("https://", c(
"jangorecki.gitlab.io/data.cube",
"Rdatatable.github.io/data.table",
"cran.rstudio.com"
)))
library(data.cube)
set.seed(1)
ar = array(rnorm(8,10,5), rep(2,3),
dimnames = list(color = c("green","red"),
year = c("2014","2015"),
country = c("IN","UK"))) # sorted
dc = as.data.cube(ar)
f = function(color=list(), year=list(), country=list(), drop=FALSE){
expr = substitute(
dc[color=.color, year=.year, country=.country, drop=.drop],
list(.color=color, .year=year, .country=country, .drop=drop)
)
eval(expr)
}
f(year=list(c("2014","2015")), country="UK")
#<data.cube>
#fact:
# 4 rows x 3 dimensions x 1 measures (0.00 MB)
#dimensions:
# color : 2 entities x 1 levels (0.00 MB)
# year : 2 entities x 1 levels (0.00 MB)
# country : 1 entities x 1 levels (0.00 MB)
#total size: 0.01 MB
You can track the expression just by putting print(expr) before/instead eval(expr).
Read more about non-standard evaluation:
- R Language Definition: Computing on the language
- Advanced R: Non-standard evaluation
- manual of substitute function
And some related SO questions:
- Passing on non-standard evaluation arguments to the subset function
- In R, why is [ better than subset?

R object not found, although returned using print

I don't understand the R output. It seems that my clearly defined object outcome is not found, although it is successfully used in sub-functions and printed. How is that possible?
My R code:
f.hazardratio <- function(input)
{
outcome <- c("A","B","C","D","E","F")
category <- c(rep("surv",2),rep("term",2),rep("lobw",2))
for(i in 1:length(outcome))
{
if(nrow(subset(input,input[,paste("out",category[i],sep=".")]==outcome[i]))>0)
{
lex <- f.lexis(data=input,
out=category[i],
out.case=outcome[i])
print(str(lex))
print(outcome[i])
print(head(subset(lex, lex.Xst=="A")))
print(head(subset(lex, lex.Xst==outcome[i])))
# nrow(subset(lex, lex.Xst==outcome[i])) is the value I am actually interest in and causes the same error message as print(), which I only added for identifying the problem
# code continues, but not shown ...
}
}
}
And the output:
Classes ‘Lexis’ and 'data.frame': 107455 obs. of 6 variables:
$ pre.time : num
$ lex.dur : num
$ lex.Xst : Factor w/ 3 levels
$ lex.Cst : Factor w/ 3 levels
[1] "A"
pre.time lex.dur lex.Xst lex.Cst
930 145 36 A vv
2255 273 14 A vv
4842 115 99 A vv
5127 260 30 A vv
5217 71 108 A v
5422 152 2 A vv
Error in eval(expr, envir, enclos) (from #32) : object 'outcome' not found
I have already tried to alter the type of variables from factor to character or vice versa and tried to define an intermediate, temporary variable tmp <- outcome[i]. Unfortunately, nothing has worked so far.
Replacing subset() using square brackets as suggested by Spacedman solved the problem.
Welcome to functional programming. If you want to specify a particular value that will be returned, then wrap it in the return(.) function. Otherwise the value returned is simple the results of the last evaluation. All of the variables created within the function will be inaccessible from outside and later garbage-collected. Calling print may or may not give you exactly the object. Some authors pass the object-to-be-printed on to summary.class (where class is an attribute of the object) first and do not return an exact copy of the argument.

Reading large fixed format text file in r

I am trying to input a large (> 70 MB) fixed format text file into r. For a smaller file (< 1MB), I can use the read.fwf() function as shown below.
condodattest1a <- read.fwf(impfile1,widths=testcsv3$Varlen,col.names=testcsv3$Varname)
When I try to run the line of code below,
condodattest1 <- read.fwf(impfile,widths=testcsv3$Varlen,col.names=testcsv3$Varname)
I get the following error message:
Error: cannot allocate vector of size 2 Kb
The only difference between the 2 lines is the size of the input file.
The formatting for the file I want to import is given in the dataframe called testcsv3. I show a small snippet of the dataframe below:
> head(testcsv3)
Varlen Varname Varclass Varsep Varforfmt
1 2 "V1" "character" 2 "A2.0"
2 15 "V2" "character" 17 "A15.0"
3 28 "V3" "character" 45 "A28.0"
4 3 "V4" "character" 48 "F3.0"
5 1 "V5" "character" 49 "A1.0"
6 3 "V6" "character" 52 "A3.0"
At least part of my problem is that I am reading in all the data as factors when I use read.fwf() and I end up exceeding the memory limit on my computer.
I tried to use read.table() as a way of formatting each variable but it seems I need a text delimiter with that function. There is a suggestion in section 3.3 in the link below that I could use sep to identify the column where every variable starts.
http://data.princeton.edu/R/readingData.html
However, when I use the command below:
condodattest1b <- read.table(impfile1,sep=testcsv3$Varsep,col.names=testcsv3$Varname, colClasses=testcsv3$Varclass)
I get the following error message:
Error in read.table(impfile1, sep = testcsv3$Varsep, col.names = testcsv3$Varname, : invalid 'sep' argument
Finally, I tried to use:
condodattest1c <- read.fortran(impfile1,lengths=testcsv3$Varlen, format=testcsv3$Varforfmt, col.names=testcsv3$Varname)
but I get the following message:
Error in processFormat(format) : missing lengths for some fields
In addition: Warning messages:
1: In processFormat(format) : NAs introduced by coercion
2: In processFormat(format) : NAs introduced by coercion
3: In processFormat(format) : NAs introduced by coercion
All I am trying to do at this point is format the data when they come into r as something other than factors. I am hoping this will limit the amount of memory I am using and allow me to actually input the file. I would appreciate any suggestions about how I can do this. I know the Fortran formats for all the variables and the column at which each variable begins.
Thank you,
Warren
Maybe this code works for you. You have to fill varlen with the field sizes and add the corresponding type strings (e.g. numeric, character, integer) to colclasses
my.readfwf <- function(filename,varlen,colclasses) {
sidx <- cumsum(c(1,varlen[1:(length(varlen)-1)]))
eidx <- sidx+varlen-1
filecontent <- scan(filename,character(0),sep="\n")
if (any(diff(nchar(filecontent))!=0))
stop("line lengths differ!")
nlines <- length(filecontent)
res <- list()
for (i in seq_along(varlen)) {
res[[i]] <- sapply(filecontent,substring,first=sidx[i],last=eidx[i])
mode(res[[i]]) <- colclasses[i]
}
attributes(res) <- list(names=paste("V",seq_along(res),sep=""),row.names=seq_along(res[[1]]),class="data.frame")
return(res)
}

r strucchange package: error numeric 'envir' arg not of length one

I want to carry out a structural change test on exchange rate data. I have a zoo series named m with several exchange rates. I first create a window m1.
m1 <- window(m, start = as.Date("1998-12-31"), end = as.Date("2010-12-31"))
The first column of m1 looks like the following.
head(m1$mbel)
1998-12-31 1999-01-31 1999-02-28 1999-03-31 1999-04-30 1999-05-31
1.2346 1.2278 1.2269 1.2259 1.2328 1.2357
mbel is the variable of interest currently. For mbel, I want to test parameter stability for a simple linear model, mbel ~ mbel(lag1).
I first combine the level and first lags of logs of mbel data.
cb <- cbind(log(m1$mbel),lag(log(m1$mbel),k = -1))
colnames(cb) <- c("rate","ratelag1")
cb <- window(cb, start = as.Date("1999-01-31"), end = as.Date("2010-12-31"))
head(cb)
rate ratelag1
1999-01-31 0.2052240 0.2107470
1999-02-28 0.2044907 0.2052240
1999-03-31 0.2036753 0.2044907
1999-04-30 0.2092880 0.2036753
1999-05-31 0.2116376 0.2092880
1999-06-30 0.2190552 0.2116376
Now with library strucchange, I use the following tests.
r <- Fstats(cb$rate~cb$ratelag1, data = cb )
re <- efp(cb$rate~cb$ratelag1,data = cb,type="RE")
plot(re)
sctest(re)
The following error is showing up:
Error in eval(attr(terms(formula), "variables")[[2]], data, env) :
numeric 'envir' arg not of length one
What am I missing here? Please help.

Resources