Ordering categorical variables with ggplot stacked barplots - r

I have a data frame tbl with 252 obs of 8 variables.
> str(CE0008)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 252 obs. of 8 variables:
$ Period start time: POSIXct, format: "2018-02-28" "2018-02-28" "2018-02-28" "2018-02-28" ...
$ WBTS name : chr "CE0008" "CE0008" "CE0008" "CE0008" ...
$ WBTS ID : num 27 27 27 27 27 27 27 27 27 27 ...
$ WCEL name : chr "CE0008U09C3" "CE0008U21B1" "CE0008U21B3" "CE0008U21C2" ...
$ WCEL ID : num 33 2 22 13 32 3 23 1 11 31 ...
$ PRACHDelayRange : num 4 4 4 4 4 4 4 4 4 4 ...
$ class : Ord.factor w/ 21 levels "Class 0"<"Class 1"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ count : num 22177 37507 37580 24066 6029 ...
I wish to produce a bar plot where the x axis contains the class variable where class variables in ordered from Class 0 through to Class 20. The height of the bars is given by the count variable and the bars are coloured by WCEL name.
From the str of my data frame tbl the class variable is an ord. factor but for some reason when I plot the data the order doesn't carry over and similarly with fill of the colour by 'WCEL name'. Any pointers would be greatly appreciated.
ggplot(data = CE0008, aes(x = class, y = count, fill = 'WCEL name')) + geom_col()

data$class <- factor(data$class, as.character(data$class))
your ggplot call goes here

Related

unable to write to the csv file [duplicate]

I am trying to write a dataframe in R to a text file, however it is returning to following error:
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L)
X[[j]] <- as.matrix(X[[j]]) :
missing value where TRUE/FALSE needed
I used the following command for the export:
write.table(df, file ='dfname.txt', sep='\t' )
I have no idea what the problem could stem from. As far as "missing data where TRUE/FALSE is needed", I have only one column which contains TRUE/FALSE values, and none of these values are missing.
Contents of the dataframe:
> str(df)
'data.frame': 776 obs. of 15 variables:
$ Age : Factor w/ 4 levels "","A","J","SA": 2 2 2 2 2 2 2 2 2 2 ...
$ Sex : Factor w/ 2 levels "F","M": 1 1 1 1 2 2 2 2 2 2 ...
$ Rep : Factor w/ 11 levels "L","NR","NRF",..: 1 1 4 4 2 2 2 2 2 2 ...
$ FA : num 61.5 62.5 60.5 61 59.5 59.5 59.1 59.2 59.8 59.9 ...
$ Mass : num 20 19 16.5 17.5 NA 14 NA 23 19 18.5 ...
$ Vir1 : num 999 999 999 999 999 999 999 999 999 999 ...
$ Vir2 : num 999 999 999 999 999 999 999 999 999 999 ...
$ Vir3 : num 40 999 999 999 999 999 999 999 999 999 ...
$ Location : Factor w/ 4 levels "Loc1",..: 4 4 4 4 4 4 2 2 2 2 ...
$ Site : Factor w/ 6 levels "A","B","C",..: 5 5 5 5 5 5 3 3 3 3 ...
$ Date : Date, format: "2010-08-30" "2010-08-30" ...
$ Record : int 35 34 39 49 69 38 145 112 125 140 ...
$ SampleID : Factor w/ 776 levels "AT1-A-F1","AT1-A-F10",..: 525 524 527 528
529 526 111 78
88 110 ...
$ Vir1Inc : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ Month :'data.frame': 776 obs. of 2 variables:
..$ Dates: Date, format: "2010-08-30" "2010-08-30" ...
..$ Month: Factor w/ 19 levels "Apr-2011","Aug-2010",..: 2 2 2 2
2 2 18 18 18 18 ...
I hope I've given enough/the right information ...
Many thanks,
Heather
An example to reproduce the error. I create a nested data.frame:
Month=data.frame(Dates= as.Date("2003-02-01") + 1:15,
Month=gl(12,2,15))
dd <- data.frame(Age=1:15)
dd$Month <- Month
str(dd)
'data.frame': 15 obs. of 2 variables:
$ Age : int 1 2 3 4 5 6 7 8 9 10 ...
$ Month:'data.frame': 15 obs. of 2 variables:
..$ Dates: Date, format: "2003-02-02" "2003-02-03" "2003-02-04" ...
..$ Month: Factor w/ 12 levels "1","2","3","4",..: 1 1 2 2 3 3 4 4 5 5 ...
No I try to save it , I reproduce the error :
write.table(dd)
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L)
X[[j]] <- as.matrix(X[[j]]) : missing value where TRUE/FALSE needed
Without inverstigating, one option to remove the nested data.frame:
write.table(data.frame(subset(dd,select=-c(Month)),unclass(dd$Month)))
The solution by agstudy provides a great quick fix, but there is a simple alternative/general solution for which you do not have to specify the element(s) in your data.frame that was(were) nested:
The following bit is just copied from agstudy's solution to obtain the nested data.frame dd:
Month=data.frame(Dates= as.Date("2003-02-01") + 1:15,
Month=gl(12,2,15))
dd <- data.frame(Age=1:15)
dd$Month <- Month
You can use akhilsbehl's LinearizeNestedList() function (which mrdwab made available here) to flatten (or linearize) the nested levels:
library(devtools)
source_gist(4205477) #loads the function
ddf <- LinearizeNestedList(dd, LinearizeDataFrames = TRUE)
# ddf is now a list with two elements (Age and Month)
ddf <- LinearizeNestedList(ddf, LinearizeDataFrames = TRUE)
# ddf is now a list with 3 elements (Age, `Month/Dates` and `Month/Month`)
ddf <- as.data.frame.list(ddf)
# transforms the flattened/linearized list into a data.frame
ddf is now a data.frame without nesting. However, it's column names still reflect the nested structure:
names(ddf)
[1] "Age" "Month.Dates" "Month.Month"
If you want to change this (in this case it seems redundant to have Month. written before Dates, for example) you can use gsub and some regular expression that I copied from Sacha Epskamp to remove all text in the column names before the ..
names(ddf) <- gsub(".*\\.","",names(ddf))
names(ddf)
[1] "Age" "Dates" "Month"
The only thing left now is exporting the data.frame as usual:
write.table(ddf, file="test.txt")
Alternatively, you could use the "flatten" function from the jsonlite package to flatten the dataframe before export. It achieves the same result of the other functions mentioned and is much easier to implement.
jsonlite::flatten
https://rdrr.io/cran/jsonlite/man/flatten.html

single instead multiple boxplots with ggplot

I would like to make a boxplot for a variable (Theta..vol..) depending on two factors (Tiefe) and (Ort).
> str(data)
'data.frame': 30 obs. of 6 variables:
$ Nummer : int > 1 2 3 4 5 6 7 8 9 10 ...
$ Name : int 11 12 13 14 15 16 17 18 19 20 ...
$ Ort : Factor w/ 2 levels "NNW","S": 2 2 2 2 2 2 2 2 2 2 ...
$ Tiefe : int 20 20 20 20 20 50 50 50 50 50 ...
$ Gerät : int 2 2 2 2 2 2 2 2 2 2 ...
$ Theta..vol..: num 15 16.4 14.9 16.6 10.6 22.1 17.6 10 18 20.3 ...
My code is:
ggplot(data, aes(x = Tiefe, y = Theta..vol.., fill=Ort))+geom_boxplot()
Since the variable(Tiefe) has 3 levels and the variable (Ort) has 2 levels I wish to see three paired boxplots (each pair for a single (Tiefe).
But I see just a single pair (one boxplot for one level of "Ort" and another boxplot for the second level of the "Ort"
What should I change to get three pairs for each "Tiefe"? Thank you
In your code, Tiefe is being read as an integer not a factor.
Easy fix using dplyr with ggplot2:
First I made some dummy data:
library(dplyr)
data <- tibble(
Ort = ifelse(runif(30) > 0.5, "NNW", "S"),
Tiefe = rep(c(20, 50, 75), times = 10),
Theta..vol.. = rnorm(30,15))
Next, we modify the Tiefe column before piping into the ggplot:
data %>%
mutate(Tiefe = factor(Tiefe)) %>%
ggplot(aes(x = Tiefe, y = Theta..vol.., fill = Ort)) +
geom_boxplot()

GGPLOT: Printing Stacked Bar Chart & Line to File

I know that it might not look like it from this question, but I've actually been programming for over 20 years, but I'm new to R. I'm trying to move away from Excel and to automate creation of about 100 charts I currently do in Excel by hand. I've asked two previous questions about this: here and here. Those solutions work for those toy examples, but when I try the exact same code on my own full program, they behave very differently and I'm completely befuddled as to why. When I run the program below, the testplot.png file is just a plot of the line, without the stacked bar chart.
So here is my (full) code as cut down as I can make it. If anyone wants to critique my programming, go ahead. I know that the comments are light, but that's to try to shorten it for this post. Also, this does actually download the USDA PSD database which is about 20MB compressed and is 170MB uncompressed...sorry but I would love someone's help on this!
Edit, here are str() outputs of both 'full' data and 'toy' data. The toy data works, the full data doesn't.
> str(melteddata)
Classes ‘data.table’ and 'data.frame': 18 obs. of 3 variables:
$ Year : int 1 2 3 4 5 6 1 2 3 4 ...
$ variable: Factor w/ 3 levels "stocks","exports",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Qty : num 2 4 3 2 4 3 4 8 6 4 ...
- attr(*, ".internal.selfref")=<externalptr>
> str(SoySUHist)
Classes ‘data.table’ and 'data.frame': 159 obs. of 3 variables:
$ Year : int 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 ...
$ variable: Factor w/ 3 levels "Stocks","DomCons",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Qty : num 0.0297 0.0356 0.0901 0.1663 0.3268 ...
- attr(*, ".internal.selfref")=<externalptr>
> str(linedata)
Classes ‘data.table’ and 'data.frame': 6 obs. of 2 variables:
$ Year: int 1 2 3 4 5 6
$ Qty : num 15 16 15 16 15 16
- attr(*, ".internal.selfref")=<externalptr>
> str(SoyProd)
Classes ‘data.table’ and 'data.frame': 53 obs. of 2 variables:
$ Year: int 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 ...
$ Qty : num 701 846 928 976 1107 ...
- attr(*, ".internal.selfref")=<externalptr>
>
library(data.table)
library(ggplot2)
library(ggthemes)
library(plyr)
toyplot <- function(plotdata,linedata){
plotCExp <- ggplot(plotdata) +
geom_bar(aes(x=Year,y=Qty,factor=variable,fill=variable), stat="identity") +
geom_line(data=linedata, aes(x=Year,y=Qty)) # <---- comment out this line & the stack plot works
ggsave(plotCExp,filename = "ggsavetest.png", width=7, height=5, units="in")
}
convertto <- function(value,crop,unit='BU'){
if (unit=='BU' & ( crop=='WHEAT' | crop=='SOYBEANS')){
value = value * 36.7437
}
return(value)
}
# =====================================
# Download Data (Warning...large download!)
# =====================================
system("curl https://apps.fas.usda.gov/psdonline/download/psd_alldata_csv.zip | funzip > DATA/psd.csv")
tmp <- fread("DATA/psd.csv")
PSD = data.table(tmp)
rm(tmp)
setkey(PSD,Country_Code,Commodity_Code,Attribute_ID)
tmp=unique(PSD[,.(Commodity_Description,Attribute_Description,Commodity_Code,Attribute_ID)])
tmp[order(Commodity_Description)]
names(PSD)[names(PSD) == "Market_Year"] = "Year"
names(PSD)[names(PSD) == "Value"] = "Qty"
PSDCmdtyAtt = unique(PSD[,.(Commodity_Code,Attribute_ID)])
# Soybean Production, Consumpion, Stocks/Use
SoyStocks = PSD[list("US",2222000,176),.(Year,Qty)] # Ending Stocks
SoyExp = PSD[list("US",2222000,88),.(Year,Qty)] # Exports
SoyProd = PSD[list("US",2222000,28),.(Year,Qty)] # Total Production
SoyDmCons = PSD[list("US",2222000,125),.(Year,Qty)] # Total Dom Consumption
SoyStocks$Qty = convertto(SoyStocks$Qty,"SOYBEANS","BU")/1000
SoyExp$Qty = convertto(SoyExp$Qty,"SOYBEANS","BU")/1000
SoyProd$Qty = convertto(SoyProd$Qty,"SOYBEANS","BU")/1000
SoyDmCons$Qty = convertto(SoyDmCons$Qty,"SOYBEANS","BU")/1000
# Stocks/Use
SoySUPlot <- SoyExp
names(SoySUPlot)[names(SoySUPlot) == "Qty"] = "Exports"
SoySUPlot$DomCons = SoyDmCons$Qty
SoySUPlot$Stocks = SoyStocks$Qty
SoySUHist <- melt(SoySUPlot,id.vars="Year")
SoySUHist$Qty = SoySUHist$value/1000
SoySUHist$value <- NULL
SoySUPlot$StocksUse = 100*SoySUPlot$Stocks/(SoySUPlot$DomCons+SoySUPlot$Exports)
SoySUPlot$Production = SoyProd$Qty/1000
SoySUHist$variable <- factor(SoySUHist$variable, levels = rev(levels(SoySUHist$variable)))
SoySUHist = arrange(SoySUHist,variable)
toyplot(SoySUHist,SoyProd)
All right, I'm feeling generous. Your example code contains a lot of fluff that should not be in a minimal reproducible example and your system call is not portable, but I had a look anyway.
The good news: Your code works as expected.
Let's plot only the bars:
ggplot(SoySUHist) +
geom_bar(aes(x=Year,y=Qty,factor=variable,fill=variable), stat="identity")
Now only the lines:
ggplot(SoySUHist) +
geom_line(data=SoyProd, aes(x=Year,y=Qty))
Now compare the scales of the y-axes. If you plot both together, the bars get plotted, but they are so small that you can't see them. You need to rescale:
ggplot(SoySUHist) +
geom_bar(aes(x=Year,y=Qty,factor=variable,fill=variable), stat="identity") +
geom_line(data=SoyProd, aes(x=Year,y=Qty/1000))

How to apply Naive Bayes model to new data

I asked a question on this this morning but am deleting that and posting here with more betterer wording.
I created my first machine learning model using train and test data. I returned a confusion matrix and saw some summary stats.
I would now like to apply the model to new data to make predictions but I don't know how.
Context: Predicting monthly "churn" cancellations. Target variable is "churned" and it has two possible labels "churned" and "not churned".
head(tdata)
months_subscription nvk_medium org_type churned
1 25 none Community not churned
2 7 none Sports clubs not churned
3 28 none Sports clubs not churned
4 18 unknown Religious congregations and communities not churned
5 15 none Association - Professional not churned
6 9 none Association - Professional not churned
Here's me training and testing:
library("klaR")
library("caret")
# import data
test_data_imp <- read.csv("tdata.csv")
# subset only required vars
# had to remove "revenue" since all churned records are 0 (need last price point)
variables <- c("months_subscription", "nvk_medium", "org_type", "churned")
tdata <- test_data_imp[variables]
#training
rn_train <- sample(nrow(tdata),
floor(nrow(tdata)*0.75))
train <- tdata[rn_train,]
test <- tdata[-rn_train,]
model <- NaiveBayes(churned ~., data=train)
# testing
predictions <- predict(model, test)
confusionMatrix(test$churned, predictions$class)
Everything up till here works fine.
Now I have new data, structure and laid out the same way as tdata above. How can I apply my model to this new data to make predictions? Intuitively I was seeking a new column cbinded that had the predicted class for each record.
I tried this:
## prediction ##
# import data
data_imp <- read.csv("pdata.csv")
pdata <- data_imp[variables]
actual_predictions <- predict(model, pdata)
#append to data and output (as head by default)
predicted_data <- cbind(pdata, actual_predictions$class)
# output
head(predicted_data)
Which threw errors
actual_predictions <- predict(model, pdata)
Error in object$tables[[v]][, nd] : subscript out of bounds
In addition: Warning messages:
1: In FUN(1:6433[[4L]], ...) :
Numerical 0 probability for all classes with observation 1
2: In FUN(1:6433[[4L]], ...) :
Numerical 0 probability for all classes with observation 2
3: In FUN(1:6433[[4L]], ...) :
Numerical 0 probability for all classes with observation 3
How can I apply my model to the new data? I'd like a new data frame with a new column that has the predicted class?
** following comment, here is head and str of new data for prediction**
head(pdata)
months_subscription nvk_medium org_type churned
1 26 none Community not churned
2 8 none Sports clubs not churned
3 30 none Sports clubs not churned
4 19 unknown Religious congregations and communities not churned
5 16 none Association - Professional not churned
6 10 none Association - Professional not churned
> str(pdata)
'data.frame': 6433 obs. of 4 variables:
$ months_subscription: int 26 8 30 19 16 10 3 5 14 2 ...
$ nvk_medium : Factor w/ 16 levels "cloned","CommunityIcon",..: 9 9 9 16 9 9 9 3 12 9 ...
$ org_type : Factor w/ 21 levels "Advocacy and civic activism",..: 8 18 18 14 6 6 11 19 6 8 ...
$ churned : Factor w/ 1 level "not churned": 1 1 1 1 1 1 1 1 1 1 ...
This is most likely caused by a mismatch in the encoding of factors in the training data (variable tdata in your case) and the new data used in the predict function (variable pdata), typically that you have factor levels in the test data that are not present in the training data. Consistency in the encoding of the features must be enforced by you, because the predict function will not check it. Therefore, I suggest that you double-check the levels of the features nvk_medium and org_type in the two variables.
The error message:
Error in object$tables[[v]][, nd] : subscript out of bounds
is raised when evaluating a given feature (the v-th feature) in a data point, in which nd is the numeric value of the factor corresponding to the feature. You also have warnings, indicating that the posterior probabilities for all the cases in data points ("observation") 1, 2, and 3 are all zero, but it is not clear if this is also related to the encoding of the factors...
To reproduce the error that you are seeing, consider the following toy data (from http://amunategui.github.io/binary-outcome-modeling/), which has a set of features somewhat similar to that in your data:
# Data setup
# From http://amunategui.github.io/binary-outcome-modeling/
titanicDF <- read.csv('http://math.ucdenver.edu/RTutorial/titanic.txt', sep='\t')
titanicDF$Title <- as.factor(ifelse(grepl('Mr ',titanicDF$Name),'Mr',ifelse(grepl('Mrs ',titanicDF$Name),'Mrs',ifelse(grepl('Miss',titanicDF$Name),'Miss','Nothing'))) )
titanicDF$Age[is.na(titanicDF$Age)] <- median(titanicDF$Age, na.rm=T)
titanicDF$Survived <- as.factor(titanicDF$Survived)
titanicDF <- titanicDF[c('PClass', 'Age', 'Sex', 'Title', 'Survived')]
# Separate into training and test data
inds_train <- sample(1:nrow(titanicDF), round(0.5 * nrow(titanicDF)), replace = FALSE)
Data_train <- titanicDF[inds_train, , drop = FALSE]
Data_test <- titanicDF[-inds_train, , drop = FALSE]
with:
> str(Data_train)
'data.frame': 656 obs. of 5 variables:
$ PClass : Factor w/ 3 levels "1st","2nd","3rd": 1 3 3 3 1 1 3 3 3 3 ...
$ Age : num 35 28 34 28 29 28 28 28 45 28 ...
$ Sex : Factor w/ 2 levels "female","male": 2 2 2 1 2 1 1 2 1 2 ...
$ Title : Factor w/ 4 levels "Miss","Mr","Mrs",..: 2 2 2 1 2 4 3 2 3 2 ...
$ Survived: Factor w/ 2 levels "0","1": 2 1 1 1 1 2 1 1 2 1 ...
> str(Data_test)
'data.frame': 657 obs. of 5 variables:
$ PClass : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ...
$ Age : num 47 63 39 58 19 28 50 37 25 39 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 2 1 1 2 1 2 2 2 ...
$ Title : Factor w/ 4 levels "Miss","Mr","Mrs",..: 2 1 2 3 3 2 3 2 2 2 ...
$ Survived: Factor w/ 2 levels "0","1": 2 2 1 2 2 1 2 2 2 2 ...
Then everything goes as expected:
model <- NaiveBayes(Survived ~ ., data = Data_train)
# This will work
pred_1 <- predict(model, Data_test)
> str(pred_1)
List of 2
$ class : Factor w/ 2 levels "0","1": 1 2 1 2 2 1 2 1 1 1 ...
..- attr(*, "names")= chr [1:657] "6" "7" "8" "9" ...
$ posterior: num [1:657, 1:2] 0.8352 0.0216 0.8683 0.0204 0.0435 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:657] "6" "7" "8" "9" ...
.. ..$ : chr [1:2] "0" "1"
However, if the encoding is not consistent, e.g.:
# Mess things up, by "displacing" the factor values (i.e., 'Nothing'
# will now be encoded as number 5, which was not present in the
# training data)
Data_test_2 <- Data_test
Data_test_2$Title <- factor(
as.character(Data_test_2$Title),
levels = c("Dr", "Miss", "Mr", "Mrs", "Nothing")
)
> str(Data_test_2)
'data.frame': 657 obs. of 5 variables:
$ PClass : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ...
$ Age : num 47 63 39 58 19 28 50 37 25 39 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 2 1 1 2 1 2 2 2 ...
$ Title : Factor w/ 5 levels "Dr","Miss","Mr",..: 3 2 3 4 4 3 4 3 3 3 ...
$ Survived: Factor w/ 2 levels "0","1": 2 2 1 2 2 1 2 2 2 2 ...
then:
> pred_2 <- predict(model, Data_test_2)
Error in object$tables[[v]][, nd] : subscript out of bounds

Error when exporting dataframe to text file in R

I am trying to write a dataframe in R to a text file, however it is returning to following error:
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L)
X[[j]] <- as.matrix(X[[j]]) :
missing value where TRUE/FALSE needed
I used the following command for the export:
write.table(df, file ='dfname.txt', sep='\t' )
I have no idea what the problem could stem from. As far as "missing data where TRUE/FALSE is needed", I have only one column which contains TRUE/FALSE values, and none of these values are missing.
Contents of the dataframe:
> str(df)
'data.frame': 776 obs. of 15 variables:
$ Age : Factor w/ 4 levels "","A","J","SA": 2 2 2 2 2 2 2 2 2 2 ...
$ Sex : Factor w/ 2 levels "F","M": 1 1 1 1 2 2 2 2 2 2 ...
$ Rep : Factor w/ 11 levels "L","NR","NRF",..: 1 1 4 4 2 2 2 2 2 2 ...
$ FA : num 61.5 62.5 60.5 61 59.5 59.5 59.1 59.2 59.8 59.9 ...
$ Mass : num 20 19 16.5 17.5 NA 14 NA 23 19 18.5 ...
$ Vir1 : num 999 999 999 999 999 999 999 999 999 999 ...
$ Vir2 : num 999 999 999 999 999 999 999 999 999 999 ...
$ Vir3 : num 40 999 999 999 999 999 999 999 999 999 ...
$ Location : Factor w/ 4 levels "Loc1",..: 4 4 4 4 4 4 2 2 2 2 ...
$ Site : Factor w/ 6 levels "A","B","C",..: 5 5 5 5 5 5 3 3 3 3 ...
$ Date : Date, format: "2010-08-30" "2010-08-30" ...
$ Record : int 35 34 39 49 69 38 145 112 125 140 ...
$ SampleID : Factor w/ 776 levels "AT1-A-F1","AT1-A-F10",..: 525 524 527 528
529 526 111 78
88 110 ...
$ Vir1Inc : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ Month :'data.frame': 776 obs. of 2 variables:
..$ Dates: Date, format: "2010-08-30" "2010-08-30" ...
..$ Month: Factor w/ 19 levels "Apr-2011","Aug-2010",..: 2 2 2 2
2 2 18 18 18 18 ...
I hope I've given enough/the right information ...
Many thanks,
Heather
An example to reproduce the error. I create a nested data.frame:
Month=data.frame(Dates= as.Date("2003-02-01") + 1:15,
Month=gl(12,2,15))
dd <- data.frame(Age=1:15)
dd$Month <- Month
str(dd)
'data.frame': 15 obs. of 2 variables:
$ Age : int 1 2 3 4 5 6 7 8 9 10 ...
$ Month:'data.frame': 15 obs. of 2 variables:
..$ Dates: Date, format: "2003-02-02" "2003-02-03" "2003-02-04" ...
..$ Month: Factor w/ 12 levels "1","2","3","4",..: 1 1 2 2 3 3 4 4 5 5 ...
No I try to save it , I reproduce the error :
write.table(dd)
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L)
X[[j]] <- as.matrix(X[[j]]) : missing value where TRUE/FALSE needed
Without inverstigating, one option to remove the nested data.frame:
write.table(data.frame(subset(dd,select=-c(Month)),unclass(dd$Month)))
The solution by agstudy provides a great quick fix, but there is a simple alternative/general solution for which you do not have to specify the element(s) in your data.frame that was(were) nested:
The following bit is just copied from agstudy's solution to obtain the nested data.frame dd:
Month=data.frame(Dates= as.Date("2003-02-01") + 1:15,
Month=gl(12,2,15))
dd <- data.frame(Age=1:15)
dd$Month <- Month
You can use akhilsbehl's LinearizeNestedList() function (which mrdwab made available here) to flatten (or linearize) the nested levels:
library(devtools)
source_gist(4205477) #loads the function
ddf <- LinearizeNestedList(dd, LinearizeDataFrames = TRUE)
# ddf is now a list with two elements (Age and Month)
ddf <- LinearizeNestedList(ddf, LinearizeDataFrames = TRUE)
# ddf is now a list with 3 elements (Age, `Month/Dates` and `Month/Month`)
ddf <- as.data.frame.list(ddf)
# transforms the flattened/linearized list into a data.frame
ddf is now a data.frame without nesting. However, it's column names still reflect the nested structure:
names(ddf)
[1] "Age" "Month.Dates" "Month.Month"
If you want to change this (in this case it seems redundant to have Month. written before Dates, for example) you can use gsub and some regular expression that I copied from Sacha Epskamp to remove all text in the column names before the ..
names(ddf) <- gsub(".*\\.","",names(ddf))
names(ddf)
[1] "Age" "Dates" "Month"
The only thing left now is exporting the data.frame as usual:
write.table(ddf, file="test.txt")
Alternatively, you could use the "flatten" function from the jsonlite package to flatten the dataframe before export. It achieves the same result of the other functions mentioned and is much easier to implement.
jsonlite::flatten
https://rdrr.io/cran/jsonlite/man/flatten.html

Resources