$ operator is invalid for atomic vectors :: R shiny - r

I have a customer survey of 25 questions. Question's answer are available in "1", "2", "3", "4" (1-Very Good, 2-Good, 3-Normal, 4-Bad)
Each rows contains the respondent name with all the answers given by him.
Data is in this format, respondent ID and the response value, Column header contains the Question name.
21044194- 1- 2- 4- 1- 3- 1- 1- 2- 1- 2- 1- 3- 2- 2- 1- 2- 4- 2
21044198- 1- 2- 4- 4- 3- 1- 1- 2- 1- 2- 1- 3- 2- 4- 1- 3- 4- 2
21044199- 1- 2- 3- 1- 2- 3- 2- 1- 1- 2- 1- 3- 2- 4- 1- 3- 4- 2
Now I want to create a shiny app, in which I have a list of all 25 questions as an input and on the basis of selected question I need to display the pie chart of answers. Like this for 1 question 31% ppl choose Very Good, 22% choose Good, 31% choose Normal and 17% choose Bad.
I have written the following code ->
Ui.R
library(shiny)
maxraw <- read.csv("C:/Users/Suchita/Desktop/maxraw.csv")
coln <- colnames(maxraw)
# Define UI for dataset viewer application
shinyUI(pageWithSidebar(
headerPanel('Iris k-means clustering'),
sidebarPanel(
selectInput('xcol', 'X Variable', choices = c(coln[26], coln[27], coln[28], coln[29])),
#selectInput('ycol', 'Y Variable', names(iris),
#selected=names(iris)[[2]]),
numericInput('clusters', 'Cluster count', 3,
min = 1, max = 9)
),
mainPanel(
plotOutput('plot1')
)
))
Server.R
library(shiny)
library(datasets)
maxraw <- read.csv("C:/Users/Suchita/Desktop/maxraw.csv")
# Define server logic required to summarize and view the selected
# dataset
shinyServer(function(input, output, session) {
# Combine the selected variables into a new data frame
selectedData <- reactive({
ss <- switch(input$xcol,
"Question1." = 26,
"Question2" = 27,
"Question3" = 28)
a = table(maxraw[,ss])
a = as.data.frame(a)
a$pct <- round(a$Freq/sum(a$Freq)*100) #calculated percentage
a$pcts <- paste(a$pct, "%") # add percents to labels
})
output$plot1 <- renderPlot({
pie(a$pct,labels = a$pcts, main = "Hospital Survey")
})
})
Here is the str(maxraw)
str(maxraw)
'data.frame': 43 obs. of 48 variables:
$ Response.ID : int 21044194 21044264 21044287 21044402 21044435 21044481 21044529 21059249 21059266 21059297 ...
$ IP.Address : Factor w/ 6 levels "","122.177.157.116",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Timestamp..MM.DD.YYYY. : Factor w/ 44 levels "","02/12/2014 04:30:20",..: 2 3 4 5 6 7 8 9 10 11 ...
$ Duplicate : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ Time.Taken.to.Complete..Seconds. : int 146 125 181 94 111 112 575 149 115 0 ...
$ Response.Status : Factor w/ 3 levels "","Complete",..: 2 2 2 2 2 2 2 2 2 3 ...
$ Seq..Number : int 1 1 1 1 1 1 1 1 1 1 ...
$ External.Reference : logi NA NA NA NA NA NA ...
$ Custom.Variable.1 : logi NA NA NA NA NA NA ...
$ Custom.Variable.2 : logi NA NA NA NA NA NA ...
$ Custom.Variable.3 : logi NA NA NA NA NA NA ...
$ Custom.Variable.4 : logi NA NA NA NA NA NA ...
$ Custom.Variable.5 : logi NA NA NA NA NA NA ...
$ Respondent.Email : logi NA NA NA NA NA NA ...
$ Email.Group.Code : logi NA NA NA NA NA NA ...
$ Country.Code : Factor w/ 2 levels "","IN": 2 2 2 2 2 2 2 2 2 2 ...
$ Region : int 10 10 10 10 10 10 10 10 10 10 ...
$ Please.take.a.minute.to.give.us.your.feedback...it.helps.us.improve.Thank.you.very.much.for.your.time.and.support..Please.start.with.the.survey.now.by.clicking.on.the..B.Continue..B..button.below. : logi NA NA NA NA NA NA ...
$ Date.Of.Visit : Factor w/ 28 levels "","01/01/2014",..: 22 6 24 1 19 2 21 7 5 1 ...
$ First.Name : Factor w/ 39 levels "","Abhi","Afsar",..: 16 21 39 15 14 29 26 38 17 1 ...
$ Last.Name : Factor w/ 40 levels "","Abhinav","Ali",..: 24 37 35 19 33 13 29 25 9 1 ...
$ Phone : num 4.1e+07 4.1e+07 4.1e+07 4.1e+07 4.1e+07 ...
$ Email.Address : Factor w/ 40 levels "","aali#gmail.com",..: 17 24 39 16 15 29 28 38 18 1 ...
$ Name.of.the.doctor. : Factor w/ 29 levels "","Dr Jholu",..: 29 17 14 28 28 26 12 5 18 1 ...
$ Max.ID. : num 45367298 65438900 67534373 67543923 78654389 ...
$ Satisfaction.With.Doctor.Was.the.Doctor.available.on.time. : int 4 3 2 3 NA 3 4 2 1 NA ...
$ Satisfaction.With.Doctor.Did.the.Doctor.treat.you.with.courtesy.and.respect. : int 4 2 3 4 2 3 4 3 1 NA ...
$ Satisfaction.With.Doctor.Did.the.Doctor.explain.your.diagnosis.and.treatment.plan.in.a.way.you.could.understand. : int 4 3 2 3 3 3 4 3 1 NA ...
$ Satisfaction.with.Nurses.Did.the.Nurses.treat.you.with.courtesy.and.respect. : int 4 3 2 4 3 4 4 3 1 NA ...
$ Appointment.Was.your.appointment.call.handled.efficiently.and.queries.resolved.to.your.satisfaction. : int 4 2 3 3 4 3 4 3 1 NA ...
$ Reception.Helpdesk.Was.the.Help.Desk.staff.at.the.hospital.helpful.and.courteous. : int 4 3 4 3 4 3 4 2 1 NA ...
$ Hospital.Infrastructure.Environment.Was.the.out.patient.department.location.convenient.to.identify. : int 4 2 2 3 4 2 4 3 1 NA ...
$ Hospital.Infrastructure.Environment.Did.the.areas.you.visited.in.the.hospital.look.clean.and.orderly. : int 4 3 3 4 3 1 4 2 1 NA ...
$ Hospital.Infrastructure.Environment.Were.the.public.area.washrooms.clean.and.hygienic. : int 4 3 2 3 4 2 4 2 1 NA ...
$ Front.Office.and.Billing.Did.the.front.office.staff.explain.and.resolve.your.query.regarding.registration.consult.diagnostics.charges.efficiently. : int 4 3 2 3 2 3 4 2 1 NA ...
$ Front.Office.and.Billing.Was.your.billing.handled.in.a.timely.and.accurate.manner. : int 4 2 3 2 1 2 4 NA 1 NA ...
$ Diagnostics.Services.Were.the.diagnostic.tests.conducted.in.a.timely.manner. : int 4 3 2 1 1 3 4 2 1 NA ...
$ Diagnostics.Services.Were.the.diagnostic.tests.conducted.efficiently.and.sensitively. : int 4 3 3 1 2 3 4 2 1 NA ...
$ Diagnostics.Services.Were.you.clearly.informed.about.report.delivery.time.and.mode.of.collection. : int 4 3 3 1 2 NA 4 2 1 NA ...
$ Max.Chemist.Were.all.the.prescribed.medicines.or.substitutes.available.at.the.chemist. : int 4 3 NA 2 1 4 4 2 1 NA ...
$ Max.Chemist.Did.you.find.the.services.at.the.pharmacy.efficient.and.timely. : int 4 4 NA 3 1 2 4 2 1 NA ...
$ Security...Parking.Did.you.find.our.car.parking.Valet.service.polite.and.efficient. : int 4 3 3 3 3 3 4 2 1 NA ...
$ How.likely.is.that.you.would.recommend.Max.Healthcare.to.a.friend.or.colleague. : int 9 7 6 4 7 8 10 6 1 NA ...
$ Any.additional.suggestions.or.comments : Factor w/ 31 levels ""," No","Abhinz was good",..: 28 29 18 21 31 NA 28 30 6 1 ...
$ Help.us.recognize.any.of.our.staff.who.served.you.exceptionally.well..by.providing.his.her.name. : logi NA NA NA NA NA NA ...
$ A. : Factor w/ 25 levels "","Abhinav","Abhinav ",..: 21 21 21 20 14 8 9 12 16 1 ...
$ B. : Factor w/ 24 levels "","Abhinav","balu ",..: 21 18 19 20 14 10 16 22 1 1 ...
$ C. : Factor w/ 17 levels "","Chiya","Dimple",..: 1 15 1 7 9 1 3 4 1 1 ...
I'm getting the error "$ operator is invalid for atomic vectors".
Can someone pls suggest the way around.
Thanks.

The error appears to be caused by the fact that selectedData is returning a vector, rather than a data.frame, but you are trying to use it as a data.frame.
You need to explicitly return the full data.frame from selectedData (e.g., add return(a) at the bottom of that function). Then, you need to actually call selectedData() within your renderPlot call (e.g., start with a <- selectedData() in output$plot1)

I realise this is an old response and you might have fixed this already. This is more for others who have the dreaded $ operator is invalid for atomic vectors message. For those having a >nuclear meltdown< see below...
Have you tried updating shinyapps? You have to do this from their github not the normal way (like in Rstudio update packages button), but from R command line.
a) get devtools (if you don't have it)
install.packages('devtools')
b) re-install shiny apps from command line
devtools::install_github('rstudio/shinyapps')
Clue to "where is error?" Look at text ahead of the message as it might help track it down. Mine was:
Error in account$server : $ operator is invalid for atomic vectors
Which suggested a problem on shiny server side more than my code. It was a bug in their new roleout of shinyapps, and updating shinyapps version cleared out my atomic error.
I hope Aarithmo you came right. Debugging shiny is tricky. Note you can put browser() statements in your shiny code to debug. It will stop the code at that point and you check your variables for issues.

Related

Error in model.frame.default, variable lengths differ

I ran a glmer, and got the following the error message "Error in model.frame.default(data = data.density.EM.gra, weights = number_of_nest.boxes, : variable lengths differ (found for 'year')". I don't understand what this means despite reading a number of different posts regarding the same error.
here is my model:
model.1.EM.gra<-glmer(cbind(data.density$number.nest.boxes.occupied.that.year,data.density$number_of_nest.boxes)~ caterpillar.sc +(1|year),data = data.density.EM.gra,weights = number_of_nest.boxes,family = binomial)
I appreciate any suggestions you may have.
setwd("~/Word/UQAM/Master's_Reale/DATA/Blue tits data and instructions/csv") # work station
install.packages("dplyr")
#calling libraries.
library(dplyr)
library (reprex)
library(lme4)
data.density<-read.csv ("nest_box_caterpillar_density.csv")
data.density$year<-factor (data.density$year)# making year a factor (categorical variable)
str(data.density) # now we see year as a factor in the data.
#> 'data.frame': 63 obs. of 16 variables:
#> $ year : Factor w/ 9 levels "2011","2012",..: 1 2 3 4 5 6 7 8 9 1 ...
#> $ number.nest.boxes.occupied.that.year: int 17 13 12 16 16 16 15 17 12 17 ...
#> $ number_of_nest.boxes : int 20 20 20 20 20 20 20 20 20 30 ...
#> $ failure : int 3 3 3 3 3 3 3 3 3 13 ...
#> $ proportion_occupied_boxes : num 0.85 0.65 0.6 0.8 0.8 0.8 0.75 0.85 0.6 0.57 ...
#> $ site : Factor w/ 7 levels "ari","ava","fel",..: 5 5 5 5 5 5 5 5 5 1 ...
#> $ population : Factor w/ 3 levels "D-Muro","E-Muro",..: 2 2 2 2 2 2 2 2 2 2 ...
#> $ mean_yearly_frass : num 295 231 437 263 426 ...
#> $ site_ID : Factor w/ 63 levels "2011_ari_","2011_ava_",..: 5 12 19 26 33 40 47 54 61 1 ...
#> $ exploration_avg : num 13.28 14.19 9.85 9.42 8.67 ...
#> $ X : logi NA NA NA NA NA NA ...
#> $ X.1 : logi NA NA NA NA NA NA ...
#> $ X.2 : Factor w/ 2 levels "","failure means the total number of nest boxes -the number of nest boxes occupied. ": 1 1 1 1 1 1 1 1 1 2 ...
#> $ X.3 : logi NA NA NA NA NA NA ...
#> $ X.4 : logi NA NA NA NA NA NA ...
#> $ X.5 : Factor w/ 5 levels "","1 column with number of nest boxes used. ",..: 1 1 4 3 1 2 5 1 1 1 ...
#making new objects
density<-data.density$proportion_occupied_boxes # making a new object called density
caterpillar<-data.density$mean_yearly_frass # making new object called caterpillar
caterpillar.sc<-scale(caterpillar)
data.density.EM<-filter(data.density,population=='E-Muro') # data for population 'E-Muro'
data.density.EM.gra<-filter(data.density.EM,site=='gra') # data for site gra in in the E-Muro population.
View(data.density.EM.gra)
model.1.EM.gra<-glmer(cbind(data.density$number.nest.boxes.occupied.that.year,data.density$number_of_nest.boxes)~ caterpillar.sc +(1|year),
data = data.density.EM.gra,
weights = number_of_nest.boxes,
family = binomial)
#> Error in model.frame.default(data = data.density.EM.gra, weights = number_of_nest.boxes, : variable lengths differ (found for 'year')

How to combine training and testing dataset in same format

I am practicing with this dataset: http://archive.ics.uci.edu/ml/datasets/Census+Income
I loaded training & testing data.
# Downloading train and test data
trainFile = "adult.data"; testFile = "adult.test"
if (!file.exists (trainFile))
download.file (url = "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
destfile = trainFile)
if (!file.exists (testFile))
download.file (url = "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",
destfile = testFile)
# Assigning column names
colNames = c ("age", "workclass", "fnlwgt", "education",
"educationnum", "maritalstatus", "occupation",
"relationship", "race", "sex", "capitalgain",
"capitalloss", "hoursperweek", "nativecountry",
"incomelevel")
# Reading training data
training = read.table (trainFile, header = FALSE, sep = ",",
strip.white = TRUE, col.names = colNames,
na.strings = "?", stringsAsFactors = TRUE)
# Load the testing data set
testing = read.table (testFile, header = FALSE, sep = ",",
strip.white = TRUE, col.names = colNames,
na.strings = "?", fill = TRUE, stringsAsFactors = TRUE)
I needed to combined two into one. But, there is a problem. I am seeing structure of the two data is not same.
Display structure of the training data
> str (training)
'data.frame': 32561 obs. of 15 variables:
$ age : int 39 50 38 53 28 37 49 52 31 42 ...
$ workclass : Factor w/ 8 levels "Federal-gov",..: 7 6 4 4 4 4 4 6 4 4 ...
$ fnlwgt : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
$ education : Factor w/ 16 levels "10th","11th",..: 10 10 12 2 10 13 7 12 13 10 ...
$ educationnum : int 13 13 9 7 13 14 5 9 14 13 ...
$ maritalstatus: Factor w/ 7 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
$ occupation : Factor w/ 14 levels "Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
$ relationship : Factor w/ 6 levels "Husband","Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
$ race : Factor w/ 5 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
$ sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 1 1 1 2 1 2 ...
$ capitalgain : int 2174 0 0 0 0 0 0 0 14084 5178 ...
$ capitalloss : int 0 0 0 0 0 0 0 0 0 0 ...
$ hoursperweek : int 40 13 40 40 40 40 16 45 50 40 ...
$ nativecountry: Factor w/ 41 levels "Cambodia","Canada",..: 39 39 39 39 5 39 23 39 39 39 ...
$ incomelevel : Factor w/ 2 levels "<=50K",">50K": 1 1 1 1 1 1 1 2 2 2 ...
Display structure of the testing data
> str (testing)
'data.frame': 16282 obs. of 15 variables:
$ age : Factor w/ 74 levels "|1x3 Cross validator",..: 1 10 23 13 29 3 19 14 48 9 ...
$ workclass : Factor w/ 9 levels "","Federal-gov",..: 1 5 5 3 5 NA 5 NA 7 5 ...
$ fnlwgt : int NA 226802 89814 336951 160323 103497 198693 227026 104626 369667 ...
$ education : Factor w/ 17 levels "","10th","11th",..: 1 3 13 9 17 17 2 13 16 17 ...
$ educationnum : int NA 7 9 12 10 10 6 9 15 10 ...
$ maritalstatus: Factor w/ 8 levels "","Divorced",..: 1 6 4 4 4 6 6 6 4 6 ...
$ occupation : Factor w/ 15 levels "","Adm-clerical",..: 1 8 6 12 8 NA 9 NA 11 9 ...
$ relationship : Factor w/ 7 levels "","Husband","Not-in-family",..: 1 5 2 2 2 5 3 6 2 6 ...
$ race : Factor w/ 6 levels "","Amer-Indian-Eskimo",..: 1 4 6 6 4 6 6 4 6 6 ...
$ sex : Factor w/ 3 levels "","Female","Male": 1 3 3 3 3 2 3 3 3 2 ...
$ capitalgain : int NA 0 0 0 7688 0 0 0 3103 0 ...
$ capitalloss : int NA 0 0 0 0 0 0 0 0 0 ...
$ hoursperweek : int NA 40 50 40 40 30 30 40 32 40 ...
$ nativecountry: Factor w/ 41 levels "","Cambodia",..: 1 39 39 39 39 39 39 39 39 39 ...
$ incomelevel : Factor w/ 3 levels "","<=50K.",">50K.": 1 2 2 3 3 2 2 2 3 2 ...
Problem 1:
age has become factor at testing. and all other levels of factor in testing is being increased by 1 than levels of factor in training. This is because first row is an unnecessary row in testing.
|1x3 Cross validator
I tried to get rid of this by re-assigning testing:
testing = testing[-1,]
but, after running str() command again, I don't see any change.
Problem 2:
Like I said at previous, I needed to combine those two data-frame into one data-frame. So, I run this:
combined <- rbind(training , testing)
Besides the problem-1, I can see new a problem after running str()
> str(combined)
'data.frame': 48842 obs. of 15 variables:
$ age : chr "39" "50" "38" "53" ...
$ workclass : Factor w/ 9 levels "Federal-gov",..: 7 6 4 4 4 4 4 6 4 4 ...
$ fnlwgt : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
$ education : Factor w/ 17 levels "10th","11th",..: 10 10 12 2 10 13 7 12 13 10 ...
$ educationnum : int 13 13 9 7 13 14 5 9 14 13 ...
$ maritalstatus: Factor w/ 8 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
$ occupation : Factor w/ 15 levels "Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
$ relationship : Factor w/ 7 levels "Husband","Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
$ race : Factor w/ 6 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
$ sex : Factor w/ 3 levels "Female","Male",..: 2 2 2 2 1 1 1 2 1 2 ...
$ capitalgain : int 2174 0 0 0 0 0 0 0 14084 5178 ...
$ capitalloss : int 0 0 0 0 0 0 0 0 0 0 ...
$ hoursperweek : int 40 13 40 40 40 40 16 45 50 40 ...
$ nativecountry: Factor w/ 42 levels "Cambodia","Canada",..: 39 39 39 39 5 39 23 39 39 39 ...
$ incomelevel : Factor w/ 5 levels "<=50K",">50K",..: 1 1 1 1 1 1 1 2 2 2 ...
factor levels at target variable (incomelevel) in combined data-frame is 5 where it's 2 (which is correct) in the training data-frame and 3 (increased by 1 for problem-1) in testing data-frame. This is because there is a . (dot) after each value at incomelevel in testing data-frame (<=50K., <=50K., >50K.,......). So, I need to remove that .(dot) But, I am not getting idea how to remove it. Is there any function?
I am very in data and r. That's why, facing this type of basic issues. Can you please help me to solve the issue I am facing?
I think you can ignore the first line of test, this will solve the issue of age being a factor, because it seems like a header:
head(readLines(testFile))
[1] "|1x3 Cross validator"
[2] "25, Private, 226802, 11th, 7, Never-married, Machine-op-inspct, Own-child, Black, Male, 0, 0, 40, United-States, <=50K."
[3] "38, Private, 89814, HS-grad, 9, Married-civ-spouse, Farming-fishing, Husband, White, Male, 0, 0, 50, United-States, <=50K."
We run your code, we can use read.csv, with skip=1 for test:
colNames = c ("age", "workclass", "fnlwgt", "education",
"educationnum", "maritalstatus", "occupation",
"relationship", "race", "sex", "capitalgain",
"capitalloss", "hoursperweek", "nativecountry",
"incomelevel")
# Reading training data
training = read.csv (trainFile, header = FALSE, col.names = colNames,stringsAsFactors = TRUE,na.strings = "?",strip.white = TRUE)
testing = read.csv (testFile, header = FALSE, col.names = colNames,na.strings = "?",stringsAsFactors = TRUE,skip=1,strip.white = TRUE)
Now, the income level, unfortunately we have to correct it manually, it's a good thing you check:
testing$incomelevel = factor(gsub("\\.","",as.character(testing$incomelevel)))
We check levels, only difference is native country:
all.equal(sapply(testing,levels) ,sapply(training,levels))
[1] "Component “nativecountry”: Lengths (40, 41) differ (string compare on first 40)"
[2] "Component “nativecountry”: 26 string mismatches"
And I don't think there's much you can do, maybe you have to remove it before / after joining:
setdiff(levels(training$nativecountry),levels(testing$nativecountry))
[1] "Holand-Netherlands"

'Object not found' error even though table() verifies the object is in the data set

I've read through others who have had a similar issue, but my situation doesn't seem to be the same as the fixes that have been proposed for those other issues. I'm trying to recode a variable using a conditional statement. I want to take a character string & turn it into a numeric so I can subset those observations out into a new data frame. Here's what I have, so far:
blad_mor <- read.csv("blad_mor.csv", header = T)
str(blad_mor)
blad_mor_recode <- gsub(C670:C679, 29010, blad_mor$cod)
I get this output for the str() command:
> str(blad_mor)
'data.frame': 127073 obs. of 12 variables:
$ year : int 1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
$ sex : Factor w/ 4 levels "1","2","F","M": 1 1 1 2 1 2 2 2 2 2 ...
$ race : Factor w/ 17 levels "America","Asian &",..: 4 4 4 4 4 4 4 4 4 4 ...
$ county : Factor w/ 79 levels "COUNTY1","COUNTY2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ cod : Factor w/ 327 levels "C001","C005",..: 89 108 108 294 63 42 172 74 85 269 ...
$ fips : int 1 1 1 1 1 1 1 1 1 1 ...
$ state : int 5 5 5 5 5 5 5 5 5 5 ...
$ race_code : int 2 2 2 2 2 2 2 2 2 2 ...
$ ethnicity : Factor w/ 4 levels "","Hispanic",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ethnicity_code: int NA NA NA NA NA NA NA NA NA NA ...
But when I try the blad_mor_recode <- gsub(C670:C679, 29010, blad_mor$cod) code I get this error:
> blad_mor_recode <- gsub(C670:C679, 29010, blad_mor$cod)
Error in gsub(C670:C679, 29010, blad_mor$cod) : object 'C670' not found
So, I verify that there actually is that object by table(blad_mor$cod) with this being some of the output:
C578 C579 C58 C60 C601 C609 C61 C629 C631 C639 C64 C65 C66 C670 C672 C674 C675 C676
2 43 4 1 1 53 6162 62 1 14 2911 30 47 1 4 1 1 2
C677 C678 C679 C680 C689 C690 C692 C693 C694 C695 C696 C699 C700 C701 C709 C71 C710 C711
1 4 2776 35 77 1 4 5 1 1 8 45 7 3 11 1 29 34
The object 'C670' has one instance as per this output, yet R is telling me it is not there & doesn't run the command. What am I missing here? Should I change the class type from factor to something else? I'm quite confused.
Edit: I have tried quotes around the character strings (e.g. blad_mor_recode <- gsub('C670:C679', '29010', blad_mor$cod) as well as ifelse(). I still get the same error message.
If you want to change all strings from C70to C79 you have to use regex. Something like the following would work:
blad_mor_recode <- gsub("C7[0-9]", "29010", blad_mor$cod)
A simple example:
gsub("C7[0-9]","",c("C60","C70","C78"))
[1] "C60" "" ""

Error when trying to use one_hot encoding

I know this may be a potential duplicate question, but I found other answers didn't work in my situation.
I am using the following dataset:
> str(total_data)
'data.frame': 32260 obs. of 13 variables:
$ age : int 40 42 44 32 25 31 30 30 27 28 ...
$ workclass : Factor w/ 4 levels "Other-Unknown",..: 3 2 2 1 2 2 2 3 2 3 ...
$ education : Ord.factor w/ 7 levels "1"<"2"<"3"<"4"<..: 2 3 2 2 2 3 2 2 2 2 ...
$ marital.status : Factor w/ 5 levels "Divorced","Married",..: 2 1 2 3 3 3 3 2 2 3 ...
$ occupation : Factor w/ 6 levels "Blue-Collar",..: 5 3 6 2 1 6 6 1 1 6 ...
$ race : Factor w/ 5 levels "Amer-Indian-Eskimo",..: 1 5 1 1 5 5 5 5 5 5 ...
$ sex : Factor w/ 2 levels "Female","Male": 2 2 2 1 2 2 2 2 1 1 ...
$ hours.per.week : int 84 40 40 38 40 38 48 70 35 38 ...
$ naitive.country: Factor w/ 41 levels "?","Cambodia",..: 39 39 39 39 39 39 39 12 39 39 ...
$ classifier : chr "<=50K" "<=50K" ">50K" "<=50K" ...
$ class_num : Factor w/ 2 levels "1","2": 1 1 2 1 1 1 1 2 1 1 ...
$ age_norm : num 0.315 0.342 0.37 0.205 0.11 ...
$ hours_norm : num 0.847 0.398 0.398 0.378 0.398 ...
I'm trying to encode the factors into binary using one_hot() but receive the following error message:
encoded_data <- one_hot(total_data, dropCols = FALSE)
ERROR MESSAGE:
Error in `[.data.frame`(dt, , cols, with = FALSE) :
unused argument (with = FALSE)
I'm not sure what the "with" argument is as I don't see it in the R documentation.
I also saw that someone suggested to use model.matrix. However, when I use that, my ordered factor gets encoded as well, which is what I'm trying to avoid.
This is what happens to my ordered factor variable:
education.L education.Q education.C education^4 education^5 education^6
-3.779645e-01 9.690821e-17 4.082483e-01 -0.5640761 4.364358e-01 -0.19738551
-1.889822e-01 -3.273268e-01 4.082483e-01 0.0805823 -5.455447e-01 0.49346377
I'm also not sure why there are sometimes letters or numbers after the attribute name. i.e. education**.L** vs education**^5**
Convert the data.frame into a data.table and it should work fine.
library(data.table)
dt = data.table(total_data)
one_hot(dt)

Extracting complete dataframe from Hmisc package in R

I've used aregImpute to impute the missing values then i used impute.transcan function trying to get complete dataset using the following code.
impute_arg <- aregImpute(~ age + job + marital + education + default +
balance + housing + loan + contact + day + month + duration + campaign +
pdays + previous + poutcome + y , data = mov.miss, n.impute = 10 , nk =0)
imputed <- impute.transcan(impute_arg, imputation=1, data=mov.miss, list.out=TRUE, pr=FALSE, check=FALSE)
y <- completed[names(imputed)]
and when i used str(y) it already gives me a dataframe but with NAs as it is not imputed before, My question is how to get complete dataset without NAs after imputation?
str(y)
'data.frame': 4521 obs. of 17 variables:
$ age : int 30 NA 35 30 NA 35 36 39 41 43 ...
$ job : Factor w/ 12 levels "admin.","blue-collar",..: 11 8 5 5 2 5 7 10 3 8 ...
$ marital : Factor w/ 3 levels "divorced","married",..: 2 2 3 2 2 3 2 2 2 2 ...
$ education: Factor w/ 4 levels "primary","secondary",..: 1 2 3 3 2 3 NA 2 3 1 ...
$ default : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 NA 1 1 1 ...
$ balance : int NA 4789 1350 1476 0 747 307 147 NA -88 ...
$ housing : Factor w/ 2 levels "no","yes": NA 2 2 2 NA 1 2 2 2 2 ...
$ loan : Factor w/ 2 levels "no","yes": 1 2 1 2 NA 1 1 NA 1 2 ...
$ contact : Factor w/ 3 levels "cellular","telephone",..: 1 1 1 3 3 1 1 1 NA 1 ...
$ day : int 19 NA 16 3 5 23 14 6 14 NA ...
$ month : Factor w/ 12 levels "apr","aug","dec",..: 11 9 1 7 9 4 NA 9 9 1 ...
$ duration : int 79 220 185 199 226 141 341 151 57 313 ...
$ campaign : int 1 1 1 4 1 2 1 2 2 NA ...
$ pdays : int -1 339 330 NA -1 176 330 -1 -1 NA ...
$ previous : int 0 4 NA 0 NA 3 2 0 0 2 ...
$ poutcome : Factor w/ 4 levels "failure","other",..: 4 1 1 4 4 1 2 4 4 1 ...
$ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
I have tested your code myself, and it works just fine, except for the last line:
y <- completed[names(imputed)]
I believe there's a type in the above line. Plus, you do not even need the completed function.
Besides, if you want to get a data.frame from the impute.transcan function, then wrap it with as.data.frame:
imputed <- as.data.frame(impute.transcan(impute_arg, imputation=1, data=mov.miss, list.out=TRUE, pr=FALSE, check=FALSE))
Moreover, if you need to test your missing data pattern, you can also use the md.pattern function provided by the mice package.

Resources