Some of the categorical variable I'm trying to convert to dummy did not convert - dummy-variable

data.type_of_meal_plan.value_counts()
Meal Plan 1 27835
Not Selected 5130
Meal Plan 2 3305
Meal Plan 3 5
Name: type_of_meal_plan, dtype: int64
<IPython.core.display.Javascript object>
When I converted the above variable to dummy and run the logistic regression model, I got this error message
ValueError: could not convert string to float: 'Meal Plan 1'
Does this mean meal plan 1 was not converted to a dummy?
How do I fix this?

Related

Do i need to convert categorical variable to factor type for CART?

i have a dataset with:
gender: 0 - female, 1 - male
cancer : 0 - no, 1 - yes
etc
do I need to convert these to factor datatype?
I need to explore these data to come up graphs and lastly, use CART R.
thank you!

How to create a binary variable for logistic regression by using key words in text variable

I have criminal sentencing data that contains a text variable which contains phrases like "2 months jail", "14 months prison", "12 months community supervision." I would like to run a logistic regression to determine the odds that a particular defendant is sent to prison or jail, or if they were released to community supervision. So I want to create a binary variable that shows a 1 for someone sent to "jail"/"prison" and a 0 for those sent to another program
I have tried using library(qdap) but have not had any luck. I have also tried ifelse(df$text %in% "jail", "1", "0") but it only shows 1 observation when I know there are several thousand.
Small data sample:
data<-data.frame('caseid'=c(1,2,3),'text'=c("went to prison","went to jail","released"))
caseid text
1 1 went to prison
2 2 went to jail
3 3 released
Trying to create a binary variable - sentenced - to analyze logistically like:
caseid text sentenced
1 1 went to prison 1
2 2 went to jail 1
3 3 released 0
Thank you for any help you can offer!
You can do the following in base R
transform(data, sentenced = +grepl("(jail|prison)", text))
# caseid text sentenced
#1 1 went to prison 1
#2 2 went to jail 1
#3 3 released 0
Explanation: "(jail|prison)" matches "jail" or "prison", and the unary operator + turns the output of grepl into an integer.

How to fix linear model fitting error in S-plus

I am trying to fit values in my algorithm so that I could predict a next month's number. I am getting a No data for variable errror when clearly I've defined what the objects are that I am putting into the equation.
I've tried to place them in vectors so that it could use one vector as a training data set to predict the new values. Current script has worked for me for a different dataset but for some reason isn't working here.
The data is small so I was wondering if that has anything to do with it. The data is:
Month io obs Units Sold
12 in 1 114
1 in 2 29
2 in 3 105
3 in 4 30
4 in 5
I'm trying to predict Units Sold with the code below
matt<-TEST1
isdf<-matt[matt$month<=3,]
isdf<-na.omit(isdf)
osdf<-matt[matt$Units.Sold==4,]
lmfit<-lm(Units.Sold~obs+Month,data=isdf,na.action=na.omit)
predict(lmFit,osdf[1,1])
I am expecting to be able to place lmfit in predict and get an output.

Using R to change a row into columns for x-number of subsequent rows

I have a data set that I would like to manipulate, but I am having difficulty getting it into a user friendly format. I have the following
Person 1
Class Grade
Math A
Science C
English A
Person 2
Class Grade
Math D
English A
Person 3
...
I would like to change it to the following format
Name Class Grade
Person 1 Math A
Person 1 Science C
Person 1 English A
Person 2 Math D
Person 2 English A
Person 3
The issues I am having is handling it for a different number of subsequent rows for each person and also just taking a single row and making it into a column for some of the subsequent rows.

How to load CSV as factor in R

I have a file called metadata.csv that I want to load into R and convert to a factor.
I begin with:
metadata <- read.csv(file="metadata.csv", header=T, stringsAsFactors=T)
And this loads the CSV just fine. I've printed out metadata here:
> metadata
Filename Genre Date Gender
1 Austen_Emma.txt Social Early Female
2 Bronte_Eyre.txt Social Middle Female
3 Dickens_Expectations.txt Social Late Male
4 Eliot_Mill.txt Social Late Female
5 Lewis_Monk.txt Gothic Early Male
6 Radcliffe_Italian.txt Gothic Early Female
7 Shelley_Frankenstein.txt Gothic Middle Female
8 Stoker_Dracula.txt Gothic Late Male
9 Thackeray_Vanity.txt Social Middle Male
10 Trollope_Vicar.txt Social Middle Male
Now I want to convert it to a factor:
as.factor(metadata)
This gives me the following error:
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
metadata is a dataframe which is a special type of list made up of vectors of equal length. You can only use as.factor() on vectors. Therefore you must class as.factor() on each vector in the dataframe. This can be done using the lapply function:
metadata <- data.frame(lapply(metadata, factor))
This will convert each column to a factor (check this by class(metadata[, 1])). The overall structure of metadata will still be a dataframe.
read.csv puts data into a data.frame
You cannot convert a data.frame into a factor. That's very basic R stuff.
It's like you're trying to change of a bunch of .doc files into PDFs by converting your computer into a PDF. It just doesn't make sense.
The error is asking "Have you called sort on a list?" Yes, you have. as.factor calls sort, and your data.frame is a list.

Resources