Plotting with different x axis - r

I've got following data:
data
Tenor Coupon Price Last 1 Month 1 Year Time
1 3 Month 0.0000 0.0150 0.02% +1 -4 06:45:02
2 6 Month 0.0000 0.0550 0.06% +2 -3 06:22:02
3 12 Month 0.0000 0.0950 0.10% +2 -1 06:50:35
4 2 Year 0.3750 99-22¾ 0.52% +10 +20 06:37:41
5 5 Year 1.5000 99-14½ 1.62% +9 +17 06:37:58
6 10 Year 2.3750 100-12 2.33% +6 -44 06:40:21
7 30 Year 3.1250 101-10½ 3.06% +5 -80 06:35:23
I've downloaded it from this website with help of this topic.
Now I want to plot it to look similiar like data from website above. I've used:
my_x <- c(3,6,12,24,12*5,10*12,30*12)
plot(my_x,data$Coupon, type = "l")
But it doesn't look nice, because of values at x axis, but I don't know how to convert first column into desirable format. I've tried to use ggplot2, but I failed as well.
It may be helpful as well:
str(data)
'data.frame': 7 obs. of 7 variables:
$ Tenor : Factor w/ 7 levels "10 Year","12 Month",..: 5 7 2 3 6 1 4
$ Coupon : Factor w/ 5 levels "0.0000","0.3750",..: 1 1 1 2 3 4 5
$ Price : Factor w/ 7 levels "0.0150","0.0550",..: 1 2 3 7 6 4 5
$ Last : Factor w/ 7 levels "0.02%","0.06%",..: 1 2 3 4 5 6 7
$ 1 Month: Factor w/ 6 levels "+1","+10","+2",..: 1 3 3 2 6 5 4
$ 1 Year : Factor w/ 7 levels "-1","+17","+20",..: 5 4 1 3 2 6 7
$ Time : Factor w/ 7 levels "06:22:02","06:35:23",..: 6 1 7 3 4 5 2

str(data) showed that your data are of factor class. But they should be numeric. So you need to convert your data from factor to numeric.
data$Coupon <- as.numeric(as.character(data$Coupon))
Afterwards, plot should work.

Related

apply chi-square test in R for each level of a categorical variable

I'm having a data frame as below. I want to run a chi-square test between 'placement' and 'books_quantile' variables at each level of zip codes. I've tried a few ways but not successful yet. Can somebody help?
Thank you!
str(zip2)
tibble [10,748 x 3] (S3: tbl_df/tbl/data.frame)
$ placement : Factor w/ 5 levels "3 or More Grade Levels Below",..: 5 3 3 3 5 2 2 5 3 5 ...
$ books_quantile: Factor w/ 4 levels "Q1 (>=56 books)",..: 2 2 2 2 3 3 3 2 1 2 ...
$ zip : Factor w/ 24 levels "38016","38018",..: 11 21 9 8 22 12 15 15 13 12 ...
You should do:
apply(xtabs(~placement + books_quantile + zip, zip2), 3, chisq.test)

Categorical variable with 132 levels in a prediction problem

I am trying to use random forest to make a prediction for price with below data frame
data.frame': 10682 obs. of 9 variables:
Airline : Factor w/ 12 levels "Air Asia","Air India",..: 4 2 5 4 4 9 5 5 5 7 ...
Source : Factor w/ 5 levels "Banglore","Chennai",..: 1 4 3 4 1 4 1 1 1 3 ...
Destination : Factor w/ 6 levels "Banglore","Cochin",..: 6 1 2 1 6 1 6 6 6 2 ...
Route : Factor w/ 132 levels "BLR → AMD → DEL",..: 19 88 123 96 30 68 6 6 6 109 ...
Additional_Info: Factor w/ 10 levels "1 Long layover",..: 8 8 8 8 8 8 6 8 6 8 ...
Duration_Num : num 1.04 2 2.94 1.69 1.56 ...
Total_Stops_Num: num 0 2 2 1 1 0 1 1 1 1 ...
Departure_Num : POSIXct, format: "2019-03-24 22:20:00" "2019-05-01 05:50:00" ...
Price : num 8.27 8.94 9.54 8.74 9.5 ...
Initially i tried Multiple linear regression so i log transformed the dependent variable (Price)
All the non numeric variables were character before so i converted them into factor and date time
The variable Route has 132 levels. I tried one hot encode but results were not as good
How to preprocess this variable with 100+ levels as Random forest is getting failed every time

change value of variable in r using dplyr

refine_original %>%
+ mutate(company=replace(company, grepl("ps",company), "phillips")) %>%
+ as.data.frame()
Error in replace(company, grepl("ps", company), "phillips") :
object 'company' not found
I do not why it is giving error object not found.
> str(refine_original)
'data.frame': 25 obs. of 6 variables:
$ company : Factor w/ 19 levels "ak zo","akz0",..: 10 8 7 13 11 9 3 4 5 2 ...
$ Product.code...number: Factor w/ 23 levels "p-23","p-34",..: 4 3 19 20 17 1 13 11 22 2 ...
$ address : Factor w/ 25 levels "Delfzijlstraat 54",..: 9 10 11 12 13 14 19 20 21 22 ...
$ city : Factor w/ 1 level "arnhem": 1 1 1 1 1 1 1 1 1 1 ...
$ country : Factor w/ 1 level "the netherlands": 1 1 1 1 1 1 1 1 1 1 ...
$ name : Factor w/ 20 levels "dhr j. Gansen",..: 7 6 1 9 4 5 2 10 3 8 ...
Please help
Your code has extra + signs in it. remove them then the errors should go away:
refine_original %>%
mutate(company=replace(company, grepl("ps",company), "phillips")) %>%
as.data.frame()

Extracting complete dataframe from Hmisc package in R

I've used aregImpute to impute the missing values then i used impute.transcan function trying to get complete dataset using the following code.
impute_arg <- aregImpute(~ age + job + marital + education + default +
balance + housing + loan + contact + day + month + duration + campaign +
pdays + previous + poutcome + y , data = mov.miss, n.impute = 10 , nk =0)
imputed <- impute.transcan(impute_arg, imputation=1, data=mov.miss, list.out=TRUE, pr=FALSE, check=FALSE)
y <- completed[names(imputed)]
and when i used str(y) it already gives me a dataframe but with NAs as it is not imputed before, My question is how to get complete dataset without NAs after imputation?
str(y)
'data.frame': 4521 obs. of 17 variables:
$ age : int 30 NA 35 30 NA 35 36 39 41 43 ...
$ job : Factor w/ 12 levels "admin.","blue-collar",..: 11 8 5 5 2 5 7 10 3 8 ...
$ marital : Factor w/ 3 levels "divorced","married",..: 2 2 3 2 2 3 2 2 2 2 ...
$ education: Factor w/ 4 levels "primary","secondary",..: 1 2 3 3 2 3 NA 2 3 1 ...
$ default : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 NA 1 1 1 ...
$ balance : int NA 4789 1350 1476 0 747 307 147 NA -88 ...
$ housing : Factor w/ 2 levels "no","yes": NA 2 2 2 NA 1 2 2 2 2 ...
$ loan : Factor w/ 2 levels "no","yes": 1 2 1 2 NA 1 1 NA 1 2 ...
$ contact : Factor w/ 3 levels "cellular","telephone",..: 1 1 1 3 3 1 1 1 NA 1 ...
$ day : int 19 NA 16 3 5 23 14 6 14 NA ...
$ month : Factor w/ 12 levels "apr","aug","dec",..: 11 9 1 7 9 4 NA 9 9 1 ...
$ duration : int 79 220 185 199 226 141 341 151 57 313 ...
$ campaign : int 1 1 1 4 1 2 1 2 2 NA ...
$ pdays : int -1 339 330 NA -1 176 330 -1 -1 NA ...
$ previous : int 0 4 NA 0 NA 3 2 0 0 2 ...
$ poutcome : Factor w/ 4 levels "failure","other",..: 4 1 1 4 4 1 2 4 4 1 ...
$ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
I have tested your code myself, and it works just fine, except for the last line:
y <- completed[names(imputed)]
I believe there's a type in the above line. Plus, you do not even need the completed function.
Besides, if you want to get a data.frame from the impute.transcan function, then wrap it with as.data.frame:
imputed <- as.data.frame(impute.transcan(impute_arg, imputation=1, data=mov.miss, list.out=TRUE, pr=FALSE, check=FALSE))
Moreover, if you need to test your missing data pattern, you can also use the md.pattern function provided by the mice package.

must a dataset contain all factors in SVM in R

I'm trying to find class probabilities of new input vectors with support vector machines in R.
Training the model shows no errors.
fit <-svm(device~.,data=dataframetrain,
kernel="polynomial",probability=TRUE)
But predicting some input vector shows some errors.
predict(fit,dataframetest,probability=prob)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
dataframetrain looks like:
> str(dataframetrain)
'data.frame': 24577 obs. of 5 variables:
$ device : Factor w/ 3 levels "mob","pc","tab": 1 1 1 1 1 1 1 1 1 1 ...
$ geslacht : Factor w/ 2 levels "M","V": 1 1 1 1 1 1 1 1 1 1 ...
$ leeftijd : num 77 67 67 66 64 64 63 61 61 58 ...
$ invultijd: num 12 12 12 12 12 12 12 12 12 12 ...
$ type : Factor w/ 8 levels "A","B","C","D",..: 5 5 5 5 5 5 5 5 5 5 ...
and dataframetest looks like:
> str(dataframetest)
'data.frame': 8 obs. of 4 variables:
$ geslacht : Factor w/ 1 level "M": 1 1 1 1 1 1 1 1
$ leeftijd : num 20 60 30 25 36 52 145 25
$ invultijd: num 6 12 2 5 6 8 69 7
$ type : Factor w/ 8 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8
I trained the model with 2 factors for 'geslacht' but sometime I have to predict data with only 1 factor of 'geslacht'.
Is it maybe possible that the class probabilites can be predicted with a test set with only 1 factor of 'geslacht'?
I hope someone can help me!!
Add another level (but not data) to geslacht.
x <- factor(c("A", "A"), levels = c("A", "B"))
x
[1] A A
Levels: A B
or
x <- factor(c("A", "A"))
levels(x) <- c("A", "B")
x
[1] A A
Levels: A B

Resources