I've got following data:
data
Tenor Coupon Price Last 1 Month 1 Year Time
1 3 Month 0.0000 0.0150 0.02% +1 -4 06:45:02
2 6 Month 0.0000 0.0550 0.06% +2 -3 06:22:02
3 12 Month 0.0000 0.0950 0.10% +2 -1 06:50:35
4 2 Year 0.3750 99-22¾ 0.52% +10 +20 06:37:41
5 5 Year 1.5000 99-14½ 1.62% +9 +17 06:37:58
6 10 Year 2.3750 100-12 2.33% +6 -44 06:40:21
7 30 Year 3.1250 101-10½ 3.06% +5 -80 06:35:23
I've downloaded it from this website with help of this topic.
Now I want to plot it to look similiar like data from website above. I've used:
my_x <- c(3,6,12,24,12*5,10*12,30*12)
plot(my_x,data$Coupon, type = "l")
But it doesn't look nice, because of values at x axis, but I don't know how to convert first column into desirable format. I've tried to use ggplot2, but I failed as well.
It may be helpful as well:
str(data)
'data.frame': 7 obs. of 7 variables:
$ Tenor : Factor w/ 7 levels "10 Year","12 Month",..: 5 7 2 3 6 1 4
$ Coupon : Factor w/ 5 levels "0.0000","0.3750",..: 1 1 1 2 3 4 5
$ Price : Factor w/ 7 levels "0.0150","0.0550",..: 1 2 3 7 6 4 5
$ Last : Factor w/ 7 levels "0.02%","0.06%",..: 1 2 3 4 5 6 7
$ 1 Month: Factor w/ 6 levels "+1","+10","+2",..: 1 3 3 2 6 5 4
$ 1 Year : Factor w/ 7 levels "-1","+17","+20",..: 5 4 1 3 2 6 7
$ Time : Factor w/ 7 levels "06:22:02","06:35:23",..: 6 1 7 3 4 5 2
str(data) showed that your data are of factor class. But they should be numeric. So you need to convert your data from factor to numeric.
data$Coupon <- as.numeric(as.character(data$Coupon))
Afterwards, plot should work.
Related
I'm having a data frame as below. I want to run a chi-square test between 'placement' and 'books_quantile' variables at each level of zip codes. I've tried a few ways but not successful yet. Can somebody help?
Thank you!
str(zip2)
tibble [10,748 x 3] (S3: tbl_df/tbl/data.frame)
$ placement : Factor w/ 5 levels "3 or More Grade Levels Below",..: 5 3 3 3 5 2 2 5 3 5 ...
$ books_quantile: Factor w/ 4 levels "Q1 (>=56 books)",..: 2 2 2 2 3 3 3 2 1 2 ...
$ zip : Factor w/ 24 levels "38016","38018",..: 11 21 9 8 22 12 15 15 13 12 ...
You should do:
apply(xtabs(~placement + books_quantile + zip, zip2), 3, chisq.test)
I am trying to use random forest to make a prediction for price with below data frame
data.frame': 10682 obs. of 9 variables:
Airline : Factor w/ 12 levels "Air Asia","Air India",..: 4 2 5 4 4 9 5 5 5 7 ...
Source : Factor w/ 5 levels "Banglore","Chennai",..: 1 4 3 4 1 4 1 1 1 3 ...
Destination : Factor w/ 6 levels "Banglore","Cochin",..: 6 1 2 1 6 1 6 6 6 2 ...
Route : Factor w/ 132 levels "BLR → AMD → DEL",..: 19 88 123 96 30 68 6 6 6 109 ...
Additional_Info: Factor w/ 10 levels "1 Long layover",..: 8 8 8 8 8 8 6 8 6 8 ...
Duration_Num : num 1.04 2 2.94 1.69 1.56 ...
Total_Stops_Num: num 0 2 2 1 1 0 1 1 1 1 ...
Departure_Num : POSIXct, format: "2019-03-24 22:20:00" "2019-05-01 05:50:00" ...
Price : num 8.27 8.94 9.54 8.74 9.5 ...
Initially i tried Multiple linear regression so i log transformed the dependent variable (Price)
All the non numeric variables were character before so i converted them into factor and date time
The variable Route has 132 levels. I tried one hot encode but results were not as good
How to preprocess this variable with 100+ levels as Random forest is getting failed every time
refine_original %>%
+ mutate(company=replace(company, grepl("ps",company), "phillips")) %>%
+ as.data.frame()
Error in replace(company, grepl("ps", company), "phillips") :
object 'company' not found
I do not why it is giving error object not found.
> str(refine_original)
'data.frame': 25 obs. of 6 variables:
$ company : Factor w/ 19 levels "ak zo","akz0",..: 10 8 7 13 11 9 3 4 5 2 ...
$ Product.code...number: Factor w/ 23 levels "p-23","p-34",..: 4 3 19 20 17 1 13 11 22 2 ...
$ address : Factor w/ 25 levels "Delfzijlstraat 54",..: 9 10 11 12 13 14 19 20 21 22 ...
$ city : Factor w/ 1 level "arnhem": 1 1 1 1 1 1 1 1 1 1 ...
$ country : Factor w/ 1 level "the netherlands": 1 1 1 1 1 1 1 1 1 1 ...
$ name : Factor w/ 20 levels "dhr j. Gansen",..: 7 6 1 9 4 5 2 10 3 8 ...
Please help
Your code has extra + signs in it. remove them then the errors should go away:
refine_original %>%
mutate(company=replace(company, grepl("ps",company), "phillips")) %>%
as.data.frame()
I've used aregImpute to impute the missing values then i used impute.transcan function trying to get complete dataset using the following code.
impute_arg <- aregImpute(~ age + job + marital + education + default +
balance + housing + loan + contact + day + month + duration + campaign +
pdays + previous + poutcome + y , data = mov.miss, n.impute = 10 , nk =0)
imputed <- impute.transcan(impute_arg, imputation=1, data=mov.miss, list.out=TRUE, pr=FALSE, check=FALSE)
y <- completed[names(imputed)]
and when i used str(y) it already gives me a dataframe but with NAs as it is not imputed before, My question is how to get complete dataset without NAs after imputation?
str(y)
'data.frame': 4521 obs. of 17 variables:
$ age : int 30 NA 35 30 NA 35 36 39 41 43 ...
$ job : Factor w/ 12 levels "admin.","blue-collar",..: 11 8 5 5 2 5 7 10 3 8 ...
$ marital : Factor w/ 3 levels "divorced","married",..: 2 2 3 2 2 3 2 2 2 2 ...
$ education: Factor w/ 4 levels "primary","secondary",..: 1 2 3 3 2 3 NA 2 3 1 ...
$ default : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 NA 1 1 1 ...
$ balance : int NA 4789 1350 1476 0 747 307 147 NA -88 ...
$ housing : Factor w/ 2 levels "no","yes": NA 2 2 2 NA 1 2 2 2 2 ...
$ loan : Factor w/ 2 levels "no","yes": 1 2 1 2 NA 1 1 NA 1 2 ...
$ contact : Factor w/ 3 levels "cellular","telephone",..: 1 1 1 3 3 1 1 1 NA 1 ...
$ day : int 19 NA 16 3 5 23 14 6 14 NA ...
$ month : Factor w/ 12 levels "apr","aug","dec",..: 11 9 1 7 9 4 NA 9 9 1 ...
$ duration : int 79 220 185 199 226 141 341 151 57 313 ...
$ campaign : int 1 1 1 4 1 2 1 2 2 NA ...
$ pdays : int -1 339 330 NA -1 176 330 -1 -1 NA ...
$ previous : int 0 4 NA 0 NA 3 2 0 0 2 ...
$ poutcome : Factor w/ 4 levels "failure","other",..: 4 1 1 4 4 1 2 4 4 1 ...
$ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
I have tested your code myself, and it works just fine, except for the last line:
y <- completed[names(imputed)]
I believe there's a type in the above line. Plus, you do not even need the completed function.
Besides, if you want to get a data.frame from the impute.transcan function, then wrap it with as.data.frame:
imputed <- as.data.frame(impute.transcan(impute_arg, imputation=1, data=mov.miss, list.out=TRUE, pr=FALSE, check=FALSE))
Moreover, if you need to test your missing data pattern, you can also use the md.pattern function provided by the mice package.
I'm trying to find class probabilities of new input vectors with support vector machines in R.
Training the model shows no errors.
fit <-svm(device~.,data=dataframetrain,
kernel="polynomial",probability=TRUE)
But predicting some input vector shows some errors.
predict(fit,dataframetest,probability=prob)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
dataframetrain looks like:
> str(dataframetrain)
'data.frame': 24577 obs. of 5 variables:
$ device : Factor w/ 3 levels "mob","pc","tab": 1 1 1 1 1 1 1 1 1 1 ...
$ geslacht : Factor w/ 2 levels "M","V": 1 1 1 1 1 1 1 1 1 1 ...
$ leeftijd : num 77 67 67 66 64 64 63 61 61 58 ...
$ invultijd: num 12 12 12 12 12 12 12 12 12 12 ...
$ type : Factor w/ 8 levels "A","B","C","D",..: 5 5 5 5 5 5 5 5 5 5 ...
and dataframetest looks like:
> str(dataframetest)
'data.frame': 8 obs. of 4 variables:
$ geslacht : Factor w/ 1 level "M": 1 1 1 1 1 1 1 1
$ leeftijd : num 20 60 30 25 36 52 145 25
$ invultijd: num 6 12 2 5 6 8 69 7
$ type : Factor w/ 8 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8
I trained the model with 2 factors for 'geslacht' but sometime I have to predict data with only 1 factor of 'geslacht'.
Is it maybe possible that the class probabilites can be predicted with a test set with only 1 factor of 'geslacht'?
I hope someone can help me!!
Add another level (but not data) to geslacht.
x <- factor(c("A", "A"), levels = c("A", "B"))
x
[1] A A
Levels: A B
or
x <- factor(c("A", "A"))
levels(x) <- c("A", "B")
x
[1] A A
Levels: A B