I'm trying to perform this function in R: fviz_famd_ind() and keep getting an error. It works on the wine dataset provided in the package, but not on my cleaned data set from Telco.Customer.Churn from IBM.
I've created the object of the FAMD function using the cleaned data set called dfcfamd1. I've verified there are no duplicate row or column names in the sets using any(duplicated(rownames())) for both Telco.Customer.Churn and dfcfamd1 which both return FALSE.
fviz_famd_ind(dfcfamd1)
> Error in `.rowNamesDF<-`(x, value = value) :
> duplicate 'row.names' are not allowed
> In addition: Warning message:
> non-unique values when setting 'row.names': ‘No’, ‘Yes’
Sample Data below
head(Telco.Customer.Churn)
customerID gender SeniorCitizen Partner Dependents tenure
1 7590-VHVEG Female 0 Yes No 1
2 5575-GNVDE Male 0 No No 34
3 3668-QPYBK Male 0 No No 2
PhoneService MultipleLines InternetService OnlineSecurity
1 No No DSL No
2 Yes No DSL Yes
3 Yes Yes Fiber optic No
OnlineBackup DeviceProtection TechSupport StreamingTV
1 Yes No No No
2 No No No No
3 No Yes No Yes
StreamingMovies Contract PaperlessBilling PaymentMethod
1 No Month-to-month Yes Electronic check
2 No One year No Mailed check
3 No Month-to-month Yes Mailed check
MonthlyCharges TotalCharges Churn
1 29.85 29.85 No
2 56.95 1889.50 No
3 53.85 108.15 Yes
The output should give me a graphical output which it does for the package data, but not for my data.
Attempting to set names to unique, I get a vector error.
rownames(dfcfamd1) = make.names(names, unique=TRUE)
> Error in as.character(names) :
> cannot coerce type 'builtin' to vector of type 'character'
The issue is that names is a function
rownames(dfcfamd1) = make.names(names, unique=TRUE)
instead it should be
row.names(dfcfamd1) = make.names(row.names(dfcfamd1), unique=TRUE)
Try:
fviz_pca_ind(dfcfamd1)
PS: I met the same problem! It could be solved by simply using the function fviz_pca_ind rather than using the function fviz_famd_ind, as the two functions use data with similar structures.
It seems that fviz_famd_ind cannot handle the same values across multiple categorical columns.
One way to solve this is to rename the values to be unique across columns:
# Define factors
cols <- c("Partner","Dependents ", "PhoneService", "MultipleLines", "InternetService","OnlineSecurity" "OnlineBackup", "DeviceProtection",
"TechSupport", "StreamingTV", "StreamingMovies","PaperlessBilling","Churn")
dfcfamd1[cols] <- lapply(dfcfamd1[cols], factor)
rm(cols)
# Rename the factors
# Do this for every column until only unique values remain.
dfcfamd1$Partner<- recode_factor(dfcfamd1$Partner,"Yes" = "yesParnter", "No" = "noPartner")
#[...]
dfcfamd1$Churn<- recode_factor(dfcfamd1$Churn,"Yes" = "yesChurn", "No" = "noChurn")
# Run the function on dfcfamd1
fviz_famd_ind(dfcfamd1)
I have 197 levels relating to location, I want to simplify this by creating a new variable "INSIDE" which stores 1 when location is a building/home/etc and 0 when location is outside. I have tried grepl() but it gives an error
data$Inside<-ifelse(grepl(data$Premise.Description,pattern = c("BUILDING","ROOM","AUTO","BALCONY","BANK","BAR","STORE","CHURCH","COLLEGE","CONDOMINIUM","CENTER","DAY CARE","SCHOOL","HOSPITAL","LIBRARY","PARLOR","OFFICE","MOSQUE","CLUB","PORCH","MALL","WAREHOUSE")),1,0)
Warning message:
In grepl(crime_3yr$Premise.Description, pattern = c("BUILDING", :
argument 'pattern' has length > 1 and only the first element will be used
I have tried using lapply() but it did not work too.
I want the output to be like this:
BUILDING 1
SHOP 1
Street 0
grepl takes a regex instead of a list of options, try this:
data$Inside<-ifelse(grepl(data$Premise.Description,pattern = "BUILDING|ROOM|AUTO|BALCONY|BANK|BAR|STORE|CHURCH|COLLEGE|CONDOMINIUM|CENTER|DAY CARE|SCHOOL|HOSPITAL|LIBRARY|PARLOR|OFFICE|MOSQUE|CLUB|PORCH|MALL|WAREHOUSE"),1,0)
If you want to keep the code similar to what you listed you need to look into regular expressions which is what the pattern part of the grepl needs to be.
data$Inside<-ifelse(grepl(data$Premise.Description,pattern = "BUILDING|ROOM|AUTO|BALCONY|BANK|BAR|STORE|CHURCH|COLLEGE|CONDOMINIUM|CENTER|DAY CARE|SCHOOL|HOSPITAL|LIBRARY|PARLOR|OFFICE|MOSQUE|CLUB|PORCH|MALL|WAREHOUSE"),1,0)
Try this code:
Your data.frame:
data<-data.frame(Premise.Description= c("BUILDING 1","MY ROOM","AUTO","BALCONY","OTHER"))
The solution:
toMatch<-c("BUILDING","ROOM","AUTO","BALCONY","BANK","BAR","STORE","CHURCH","COLLEGE","CONDOMINIUM","CENTER","DAY CARE","SCHOOL","HOSPITAL","LIBRARY","PARLOR","OFFICE","MOSQUE","CLUB","PORCH","MALL","WAREHOUSE")
data$Inside<-grepl(paste(toMatch,collapse="|"), data$Premise.Description)
data
Premise.Description Inside
1 BUILDING 1 TRUE
2 MY ROOM TRUE
3 AUTO TRUE
4 BALCONY TRUE
5 OTHER FALSE
You might be better off using data.table:
library(data.table)
setDT(data)
data[
grepl(c("BUILDING","ROOM","AUTO","BALCONY","BANK","BAR","STORE","CHURCH","COLLEGE","CONDOMINIUM","CENTER","DAY CARE","SCHOOL","HOSPITAL","LIBRARY","PARLOR","OFFICE","MOSQUE","CLUB","PORCH","MALL","WAREHOUSE"), Premise),
Inside := TRUE
]
I'm trying to subset a data.frame based on a 1 or 0 value the data.frame.
Here is some sample code;
> Test
Close High Low Dn.BB MaVg Up.BB Per.BB Dn.Brk
2007-02-27 6286.1 6434.7 6270.5 6305.813 6389.679 6473.544 -0.11752900 1
2007-02-28 6171.5 6286.1 6166.2 6237.635 6377.186 6516.737 -0.23695539 1
2007-03-01 6116.0 6230.7 6038.9 6164.470 6358.129 6551.787 -0.12514308 1
2007-03-02 6116.2 6164.4 6085.6 6110.807 6341.179 6571.550 0.01170495 0
2007-03-05 6058.7 6116.2 5989.6 6047.421 6318.100 6588.779 0.02083561 0
2007-03-06 6138.5 6138.5 6058.7 6018.953 6297.907 6576.861 0.21427696 0
2007-03-07 6156.5 6167.6 6106.1 6001.139 6278.136 6555.133 0.28043853 0
2007-03-08 6227.7 6233.1 6156.5 5997.989 6264.436 6530.882 0.43106389 0
2007-03-09 6245.2 6255.8 6190.3 6003.152 6250.207 6497.262 0.48986661 0
2007-03-12 6233.3 6276.3 6219.3 6007.297 6237.421 6467.546 0.49104464 0
2007-03-13 6161.2 6240.7 6161.2 6000.401 6223.429 6446.457 0.36049188 0
Here, I would like to have something that iterates along the data.frame and then splits out the subsets based on Dn.Brk > 0. I can only think of a loop method here and am not to familiar with sub-setting, so was wondering if anyone could point me in the right direction / provide some tips of functions / packages that could achive this?
A little more detail below;
Sub <- rep(0,nrow(Test))
for (i in nrow(Test)){
if (Test[i,8] > 0){Sub = Test(i:i+10,1)}
}
So, the above would, at every point where Test[i,8] > 0, select, Test$Close from i:i+10.
Ideally, I'd like every sample to be stored in a separate row/column in a new df. Is that possible?
You can use sapply here:
sapply(which(Test[, 8] > 0), function(z) Test$Close[z:(z+10)])
A few things to note in the loop you provided though:
You are not iterating: Your loop is from i in nrow(Test) which is effectively nrow(Test)
You would be overwriting Sub with each iteration
If you are still in search for doing it with a for loop here is the answer:
#### results list #####
results <- list()
for (i in rows.test){
if (test[i,8] > 0)
{
results[[i]] = test$Close[i:(i+10)]
}
else {results[[i]] = "no value"}
}
This could also be further parallelisable if your dataset is huge with a package called foreach. A good intro here: http://www.vikparuchuri.com/blog/parallel-r-loops-for-windows-and-linux/. You could also change "no value" to next if you want a list with only three named elements
I have a csv fie as:
score text
1 0 RT #RealJackEdwards: (2 of) a solution. 7 st yrs in playoffs, a Cup, a Final, a Prez Trophy. Yup, Boychuk trade a disaster; Bottom 6 fwds r…
I need to write all the tweets with negative score to a different file. I am trying to use if statement as:
if(stat$score < 0 )
write.csv(stat$text, file=paste('negtweetscore.csv'), row.names=TRUE)
But after running this code i am getting the following error message:
In if (stat$score < 0) write.csv(stat$text, file = paste("negtweetscore.csv"), :
the condition has length > 1 and only the first element will be used
You have to subset your data.frame properly:
write.csv(stat$text[stat$score<0], file=paste('negtweetscore.csv'), row.names=TRUE)
R subject
I have an "cannot coerce class "c("summary.turnpoints", "turnpoints")" to a data.frame" error when trying to save the summary in a file. I have tried to fix that with as.data.frame with no success.
code :
library(plyr)
library(pastecs)
data <- read.table("C:\\Users\\Ron\\Desktop\\dataset.txt", header=F, col.name="A")
data.tp=turnpoints(data$A)
print(data.tp)
Turning points for: data$A
nbr observations : 5990
nbr ex-aequos : 51
nbr turning points: 413 (first point is a pit)
E(p) = 3992 Var(p) = 1064.567 (theoretical)
Turning points for: data$A
nbr observations : 5990
nbr ex-aequos : 51
nbr turning points: 413 (first point is a pit)
E(p) = 3992 Var(p) = 1064.567 (theoretical)
data.sum=summary(data.tp)
print(data.sum)
point type proba info
1 11 pit 7.232437e-15 46.97444
2 21 peak 7.594058e-14 43.58212
3 30 pit 3.479857e-27 87.89303
4 51 peak 5.200612e-29 93.95723
5 62 pit 7.594058e-14 43.58212
6 70 peak 6.213321e-14 43.87163
7 81 pit 6.276081e-16 50.50099
8 91 peak 5.534016e-23 73.93602
.....................................
write.table(data.sum, file = "C:\\Users\\Ron\\Desktop\\datasetTurnP.txt")
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class "c("summary.turnpoints", "turnpoints")" to a data.frame
In addition: Warning messages:
1: package ‘plyr’ was built under R version 3.0.1
2: package ‘pastecs’ was built under R version 3.0.1
How can I save these summary results to a text file?
Thank you.
Look at the Value section of:
?pastecs::summary.turnpoints
It should be clear that this will not be a set of lists all of which have the same length. Hence the error message. So rather than asking for the impossible, ... tell us what you wanted to save.
It's actually not impossible, just not possible with write.table, since it's not a dataframe. The dump function would allow you to construct an ASCII representation of the structure(...) representation of that summary-object.
dump(data.sum, file="dump_data_sum.asc")
This could then be source()-ed