Reading value labels to spss file in R - r

I use this code to read an spss file:
dt<-read.spss("dt.sav",to.data.frame = TRUE,use.value.labels = TRUE)
But I got this error:
Warning message:
In levels<-(*tmp*, value = if (nl == nL) as.character(labels) else
paste0(labels, : duplicated levels in factors are deprecated
I didn't find anything about it. Could anyone help me?

According to the error message, it seems that you have duplicated factors.
You should use:
dt<-read.spss("dt.sav",to.data.frame = TRUE,use.value.labels = TRUE,duplicated.value.labels="append")

Related

R: error in tableplot: Error in if (by < 1) stop("'by' must be > 0")

I'm a beginner in R and cannot understand what's the problem in that simple code:
install.packages("tabplot")
library("tabplot")
library("MASS")
Boston$chas <- factor(Boston$chas)
Boston$rad <- ordered(Boston$rad)
tableplot(Boston)
After running the function 'tableplot' I get this error message:
Error in if (by < 1) stop("'by' must be > 0") :
missing value where TRUE/FALSE needed
`In addition: Warning message:
In chunk.default(from = 1L, to = 506L, by = c(double = 23058430092136940), :NAs introduced by coercion to integer range
What is the problem? There're no missing values in the dataset. Could someone explain it, please?
Many thanks in advance
Daria

Error replacing a column with other values data frames R

I'm trying to replace the values which I've set by default in a data frame by the calculated ones but I get an error that I don't understand as far as I've no factors.
Here is the code :
nb_agences_iris <- agences %>%
group_by(CODE_IRIS) %>%
summarise(nb_agences = n()) %>%
arrange(CODE_IRIS)
int <- data.frame("CODE_IRIS" = as.character(intersect(typo$X0, nb_agences_iris$CODE_IRIS)))
typo$nb_agences <- as.character(rep(0, nrow(typo)))
typo[int$CODE_IRIS,]$nb_agences <- as.character(nb_agences_iris[int$CODE_IRIS,]$nb_agences)
And I get the following error:
Error in Summary.factor(1:734, na.rm = FALSE) :
‘max’ not meaningful for factors
In addition: Warning message:
In Ops.factor(i, 0L) : ‘>=’ not meaningful for factors
Thanks in advance for your help.

PCA with result non-interactively in R

I send you a message because I would like realise an PCA in R with the package ade4.
I have the data "PAYSAGE" :
All the variables are numeric, PAYSAGE is a data frame, there are no NAS or blank.
But when I do :
require(ade4)
ACP<-dudi.pca(PAYSAGE)
2
I have the message error :
**You can reproduce this result non-interactively with:
dudi.pca(df = PAYSAGE, scannf = FALSE, nf = NA)
Error in if (nf <= 0) nf <- 2 : missing value where TRUE/FALSE needed
In addition: Warning message:
In as.dudi(df, col.w, row.w, scannf = scannf, nf = nf, call = match.call(), :
NAs introduced by coercion**
I don't understand what does that mean. Have you any idea??
Thank you so much
I'd suggest sharing a data set/example others could access, if possible. This seems data-specific and with NAs introduced by coercion you may want to check the type of your input - typeof(PAYSAGE) - the manual for dudi.pca states it takes a data frame of numeric values as input.
Yes, for example :
ag_div <- c(75362,68795,78384,79087,79120,73155,58558,58444,68795,76223,50696,0,17161,0,0)
canne <- c(rep(0,10),5214,6030,0,0,0)
prairie_el<- c(60, rep(0,13),76985)
sol_nu <- c(18820,25948,13150,9903,12097,21032,35032,35504,25948,20438,12153,33096,15748,33260,44786)
urb_peu_d <- c(448,459,5575,5902,5562,458,6271,6136,459,1850,40,13871,40,13920,28669)
urb_den <- c(rep(0,12),14579,0,0)
veg_arbo <- c(2366,3327,3110,3006,3049,2632,7546,7620,3327,37100,3710,0,181,0,181)
veg_arbu <- c(18704,18526,15768,15527,15675,18886,12971,12790,18526,15975,22216,24257,30962,24001,14523)
eau <- c(rep(0,10),34747,31621,36966,32165,28054)
PAYSAGE<-data.frame(ag_div,canne,prairie_el,sol_nu,urb_peu_d,urb_den,veg_arbo,veg_arbu,eau)
require(ade4)
ACP<-dudi.pca(PAYSAGE)

Deprecated levels warning with read.dta in R

(This is a beginner question, but I didn't find an answer elsewhere. Relevant posts include this one, this one, and this one, but not sure how to apply these to my case.)
When I use read.dta to import STATA format data to R, there is a warning:
> lca <- read.dta("trial.dta")
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else
paste0(labels, :
duplicated levels in factors are deprecated
Does it simply mean that the variables ("factors" in R) contain duplicate values? If so, why is this even a warning -- isn't this expected of most variables?
Try this :
don <- read.dta("trial.dta",convert.dates = T,convert.factors = F)
for(i in 1:ncol(don)){
valuelabel<-attributes(don)[[6]][i]
if(valuelabel!=""){
label<-paste("names(attributes(don)[[11]]$",valuelabel,")",sep="")
level<-paste("attributes(don)[[11]]$",valuelabel,sep="")
labels=(eval(parse(text=label)))
levels=(eval(parse(text=level)))
if(sum(duplicated(labels)) > 0){
doublon<-which(duplicated(labels))
remplace<-levels[doublon]
valueremplace<-levels[unique(labels)==names(remplace)]
don[don[,i]%in%remplace,i]<-valueremplace
labels<-unique(labels)
levels<-levels[labels]
}
don[,i]<-factor(don[,i],levels=levels,labels=labels)
}
}

Replace non <NA> with a number from data.frame

I'm having some problems to understand the logic of the mistakes I'm getting.
I need to substitute non NA from a data frame with the number 1.
I tested, on a vector -a- , the simple code:
a<-c("a","a","a", NA,"a")
a[!is.na(a)]<-1
And it worked.
But what I need is to apply the same process on a data.frame, imported using:
data<-read.table ("dataframe.csv", header = T, sep = ",",na.strings= c(" ","") )
But when I run the same code as previously written
data$column1[!is.na(data$column1)]<-1
R returns:
Warning message:
In `[<-.factor`(`*tmp*`, !is.na(data$column1), value = c(NA_integer_, :
invalid factor level, NA generated
Does someone have any idea where the problem could be?
in the end I was able to do it using the replace function explained in a topic on this forum
data<-replace(data.frame(lapply(data, as.character), stringsAsFactors = FALSE),
!is.na(data), "1")

Resources