Deprecated levels warning with read.dta in R - r

(This is a beginner question, but I didn't find an answer elsewhere. Relevant posts include this one, this one, and this one, but not sure how to apply these to my case.)
When I use read.dta to import STATA format data to R, there is a warning:
> lca <- read.dta("trial.dta")
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else
paste0(labels, :
duplicated levels in factors are deprecated
Does it simply mean that the variables ("factors" in R) contain duplicate values? If so, why is this even a warning -- isn't this expected of most variables?

Try this :
don <- read.dta("trial.dta",convert.dates = T,convert.factors = F)
for(i in 1:ncol(don)){
valuelabel<-attributes(don)[[6]][i]
if(valuelabel!=""){
label<-paste("names(attributes(don)[[11]]$",valuelabel,")",sep="")
level<-paste("attributes(don)[[11]]$",valuelabel,sep="")
labels=(eval(parse(text=label)))
levels=(eval(parse(text=level)))
if(sum(duplicated(labels)) > 0){
doublon<-which(duplicated(labels))
remplace<-levels[doublon]
valueremplace<-levels[unique(labels)==names(remplace)]
don[don[,i]%in%remplace,i]<-valueremplace
labels<-unique(labels)
levels<-levels[labels]
}
don[,i]<-factor(don[,i],levels=levels,labels=labels)
}
}

Related

R: error in tableplot: Error in if (by < 1) stop("'by' must be > 0")

I'm a beginner in R and cannot understand what's the problem in that simple code:
install.packages("tabplot")
library("tabplot")
library("MASS")
Boston$chas <- factor(Boston$chas)
Boston$rad <- ordered(Boston$rad)
tableplot(Boston)
After running the function 'tableplot' I get this error message:
Error in if (by < 1) stop("'by' must be > 0") :
missing value where TRUE/FALSE needed
`In addition: Warning message:
In chunk.default(from = 1L, to = 506L, by = c(double = 23058430092136940), :NAs introduced by coercion to integer range
What is the problem? There're no missing values in the dataset. Could someone explain it, please?
Many thanks in advance
Daria

"data are essentially constant" error with t test

t.test(antibioticdata$Bacteria,
antibioticdata$Inhibition,
alternative = c("two.sided"),
paired = FALSE,
var.equal = FALSE)
Here is my R code to make a t-test for a set of data on antibiotic resistance of bacteria. This gives me the error code:
Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In var(x) :
Calling var(x) on a factor x is deprecated and will become an error.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
not sure what I am doing wrong
I just met the same error. It's probably due to all the values in each group are the same.
So just write two more "if else". For me, I did
library("greenbrown")
apply(data.table, 1, function(x){
if(AllEqual(x[1:9])){return(1)}
else if(AllEqual(x[1:4]) & AllEqual(x[5:9])){return(0)} else {
t.test(as.numeric(x[1:4]), as.numeric(x[5:9]))->t.results
return(t.results$p.value)
}
})->P.for.data.table

PCA with result non-interactively in R

I send you a message because I would like realise an PCA in R with the package ade4.
I have the data "PAYSAGE" :
All the variables are numeric, PAYSAGE is a data frame, there are no NAS or blank.
But when I do :
require(ade4)
ACP<-dudi.pca(PAYSAGE)
2
I have the message error :
**You can reproduce this result non-interactively with:
dudi.pca(df = PAYSAGE, scannf = FALSE, nf = NA)
Error in if (nf <= 0) nf <- 2 : missing value where TRUE/FALSE needed
In addition: Warning message:
In as.dudi(df, col.w, row.w, scannf = scannf, nf = nf, call = match.call(), :
NAs introduced by coercion**
I don't understand what does that mean. Have you any idea??
Thank you so much
I'd suggest sharing a data set/example others could access, if possible. This seems data-specific and with NAs introduced by coercion you may want to check the type of your input - typeof(PAYSAGE) - the manual for dudi.pca states it takes a data frame of numeric values as input.
Yes, for example :
ag_div <- c(75362,68795,78384,79087,79120,73155,58558,58444,68795,76223,50696,0,17161,0,0)
canne <- c(rep(0,10),5214,6030,0,0,0)
prairie_el<- c(60, rep(0,13),76985)
sol_nu <- c(18820,25948,13150,9903,12097,21032,35032,35504,25948,20438,12153,33096,15748,33260,44786)
urb_peu_d <- c(448,459,5575,5902,5562,458,6271,6136,459,1850,40,13871,40,13920,28669)
urb_den <- c(rep(0,12),14579,0,0)
veg_arbo <- c(2366,3327,3110,3006,3049,2632,7546,7620,3327,37100,3710,0,181,0,181)
veg_arbu <- c(18704,18526,15768,15527,15675,18886,12971,12790,18526,15975,22216,24257,30962,24001,14523)
eau <- c(rep(0,10),34747,31621,36966,32165,28054)
PAYSAGE<-data.frame(ag_div,canne,prairie_el,sol_nu,urb_peu_d,urb_den,veg_arbo,veg_arbu,eau)
require(ade4)
ACP<-dudi.pca(PAYSAGE)

Warning message:invalid factor level, NA generated

I'm getting this error when I tried to assign new character value to some of the values in one of my columns.
This works fine:
merge_output$extra_dod[merge_output$extra_dod == 'Refugees camps in forestreserve.'] <-'Refugees'
but this doesn't:
merge_output$extra_dod[merge_output$extra_dod=='Air Strip'] <-'strip'
And it returns this error message:
Warning message:
In `[<-.factor`(`*tmp*`, merge_output$extra_dod == "Lime", value = c(5L, :
invalid factor level, NA generated
I'm not sure why I can replace some of the values but not others.
Here's a much-simplified example that fails in the same way:
f <- factor(c("a","b","c","d"))
f[f=="d"] <- "e"
Warning message:
In [<-.factor(*tmp*, f == "d", value = "e") :
invalid factor level, NA generated
If you happen to try replacing with a factor level that already
exists, it works:
f[f=="c"] <- "b"
A few more general options:
Convert the variable back into a character vector
before trying to replace values (or use something like
stringsAsFactors=FALSE in read.csv/read.table)
use car::recode

Reading value labels to spss file in R

I use this code to read an spss file:
dt<-read.spss("dt.sav",to.data.frame = TRUE,use.value.labels = TRUE)
But I got this error:
Warning message:
In levels<-(*tmp*, value = if (nl == nL) as.character(labels) else
paste0(labels, : duplicated levels in factors are deprecated
I didn't find anything about it. Could anyone help me?
According to the error message, it seems that you have duplicated factors.
You should use:
dt<-read.spss("dt.sav",to.data.frame = TRUE,use.value.labels = TRUE,duplicated.value.labels="append")

Resources