Warning message:invalid factor level, NA generated - r

I'm getting this error when I tried to assign new character value to some of the values in one of my columns.
This works fine:
merge_output$extra_dod[merge_output$extra_dod == 'Refugees camps in forestreserve.'] <-'Refugees'
but this doesn't:
merge_output$extra_dod[merge_output$extra_dod=='Air Strip'] <-'strip'
And it returns this error message:
Warning message:
In `[<-.factor`(`*tmp*`, merge_output$extra_dod == "Lime", value = c(5L, :
invalid factor level, NA generated
I'm not sure why I can replace some of the values but not others.

Here's a much-simplified example that fails in the same way:
f <- factor(c("a","b","c","d"))
f[f=="d"] <- "e"
Warning message:
In [<-.factor(*tmp*, f == "d", value = "e") :
invalid factor level, NA generated
If you happen to try replacing with a factor level that already
exists, it works:
f[f=="c"] <- "b"
A few more general options:
Convert the variable back into a character vector
before trying to replace values (or use something like
stringsAsFactors=FALSE in read.csv/read.table)
use car::recode

Related

rounding up error in R with data type error

I know this is a very basic question. So you might wonder why this is even bothering.
But, I have an issue with rounding numbers up.
I tried this but none of this worked.
mode(RNA_data) <- 'numeric'
Error in mde(x) : (list) object cannot be coerced to type 'double'
RNA_data<-as.numeric(RNA_data)
Error: (list) object cannot be coerced to type 'double'
round(P7_N02_RNA, digits=0)
Error in Math.data.frame(list(P07_N02_RNA.genes.V5 = c(326L, 1L, 851L, :
non-numeric variable(s) in data frame: P07_N02_RNA.genes.V5
Error: unexpected symbol in "non-numeric variable"
P7_N02_RNA <- round(P7_N02_RNA, digits=0)
Error in Math.data.frame(list(P07_N02_RNA.genes.V5 = c(326L, 1L, 851L, :
non-numeric variable(s) in data frame: P07_N02_RNA.genes.V5
Error: unexpected symbol in "non-numeric variable"
trimmed_RNA <- round(RNA_data$p07_N01,digit = 0)
Error in round(RNA_data$p07_N01, digit = 0) :
non-numeric argument to mathematical function
Error: unexpected symbol in "non-numeric argument"
trimmed_RNA <- round(RNA_data[-1,], digits=0)
Error in Math.data.frame(list(geneID = 2:58639, p07_N01 = c(2175L, 9753L, : non-numeric variable(s) in data frame: geneID, p07_N01, p07_T01, p07_N02, p07_T02, p08_N01, p08_T01, p08_N02, p08_T02, p09_N01, p09_T01, p09_N02, p09_T02
RNA_data <-data.frame(RNA_data)
trimmed_RNA <- data.frame(round(as.numeric(levels(RNA_data)[RNA_data])))
got a reault with 0 obs.
rm(trimmed_RNA)
require(data.table)
setDT(RNA_data)
RNA_data[, RNA_data:=round(as.numeric(levels(RNA_data)[RNA_data]))]
Error in [.data.table(RNA_data, , :=(RNA_data, round(as.numeric(levels(RNA_data)[RNA_data])))) : RHS of assignment to existing column 'RNA_data' is zero length but not NULL. If you intend to delete the column use NULL. Otherwise, the RHS must have length > 0; e.g., NA_integer_. If you are trying to change the column type to be an empty list column then, as with all column type changes, provide a full length RHS vector such as vector('list',nrow(DT)); i.e., 'plonk' in the new column.**
This is how data looks like with 58639 rows.
I also tried to export files into csv file and trim them in excel but, also had a circular reference error, and didn't work.
Now I have no idea what can I do.
Can anybody help me with rounding up those numbers?
Its hard to tell from your question, but first convert your variables to numeric like this:
mtcars$mpg <- as.numeric( as.character( mtcars$mpg))
And is this what you mean by rounding up
ceiling( mtcars$qsec )
round( mtcars$disp , -1 )

Pass arguments to the R Phyloseq subset_taxa wrapper

I'll explain the end goal, and what I'm trying as a test first. (Because I'm likely going about it the wrong way.)
I am using the phyloseq package to visualize microbiome data. I want to "automate" it to an extent by having users choose levels of analysis and have my script generate the visualizations without someone hand typing in each combination.
The issue is passing variables into the subset function. I get these errors primarily (depending on what combinations of paste0, eval, parse, as.logical, expression, noquote....etc that i've tried):
Error in subset.data.frame(oldDF, ...) : 'subset' must be logical
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
A user would set the levels of analysis. So lets say for now there are two levels, and selecting the second level automatically means you want the first level as well. (I haven't worked on that part yet, but I wanted to explain it upfront.
#Set lineage level
lin_level <- 1
lin_list <- c("k__Kingdom", "p__Phylum","c__Class", "o__Order","f__Family")
lin_select <- lin_list[lin_level]
sub_lin <- lin_list[(lin_level +1)]
#Kingdom
king_list <- "k__Bacteria"
#set Phylum list
if (lin_select == "p__Phylum"){
phylum_list <- c("p__Firmicutes","p__Proteobacteria","p__Bacteroidetes","p__Actinobacteria","p__Tenericutes")
}
subgroup <- "All"
From here, the script would ultimately get to the graphing section. If lin_level is set to 1, it would look like this:
FIXED
gphic = subset_taxa(physeq1, Kingdom=="k__Bacteria")
title = paste0(subgroup," ", "Bacteria-only")
plot_bar(gpsfb, "Phylum", "Abundance", "Phylum",
title=title, facet_grid="Type~.")
AUTOMATED
gphic = subset_taxa(physeq1, (substring(lin_select,4)) == king_list)
title = paste0(subgroup," ", (substring(king_list,4)),"-only")
plot_bar(gpsfb, (substring(sub_lin,4)), "Abundance", (substring(sub_lin,4)),
title=title, facet_grid="Type~.")
But, trying to pass (substring(lin_select,4)) == king_list as an argument results in errors.
I've searched through the various threads on this issue, but haven't been able to get the different answers to work. Ultimately I need to run the graphing section once for Kingdom, and then again each time for each item in the Phylum list. But before i can get there, I need to be able to pass the arguments into the subset function.
Things I've tried:
test <- paste0(substring(lin_select,4),"==","\"","p__Bacteroidetes","\"")
noquote(test)
[1] Phylum=="p__Bacteroidetes"
gphic = subset_taxa(physeq1, noquote(test))
Error in subset.data.frame(oldDF, ...) : 'subset' must be logical
gphic = subset_taxa(physeq1, paste0(substring(lin_select,4),"==","\"","p__Bacteroidetes","\""))
Error in subset.data.frame(oldDF, ...) : 'subset' must be logical
gphic = subset_taxa(physeq1, as.logical(test))
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
as.logical(noquote(test))
[1] NA
gphic = subset_taxa(physeq1, as.logical(noquote(test)))
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
noquote(test)
[1] Phylum=="p__Bacteroidetes"
as.logical(noquote(test))
[1] NA
as.logical(as.character(noquote(test)))
[1] NA
test2 <- eval(parse(text= test))
Error in eval(parse(text = test)) : object 'Phylum' not found
test2 <- eval(test)
gphic = subset_taxa(physeq1, as.logical(test2))
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
as.logical(test2)
[1] NA
And a lot of other permutations trying to sub in different things, but you get the idea.
gphic = subset_taxa(physeq1, eval(as.name(level_tax)) == king_list)
Here , level_tax is the variable in a loop. Say level_tax = "Order", then we convert the string "Order" into variable name by as.name(level_tax) or as.symbol(level_tax). Then we use eval(), which takes an expression and evaluates in the specified environment

Error in cor(x, use = use) : supply both 'x' and 'y' or a matrix-like 'x'

I am using the psych package,
following code I tried:
library(psych)
str(price_per_d)
Least_appealing <-subset(zdf_base, select=c("price_per_h",
"price_per_d", "mileage", "one_way_option", "difficulties",
"vehicle_types", "parking_spot","picking_up","availability", "dirty",
"returning","refilling", "loalty_programs"))
# code from stackoverflow which I use, to get a numeric x
Least_appealing <- gsub(",", "", Least_appealing)
Least_appealing <- as.numeric(Least_appealing)
fa.parallel(Least_appealing)
I get this error messages:
> library(psych)
> str(price_per_d)
Factor w/ 1 level "Price (daily rate too high)": 1 NA 1 1 1 NA NA 1 1
NA ...
> Least_appealing <-subset(zdf_base, select=c("price_per_h",
+ "price_per_d",
"mileage", "one_way_option", "difficulties",
+ "vehicle_types",
"parking_spot","picking_up","availability", "dirty",
+ "returning","refilling",
"loalty_programs"))
>
> Least_appealing <- gsub(",", "", Least_appealing)
> Least_appealing <- as.numeric(Least_appealing)
**Warnmeldung:
NAs durch Umwandlung erzeugt**
>
> fa.parallel(Least_appealing)
**Fehler in cor(x, use = use) : supply both 'x' and 'y' or a matrix-like
'x'**
>
How can I conduct a Factor analysis succesfully?
First I got the error message, my 'x' must be numeric, that's why I used the above mentioned code.
When I used this code, R tells me, that I got NA's through the conversion.
I still kept on and tried fa.parallel, which gives me another error message.
If you have character data intermixed with numeric data (e.g., your coding is categorical and you need to convert it to numerical, you could try using the char2numeric function before doing the fa.
e.g. with data that are a mix of categorical and numerical;
describe(data) #this will flag those variables that are categorical with an asterix
new.data <- char2numeric(data) #this makes all numeric
fa(new.data, nfactors=3) #to get three factors
It appears that you have only one variable in your 'least.appealing' object.

replacing a value in column X based on columns Y with R

i've gone through several answers and tried the following but each either yields an error or an un-wanted result:
here's the data:
Network Campaign
Moburst_Chartboost Test Campaign
Moburst_Chartboost Test Campaign
Moburst_Appnext unknown
Moburst_Appnext 1065
i'd like to replace "Test Campaign" with "1055" whenever "Network" == "Moburst_Chartboost". i realize this should be very simple but trying out these:
dataset = read.csv('C:/Users/User/Downloads/example.csv')
for( i in 1:nrow(dataset)){
if(dataset$Network == 'Moburst_Chartboost') dataset$Campaign <- '1055'
}
this yields an error: Warning messages:
1: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
2: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
etc.
then i tried:
within(dataset, {
dataset$Campaign <- ifelse(dataset$Network == 'Moburst_Chartboost', '1055', dataset$Campaign)
})
this turned ALL 4 values in row "Campaign" into "1055" over running what was there even when condition isn't met
also this:
dataset$Campaign[which(dataset$Network == 'Moburst_Chartboost')] <- 1055
yields this error, and replaced the values in the two first rows of "Campaign" with NA:
Warning message:
In `[<-.factor`(`*tmp*`, which(dataset$Network == "Moburst_Chartboost"), :
invalid factor level, NA generated
scratching my head here. new to R but this shouldn't be so hard :(
In your first attempt, you're trying to iterate over all the columns, when you only want to change the 2nd column.
In your second, you're trying to assign the value "1055" to all of the 2nd column.
The way to think about it is as an if else, where if the condition in col 1 is met, col 2 is changed, otherwise it remains the same.
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", "1065"))
dataset$Campaign <- ifelse(dataset$Network == "Moburst_Chartboost",
"1055",
dataset$Campaign)
head(dataset)
Network Campaign
1 Moburst_Chartboost 1055
2 Moburst_Chartboost 1055
3 Moburst_Appnext unknown
4 Moburst_Appnext 1065
You may also try dataset$Campaign[dataset$Campaign=="Test Campaign"]<-1055 to avoid the use of loops and ifelse statements.
Where dataset
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", 1065))
Try the following
dataset = read.csv('C:/Users/User/Downloads/example.csv', stringsAsFactors = F)
for( i in 1:nrow(dataset)){
if(dataset$Network[i] == 'Moburst_Chartboost') dataset$Campaign[i] <- '1055'
}
It seems your forgot the index variable. Without [i] you work on the whole vector of the data frame, resulting in the error/warning you mentioned.
Note that I added stringsAsFactors = F to the read.csv() function to make sure the strings are indeed interpreted as strings and not factors. Using factors this would result in an error like this
In `[<-.factor`(`*tmp*`, i, value = c(NA, 2L, 3L, 1L)) :
invalid factor level, NA generated
Alternatively you can do the following without using a for loop:
idx <- which(dataset$Network == 'Moburst_Chartboost')
dataset$Campaign[idx] <- '1055'
Here, idx is a vector containing the positions where Network has the value 'Moburst_Chartboost'
thank you for the help! not elegant, but since this lingered with me when going to sleep last night i decided to try to bludgeon this with some ugly code but it worked too - just as a workaround...separated to two data frames, replaced all values and then binded back...
# subsetting only chartboost
chartboost <- subset(dataset, dataset$Network=='Moburst_Chartboost')
# replace all values in Campaign
chartboost$Campaign <-sub("^.*", "1055",chartboost$Campaign)
#subsetting only "not chartboost"
notChartboost <-subset(dataset, dataset$Network!='Moburst_Chartboost')
# binding back to single dataframe
newSet <- rbind(chartboost, notChartboost)
Ugly as a duckling but worked :)

Invalid factor level error when pushing values into data frame

I have the following code that is aimed at going to a website, fetching some data and putting it in a data.frame
create_links <- function(keyword, distance) {
data.frame <- data.frame(character(), character())
postcode <- c(3511, 4000, 5000)
var_website1 <- "http://www.marktplaats.nl/z.html?query="
var_website2 <- "&postcode="
var_website3 <- "&distance="
for (i in 1:length(postcode)) {
website <- paste0(var_website1, keyword, var_website2, postcode[i], var_website3, distance)
html <- read_html(website)
number <- html_nodes(html, "h1 span")
number <- as.character(number)
website <- as.character(website)
data.frame <- rbind(data.frame, number, website)
}
data.frame
}
When Im calling the formula:
library(rvest)
create_links("bureaustoel", 10000)
All seems to work but I only have a small hassle with putting the values in the data.frame. I get the following error:
1: In `[<-.factor`(`*tmp*`, ri, value = "<span>20.474 resultaten voor 'bureaustoel'</span>") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = "http://www.marktplaats.nl/z.html?query=bureaustoel&postcode=4000&distance=10000") :
invalid factor level, NA generated
3: In `[<-.factor`(`*tmp*`, ri, value = "<span>20.474 resultaten voor 'bureaustoel'</span>") :
invalid factor level, NA generated
I dont get why I get the errors though (cause Im converting the values into characters before pushing it into the data.frame?
Any thoughts?

Resources