Invalid factor level error when pushing values into data frame - r

I have the following code that is aimed at going to a website, fetching some data and putting it in a data.frame
create_links <- function(keyword, distance) {
data.frame <- data.frame(character(), character())
postcode <- c(3511, 4000, 5000)
var_website1 <- "http://www.marktplaats.nl/z.html?query="
var_website2 <- "&postcode="
var_website3 <- "&distance="
for (i in 1:length(postcode)) {
website <- paste0(var_website1, keyword, var_website2, postcode[i], var_website3, distance)
html <- read_html(website)
number <- html_nodes(html, "h1 span")
number <- as.character(number)
website <- as.character(website)
data.frame <- rbind(data.frame, number, website)
}
data.frame
}
When Im calling the formula:
library(rvest)
create_links("bureaustoel", 10000)
All seems to work but I only have a small hassle with putting the values in the data.frame. I get the following error:
1: In `[<-.factor`(`*tmp*`, ri, value = "<span>20.474 resultaten voor 'bureaustoel'</span>") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = "http://www.marktplaats.nl/z.html?query=bureaustoel&postcode=4000&distance=10000") :
invalid factor level, NA generated
3: In `[<-.factor`(`*tmp*`, ri, value = "<span>20.474 resultaten voor 'bureaustoel'</span>") :
invalid factor level, NA generated
I dont get why I get the errors though (cause Im converting the values into characters before pushing it into the data.frame?
Any thoughts?

Related

Error in FUN(left, right) : non-numeric argument to binary operator 2

i'm to solve a problem in which the equation goes like this
Values of Band 2 = (Band 3 - Band 1) / (Column name of Band 3 - Column name of Band 1)
i have lots of values in a column so i decided to loop them.
Here is my code:
data4 <- read.csv(file.choose())
XBC <- data4[data4$Crops == "XBC", ]
NB <- data4[data4$Crops == "NB", ]
name <- colnames(NB)[5:125] //store column names into variable name
name <- gsub("[a-zA-Z ]", "", name) // Delete letters from column names so they are numeric
cols <- 5:125
colsname <- 1:121
NB[cols] <- lapply(NB[cols], as.numeric) // set values of column in NB as numeric
name[colsname] <- lapply(name[colsname], as.numeric) // Set column names to numeric
NB[cols+1] <- ((NB[cols+1] - NB[cols-1]))/((name[colsname+1] - name[colsname-1])) // Equation
This is the error i got.
Error in FUN(left, right) : non-numeric argument to binary operator
This is an example of how the columns and rows in NB looks like:
X413.278 X417.897 X422.515
28.86137122 25.83735038 23.18536764
15.21502939 13.81200807 12.47974824
16.0551981 14.54152526 13.02826111
22.16092833 20.66666667 18.69994899
24.35706355 21.73813623 19.65632493
15.74024166 14.17246326 12.71688841
16.64029416 15.14249927 13.55668394
21.13782229 19.40196624 17.63372817
If i sub 1 row of the values into the equation it should be like this:
Values of X417.897 = (23.18536764 - 28.86137122) / (422.515 - 413.278)
and i am going to do this for the rest of the rows and columns
I am using base R
name[colsname+1] and name[colsname-1] are still characters, hence why you're getting a "non-numeric argument" error. Try:
NB[cols+1] <- ((NB[cols+1] - NB[cols-1]))/((as.numeric(name[colsname+1]) - as.numeric(name[colsname-1])))

Error replacing a column with other values data frames R

I'm trying to replace the values which I've set by default in a data frame by the calculated ones but I get an error that I don't understand as far as I've no factors.
Here is the code :
nb_agences_iris <- agences %>%
group_by(CODE_IRIS) %>%
summarise(nb_agences = n()) %>%
arrange(CODE_IRIS)
int <- data.frame("CODE_IRIS" = as.character(intersect(typo$X0, nb_agences_iris$CODE_IRIS)))
typo$nb_agences <- as.character(rep(0, nrow(typo)))
typo[int$CODE_IRIS,]$nb_agences <- as.character(nb_agences_iris[int$CODE_IRIS,]$nb_agences)
And I get the following error:
Error in Summary.factor(1:734, na.rm = FALSE) :
‘max’ not meaningful for factors
In addition: Warning message:
In Ops.factor(i, 0L) : ‘>=’ not meaningful for factors
Thanks in advance for your help.

PCA with result non-interactively in R

I send you a message because I would like realise an PCA in R with the package ade4.
I have the data "PAYSAGE" :
All the variables are numeric, PAYSAGE is a data frame, there are no NAS or blank.
But when I do :
require(ade4)
ACP<-dudi.pca(PAYSAGE)
2
I have the message error :
**You can reproduce this result non-interactively with:
dudi.pca(df = PAYSAGE, scannf = FALSE, nf = NA)
Error in if (nf <= 0) nf <- 2 : missing value where TRUE/FALSE needed
In addition: Warning message:
In as.dudi(df, col.w, row.w, scannf = scannf, nf = nf, call = match.call(), :
NAs introduced by coercion**
I don't understand what does that mean. Have you any idea??
Thank you so much
I'd suggest sharing a data set/example others could access, if possible. This seems data-specific and with NAs introduced by coercion you may want to check the type of your input - typeof(PAYSAGE) - the manual for dudi.pca states it takes a data frame of numeric values as input.
Yes, for example :
ag_div <- c(75362,68795,78384,79087,79120,73155,58558,58444,68795,76223,50696,0,17161,0,0)
canne <- c(rep(0,10),5214,6030,0,0,0)
prairie_el<- c(60, rep(0,13),76985)
sol_nu <- c(18820,25948,13150,9903,12097,21032,35032,35504,25948,20438,12153,33096,15748,33260,44786)
urb_peu_d <- c(448,459,5575,5902,5562,458,6271,6136,459,1850,40,13871,40,13920,28669)
urb_den <- c(rep(0,12),14579,0,0)
veg_arbo <- c(2366,3327,3110,3006,3049,2632,7546,7620,3327,37100,3710,0,181,0,181)
veg_arbu <- c(18704,18526,15768,15527,15675,18886,12971,12790,18526,15975,22216,24257,30962,24001,14523)
eau <- c(rep(0,10),34747,31621,36966,32165,28054)
PAYSAGE<-data.frame(ag_div,canne,prairie_el,sol_nu,urb_peu_d,urb_den,veg_arbo,veg_arbu,eau)
require(ade4)
ACP<-dudi.pca(PAYSAGE)

Warning message:invalid factor level, NA generated

I'm getting this error when I tried to assign new character value to some of the values in one of my columns.
This works fine:
merge_output$extra_dod[merge_output$extra_dod == 'Refugees camps in forestreserve.'] <-'Refugees'
but this doesn't:
merge_output$extra_dod[merge_output$extra_dod=='Air Strip'] <-'strip'
And it returns this error message:
Warning message:
In `[<-.factor`(`*tmp*`, merge_output$extra_dod == "Lime", value = c(5L, :
invalid factor level, NA generated
I'm not sure why I can replace some of the values but not others.
Here's a much-simplified example that fails in the same way:
f <- factor(c("a","b","c","d"))
f[f=="d"] <- "e"
Warning message:
In [<-.factor(*tmp*, f == "d", value = "e") :
invalid factor level, NA generated
If you happen to try replacing with a factor level that already
exists, it works:
f[f=="c"] <- "b"
A few more general options:
Convert the variable back into a character vector
before trying to replace values (or use something like
stringsAsFactors=FALSE in read.csv/read.table)
use car::recode

replacing a value in column X based on columns Y with R

i've gone through several answers and tried the following but each either yields an error or an un-wanted result:
here's the data:
Network Campaign
Moburst_Chartboost Test Campaign
Moburst_Chartboost Test Campaign
Moburst_Appnext unknown
Moburst_Appnext 1065
i'd like to replace "Test Campaign" with "1055" whenever "Network" == "Moburst_Chartboost". i realize this should be very simple but trying out these:
dataset = read.csv('C:/Users/User/Downloads/example.csv')
for( i in 1:nrow(dataset)){
if(dataset$Network == 'Moburst_Chartboost') dataset$Campaign <- '1055'
}
this yields an error: Warning messages:
1: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
2: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
etc.
then i tried:
within(dataset, {
dataset$Campaign <- ifelse(dataset$Network == 'Moburst_Chartboost', '1055', dataset$Campaign)
})
this turned ALL 4 values in row "Campaign" into "1055" over running what was there even when condition isn't met
also this:
dataset$Campaign[which(dataset$Network == 'Moburst_Chartboost')] <- 1055
yields this error, and replaced the values in the two first rows of "Campaign" with NA:
Warning message:
In `[<-.factor`(`*tmp*`, which(dataset$Network == "Moburst_Chartboost"), :
invalid factor level, NA generated
scratching my head here. new to R but this shouldn't be so hard :(
In your first attempt, you're trying to iterate over all the columns, when you only want to change the 2nd column.
In your second, you're trying to assign the value "1055" to all of the 2nd column.
The way to think about it is as an if else, where if the condition in col 1 is met, col 2 is changed, otherwise it remains the same.
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", "1065"))
dataset$Campaign <- ifelse(dataset$Network == "Moburst_Chartboost",
"1055",
dataset$Campaign)
head(dataset)
Network Campaign
1 Moburst_Chartboost 1055
2 Moburst_Chartboost 1055
3 Moburst_Appnext unknown
4 Moburst_Appnext 1065
You may also try dataset$Campaign[dataset$Campaign=="Test Campaign"]<-1055 to avoid the use of loops and ifelse statements.
Where dataset
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", 1065))
Try the following
dataset = read.csv('C:/Users/User/Downloads/example.csv', stringsAsFactors = F)
for( i in 1:nrow(dataset)){
if(dataset$Network[i] == 'Moburst_Chartboost') dataset$Campaign[i] <- '1055'
}
It seems your forgot the index variable. Without [i] you work on the whole vector of the data frame, resulting in the error/warning you mentioned.
Note that I added stringsAsFactors = F to the read.csv() function to make sure the strings are indeed interpreted as strings and not factors. Using factors this would result in an error like this
In `[<-.factor`(`*tmp*`, i, value = c(NA, 2L, 3L, 1L)) :
invalid factor level, NA generated
Alternatively you can do the following without using a for loop:
idx <- which(dataset$Network == 'Moburst_Chartboost')
dataset$Campaign[idx] <- '1055'
Here, idx is a vector containing the positions where Network has the value 'Moburst_Chartboost'
thank you for the help! not elegant, but since this lingered with me when going to sleep last night i decided to try to bludgeon this with some ugly code but it worked too - just as a workaround...separated to two data frames, replaced all values and then binded back...
# subsetting only chartboost
chartboost <- subset(dataset, dataset$Network=='Moburst_Chartboost')
# replace all values in Campaign
chartboost$Campaign <-sub("^.*", "1055",chartboost$Campaign)
#subsetting only "not chartboost"
notChartboost <-subset(dataset, dataset$Network!='Moburst_Chartboost')
# binding back to single dataframe
newSet <- rbind(chartboost, notChartboost)
Ugly as a duckling but worked :)

Resources