Error replacing a column with other values data frames R - r

I'm trying to replace the values which I've set by default in a data frame by the calculated ones but I get an error that I don't understand as far as I've no factors.
Here is the code :
nb_agences_iris <- agences %>%
group_by(CODE_IRIS) %>%
summarise(nb_agences = n()) %>%
arrange(CODE_IRIS)
int <- data.frame("CODE_IRIS" = as.character(intersect(typo$X0, nb_agences_iris$CODE_IRIS)))
typo$nb_agences <- as.character(rep(0, nrow(typo)))
typo[int$CODE_IRIS,]$nb_agences <- as.character(nb_agences_iris[int$CODE_IRIS,]$nb_agences)
And I get the following error:
Error in Summary.factor(1:734, na.rm = FALSE) :
‘max’ not meaningful for factors
In addition: Warning message:
In Ops.factor(i, 0L) : ‘>=’ not meaningful for factors
Thanks in advance for your help.

Related

Having trouble with making K Nearest Neighbors work in R Studio

I'm trying to use the knn function in r but I keep getting this error message when I try to compute it.
> knn(Taxi_train,Taxi_test,cl,k=100)
Error in knn(Taxi_train, Taxi_test, cl, k = 100) :
NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(Taxi_train, Taxi_test, cl, k = 100) : NAs introduced by coercion
2: In knn(Taxi_train, Taxi_test, cl, k = 100) : NAs introduced by coercion
I don't know what exactly is wrong with my code so I need some help to get it working.
I tried making sure that all the variables are numeric but that didn't change anything. It may also be an issue with my cl factor in the knn equation.
Here is what my code is currently:
date<-chicago_taxi$date
class(date)
Date <- as.Date(date)
class(Date)
Julian <- yday(Date)
class(Julian)
head(Julian)
chicago_taxi <- cbind(chicago_taxi,Julian)
chicago_taxi$seconds <- as.numeric(chicago_taxi$seconds)
set.seed(7777)
train_set <- sample(1:13081,10400,replace = FALSE)
Taxi_train <- chicago_taxi[train_set,]
Taxi_test <- chicago_taxi[-train_set,]
cl <- Taxi_train$payment_type
scale(chicago_taxi$miles)
scale(chicago_taxi$seconds)
scale(chicago_taxi$Julian)
knn(Taxi_train,Taxi_test,cl,k=100)

Error: must rename columns with a valid subscript vector

I'm just trying to import a kaggle data set to study R on and it's being a nightmare.
I'm trying to rename the columns in my data frame but I keep getting errors.
library(tidyverse)
library(dplyr)
library(ggplot2)
library(tibble)
library(janitor)
food_advs<- read.csv("CAERS_ASCII_2004_2017Q2.csv")
food_df <- data.frame(food_advs)
food_df %>% rename(food_df, Product = PRI_Reported.Brand.Product.Name, Industry = PRI_FDA.Industry.Name, Person_age = CI_Age.at.Adverse.Event, Gender = CI_Gender, Outcomes = AEC_One.Row.Outcomes, Symptoms = SYM_One.Row.Coded.Symptoms)
> food_df %>% rename(food_df, "Product" = "PRI_Reported.Brand.Product.Name", "Industry" = "PRI_FDA.Industry.Name", "Person_age" = "CI_Age.at.Adverse.Event", "Gender" = "CI_Gender", "Outcomes" = "AEC_One.Row.Outcomes", "Symptoms" = "SYM_One.Row.Coded.Symptoms")
Error: Must rename columns with a valid subscript vector.
x Subscript has the wrong type `data.frame<
RA_Report.. : integer
RA_CAERS.Created.Date : character
AEC_Event.Start.Date : character
PRI_Product.Role : character
PRI_Reported.Brand.Product.Name: character
PRI_FDA.Industry.Code : integer
PRI_FDA.Industry.Name : character
CI_Age.at.Adverse.Event : integer
CI_Age.Unit : character
CI_Gender : character
AEC_One.Row.Outcomes : character
SYM_One.Row.Coded.Symptoms : character
>`.
i It must be numeric or character.
Run `rlang::last_error()` to see where the error occurred.
Try the following,
food_df %>%
rename(Product = PRI_Reported.Brand.Product.Name,
Industry = PRI_FDA.Industry.Name,
Person_age = CI_Age.at.Adverse.Event,
Gender = CI_Gender,
Outcomes = AEC_One.Row.Outcomes,
Symptoms = SYM_One.Row.Coded.Symptoms
)
Your mistake is in your usage of %>%; It is abundant to use rename(data, ...) when you already have data %>% before your call.

PCA with result non-interactively in R

I send you a message because I would like realise an PCA in R with the package ade4.
I have the data "PAYSAGE" :
All the variables are numeric, PAYSAGE is a data frame, there are no NAS or blank.
But when I do :
require(ade4)
ACP<-dudi.pca(PAYSAGE)
2
I have the message error :
**You can reproduce this result non-interactively with:
dudi.pca(df = PAYSAGE, scannf = FALSE, nf = NA)
Error in if (nf <= 0) nf <- 2 : missing value where TRUE/FALSE needed
In addition: Warning message:
In as.dudi(df, col.w, row.w, scannf = scannf, nf = nf, call = match.call(), :
NAs introduced by coercion**
I don't understand what does that mean. Have you any idea??
Thank you so much
I'd suggest sharing a data set/example others could access, if possible. This seems data-specific and with NAs introduced by coercion you may want to check the type of your input - typeof(PAYSAGE) - the manual for dudi.pca states it takes a data frame of numeric values as input.
Yes, for example :
ag_div <- c(75362,68795,78384,79087,79120,73155,58558,58444,68795,76223,50696,0,17161,0,0)
canne <- c(rep(0,10),5214,6030,0,0,0)
prairie_el<- c(60, rep(0,13),76985)
sol_nu <- c(18820,25948,13150,9903,12097,21032,35032,35504,25948,20438,12153,33096,15748,33260,44786)
urb_peu_d <- c(448,459,5575,5902,5562,458,6271,6136,459,1850,40,13871,40,13920,28669)
urb_den <- c(rep(0,12),14579,0,0)
veg_arbo <- c(2366,3327,3110,3006,3049,2632,7546,7620,3327,37100,3710,0,181,0,181)
veg_arbu <- c(18704,18526,15768,15527,15675,18886,12971,12790,18526,15975,22216,24257,30962,24001,14523)
eau <- c(rep(0,10),34747,31621,36966,32165,28054)
PAYSAGE<-data.frame(ag_div,canne,prairie_el,sol_nu,urb_peu_d,urb_den,veg_arbo,veg_arbu,eau)
require(ade4)
ACP<-dudi.pca(PAYSAGE)

Pincipal Component Analysis error

I keep getting this error when I try to run a Principal Component Analysis -
Final_Dataset <- Final_Dataset[, colSums(is.na(Final_Dataset)) != nrow(Final_Dataset)]
Final_Dataset <- Final_Dataset[,-grep("Date|factor|character|logical", sapply(Final_Dataset, class))]
table(sapply(Final_Dataset, class))
nzv <- nearZeroVar(Final_Dataset, saveMetrics = TRUE)
print(paste('range:', range(nzv$percentUnique)))
dim(nzv[nzv$percentUnique > 0.1,])
gisette_nzv <- Final_Dataset[c(rownames(nzv[nzv$percentUnique > 0.1,]))]
pmatrix <- scale(gisette_nzv)
princ <- prcomp(pmatrix)
Error in svd(x, nu = 0) : infinite or missing values in 'x'
Is there any way of telling the function to omit na? The problem here is the dataset is huge so if I remove nas there will be no rows left, because out of the ~1000 there are always rows with missing values.

Error in as(x, class(k)) : no method or default for coercing “NULL” to “data.frame”

I am currently facing an error mentioned below which is related to NULL values being coerced to a data frame. The data set does contain nulls, however I have tried both is.na() and is.null() functions to replace the null values with something else. The data is stored on hdfs and is stored in a pig.hive format. I have also attached the code below. The code works fine if I remove v[,25] from the key.
Code:
AM = c("AN");
UK = c("PP");
sample.map <- function(k,v){
key <- data.frame(acc = v[!which(is.na(v[,1],1],
year = substr(v[!which(is.na(v[,1]),2],1,4),
month = substr(v[!which(is.na(v[,1]),2],5,6))
value <- data.frame(v[,3],count=1)
keyval(key,value)
}
sample.reduce <- function(key,v){
AT <- sum(v[which(v[,1] %in% AM=="TRUE"),2])
UnknownT <- sum(v[which(v[,1] %in% UK=="TRUE"),2])
Total <- AT + UnknownT
d <- data.frame(AT,UnknownT,Total)
keyval(key,d)
}
out <- mapreduce(input ="/user/hduser/input",
output = "/user/hduser/output",
input.format = make.input.format("pig.hive", sep = "\u0001")
output.format = make.output.format("csv", sep = ","),
map= sample.map)
reduce = sample.reduce)
Error:
Warning in asMethod(object) : NAs introduced by coercion
Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) : data length is not a multiple of split variable
Warning in rmr.split(x, x, FALSE, keep.rownames = FALSE) : number of items to replace is not a multiple of replacement length Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) :
data length is not a multiple of split variable
Warning in rmr.split(v, ind, lossy = lossy, keep.rownames = TRUE) : number of items to replace is not a multiple of replacement length
Error in as(x, class(k)) :
no method or default for coercing “NULL” to “data.frame”
Calls: <Anonymous> ... apply.reduce -> c.keyval -> reduce.keyval -> lapply -> FUN -> as No traceback available
UPDATE
I have added the sample data and edited the code above. Hope this helps!
Sample Data:
NULL,"2014-03-14","PP"
345689202,"2014-03-14","AN"
234539390,"2014-03-14","PP"
123125444,"2014-03-14","AN"
NULL,"2014-03-14","AN"
901828393,"2014-03-14","AN"
There are some issues with as which have been identified recently. I don't see why as can't handle this by default, but you can modify coerce which handles the conversion with an S4 method to call as.data.frame.
setMethod("coerce",c("NULL","data.frame"), function(from, to, strict=TRUE) as.data.frame(from))
[1] "coerce"
as(NULL,"data.frame")
data frame with 0 columns and 0 rows

Resources