I have this code iv written for counting that reads a directory full of files and reports the number of completely observed cases in each data file. The function should return a data frame where the first column is the name of the file and the second column is the number of complete cases. I need help in the error in this code which is:
Error in [.data.frame(data, i) : undefined columns selected
In addition: Warning messages:
1: In comp[i] <- !is.na(data[i]) :
number of items to replace is not a multiple of replacement length
2: In comp[i] <- !is.na(data[i]) :
number of items to replace is not a multiple of replacement length
3: In comp[i] <- !is.na(data[i]) :
number of items to replace is not a multiple of replacement length
The code is the following:
complete<-function(directory, id=1:332){
files.list<-list.files(directory, full.names=TRUE, pattern=".csv")
comp<-character()
return.data<-data.frame()
nobs<-numeric()
for(i in id){
data<-read.csv(files.list[i])
comp[i]<-!is.na(data[i])
nobs[i]<-nrow(comp[i])
}
return.data<-c(id,nobs)
}
Your problem is, that !is.na() returns a boolean vector and not a single value, you cannot insert multiple elements into the single element comp[i].
In R there is a function complete.cases which does exactly what you attempted. With this your function would look like this
complete<-function(directory, id=1:332){
files.list<-list.files(directory, full.names=TRUE, pattern=".csv")
nobs <- numeric(length(id))
for(i in id){
data<-read.csv(files.list[i])
nobs[i]<-sum(complete.cases(data))
}
return.data<-data.frame(id,nobs)
}
That aside your code has several flaws I want to point out
why is comp of type character?
allocate the size of a vector if you know it beforehand (nobs <- numeric(length(id)))
do you really want to check only columni of your ith loaded data.frame` for missing values?
if you assign return.data <- c(id,nobs) return.data will be a single numeric vector with ids at the beginning and nobs at the end.
you need to provide an index to your data.. so that it selects all rows and i column.e.g
comp[i]<-!is.na(data[ ,i])
Related
First time poster, so please let me know how to improve my question if needed. Certainly eager to improve.
I have a data frame that has a column of strings that I need to replace characters in using a column containing a list of multiple positions and another column with a list of characters for those positions. Using example data:
#create the values to build the data frame for use case
food <-"pasta"
string <-"bacorogi"
pos <- c(1,4,7)
chars <- c("m","a","n")
#convert vectors to lists
poslist <-list(pos)
charlist <-list(chars)
#create data frame
df <-data.frame(cbind(food,
poslist,
charlist,
string))
I figured out how to do this when the string, positions, and characters exist as separate vectors using:
for(i in seq_along(pos)) substring(string, pos[i], pos[i]) <- chars[i]
string
[1] "macaroni"
When I try to apply this to the data frame I run into an error:
for(i in seq_along(df$pos)) substring(df$string, df$pos[i], df$pos[i]) <- df$chars[i]
Error in `substring<-`(`*tmp*`, df$pos[i], df$pos[i], value = df$chars[i]) :
(list) object cannot be coerced to type 'integer'
To try to properly apply this to a data frame, I tried below and got an error:
for(i in seq_len(nrow(df))) substring(df$string, df$poslist[i], df$poslist[i]) <- df$charlist[i]
Error in `substring<-`(`*tmp*`, df$poslist[i], df$poslist[i], value = df$charlist[i]) :
(list) object cannot be coerced to type 'integer'
I am not really sure how to get around this problem or how to adapt this to a data frame.
I do have more rows in my data frame, but I figured if someone could help me figure out how to do this to one row, I could take it from there. Thanks for any input you can provide!
The nesting is one level deep. So, extract that element and loop
for(i in seq_along(df$pos[[1]])) {
substring(df$string[[1]], df$pos[[1]][i],
df$pos[[1]][i]) <- df$charlist[[1]][i]
}
-output
df$string
#$string
#[1] "macaroni"
If there are more rows, do a nested loop
for(i in seq_along(df$pos)) {
for(j in seq_along(df$pos[[i]])) {
substring(df$string[[i]], df$pos[[i]][j],
df$pos[[i]][j]) <- df$charlist[[i]][j]
}
}
df$string
#$string
#[1] "macaroni"
I am new to R and really need some help. I keep getting the error message
"Error in eff_weights[i, ] <- eff.port$pw : number of items to
replace is not a multiple of replacement length"
when I run the loop. Can someone help me figure out what I am doing wrong. Thank you so much in advance!
# Create for loop to find efficient frontier
for (i in 1 : length(grid)) {
eff.port <- portfolio.optim(returns, pm = grid[i], shorts =TRUE)
vector_pm[i] <- eff.port$pm
vector_psd[i] <- eff.port$ps
eff_weights[i, ] <- eff.port$pw
}
Without a sample of your data or dummy data to reproduce the problem it is hard to provide a certain solution. However, in your loop you assign a vector of values from a column, eff.port$pw, to the ith row of a dataframe or matrix, eff_weights[i, ]. The error message is saying the are different lengths - use the length() or dim() functions to compare the lengths of these two. Your vector eff.port$pw and row eff_weight[i,] must be the same length.
I have multiple dataframes and would like to remove the first row in all of them.
I have tried using a for loop but cannot understand what I am doing wrong
for (i in cities){
i <- i[-1, ]
}
I get the following error code:
Error in i[-1, ] : incorrect number of dimensions
If we assume that the only objects in your workspace are dataframes then this might succeed:
cities <- objects() )
for (i in cities) { assign(i, get(i)[-1,])}
Explanation:
Two thing wrong with original codes:
One was already mentioned in comments. "df" is not the same as df. You need to use get to convert a character value to a "true" R name that is used to retrieve an object having that name. The result of object() is only a character value. In R the term "name" means a "language object". See the help page: ?mode. (There is potential confusion about rownames and columnnames which are always "character"-class.) It's not like SAS which is a macro language that has no such distinction.
The second error was trying to get substitution for the i on the left-hand side of <-. The would have failed even if you were working with actual R names. The assign function is designed to handle character values that are then converted to R names.
say you get a list of all the tables in your environment, and you call that list cities. You can't just iterate over each value of cities and change things, because in the list they are just characters.
Here is what you need:
for (i in cities){
tmp <- get(i) # load the actual table
tmp <- tmp[-1, ] # remove first column
assign(i, tmp) # re-assign table to original table name
}
Let say that I have these vectors:
time <- c(306,455,1010,210,883,1022,310,361,218,166)
status <- c(1,1,0,1,1,0,1,1,1,1)
gender <- c(1,1,1,1,1,1,2,2,1,1)
And I turn it into these data frame:
dataset <- data.frame(time, status, gender)
I want to list the factors in the third column using this function (p/s: pardon the immaturity. I'm still learning):
getFactor<-function(dataset){
result <- list()
result["Factors"] <- unique(dataset[[3]])
return(result)
}
And all I get is this:
getFactor(dataset)
$Factors
[1] 1
Warning message:
In result["Factors"] <- unique(dataset[[3]]) :
number of items to replace is not a multiple of replacement length
I tried using levels, but all I get is an empty list. My question is (1) why does this happen? and (2) is there any other way that I can get the list of the factor in a function?
Solution is simple, you just need double brackets around "Factors" :)
In the function
result[["Factors"]] <- unique(dataset[[3]])
That should be the line.
The double brackets return an element, single brackets return that selection as a list.
Sounds silly, by try this
test <- list()
class(test["Factors"])
class(test[["Factors"]])
The first class will be of type 'list'. The second will be of type 'NULL'. This is because the single brackets returns a subset as a list, and the double brackets return the element itself. It's useful depending on the scenario. The element in this case is "NULL" because nothing has been assigned to it.
The error "number of items to replace is not a multiple of replacement length" is because you've asked it to put 3 things into a single element (that element is a list). When you use double brackets you actually put it inside a list, where you can have multiple elements, so it can work!
Hope that makes sense!
Currently, when you create your data frame, dataset$gender is double vector (which R will automatically do if everything in it is numbers). If you want it to be a factor, you can declare it that way at the beginning:
dataset <- data.frame(time, status, gender = as.factor(gender))
Or coerce it to be a factor later:
dataset$gender <- as.factor(gender)
Then getting a vector of the levels is simple, without writing a function:
level_vector <- levels(dataset$gender)
level_vector
You're also subsetting lists & data frames incorrectly in your function. To call the third column of dataset, use dataset[,3]. The first element of a list is called by list[[1]]
odds_evens <- function(df){
odds <- 0
for ( n in df){
if(n%%2==1){
odds<-odds+1
}
}
odds
}
I want to use this function in another function and on a different data set: naively I supposed that I would include source("reference to the file") and then use lapply(data, odds_evens) to count the number of odd numbers in every row of a data set.
I get both after the console runs source(filename) and when the function called within my second function:
Error in FUN(X[[1L]], ...) : unused argument (X[[1]])
Could someone kindly explain what is going on. Still new at this!