Undefined columns error when using lapply - r

Why does this work and not the lapply?
Using the built in base R ChickWeight data:
names(ChickWeight)<-tolower(names(ChickWeight))
This works if I just want one correlations for 1 column, "time":
library(reshape)
cor(cast(melt(ChickWeight[,c("time","diet","chick")],id.vars=c("chick","diet")),chick~diet))
This doesn't when I try to apply the same thing to both "time" and "weight", i.e. columns 1:2:
lapply(as.list(ChickWeight[,c(1:2)]), FUN=function(i){
cor(cast(melt(ChickWeight[,c(i,"diet","chick")], id.vars=c("chick","diet")),chick~diet))
})
So the fact that the function part works fine by itself makes me think there's something I don't understand about using lapply like this. I get this error:
Error in `[.data.frame`(ChickWeight, , c(i, "diet", "chick")) :
undefined columns selected

Ah I see what you are trying to do here now.
Replace:
lapply(as.list(ChickWeight[,c(1:2)]), .........
With
lapply(names(ChickWeight)[1:2], .............
You are passing the column values when what you want is the column name.

Related

Trying to find a better way to sorting the data in R

In my data frame I am trying to sort the data in descending order. I am using the below line of code for sorting my data and it works as intended.
CNS25VOL <- CNS25VOL[order(-CNS25VOL$MATVOL22), ]
However if I refer to the same column by it's index number, the code throws an error
CNS25VOL <- CNS25VOL[order(-CNS25VOL[, 2]), ]
Error thrown is
Error in CNS25VOL[, 2] : incorrect number of dimensions
While I do have a solution to what I am intending to do, but issue I see is if all of a sudden name of my column changes the code won't work. I know that their position will stay same in the data frame.
How can we handle it.
order(-CNS25VOL[, 2]) order here does expect a vector which you try to construct via the [] in CNS25VOL[, 2]. Normal dataframes will return a vector consisting only of the 2nd column. A tibble however will return a tibble with only one column.
You can reproduce the behaviour of normal data.frames with the drop = FALSE argument to [] as in
CNS25VOL[, 2, drop = TRUE]
Try to always be aware whether you are using a standard data.frame or a tibble or a data.table because they look very similar and are not in the details. Also see https://tibble.tidyverse.org/reference/subsetting.html
dplyr functions tend to give you a tibble back even if you fed them a classical data.frame.

Create multiple columns in data.table with `:=` without colnames

I was wandering if it is possible to use the following data.table feature without providing column names:
dt <- data.table(mtcars)[,.(mpg, cyl)]
dt[,`:=`(avg=mean(mpg), med=median(mpg))]
Let's say for example that I have a function that return more than one column like this
mfun=function(x){cbind(x^2,x^3)}
But if I want to assign it as new columns that specific way, R would execute function mfun twice, which is not efficient.
dt[,`:=`(sqr=mfunc(mpg)[,1], cub=mfunc(mpg)[,2])]
So, without 'work arounds', is it possible to do something similar to this:
dt[,`:=`(mfunc(mpg))] #this returns an error
dt[,`:=`(error2=mfunc(mpg))] #this returns an error

Error in using grep in SparkR

I am having an issue with subsetting my Spark DataFrame.
I have a DataFrame called nfe, which contains a column called ITEM_PRODUTO that is formatted as a string. I would like to subset this DataFrame based on whether the item column contains the word "AREIA". I can easily subset the data based on an exact phrase:
nfe.subset1 <- subset(nfe, nfe$ITEM_PRODUTO == "AREIA LAVADA FINA")
nfe.subset2 <- subset(nfe, nfe$ITEM_PRODUTO %in% "AREIA")
However, what I would like is a subset of all rows that contain the word "AREIA" in the ITEM_PRODUTO column. When I try to use grep, though, I receive an error message:
nfe.subset3 <- subset(nfe, grep("AREIA", nfe$ITEM_PRODUTO))
# Error in as.character.default(x) :
# no method for coercing this S4 class to a vector
I've tried multiple iterations of syntax, and tried grepl as well, but nothing seems to work. It's probably a syntax error, but could anyone help me out?
Thanks!
Standard R functions cannot be applied to SparkDataFrame. Use either like`:
where(nfe, like(nfe$ITEM_PRODUTO, "%AREIA%"))
or rlike:
where(nfe, rlike(nfe$ITEM_PRODUTO, ".*AREIA.*"))

gsub apply combination in R

I am trying to use gsub on every column of a dataframe to remove some characters, I have tried using apply to do this without success:
data<-apply(data,2, function(x) gsub("£","",data[x]))
returns error
Error in `[.data.frame`(data, x) : undefined columns selected
I know it works if I do
for(i in 1: length(data)){data[,i]<-gsub("£","",data[,i]) }
But why doesn't the apply call work?
Here's the next best reproducible example. Though there might be a better / faster (vectorized) way if I thought a little harder. But since you asked for apply:
# just turn it to characters in order to
# turn . to , ... was just the first dataset that came to
# but as character should not be necessary for your data
ds[] <- sapply(mtcars,function(x) gsub("\\.",",",as.character(x)))

Error in R "undefined columns selected"

I am trying to initiate this code using the zoo command:
gld <- zoo(gld[,7], gld_dates)
Unfortunately I get an error message telling me this:
Error in `[.data.frame`(gld, , 7) : undefined columns selected
I want to use the zoo function to create zoo objects from my data.
The function should take two arguments: a vector of data and
a vector of dates.
This is the data I am using[LINK BROKEN].
I believe I have have 7 columns in my data set. Any ideas?
The code I am trying to implement is found here[LINK BROKEN].
Is their anything wrong with this code?
You don't say what your gld_dates is exactly, but if gld starts as your original data and you want to make a zoo object of the 7th column ordering by the 1st column (dates), I can do
gld_zoo <- zoo(gld[, 7], gld[, 1])
just fine. Equivalently, but with more readability,
gld_zoo <- zoo(gld$Adj.close, gld$Date)
reminds me what each column is.
Subsetting requires the names of the subset columns to match those in the data frame. This code subsets the dataset french_fries with potat instead of potato:
data("french_fries")
df_potato <- french_fries[, c("potatoes")]
and it fails with:
Error in `[.data.frame`(french_fries, , c("potatoes")) :
undefined columns selected
but using the right name potato works:
df_potato <- french_fries[, c("potato")]

Resources