Error in R "undefined columns selected" - r

I am trying to initiate this code using the zoo command:
gld <- zoo(gld[,7], gld_dates)
Unfortunately I get an error message telling me this:
Error in `[.data.frame`(gld, , 7) : undefined columns selected
I want to use the zoo function to create zoo objects from my data.
The function should take two arguments: a vector of data and
a vector of dates.
This is the data I am using[LINK BROKEN].
I believe I have have 7 columns in my data set. Any ideas?
The code I am trying to implement is found here[LINK BROKEN].
Is their anything wrong with this code?

You don't say what your gld_dates is exactly, but if gld starts as your original data and you want to make a zoo object of the 7th column ordering by the 1st column (dates), I can do
gld_zoo <- zoo(gld[, 7], gld[, 1])
just fine. Equivalently, but with more readability,
gld_zoo <- zoo(gld$Adj.close, gld$Date)
reminds me what each column is.

Subsetting requires the names of the subset columns to match those in the data frame. This code subsets the dataset french_fries with potat instead of potato:
data("french_fries")
df_potato <- french_fries[, c("potatoes")]
and it fails with:
Error in `[.data.frame`(french_fries, , c("potatoes")) :
undefined columns selected
but using the right name potato works:
df_potato <- french_fries[, c("potato")]

Related

Trying to find a better way to sorting the data in R

In my data frame I am trying to sort the data in descending order. I am using the below line of code for sorting my data and it works as intended.
CNS25VOL <- CNS25VOL[order(-CNS25VOL$MATVOL22), ]
However if I refer to the same column by it's index number, the code throws an error
CNS25VOL <- CNS25VOL[order(-CNS25VOL[, 2]), ]
Error thrown is
Error in CNS25VOL[, 2] : incorrect number of dimensions
While I do have a solution to what I am intending to do, but issue I see is if all of a sudden name of my column changes the code won't work. I know that their position will stay same in the data frame.
How can we handle it.
order(-CNS25VOL[, 2]) order here does expect a vector which you try to construct via the [] in CNS25VOL[, 2]. Normal dataframes will return a vector consisting only of the 2nd column. A tibble however will return a tibble with only one column.
You can reproduce the behaviour of normal data.frames with the drop = FALSE argument to [] as in
CNS25VOL[, 2, drop = TRUE]
Try to always be aware whether you are using a standard data.frame or a tibble or a data.table because they look very similar and are not in the details. Also see https://tibble.tidyverse.org/reference/subsetting.html
dplyr functions tend to give you a tibble back even if you fed them a classical data.frame.

Subsetting a dataframe in R with named vector giving undefined columns error

I am trying to extract specific columns from a dataframe using a defined named vector created beforehand. The vector looks like this -
>select_cols <- ids$test
>select_cols
01534314-832a-495f-99c2-40d9783401a2 053a0aff-7912-4f18-b997-d2f20d91bbf0
"TCGA-NH-A8F7-01A" "TCGA-DM-A288-01A"
My dataframe contains gene expression values for hundreds of samples, which looks like this
>exp.coad
ens.names TCGA-NH-A8F7-01A TCGA-DM-A288-01A TCGA-CA-5254-01A TCGA-5M-AAT5-01A TCGA-AA-3489-01A
ENSG00000000003.15 TSPAN6 13.067896 11.586922 11.022340 12.431234 11.768204
ENSG00000000005.6 TNMD 5.905824 4.061119 2.174923 6.898203 7.191496
ENSG00000000419.13 DPM1 11.677447 11.406170 11.355047 11.899990 11.245281
ENSG00000000457.14 SCYL3 9.226378 9.256162 8.972929 9.223441 9.316666
ENSG00000000460.17 C1orf112 8.472735 8.393176 8.039962 9.225961 8.731497
ENSG00000000938.13 FGR 6.745847 7.344758 7.014380 5.558205 9.901777
When I'm subsetting the dataframe using the following command
exp5 <- exp.coad[up, select_cols]), where up = dynamic value assigned in for loop
I am getting the following error -
Error in `[.data.frame`(exp.coad, up, select_cols) :
undefined columns selected
However, if I try to extract columns after explicitly mentioning it, I can get the desired dataframe
> exp.coad[up, c('TCGA-NH-A8F7-01A','TCGA-DM-A288-01A')]
TCGA-NH-A8F7-01A TCGA-DM-A288-01A
ENSG00000158769.18 12.64327 13.06597
I verified that the named vector is indeed a character datatype by doing
typeof(select_cols)
[1] "character"
Since I have to extract multiple columns in a loop, I can't explicitly mention column names every time. What could be the reason behind this error?
Update
I resolved the issue by using the following tweak -
exp.coad[up, names(exp.coad) %in% select_cols]

Undefined columns error when using lapply

Why does this work and not the lapply?
Using the built in base R ChickWeight data:
names(ChickWeight)<-tolower(names(ChickWeight))
This works if I just want one correlations for 1 column, "time":
library(reshape)
cor(cast(melt(ChickWeight[,c("time","diet","chick")],id.vars=c("chick","diet")),chick~diet))
This doesn't when I try to apply the same thing to both "time" and "weight", i.e. columns 1:2:
lapply(as.list(ChickWeight[,c(1:2)]), FUN=function(i){
cor(cast(melt(ChickWeight[,c(i,"diet","chick")], id.vars=c("chick","diet")),chick~diet))
})
So the fact that the function part works fine by itself makes me think there's something I don't understand about using lapply like this. I get this error:
Error in `[.data.frame`(ChickWeight, , c(i, "diet", "chick")) :
undefined columns selected
Ah I see what you are trying to do here now.
Replace:
lapply(as.list(ChickWeight[,c(1:2)]), .........
With
lapply(names(ChickWeight)[1:2], .............
You are passing the column values when what you want is the column name.

Error in using grep in SparkR

I am having an issue with subsetting my Spark DataFrame.
I have a DataFrame called nfe, which contains a column called ITEM_PRODUTO that is formatted as a string. I would like to subset this DataFrame based on whether the item column contains the word "AREIA". I can easily subset the data based on an exact phrase:
nfe.subset1 <- subset(nfe, nfe$ITEM_PRODUTO == "AREIA LAVADA FINA")
nfe.subset2 <- subset(nfe, nfe$ITEM_PRODUTO %in% "AREIA")
However, what I would like is a subset of all rows that contain the word "AREIA" in the ITEM_PRODUTO column. When I try to use grep, though, I receive an error message:
nfe.subset3 <- subset(nfe, grep("AREIA", nfe$ITEM_PRODUTO))
# Error in as.character.default(x) :
# no method for coercing this S4 class to a vector
I've tried multiple iterations of syntax, and tried grepl as well, but nothing seems to work. It's probably a syntax error, but could anyone help me out?
Thanks!
Standard R functions cannot be applied to SparkDataFrame. Use either like`:
where(nfe, like(nfe$ITEM_PRODUTO, "%AREIA%"))
or rlike:
where(nfe, rlike(nfe$ITEM_PRODUTO, ".*AREIA.*"))

Cbind Error " Object not found"

I am trying to run a panel regression in R studio. When I use cbind command
x<- cbind(DEX, GRW , Debt, Life)
for my independent variables,it returns this error;
" Error in cbind(DEX, GRW, Debt, Life) : object 'DEX' not found"
However my dependent variable works fine with cbind
as shown below
y<- cbind(GDP)
Can you help?
Thanks.
You defined one and only one object when you executed:
tino=read.delim("clipboard")
The column names of that object are not handled as other objects. If you wnated to create a new object from that dataframe you could do this:
x <- with(tino, cbind(DEX, GRW , Debt, Life) )
It's possible this might do violence to the contents of x and it would be safer to extract as just those columns of hte dataframe, tino:
x <- tino[ , c('DEX', 'GRW' , 'Debt', 'Life')]
You should realize that vectors passed to cbind will get turned into matrices (where all their elements have the same class and no other attributes are supported). Matrices have different features than dataframes (which can have multiple column class attributes).

Resources