Individual variables from the top row of my dataframe are not recognized as objects - r

I am really new to "R" and stackoverflow here a beginner question it seems so simple that I can not find the answer in the net...
I loaded my data:
df<- read.csv2("HOR.csv", na="NA",header = TRUE)
Now I would like to get an overview from individual variables of my df.
I use
summary(df)
and that works...
But when I try this with individual variables/columns, I get an error
Object not found
summary(A)
eg. the console says:
Error in summary(PSumme) : Object 'PSumme' not found
It is a very basic question but i can not do any type of analysis since it seems that R does not recognize any of my columns as objects, I get the same error when I try adding or subtracting columns with each others or trying to change variables e.g.:
PSumme * -1
I have already tried tibble and other commands to turn the columns heads into variables and it seems to work but the variables are not treated as objects.
Bests and ty for your time!

Related

Reticulate (R) Error: In py_to_r.pandas.core.frame.DataFrame(result) : index contains duplicated values: row names not set

I using Reticulate. I am passing a data frame from R to pandas and back again. I get this error:
In py_to_r.pandas.core.frame.DataFrame(result) :
index contains duplicated values: row names not set
I apologize for not having a reproducible example. My code is spread across several scripts and hardwired with some personal information at the moment.
Instead I am looking for someone who knows this error and can comment on what is going on.
Thanks so much!

Column of original data is returning length as 0 in R

I am working in R studio and trying to create a table. The error I keep getting is "Error in table(players, fitmod1$classification) : all arguments must have the same length". When I check the length of my data, fitmod1$classification is returning a value; but players is returning 0. I have no idea how to fix this.
Player's is a qualitative column of the Hitters data in R package ISLR. fitmod1 is a mclust model. I am attaching my code below so hopefully that helps! Thanks]1
Your issue is that the players are the row names and not an actual column of the data. So when you subset the Hitter's data frame with:
players <- Hitters[,0]
you end up with an empty dataframe (though the rows are still named which what you are seeing when you view it in RStudio).
Instead you want to get the row names and store them as a vector:
players <- row.names(Hitters)
You will now be able to generate a table.
Here is all of the code (by the way it is much easier for us as a community to answer your questions if you use the code feature in stack overflow rather than attaching a png. This way we can copy and paste your code rather than having to type it by hand) :
library(ISLR)
library(mclust)
data(Hitters)
Hitters=Hitters[,c(1:7)]
Hitters<-na.omit(Hitters)
players <- row.names(Hitters)
fitmod1<-Mclust(Hitters, G=3, modelNames=c("VEE"))
table(players, fitmod1$classification)

R lapply-split-rbindlist - does subset cause problems?

I'm sure this will be very easy as I'm still an R beginner but here goes...
I've started with a data frame which I've successfully put through lapply-split followed by rbindlist to regenerate as a dataframe.
From this same data set, I've subset some data and performed lapply-split followed by rbindlist and get the following error:
"Error in rbindlist(df) : Item 1 of list input is not a data.frame,
data.table or list"
This is confusing since it's the same (sub)set of data being split by the same parameter.
When I call:
df[1]
I get:
$SWS1Ami
[1] 13451.02
which is the mean value I wanted to calculate for the SWS1Ami group (so it seems to have done the lapply split correctly). When I call:
typeof(df[1])
I see it tells me this element(?) type is a list.
Two questions:
(1) What could cause rbindlist to not work after doing lapply-split? Why does this seem to sometimes work and sometimes not work?
(2) Is there a quick litmus test to tell if your dataframe is in the "right" setup to undergo lapply-split-rbindlist?

R: Error in .Primitive, non-numeric argument to binary operator

I did some reading on similar SO questions, but couldn't figure out how to resolve my error.
I have written the following string of code:
points[paste0(score.avail,"_pts")] <-
Map('*', points[score.avail], mget(paste0(score.avail,'_m')) )
Essentially, I have a list of columns in the 'points' data frame, defined by 'score.avail'. I am multiplying each of the columns by a respective constant, defined as the paste0(score.avail, '_m') expression. It appends new fields based on the multiplication, given by paste0(score.avail, "_pts") expression.
I have used this function before in a similar setup with no issues. However, I am now getting the following error:
Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
non-numeric argument to binary operator
I'm pretty sure R is telling me that one of the fields I'm trying to multiply is not numeric. However, I have checked all my fields, and they are numeric. I have even tried running a line as.numeric(score.avail) but that doesn't help. I also ran the following to remove NA's in the fields (before the Map function above).
for(col in score.avail){
points[is.na(get(col)) & (data.source == "average" |
data.source == "averageWeighted"), (col) := 0]}
The thing that stumps me is that this expression has worked with no issues before.
Update
I did some more digging by separating out each component of my original function. I'm getting odd output when running points[score.avail]. Previously when I ran this, it would return just the columns for all of my rows. Now, however, I'm getting none of the rows in my original data frame -- rather, it is imputing the column names in the 'score.avail' list as rows and filling in NA's everywhere (this is clearly the source of my problem).
I think this is because I'm using the object I'm pointing to is a data.table with keyvars set. Previously with this function, I had been pointing to a data frame.
Off to try a few more things.
Another Update
I was able to solve my problem by copying the 'points' object using as.data.frame(). However, I will leave the question open to see if anyone knows how to reset the data table key vars so that the function I specified above will work.
I was able to solve my problem by copying the 'points' object using as.data.frame(). Apparently classifying the object as a data.table was causing my headaches.

R: partimat function doesn't recognize my classes

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.
From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Resources