object not find when aggregating in R - r

I am trying to use aggregate in R. I found an example code:
attach(mtcars)
agg=aggregate(mtcars, by=list(cyl,vs),FUN=mean, na.rm=TRUE)
detach(mtcars)
This works fine. However when I try to do it using my data:
library(stats)
FileName="Raw.csv"
Raw=read.csv(FileName,header = TRUE)
Acc1=aggregate(Raw,by=list(Experiment,SsNum),FUN=mean, na.rm=TRUE)
I get the following error message:
Error in aggregate.data.frame(Raw, by = list(Experiment, SsNum), FUN = mean, object 'Experiment' not found
I also tries to run:
Acc2=aggregate(Raw,by=list(Raw$Experiment,Raw$SsNum),FUN=mean, na.rm=TRUE)
and I got the following error:
There were 50 or more warnings (use warnings() to see the first 50)
The warnings are:
1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
my main question is how the Acc1 is differrent from the online example (that works fine).
thank you very much
Ariel

you can just compute the mean of a numeric variable so you have at least to take a subset of the data excluding character variables. Thats were you ACC1 most likely differ from mtcars, because in mtcars there are only numeric values in, due to this you do not get a warning in the first line.
So in this line:
Acc2=aggregate(Raw,by=list(Raw$Experiment,Raw$SsNum),FUN=mean, na.rm=TRUE)
You get a error, because in RAW there appears to be column which are not numeric
Supposed you have:
set.seed(4)
Experiment <- sample(seq(1:3), 5, replace=TRUE)
SsNum <- sample(1:10, 5, replace=TRUE)
value <- rnorm(5)
df <- data.frame(Experiment, SsNum, value)
Then aggregate works as follows:
aggregate(value ~Experiment + SsNum, data = df, FUN = mean)
Experiment SsNum value
1 3 1 1.7768632
2 2 3 0.6892754
3 1 8 -1.2812466
4 1 10 0.8416977

Related

Reordering variables using the mean value of a separate column in r

I have a dataset of the form
regional.indicator ladder.score
1 A 100
2 A 200
3 B 30
4 B 40
5 C 50
where I am trying to reorder the variables by the mean ladder.score in a factor named regional.indicator and assign this new vector to order1(similar to this). My issue is that the code replies with the error that the regional.indicator does not exist.
Example
library(dplyr)
# Create dataset
df <- data.frame(regional.indicator = c("A","A","B","B","C"),
ladder.score = c(100,200, 30,40,50))
# Change regional.indicator to factor
df$regional.indicator <- as.factor(df$regional.indicator)
# Function where the error arises
order1 <- df %>%
group_by(regional.indicator)%>%
summarise(Laddermean = mean(ladder.score))%>%
arrange(Laddermean)%>%
pull(regional.indicator)
Error message that arose:
Error: Can't extract columns that don't exist.
x Column `regional.indicator` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
How can I get rid of this error or do this in a different way? Perhaps using forcats?
In case you have not solved your problem -> then do this:
Just add dplyr:: to summarise. I guess dplyr's summarise interfer with plyr package on your system:
replace:
summarise(Laddermean = mean(ladder.score))%>%
with
dplyr::summarise(Laddermean = mean(ladder.score))%>%

Loop through a character vector to use in a function

I am conducting a methodcomparison study, comparing measurements from two different systems. My dataset has a large number of columns with variabels containing measurements from one of the two systems.
aX and bX are both measures of X, but from system a and b. I have about 80 pairs of variabels like this.
A simplified version of my data looks like this:
set.seed(1)
df <- data.frame(
ID = as.factor(rep(1:2, each=10)),
aX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
bX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
aY = rep(1:10+rnorm(10,mean=1,sd=0.5), 2),
bY = rep(1:10-rnorm(10,mean=1,sd=0.5),2))
head(df)
ID aX bX aY bY
1 1 1.686773 2.755891 2.459489 -0.6793398
2 1 3.091822 3.194922 3.391068 1.0513939
3 1 3.582186 3.689380 4.037282 1.8061642
4 1 5.797640 3.892650 4.005324 3.0269025
5 1 6.164754 6.562465 6.309913 4.6885298
6 1 6.589766 6.977533 6.971936 5.2074973
I am trying to loop through the elements of a character vector, and use the elements to point to columns in the dataframe. But I keep getting error messages when I try to call functions with variable names generated in the loop.
For simplicity, I have changed the loop to include a linear model as this produces the same type of error as I have in my original script.
#This line is only included to show that
#the formula used in the loop works when
#called with directly with the "real" column names
(broom::glance(lm(aX~bX, data = df)))$r.squared
[1] 0.9405218
#Now I try the loop
varlist <- c("X", "Y")
for(i in 1:length(varlist)){
aVAR <- paste0("a", varlist[i])
bVAR <- paste0("b", varlist[i])
#VAR and cVAR appear to have names identical column names in the df dataframe
print(c(aVAR, bVAR))
#Try the formula with the loop variable names
print((broom::glance(lm(aVAR~bVAR, data = df)))$r.squared)
}
The error messages I get when calling the functions from inside the loop vary according to the function I am calling, the common denominator for all the errors is that the occur when I try to use the character vector (varlist) to pick out specific columns.
Example of error messages:
rmcorr(ID, aVAR, bVAR, df)
Error in rmcorr(ID, aVAR, bVAR, df) :
'Measure 1' and 'Measure 2' must be numeric
or
broom::glance(lm(aVAR~bVAR, data = df))
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion
Can you help me understand what goes wrong in the loop? Or suggest and show another way to acomplish what I am trying to do.
Variables aren't evaluated in formulas (the things with ~).
You can type
bert ~ ernie
and not get an error even if variables named bert and ernie do not exist. Formula store relationships between symbols/names and does not attempt to evaulate them. Also note we are not using quotes here. Variable names (or symbols) are not interchangeable with character values (ie aX is very different from "aX").
So when putting together a formula from string values, I suggest you use the reformualte() function. It takes a vector of names for the right-hand side and an optional value for the left hand side. So you would create the same formula with
reformulate("ernie", "bert")
# bert ~ ernie
And you can use the with your lm
lm(reformulate(bVAR, aVAR), data = df)
I'm too lazy to search for a duplicate on how to construct formulas programmatically, so here is a solution:
varlist <- c("X", "Y")
for(i in 1:length(varlist)){
#make these symbols:
aVAR <- as.symbol(paste0("a", varlist[i]))
bVAR <- as.symbol(paste0("b", varlist[i]))
#VAR and cVAR appear to have names identical column names in the df dataframe
print(c(aVAR, bVAR))
#Try the formula with the loop variable names
#construct the call to `lm` with `bquote` and `eval` the expression
print((broom::glance(eval(bquote(lm(.(aVAR) ~ .(bVAR), data = df)))))$r.squared)
}

Convert a character column in a dataframe into a numeric column in R

I'm trying to convert some financial data I scrapped, and it has turned into a big headache! I've tried most things out there like using as.numeric(as.character()) and it does not seem to work. What REALLY throws me off is how it can filter using Close >= 20 if it's a Character!? And I can plot it, but not do any mathematical operations! Not sure if this is an artifact of having it in a dataframe? Any help would be appreciated.
# Links to oil futures prices and downloads them
library(dplyr)
oilurl <- "http://quotes.ino.com/exchanges/contracts.html?r=NYMEX_CL"
download.file(oilurl, destfile = ".../Strip/OilStrip.html")
oilhtml <- read_html("..../Strip/OilStrip.html")
# Puts the Oil data into a df
wti_df <- as.data.frame(read_html("..../Strip/OilStrip.html") %>% html_table(fill = TRUE))
colnames(wti_df) <- c("A","Date","C","D","E","Close","G","H","I")
# Cleans up the data into just Dates and Close price, removes any less than $20
wti_df <- select(wti_df,Date,Close)
wti_df <- slice(wti_df, 3:1023)
wti_df <- filter(wti_df,Close >= 20)
#turn "Close" into numeric
transform(wti_df, Close = as.numeric)
I've tried pretty much everything to make the column into a numeric and this is the closest I get. Here is the summary
> summary(wti_df$Close)
Length Class Mode
[1,] 1 -none- numeric
[2,] 1 -none- numeric
But when I try and do anything like add a number to Close column or plot the data....
> plot(wti_df$Close)
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' is a list, but does not have components 'x' and 'y'
> 5.0 + wti_df$Close
Error in 5 + wti_df$Close : non-numeric argument to binary operator
"Transform" has some issues when giving it multiple arguments simultaneously (such as asking it to first apply "as.character" then "as.numeric" to your column. Maybe try a simpler approach to see if transform is your problem, or if something else is going on with your data.
wti_df$Close <- as.numeric(as.character(wti_df$Close))

having troubles with handling large data in R

Im currently making recommender system with 8k users and 200k items using recommenderlab package.
Before using the functions of recommenderlab, I'm having troubles with converting my data frame to real rating matrix.
item_idx mem_idx rating
1 00600015987465341234f7dae4 534122168382b 4
2 0060001660924533ad0cd443e1 53d79f413e3aa 5
3 006000195520453d7ac28e4b4b 53d79f413e3aa 5
4 0060001986642536d6fc77d269 535146eb5af95 4
5 00708969975005409278f828f3 540927366f478 5
This is the part of my data frame, all the (item_idx, mem_idx) pairs are distinct.
mat <- tapply(df$rating, list(df$mem_idx, df$ID), FUN=function(x) x)
I tried to convert data frame to matrix using this code, some times success but usually there occur error like this.
Error: cannot allocate vector of size 1.1 Gb
In the succeeded case,
r <- as(mat, "realRatingMatrix")
I applied this code to make it realRatingMatrix
But I always failed with this error
Error in which(x == 0, arr.ind = TRUE) :
error in evaluating the argument 'x' in selecting a method for function 'which': Error: (list) object cannot be coerced to type 'double'
Anyone who knows how to escape one of these errors, please help me.
Convert the dataframe to a sparse matrix and then to realRatingMatrix class
itm <- factor(data[,1])
mem <- factor(data[,2])
# sparsematrix
s <- sparseMatrix(
as.numeric(itm),
as.numeric(mem),
dimnames = list(
as.character(levels(itm)),
as.character(levels(mem))),
x = data[,3])
#convert to realRatingMatrix class
rm <- new("realRatingMatrix",data=s)

t-tests on different groups by iteration in R

I have a group of patient scores such as:
P1 <- c(7.81,6.93,7.11)
P2 <- c(8.61,7.95,8.11)
P3 <- c(8.41,7.65,7.01)
....etc
I have a big group of healthy people scores such as:
HC <- c(5.22,4.87,6.93,5.27,6.01,4.55,.....etc)
I have listed the names of patients in a vector:
patients <- c('P1','P2','P3',....etc)
I am trying to perform t-tests for each of the patient scores against the healthy control group. I have written:
for (i in patients){t.test(patients[i],HC)}
I was expecting R to print the result of a load of t-tests to the console but it tells me:
Error in t.test.default(patients[i], HC) :
not enough 'x' observations
In addition: Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA
I just need to get some P-values on the data and think this may be a simple syntax problem but don't work much with R and can't seem to find a quick answer. Any help would be great?
Use a list for patients containing the actual vectors, rather than the names of the vectors:
> patients <- list(P1, P2, P3)
> for (i in patients){print(t.test(i,HC)$p.value)}
[1] 0.005015573
[1] 0.0002672035
[1] 0.00899473
Try this: for (i in patients){t.test(get(i),HC)}
The problem is that i is cycling through your patients vector and returning a character. R doesn't know what to do with the character 'P1'. get tells R to look in the environment for an object called 'P1'.

Resources