new R user has regression trouble - r

I am new to R and am trying to run a regression analysis. I have constructed arbitrary vectors with the c() function to learn the plot, lm, fit, abline, and summary functions. That has worked properly, but when trying to regress imported data, I receive the following error message. I don't know what's causing the error or how to fix it. Any thought? Thanks.
library(xlsx)
Loading required package: xlsxjars
Loading required package: rJava
x <- "~/Desktop/x.xlsx"
y <- "~/Desktop/y.xlsx"
X <- read.xlsx(x,1)
Y <- read.xlsx(y,1)
dim(X)
[1] 149 1
dim(Y)
[1] 149 1
plot(X,Y)
Error in stripchart.default(x1, ...) : invalid plotting method
plot(X)
plot(Y)
Also, I don't think I understand all of the arguments accepted in the read.xlsx function. For example, if sheetindex is meant to index the sheets, wouldn't, in this example, x be 1 and y be 2? But then:
X <- read.xlsx(x,1)
Y <- read.xlsx(y,2)
Error in sheets[[sheetIndex]] : subscript out of bounds
Furthermore, the dimension is incorrect. The .xlsx file has 1 column, 150 rows, and no header.
dim(X)
[1] 149 1
When converting to a .csv file, which I don't particularly want to do b/c of the total number of .xlsx file I have, I still have the same plotting error, however the dimension seems to be correct. In this example, the number of rows and columns remain the same at 1 and 150 respectively, but there is a header.
x <- "~/Desktop/x.csv"
y <- "~/Desktop/y.csv"
X <- read.table(x, header = T)
Y <- read.table(y, header = T)
plot(X,Y)
Error in stripchart.default(x1, ...) : invalid plotting method
dim(X)
[1] 150 1

The problem is that X and Y are objects called data frames (?data.frame for details) and not vectors. the plot function is actually a wrapper around a family of other object-specific plot functions and in this case is trying to plot using stripchart(), which is causing the problem. This reproduces the problem, and fixes it:
X=data.frame(x=1:100)
Y=data.frame(y=rnorm(100,mean=1:100,sd=5))
plot(X,Y)
names(X)
names(Y)
plot(X$x, Y$y)
Assuming your data always consist of just one column, you could fix your code above by converting from a data.frame into an object of type numeric (i.e., the same sort of object as X=c(1,2,3,5)) which can be done a variety of ways
X <- unlist(read.xlsx(x,1))
Y <- unlist(read.xlsx(y,1))
or, alternatively, it's better to just havie read.xlsx() return a list instead of a data.frame
X <- read.xlsx(x,1, as.data.frame=FALSE)
Y <- read.xlsx(y,1, as.data.frame=FALSE)
Or you can just access the first column of the data.frame when you call plot
plot(X[,1], Y[,1])
See the help files for what all these functions return (e.g. ?as.numeric, ?unlist, ?names, etc.) and also see ?class, ?mode and ?typeof for querying object properties.

Furthermore, the dimension is incorrect. The .xlsx file has 1 column, 150 rows, and no header.
Header is by default TRUE, so you should specify this in your read.xlsx call.
X <- read.xlsx(x,1, header = TRUE)
Regarding the plot error:
plot(X,Y)
Error in stripchart.default(x1, ...) : invalid plotting method
read.xlsx returns data.frames, thats why the error shows up. Here is an example:
X <- data.frame(rnorm(150))
Y <- data.frame(rnorm(150))
plot(X, Y)
# Error in stripchart.default(x1, ...) : invalid plotting method
Please read carefully the read.xlsx documentation and about R object types.

Related

error (unused argument) using plyr with lattice xyplot

Hello everybody on stackoverflow,
it's my first question asked here... (well, actually the first one no one had already replied to!).
I'm trying to use lattice xyplot function to plot a big df (2362422 rows), that should be splitted by a variable in several subplots (each of them with about 52 panels).
This is a highly simplified reproduction of the df and of the code I'm using:
library(lattice)
library(plyr)
set.seed(1)
df <- as.data.frame(cbind(x = rnorm(30), y=(1:2), z=rnorm(30), q = c("a","b","c","d","e")))
grpro <- function () {xyplot (x ~ z| q, data=df)}
grpro()
When I try to call the grpro function with d_ply to plot all the subplots based on the y variable, with the following code
d_ply(df, .(y), grpro)
I get the following error
Error in .fun(.data[[i]], ...) : unused argument (.data[[i]])
For what I understand, d_ply function splits the df in several dataframes, in this case two dfs based on the values "1" and "2" of y.
I assume that my code is working on that, and any other argument used in my grpro seems to be useful also when I split the df by y.
So, where am I wrong?
Thanks a lot for your help,
MZ

How does the curve function in R work? - Example of curve function

How does the following code work? I got the example when I was reading the help line of R ?curve. But i have not understood this.
for(ll in c("", "x", "y", "xy"))
curve(log(1+x), 1, 100, log = ll,
sub = paste("log= '", ll, "'", sep = ""))
Particularly , I am accustomed to numeric values as arguments inside the for-loop as,
for(ll in 1:10)
But what is the following command saying:
for(ll in c("","x","y","xy"))
c("","x","y","xy") looks like a string vector? How does c("","x","y","xy") work inside curve
function as log(1+x)[what is x here? the string "x"? in c("","x","y","xy")] and log=ll ?
Apparently, there are no answers on stack overflow about how the curve function in R works and especially about the log argument so this might be a good chance to delve into it a bit more (I liked the question btw):
First of all the easy part:
c("","x","y","xy") is a string vector or more formally a character vector.
for(ll in c("","x","y","xy")) will start a loop of 4 iterations and each time ll will be '','x','y','xy' respectively. Unfortunately, the way this example is built you will only see the last one plotted which is for ll = 'xy'.
Let's dive into the source code of the curve function to answer the rest:
First of all the what does the x represent in log(1+x)?
log(1+x) is a function. x represents a vector of numbers that gets created inside the curve function in the following part (from source code):
x <- exp(seq.int(log(from), log(to), length.out = n)) #if the log argument is 'x' or
x <- seq.int(from, to, length.out = n) #if the log argument is not 'x'
#in our case from and to are 1 and 100 respectively
As long as the n argument is the default the x vector will contain 101 elements. Obviously the x in log(1+x) is totally different to the 'x' in the log argument.
as for y it is always created as (from source code):
y <- eval(expr, envir = ll, enclos = parent.frame()) #where expr is in this case log(1+x), the others are not important to analyse now.
#i.e. you get a y value for each x value on the x vector which was calculated just previously
Second, what is the purpose of the log argument?
The log argument decides which of the x or y axis will be logged. The x-axis if 'x' is the log argument, y-axis if 'y' is the log argument, both axis if 'xy' is the log argument and no log-scale if the log argument is ''.
It needs to be mentioned here that the log of either x or y axis is being calculated in the plot function in the curve function, that is the curve function is only a wrapper for the plot function.
Having said the above this is why if the log argument is 'x' (see above) the exponential of the log values of the vector x are calculated so that they will return to the logged ones inside the plot function.
P.S. the source code for the curve function can be seen with typing graphics::curve on the console.
I hope this makes a bit of sense now!

Exclude Node in semPaths {semPlot}

I'm trying to plot a sem-path with R.
Im using an OUT file provinent from Mplus with semPaths {semPLot}.
Apparently it seems to work, but i want to remove some latent variables and i don't know how.
I am using the following syntax :
Out from Mplus : https://www.dropbox.com/s/vo3oa5fqp7wydlg/questedMOD2.out?dl=0
outfile1 <- "questedMOD.out"
```
semPaths(outfile1,what="est", intercepts=FALSE, rotation=4, edge.color="black", sizeMan=5, esize=TRUE, structural="TRUE", layout="tree2", nCharNodes=0, intStyle="multi" )
There may be an easier way to do this (and ignoring if it is sensible to do it) - one way you can do this is by removing nodes from the object prior to plotting.
Using the Mplus example from your question Rotate Edges in semPaths/qgraph
library(qgraph)
library(semPlot)
library(MplusAutomation)
# This downloads an output file from Mplus examples
download.file("http://www.statmodel.com/usersguide/chap5/ex5.8.out",
outfile <- tempfile(fileext = ".out"))
# Unadjusted plot
s <- semPaths(outfile, intercepts = FALSE)
In the above call to semPaths, outfile is of class character, so the line (near the start of code for semPaths)
if (!"semPlotModel" %in% class(object))
object <- do.call(semPlotModel, c(list(object), modelOpts))
returns the object from semPlot:::semPlotModel.mplus.model(outfile). This is of class "semPlotModel".
So the idea is to create this object first, amend it and then pass this object to semPaths.
# Call semPlotModel on your Mplus file
obj <- semPlot:::semPlotModel.mplus.model(outfile)
# obj <- do.call(semPlotModel, list(outfile)) # this is more general / not just for Mplus
# Remove one factor (F1) from object#Pars - need to check lhs and rhs columns
idx <- apply(obj#Pars[c("lhs", "rhs")], 1, function(i) any(grepl("F1", i)))
obj#Pars <- obj#Pars[!idx, ]
class(obj)
obj is now of class "semPlotModel" and can be passed directly to semPaths
s <- semPaths(obj, intercepts = FALSE)
You can use str(s) to see the structure of this returned object.
Assuming that you use the following sempath code to print your SEM
semPaths(obj, intercepts = FALSE)%>%
plot()
you can use the following function to remove any node by its label:
remove_nodes_and_edges <- function(semPaths_obj,node_tbrm_vec){
relevent_nodes_index <- semPaths_obj$graphAttributes$Nodes$names %in% node_tbrm_vec
semPaths_obj$graphAttributes$Nodes$width[relevent_nodes_index]=0
semPaths_obj$graphAttributes$Nodes$height[relevent_nodes_index]=0
semPaths_obj$graphAttributes$Nodes$labels[relevent_nodes_index]=""
return(semPaths_obj)
}
and use this function in the plotting pipe in the following way
semPaths(obj, intercepts = FALSE) %>%
remove_nodes_and_edges(c("Y1","Y2","Y3")) %>%
plot()

r - Add text to each lattice histogram with panel.text but has error "object x is missing"

In the following R code, I try to create 30 histograms for the variable allowed.clean by the factor zip_cpt(which has 30 levels).
For each of these histograms, I also want to add mean and sample size--they need to be calculated for each level of the factor zip_cpt. So I used panel.text to do this.
After I run this code, I had error message inside each histogram which reads "Error using packet 21..."x" is missing, with..." (I am not able to read the whole error message because they don't show up in whole). I guess there's something wrong with the object x. Is it because mean(x) and length(x) don't actually apply to the data at each level of the factor zip_cpt?
I appreciate any help!
histogram(~allowed.clean|zip_cpt,data=cpt.IC_CAB1,
type='density',
nint=100,
breaks=NULL,
layout=c(10,3),
scales= list(y=list(relation="free"),
x=list(relation="free")),
panel=function(x,...) {
mean.values <-mean(x)
sample.n <- length(x)
panel.text(lab=paste("Sample size = ",sample.n))
panel.text(lab=paste("Mean = ",mean.values))
panel.histogram(x,col="pink", ...)
panel.mathdensity(dmath=dnorm, col="black",args=list(mean=mean(x, na.rm = TRUE),sd=sd(x, na.rm = TRUE)), ...)})
A discussion I found online is helpful for adding customized text (e.g., basic statistics) on each of the histograms:
https://stat.ethz.ch/pipermail/r-help/2007-March/126842.html

'x' is a list, but does not have components 'x' and 'y'

i am trying to plot a ROC curve for a multiclass problem, using multiclass.roc function from pROC package, but I get this error:
'x' is a list, but does not have components 'x' and 'y'
What does this error mean cause searching in the web didn't help me to find an answer. I can print the roc object, but can not plot it.
Thank you!
If you call plot on a list l: plot (l), the x coordinates will be taken from l$x and the y coordinates from l$y. Your list doesn't have elements x and y.
You need to call plot (l$your.x.coordinate, l$your.y.coordinate) instead.
Another (lazy) approach is to simply use the useful library
install.packages('useful')
library(useful)
Example -
wineUrl <- 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data'
wine <- read.table(wineUrl, header=F, sep=',')
wine_kmeans <- wine[, which(names(wine) != "Cultivar")]
wine_cluster <- kmeans(x=wine_kmeans , centers=3)
plot(wine_cluster, data=wine_kmeans)

Resources