Bland Altman plot from csv file in RScript - r

I have a simple csv file containing 2 columns of numbers with the headers "Colli_On" and "Colli_Off". I have written a simple Rscript which passes 3 arguments - file name and column names - and would like to produce a Bland Altman plot. However I get the following error message
> Error in plot.window(...) : need finite 'xlim' values
Calls: baplot ... do.call -> plot -> plot.default -> localWindow -> plot.window
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Where am I going wrong?
#!/usr/bin/Rscript
# -*- mode: R =*-
#script passes 3 arguments filename and 2 columns and does bland altman analysis
#Example BA /home/moadeep/Data/sehcat.csv Colli_on Colli_off
args <- commandArgs(TRUE)
mydata <- read.csv(file=args[1],head=TRUE,sep="\t")
baplot = function(x,y){
bamean = (x+y)/2
badiff = (y-x)
plot(badiff~bamean, pch=20, xlab="mean", ylab="difference")
# in the following, the deparse(substitute(varname)) is what retrieves the
# name of the argument as data
title(main=paste("Bland-Altman plot of collimator x and y\n",
deparse(substitute(x)), "and", deparse(substitute(y)),
"standardized"), adj=".5")
#construct the reference lines on the fly: no need to save the values in new
# variable names
abline(h = c(mean(badiff), mean(badiff)+1.96 * sd(badiff),
mean(badiff)-1.96 * sd(badiff)), lty=2)
}
pdf(file="test.pdf")
baplot(mydata$args[2],mydata$argss[3])
dev.off()

The problem is with this line:
baplot(mydata$args[2],mydata$argss[3])
Let's not even mention the typo... When you ask for mydata$args[2], R looks for a column named "args" in your data.frame. Obviously, there is no such column so you get NULL. The programmatic way of extracting columns from a data.frame is using [. The correct syntax should be:
baplot(mydata[args[2]],mydata[args[3]])
That should fix your problem.
(Also note that the [ operator, unlike $, will throw an error if you are trying to extract a column that does not exist: a preferable feature IMHO.)

Related

Plotting data in R: Error in plot.window(...) : need finite 'xlim' values

Trying to plot some data in R - I am a basic user and teaching myself. However, whenever I try to plot, it fails, and I am not sure why.
> View(Pokemon_BST)
> Pokemon_BST <- read.csv("~/Documents/Pokemon/Pokemon_BST.csv")
> View(Pokemon_BST)
> plot("Type_ID", "Gender_ID")
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
5: In min(x) : no non-missing arguments to min; returning Inf
6: In max(x) : no non-missing arguments to max; returning -Inf
This is my code, but I thought it might be an issue with my .csv file? I have attributed numbers to the "Type_ID" and "Gender_ID" columns. Type_ID has values between 1-20; Gender_ID has 1 for male, 2 for female, and 3 for both. I should state that both ID columns are just made of numeric values. Nothing more.
I then tried using barplot function. This error occurred:
> barplot("Gender_ID", "Type_ID")
Error in width/2 : non-numeric argument to binary operator
In addition: Warning message:
In mean.default(width) : argument is not numeric or logical: returning NA
There are no missing values, no characters within these columns, nothing that SHOULD cause an error according to my basic knowledge. I am just not sure what is going wrong.
To me it seems as you are giving the plot function the wrong inputs.
For the x and y axis plot expects numeric values and you are only providing a single string. The function does not know that the "Type_ID" and "Gender_ID" come from the Pokemon_BST data frame.
To reach your data you must tell R where the object comes from. You do this by opening square brackets behind the object you want to access and write the names of the objects to be accessed into it.
View(Pokemon_BST)
Pokemon_BST <- read.csv("~/Documents/Pokemon/Pokemon_BST.csv")
# Refer to the object
plot(Pokemon_BST["Type_ID"], Pokemon_BST["Gender_ID"])
# Sould also work now
barplot(Pokemon_BST["Gender_ID"], Pokemon_BST["Type_ID"])
See also here for a introduction on subsetting in R
The problem is how you're passing the values to the plot function. In your code above, "Gender_ID" is just some string and the plot function doesn't know what to do with that. One way to plot your values is to pass the vectors Pokemon_BST$Gender_ID and Pokemon_BST$Type_ID to the function.
Here's a sample dataframe with the plot you were intending.
Pokemon_BST <- data.frame(
Type_ID = sample(1:20, 10, replace = TRUE),
Gender_ID = sample(1:3, 10, replace = TRUE))
plot(Pokemon_BST$Gender_ID, Pokemon_BST$Type_ID)

Will only plot Factors?

Im working in my new Data set and I always start it with
options(StringsAsFactors = FALSE)
The problem im having now, is that R will only plot the Data I set if the strings as factors options is set to TRUE.
Whenever I try to plot with Stringsasfactors = FALSE it will give me the next Error message.
plot(Data$Jobs, Data$RXH)
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
But when I set Stringsasfactors TRUE it plots it without problem...
This is the script.
#Setting WD.
getwd()
setwd("C:/Windows/System32/config/systemprofile/Documents/R proj")
options(stringsAsFactors = F)
get <- read.csv("WorkExcelR.csv", header = TRUE, sep = ",")
Data <- na.omit(get)
And this is Data$Jobs and Data$RXH
> Data$Jobs
[1] "Playstation" "RWC Heineken" "Jagermeister" "RWC Heineken"
[5] "RWC Heineken" "RWC Heineken"
> Data$RXH
[1] 90 90 100 90 90 90
The problem you are illustrating stems from the fact that there is a plot.factor function but no plot.character function. You can see the available plot.-methods by typing:
methods(plot)
This is not particularly well described in the help page for ?plot, but there is a separate help page for ?plot.factor. Functions in R are dispatched on the basis of their arguments: S3 functions on the basis only of the class of their first argument and S4 methods on the basis of their argument signatures. In a sense the plot.factor function elaborates on that strategy, because it then dispatches to different plotting routines based on the second argument's class as well, assuming it is matched by position or named y.
You have a couple of choices: Force the plot method which then needs to be caled using the ::: infix function since plot.factor is not exported or do the coercion yourself or call a more specific plotting type.
graphics:::plot.factor(Data$Jobs, Dat
plot(factor(Data$Jobs), Data$RXH)
boxplot(Data$RXH ~Data$Jobs) # which is the result if x is factor and y is numeric

Screeplot in R with psych package

I have computed a PCA with the principal function in the psych package in R. I would like to build a screeplot from the eigenvalues, but both scree(PCA) and screeplot(PCA) give me errors and no plot. Is there a function within this package that I'm not aware of (I have very, very little R experience)??
NOTE: I've been simply working in the command line.
Error for scree(PCA):
Error in if (nvar != dim(rx)[1]) { : argument is of length zero
Error for screeplot(PCA):
Error in plot.window(xlim, ylim, log = log, ...) :
need finite 'xlim' values
In addition: Warning messages:
1: In min(w.l) : no non-missing arguments to min; returning Inf
2: In max(w.r) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Without data it is hard for us to check this. The error message looks like the data is empty.
Here are some tips for R beginners.
Try get help on scree function. Are you missing a parameter? Type in command line.
help(scree)
Look at your variable PCA
head(PCA) - shows first few rows of your data
str(PCA) - shows structure of the variable. Is it what scree function is expecting?
Do you have missing values or text values in your data? The function may be thrown out by these. You can drop missing data - take a look at complete.cases. is.na() is how you check for NA values (i.e. if I wanted to check for NAs in variable mydata, sum(is.na(mydata)) would tell me how many I have. Drop those rows and see if that gets your scree function working okay.
Take a look at the vignette for the package:
https://cran.r-project.org/web/packages/psych/vignettes/overview.pdf
Hope this gets you on track.
Did you enter a correlation matrix as your input to the scree( ) function?
Using my own data, I was able to generate a scree plot with the following two lines of code:
humor_cor <- cor(humor, use = "pairwise.complete.obs")
scree(humor_cor, factors = FALSE)

How to perform clustering without removing rows where NA is present in R

I have a data which contain some NA value in their elements.
What I want to do is to perform clustering without removing rows
where the NA is present.
I understand that gower distance measure in daisy allow such situation.
But why my code below doesn't work?
I welcome other alternatives than 'daisy'.
# plot heat map with dendogram together.
library("gplots")
library("cluster")
# Arbitrarily assigning NA to some elements
mtcars[2,2] <- "NA"
mtcars[6,7] <- "NA"
mydata <- mtcars
hclustfunc <- function(x) hclust(x, method="complete")
# Initially I wanted to use this but it didn't take NA
#distfunc <- function(x) dist(x,method="euclidean")
# Try using daisy GOWER function
# which suppose to work with NA value
distfunc <- function(x) daisy(x,metric="gower")
d <- distfunc(mydata)
fit <- hclustfunc(d)
# Perform clustering heatmap
heatmap.2(as.matrix(mydata),dendrogram="row",trace="none", margin=c(8,9), hclust=hclustfunc,distfun=distfunc);
The error message I got is this:
Error in which(is.na) : argument to 'which' is not logical
Calls: distfunc.g -> daisy
In addition: Warning messages:
1: In data.matrix(x) : NAs introduced by coercion
2: In data.matrix(x) : NAs introduced by coercion
3: In daisy(x, metric = "gower") :
binary variable(s) 8, 9 treated as interval scaled
Execution halted
At the end of the day, I'd like to perform hierarchical clustering with the NA allowed data.
Update
Converting with as.numeric work with example above.
But why this code failed when read from text file?
library("gplots")
library("cluster")
# This time read from file
mtcars <- read.table("http://dpaste.com/1496666/plain/",na.strings="NA",sep="\t")
# Following suggestion convert to numeric
mydata <- apply( mtcars, 2, as.numeric )
hclustfunc <- function(x) hclust(x, method="complete")
#distfunc <- function(x) dist(x,method="euclidean")
# Try using daisy GOWER function
distfunc <- function(x) daisy(x,metric="gower")
d <- distfunc(mydata)
fit <- hclustfunc(d)
heatmap.2(as.matrix(mydata),dendrogram="row",trace="none", margin=c(8,9), hclust=hclustfunc,distfun=distfunc);
The error I get is this:
Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Error in hclust(x, method = "complete") :
NA/NaN/Inf in foreign function call (arg 11)
Calls: hclustfunc -> hclust
Execution halted
~
The error is due to the presence of non-numeric variables in the data (numbers encoded as strings).
You can convert them to numbers:
mydata <- apply( mtcars, 2, as.numeric )
d <- distfunc(mydata)
Using as.numeric may help in this case, but I do think that the original question points to a bug in the daisy function. Specifically, it has the following code:
if (any(ina <- is.na(type3)))
stop(gettextf("invalid type %s for column numbers %s",
type2[ina], pColl(which(is.na))))
The intended error message is not printed, because which(is.na) is wrong. It should be which(ina).
I guess I should find out where / how to submit this bug now.

plot.window error in loop for R creating basic plots

I apologize for not generating pseudo data for this question, but I think the problems I am facing are basic to most non novice individuals on this site. I am attempting to create a loop that plots a scatterplot of x and y for each value of a z variable.
x=rnorm(n=50)
y=rnorm(n=50)
z<-rep(c(1,2,3,4,5),10)
dataset <-cbind(x,y,z)
Dataset<-as.data.frame(dataset)
attach(Dataset)
jpeg()
z <-Dataset$z[1:5]
for(i in 1:5) {
y<-y[z==i]
x <-x[z==i]
ARMAXpath<-file.path("C:", "Desktop", paste("myplot_", z[i], ".jpg", sep=""))
jpeg(file = ARMAXpath)
TheTitle = paste("Scatter Plots", z[i])
plot.new()
plot.window(xlim=c(0,1), ylim=c(5,10))
plot(y,x)
dev.off()
}
detach(Dataset)
No matter what I do I get the same plot.window error. I ran this code with and without attach. I ran it with and without plot.window. I also moved it in and outside the loop.
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
My question is how do I generate plots of two time series by a third variable in my dataset (i.e. region) write the output to a file folder as I have poorly attempted to do above?
Why not: consolidate all three plot calls into:
plot(y,x, xlim=c(0,1), ylim=c(5,10))
Some alternative code, using #DWin's comment:
x=rnorm(n=50)
y=rnorm(n=50)
z<-rep(c(1,2,3,4,5),10)
Dataset<-data.frame(x=x,y=y,z=z)
my.plot <- function(x,y,z){
ARMAXpath<-file.path("C:", "Desktop", paste0("myplot_", z, ".jpg"))
jpeg(file = ARMAXpath)
plot(y,x, xlim=c(0,1), ylim=c(5,10))
dev.off()
}
by(Dataset, Dataset$z, function(d) my.plot(d$x,d$y,unique(d$z)))

Resources