Error message: 'x' and 'y' must have the same length - r

I keep getting the following error message in R whilst trying to run a simple correlation. Can anyone help?
Error message is:
Error in cor.test.default(my_data$Year, my_data$Total, method =
"spearman") : 'x' and 'y' must have the same length
this is the code I am using:
library("dplyr")
library ("ggpubr")
library("devtools")
my_data<- read.csv(file.choose())
set.seed(1234)
dplyr::sample_n(my_data, 10)
ggdensity(my_data$Total,
main = "Density plot of barrier closures",
xlab = "Year ending")
ggqqplot(my_data$Total)
shapiro.test(my_data$Total)
cor.test(my_data$Year, my_data$Total, method = "spearman")
The data I am using has two columns in a CSV file, one is labelled "year" one is labelled "total". Both columns have 39 numeric entries so the lengths of the columns is identical. Every other part of the code works fine. I am using the latest version of R and latest version of all the packages
Edit: Someone asked for my data frame so here it is:
structure(list(ï..Year = 83:121, Total = c(1L, 0L, 0L, 1L, 1L,
0L, 1L, 4L, 2L, 0L, 4L, 7L, 4L, 4L, 1L, 1L, 2L, 6L, 24L, 4L,
20L, 1L, 4L, 3L, 8L, 6L, 5L, 5L, 0L, 0L, 5L, 50L, 1L, 1L, 2L,
3L, 2L, 9L, 6L)), class = "data.frame", row.names = c(NA, -39L
))

As user2554330 rightly stated: You'll get that error if you misspecify one of the column names. As can be seen from the output of dput(my_data), the first column's name is not Year, but ï..Year. The given error does not occur with
cor.test(my_data$ï..Year, my_data$Total, method = "spearman")
(You may be able to remove the merging of this byte order mark with the column name by adding the argument fileEncoding="UTF-8-BOM" in the read.csv() call.)

Related

How to specify upper and lower parameter bounds in nlmer models of lme4 using the bobyqa optimizer

I have a dataset to which I want to fit a nonlinear model with random effects. The dataset involves different lines being observed along time. The total number of lines were split up into batches that were executed on different times in the year. When using nlmer(), I ran into issues on how to specify boundaries of parameters when using the bobyqa optimizer.
A simple version of my dataset is as follows:
batch<-c(rep("A",29),rep("B",10),rep("C",10))
line<-c(rep(1:3,9), 1,3,rep(4:5,5),rep(6:7,5))
day<-c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L,
9L, 9L)
result<-c(-2.5336544395728, -2.69179853934892, -2.85649494251061, -4.08634506491338,
-3.57079698958629, -2.62038994824068, -2.69029745619165, -2.18131299587959,
-2.1028751114459, -2.56553968316024, -2.55450633017557, -2.43072209061048,
-2.42496349148255, -2.52850292795008, -1.09958807849945, -1.49448455383069,
-0.461525929110392, -0.298569396331159, -0.520372425046126, -0.676393841451061,
-0.930448741799686, -0.414050789171074, -0.0915696466880981,
-0.239509444743891, -0.319036274966057, -0.189385981406834, -0.376015368786241,
-0.269728570922294, -0.260869642513491, -0.206260420960064, -0.790169432375232,
-0.0573210164472325, -0.202013642441365, -0.0853200223702248,
-0.13422881481997, 0.0831839881028635, -0.0288333371533044, 0.124233139837959,
-0.16906818823674, -0.299957519185662, -0.085547531863026, 0.00929447912629702,
-0.117359669415726, -0.0764263122102468, -0.00718772329252618,
0.0110076995240481, -0.0304444368953004, 0.0586926009563272,
-0.0457574905606751)
data <- data.frame(day, line, batch, result)
data$line<-as.factor(data$line)
data$batch<-as.factor(data$batch)
The nlmer() function of lme4 allows for complex random effects to be specified. I use bobyqa as optimizer, to avoid convergence issues:
#defining the function needed for nlmer()
nform <- ~ z-(p0*(z-Za))/(p0+(1-p0)*(1/(1+s))^day)
nfun <- deriv(nform, namevec = c("z","p0","Za","s"),
function.arg = c("day", "z","p0","Za","s"))
nlmerfit = nlmer(log10perfract ~ nfun(day, z, p0, Za, s) ~
(z+s+Za|batch),
data = data,
start= coef(nlsfit),
control= nlmerControl(optimizer = "bobyqa")
However, specifying upper and lower limits does not work (with nlme or nls, no issues whatsoever) :
Error in nlmerControl(optimizer = "bobyqa", lower = lower_bounds,
upper = upper_bounds) : unused arguments (lower = lower_bounds,
upper = upper_bounds)
When specifying these bounds in an optCtrl argument as a list, R returns that my starting values violate the bounds (which they do not?):
nlmerfit = nlmer(log10perfract ~ nfun(day, z, p0, Za, s) ~
(z+s+Za|batch),
data = data,
start= coef(nlsfit),
control= nlmerControl(optimizer = "bobyqa",
optCtrl = list(lower = lower_bounds,
upper = upper_bounds)
)
)
Error in (function (par, fn, lower = -Inf, upper = Inf, control =
list(), : Starting values violate bounds
I need these bounds to be working as my real data is even a bit more complex (containing different groups of data for which the bounds are needed to allow a fit).

Rotating y axis labels with mosaic plots WITHOUT overlap

This question is extremely similar to this one yet from another point of view which has not been responded.
Following the proposed code, I am able to generate mosaic plots and rotate the labels so that they are legible. The problem comes when (it seems) the mosaic() function from vcd package does not recognise the rotation and so it does not adapt the graph to fit the labels, yielding results like the following:
Is there any way to change the margins between the labels and the titles? I would be surprised if I am the first one that has encountered this issue. I am open to using other packages to get mosaic graphs if applicable as well.
Code
aux = structure(c(0L, 0L, 3L, 46L, 107L, 14L, 0L, 0L, 4L, 0L, 0L, 2L,
9L, 0L, 23L, 2L, 1L, 3L, 14L, 1L, 8L, 26L, 6L, 11L, 6L, 1L, 6L,
0L, 1L, 1L, 29L, 10L, 62L, 1L, 3L, 1L, 1L, 3L, 1L), .Dim = c(3L,
13L), .Dimnames = list(abcdefghi = c("Madrid", "Valencia", "Granada"
), jklmnopqr = c("roknbjftxcwl", "mfchldbxuyig", "gtyoxeduijpw",
"akbcefymvsiw", "ucbfxplietqk", "mzeykauprfdh", "piermgawyjht",
"chjvatqbylxo", "merhcogjflbd", "wiyrugvmhjlq", "glszdqmjhkov",
"giowaxrtsknm", "pxucytzvljqw")), class = "table")
library(vcd)
colours = c("brown","darkgreen","darkgrey","orange","darkred","gold","blue","red",
"white","pink","purple","navy","lightblue","green","peachpuff","violet","yellow","yellow4")
aux_names = names(attr(aux,"dimnames"))
mosaic(aux,main=paste(aux_names,collapse=" vs. "),
gp=gpar(fill=matrix(sample(colours,max(nrow(aux),ncol(aux))),1,max(nrow(aux),ncol(aux)))),
pop = FALSE,labeling = labeling_border(rot_labels=c(90,0,0,0),
just_labels=c("left","right")))
This code should do what i think you're after.
mosaic(aux,main=paste(aux_names,collapse=" vs. "),
gp=gpar(fill=matrix(sample(colours,max(nrow(aux),ncol(aux))),1,max(nrow(aux),ncol(aux)))),
pop = FALSE,labeling = labeling_border(rot_labels=c(90,0,0,0),
just_labels=c("left","right"),
offset_varnames = c(8,8,8,8)),
margins = c(10, 10, 10, 10))

Plotting multiple effect plots from logistic regression

I have a number of logistic regression models with different response variables but the same predictor variables. I want to use grid.arrange (or anything else) to make a single figure with all these effect plots that were made with the effects package. I followed the advice here to make such a graph: grid.arrange with John Fox's effects plots
library(effects)
library(gridExtra)
data <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L,1L, 1L, 2L, 2L, 2L), .Label = c("group1", "group2"), class = "factor"),obs = c(1L, 1L, 4L, 4L, 6L, 12L, 26L, 1L, 10L, 6L),responseA = c(1L, 1L, 2L, 0L, 1L, 10L, 20L, 0L, 3L, 2L), responseB = c(0L, 0L, 2L, 4L, 6L, 4L, 8L, 1L, 8L, 5L)), .Names = c("group", "obs", "responseA","responseB"), row.names = c(53L, 54L, 55L, 56L, 57L, 58L,59L, 115L, 116L, 117L), class = "data.frame")
model1<-glm(cbind(responseA,(obs-responseA))~group,family=binomial, data=data)
model2<-glm(cbind(responseA,(obs-responseA))~group,family=binomial, data=data)
ef1 <-allEffects(model1)[[1]]
ef2 <- allEffects(model2)[[1]]
elist <- list( ef1,ef2)
class(elist) <- "efflist"
plot(elist, col=2)
The problem is that, in the models I am using the response variable in the model in the form cbind(response A,no response A), but for the figure I would like to change it to something more clean (like Response A). I tried changing the y labels by putting a list, but got a warning, and it turned both labels into "Response A".
plot(elist, ylab=c("response A","response B"),col=2)
Then tried the second method suggestion to change the class to trellis, got an error, so grid.arrange didn’t work either.
p1<-plot(allEffects(model1),ylab="Response A")
p2<-plot(allEffects(model2),ylab="Response B")
class(p1) <- class(p2) <- "trellis"
grid.arrange(p1, p2, ncol=2)
Can anyone provide a method to change each y-axis label separately?
With the ef1 and ef2 variables you created, you can try the following
plot1 <- plot(ef1, ylab = "Response A")
plot2 <- plot(ef2, ylab = "Response B")
grid.arrange(plot1, plot2, ncol=2)

Data not ordering by date using as.Date in R

I have a data set with a date column like this:
dateCol other column
"2013/11/12" some data
"2012/05/02" more data
"2013/09/22" etc
"" etc
"2013/09/17" etc
When I try to order the data frame by this column(dateCOl) by date, it just does nothing, I tried several codes, my last code was:
mydata<-mydata[with(mydata, order(as.Date(mydata[,dateCol], format="%y/%m/%d"))),]
But is not working, any ideas?
Thanks in advance!
You need to provide the right format of the dates for the conversion to succeed. In this case you need "%Y" with a capital Y for data with years including the centuries.
Try
sort(as.Date(mydata[,"dateCol"], format="%Y/%m/%d"))
#[1] "2011-07-13" "2011-08-21" "2012-05-02" "2012-07-02" "2012-07-17" "2013-01-29"
#[7] "2013-08-19" "2013-09-17" "2013-09-22" "2013-11-12" "2014-04-02"
data
mydata <-structure(list(dateCol = structure(c(1L, 11L, 4L, 10L, 1L, 1L,
1L, 1L, 1L, 9L, 6L, 5L, 12L, 1L, 1L, 8L, 1L, 7L, 3L, 2L),
.Label = c("", "2011/07/13", "2011/08/21", "2012/05/02",
"2012/07/02", "2012/07/17", "2013/01/29", "2013/08/19",
"2013/09/17", "2013/09/22", "2013/11/12", "2014/04/02"),
class = "factor")), .Names = "dateCol",
row.names = (NA, -20L), class = "data.frame")

How to add multiple data series to a scatterplot and how to format numbers to appear in standard form on y axis

My data set:
structure(list(Site = c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L, 6L), Average.worm.weight..g. = c(0.1934,
0.249, 0.263, 0.262, 0.4186, 0.204, 0.311, 0.481, 0.326, 0.657,
0.347, 0.311, 0.239, 0.4156, 0.31, 0.3136, 0.4033, 0.302, 0.277
), Average.total.immune.cell.count = structure(c(8L, 16L, 11L,
12L, 10L, 1L, 4L, 15L, 4L, 3L, 17L, 13L, 18L, 7L, 5L, 6L, 9L,
14L, 2L), .Label = c("0", "168750", "18650000", "200,000", "21,600,000",
"226666.6", "22683333.33", "2533333.33", "283333.333", "291666.6",
"335833.3", "435800", "474816666.7", "500000", "6450000", "729166.667",
"7433333.3", "9916667"), class = "factor"), Average.eleocyte.number = structure(c(2L,
5L, 14L, 10L, 1L, 1L, 6L, 1L, 6L, 7L, 1L, 9L, 15L, 8L, 12L, 3L,
11L, 13L, 4L), .Label = c("0", "1266666.67", "153333.3", "168740",
"17", "200,000", "2266666.667", "22683333.33", "23116666.67",
"264000", "283333.333", "442", "500000", "7.3", "9916667"), class = "factor")), .Names = c("Site",
"Average.worm.weight..g.", "Average.total.immune.cell.count",
"Average.eleocyte.number"), class = "data.frame", row.names = c(NA,
-19L))
This is my R script so far:
Plotting multiple data series on a graph
y1<-dframe1$"Average.total.immune.cell.count"
y2<-dframe1$"Average.eleocyte.number"
x<-dframe1$"Average.worm.weight..g."
plot.default(y1~x,type="p" )
points(y2~x)
I am trying to add to y series to the same scatterplot and I am struggling to do so, I want to have different symbols for the points so as to tell apart the two different data series. Also I would like the axes to meet on the bottom left hand side and would appreciate being informed as to how I can do that? I would also like the y axis to be in standard form, but do not know how to get R to do that.
Best regards.
K.
So this is an object lesson is getting your data in the correct format to begin with. Your numbers have commas, which R does not like. Hence the numbers get converted to character and imported as factors (which your structure(...) clearly shows. You need to fix that, or better yet get rid of the commas prior to exporting.
Something like this will work
colnames(dframe) <- c("Site","x","y1","y2")
dframe$y1 <- as.numeric(as.character(gsub(",","",dframe$y1,fixed=TRUE)))
dframe$y2 <- as.numeric(as.character(gsub(",","",dframe$y2,fixed=TRUE)))
plot(y1~x,dframe, col="red", pch=20)
points(y2~x,dframe, col="blue", pch=20)
But there are additional problems. One of the numbers (in row 12) is a factor of 10 larger than all the others, so the plot above is not very informative. It's hard to know if this is a data input error, or a genuine outlier in your data.
EDIT: Response to OP's comment
dframe <- dframe[-12,] # remove row 12
dframe <- dframe[order(dframe$x),] # order by increasing x
plot(y1~x,dframe, col="red", pch=20, type="b")
points(y2~x,dframe, col="blue", pch=20, type="b")
legend("topleft",legend=c("y1","y2"),col=c("red","blue"),pch=20)

Resources