(list) object cannot be coerced in clogitLasso - r

I have a problem with the package clogitLasso where I continually get the error "(list) object cannot be coerced to type 'double'"
I've done plenty of searching on this, and there are plenty of ways to pre-convert the data to solve this problem, but no matter what I do it keeps coming up.
I'm not sure what I'm doing wrong here - I can generate data structured exactly like this within R and it runs with the same syntax without any problems, but when I read it in like this it doesn't work.
Using the data (trimmed, but gives the same error): https://pastebin.com/WfB1LJQ2
And the code:
library(clogitLasso)
#Read in data
data <- read.csv('data.txt',sep="\t")
#Data must be sorted so that the
#binary=1 option comes FIRST within the strata
datasorted <- data[order(data$groupid,-data$binary),]
#Convert from a data frame to numericals
X <- as.matrix(datasorted[,1:4])
y <- as.numeric(datasorted[,5])
group <- as.numeric(datasorted[,6])
results <- clogitLasso(X,y,group)
This gives the same error every time. Any tips would be greatly appreciated!

The object y must be of class matrix. Here is the modified code:
library(clogitLasso)
data <- read.csv('WfB1LJQ2.txt',sep="\t", header=T)
datasorted <- data[order(data$groupid,-data$binary),]
X <- as.matrix(datasorted[,1:4])
y <- as.matrix(datasorted[,5])
group <- as.numeric(datasorted[,6])
results <- clogitLasso(X,y,group)
plot(results)

Related

R - Error in colMeans(wind.speed, na.rm = T) : 'x' must be numeric

I am trying to importa single column of a text file data set where each file is a single day of data. I want to take the mean of each day's wind speed. Here is the code I have written for that:
daily.wind.speed <- c()
file.names <- dir("C:\\Users\\User Name\\Desktop\\R project\\Sightings Data\\Weather Data", pattern =".txt")
for(i in 1:length(file.names))
{
##import data file into data frame
weather.data <-read.delim(file.names[i])
## extract wind speed column
wind.speed <- weather.data[3]
##Attempt to fix numeric error
##wind.speed.num <- as.numeric(wind.speed)
##Take the column mean of wind speed
daily.avg <- colMeans(wind.speed,na.rm=T)
##Add daily average to list
daily.wind.speed <- c(daily.wind.speed,daily.avg)
##Print for troubleshooting and progress
print(daily.wind.speed)
}
This code seems to work on some files in my data set, but others give me this error during this section of the code:
> daily.avg <- colMeans(wind.speed,na.rm=T)
Error in colMeans(wind.speed, na.rm = T) : 'x' must be numeric
I am also having trouble converting these values to numeric and am looking for options to either convert my data to numeric, or to possibly take the mean in a different way that dosen't encounter this issue.
> as.numeric(wind.speed.df)
Error: (list) object cannot be coerced to type 'double'
weather.data Example
Even though this is not a reproducible example the problem is that you are applying a matrix function to a vector so it won't work. Just change the colMeans for mean

Creating a compartive object in R from two dataframes for comparitive phylogenetics

I'm trying to read in two dataframes into a comparitive object so I can plot them using pgls.
I'm not sure what the error being returned means, and how to go about getting rid of it.
My code:
library(ape)
library(geiger)
library(caper)
taxatree <- read.nexus("taxonomyforzeldospecies.nex")
LWEVIYRcombodata <- read.csv("LWEVIYR.csv")
LWEVIYRcombodataPGLS <-data.frame(LWEVIYRcombodata$Sum.of.percentage,OGT=LWEVIYRcombodata$OGT, Species=LWEVIYRcombodata$Species)
comp.dat <- comparative.data(taxatree, LWEVIYRcombodataPGLS, "Species")
Returns error:
> comp.dat <- comparative.data(taxatree, LWEVIYRcombodataPGLS, 'Species')
Error in if (tabulate(phy$edge[, 1])[ntips + 1] > 2) FALSE else TRUE :
missing value where TRUE/FALSE needed
This might come from your data set and your phylogeny having some discrepancies that comparative.data struggles to handle (by the look of the error message).
You can try cleaning both the data set and the tree using dispRity::clean.data:
library(dispRity)
## Reading the data
taxatree <- read.nexus("taxonomyforzeldospecies.nex")
LWEVIYRcombodata <- read.csv("LWEVIYR.csv")
LWEVIYRcombodataPGLS <- data.frame(LWEVIYRcombodata$Sum.of.percentage,OGT=LWEVIYRcombodata$OGT, Species=LWEVIYRcombodata$Species)
## Cleaning the data
cleaned_data <- clean.data(LWEVIYRcombodataPGLS, taxatree)
## Preparing the comparative data object
comp.dat <- comparative.data(cleaned_data$tree, cleaned_data$data, "Species")
However, as #MrFlick suggests, it's hard to know if that solves the problem without a reproducible example.
The error here is that I was using a nexus file, although ?comparitive.data does not specify which phylo objects it should use, newick trees seem to work fine, whereas nexus files do not.

Using a function in R to scrape website, returning "subscript out of bounds" error

I am trying to scrape player data from the Baseball Reference website, using a function to loop through multiple years (variable "year") for each player notated by "playerid."
library(plyr)
library(XML)
fetch_stats <- function(playerid, year) {
url <- paste0("http://www.baseball-reference.com/players/gl.cgi?id=",playerid,"&t=b&year=",year)
data <- readHTMLTable(url, stringsAsFactors = FALSE)
data <- data[[3]]
data$Year <- year
data$PlayerId <- playerid
data
}
This function works perfectly well when it is applied to a single year's worth of data, as seen here:
AdrianGonzales <- ldply("gonzaad01", fetch_stats, year= 2008, .progress="text")
However, as soon as I actually use the function to loop through the multiple years in a players career, it always spits out the following error:
AdrianGonzales <- ldply("gonzaad01", fetch_stats, year= 2009:2004, .progress="text")
Error in data[[3]] : subscript out of bounds
In addition: Warning message:
XML content does not seem to be XML: 'http://www.baseball- reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2009
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2008
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2007
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2006
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2005
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2004'
From what I have been able to find, the "subscript out of bounds" error happens when you exceed the limits of a defined dataset within R. For this particular function, I may just be dumb, but I don't see how that would apply in this case- or why it would work for a single year, but not for several at a time.
I'm open to any and all suggestions. Thanks ahead of time.
You could just use lapply as in the following way below. I put in a minor fix to fetch_stats as it seems that the 6th column returned has no name. You can do what you like with it, as it is just to show how you can use lapply instead.
library(plyr)
library(XML)
# Minor change made to get function working (naming column 6)
fetch_stats <- function(playerid, year) {
url <- paste0("http://www.baseball-reference.com/players/gl.cgi?id=",playerid,"&t=b&year=",year)
data <- readHTMLTable(url, stringsAsFactors = FALSE)
data <- data[[3]]
data$Year <- year
data$PlayerId <- played
### Column six name is empty.
names(data)[6] <- 'EMPTY'
data
}
res <- lapply(2009:2004, function(x) fetch_stats("gonzaad01", x))
resdf <- ldply(res)
This will create a list of 6 elements, one for each year, then convert the list to a data.frame
The way ldapply is applied in your code, it is not giving it one year at a time, it is giving the entire vector of years all at once.
EDIT
After looking a little closer, here is a solution using ldply
new_res <- ldply(.data = 2009:2004,
.fun = function(x) fetch_stats("gonzaad01", x),
.progress="text")
This gave me the same results as the other method above.

R language, problems with SpatialPixelsDataFrame

The following two scripts will generate a "SpatialPixelDataFrame" object:
# FIRST
library(rgdal)
elev.grid <- readGDAL("whatever.asc")
elev.grid <- as(elev.grid, "SpatialPixelsDataFrame")
# SECOND
library(raster)
library(SDMTools)
library(adehabitat)
elev.grid <- raster("whatever.asc")
elev.grid.asc <- asc.from.raster(elev.grid)
elev.grid.SPDF <- asc2spixdf(elev.grid.asc)
HOWEVER, the first one excedes the capability of my computing resources when applying it to big (15000 x 16000) grids, and the second one generates an object which I can't use for some of my further analyses. For example, when I use it for krige purposes
x <- krige(V3~var, points, elev.grid)
I get the following:
Error in model.frame.default(terms(formula), as(data, "data.frame"), na.action = na.fail) : invalid type (closure) for variable 'var'
I will be really thankful if somebody is kind enough to tell me how to fix it, whether providing me a trick to bypass the memory/capability issue in the first case (preferably), or fixing the error generated by the second case.
THANKS A LOT IN ADVANCE!!!
perep

Adding a variable to a ts object in R

I have an object that I have created using the as.ts function in R, and now I would like a simple way to transform one of the variables and add it to the same ts object. So, for example
tsMloa <- ts(read.dta("http://www.stata-press.com/data/r12/mloa.dta"), frequency=12, start=1959)
tsMloa[, "meanLog"] <- tsMloa[,"log"] - mean(tsMloa[,"log"])
gives me a subscript out of bounds error. How can I get around this?
Firstly, you ought to consider adding require(foreign) to your example code, as it's necessary to run your code.
I don't know anything about *.dta files or their formatting, but i can tell you that if you'd like to work with time series in R, you'd do well to look into the zoo and xts family of functions.
With that in mind, try the following:
require(xts)
require(foreign)
tsMloa <- ts(read.dta("http://www.stata-press.com/data/r12/mloa.dta"), frequency=12, start=1959)
tt <- seq(as.Date("1959-01-01"), as.Date("1990-12-01"), by='mon')
tsMloa_x <- xts(unclass(tsMloa)[,1:3], order.by=tt)
tsMloa_x$meanLog <- tsMloa_x$log - mean(tsMloa_x$log)
That should do what you are looking for -- and it gives you a reason to look into the very good packages.
Doing it with zoo -- plus i've created a function to turn your integers into months.
require(foreign)
require(zoo)
Mloa <- read.dta("http://www.stata-press.com/data/r12/mloa.dta"), frequency=12, start=1959)
intToMonth <- function(intMonth, origin = "1960-01-01"){
dd <- as.POSIXlt(origin)
ddVec <- rep(dd, length(intMonth))
ddVec$mon <- ddVec$mon + intMonth%%12
ddVec$year <- ddVec$year + intMonth%/%12
ddRet <- as.Date(ddVec)
return(ddRet)
}
dateString <- intToMonth(Mloa[, 'tm'])
zMloa <- zoo(Mloa[, -2], dateString)
zMloa$meanLog <- zMloa$log - mean(zMloa$log)
As i see it, your problem is with converting the timestamps in the source file to something R understands and can work with. I found this part of adapting to R especially tricky.
The above function will take your month-integers, and turn them into a Date object. The resultant output will work with both zoo and xts as the order.by argument.
If you need to change the origin date, just supply the second argument to the function -- i.e. otherDateString <- intToMonth(timeInts, "2011-01-01").

Resources