S4 Clases for GTFS in R.....help subsetting? - r

As part of an assignment in college I am trying to make a small r package that provides some basic statistics and graphics on GTFS feeds.
I am using the files from https://github.com/ondrejivanic/131500/blob/master/gtfs.r.
I have to create a number of S4 classes to as part of the assignment. I have created a separate classes for each GTFS feed file. I am trying to make a list of service id's to produce a graphic for the number of trips on a given day.
Here I define and create and object of the class.
# Create the S4 Class for calendar_dates.txt
# calendar_dates.txt - service_id, date, exception_type
setClass("CalendarDates", representation(service_id = "factor", date = "POSIXct", exception_type = "numeric"))
# create new object of SHAPES from files
calendar_dates <- transform(
read.gtfs.file("calendar_dates.txt", "data"),
date = ymd(date)
)
# create S4 object of routes
calendar_datesS4 <- new("CalendarDates", service_id = calendar_dates$service_id, date = calendar_dates$date, exception_type = calendar_dates$exception_type)
The part I cannot understand is how to perform this subset on an S4 object. The piece below works with a dataframe object:
calendar.dates <- calendar_datesS4
calendar.dates[calendar.dates$date == d & calendar.dates$exception_type == 1, c("service_id")]
[1] "daily_1" "daily_2" "daily_3" "daily_4"
Doing the following results in an error:
calendar.dates[calendar.dates#date == d & calendar.dates#exception_type == 1, c("service_id")]
Error in calendar.dates[dates == d & exceptions == 2, c("service_id")] :
object of type 'S4' is not subsettable
I have not found any questions elsewhere, where a condition must be met for the subset.
I really appreciate any help with this!

Related

How to automate creation of shape files with multipolygons using rgdal

I have coordinate data for given areas in a country - some of the areas are divided into blocks. I can create an individual shape file of multiple blocks using sp::spatial polygons but I want to automate the creation of shape files as there are a lot of areas (83 in the current sheet). Some of these areas have only one block some have multiple wherein the trouble arises.
For an individual area, making the shape file with https://mhallwor.github.io/_pages/basics_SpatialPolygons code works fine
For automating the process I tried to create an if and else function
else {if(block==2){firstSpatialPoly <- sp::SpatialPolygons(list(sp::Polygons(list(poly1),ID = "A"),
sp::Polygons(list(poly2), ID = "B")))}
else {firstSpatialPoly <- sp::SpatialPolygons(list(sp::Polygons(list(poly1),ID = "A"),
sp::Polygons(list(poly2), ID = "B"), sp::Polygons(list(poly3), ID = "C")))}}
but this gives me the following error
Error in if (block == 1) { : the condition has length > 1
I also tried an ifelse function
firstSpatialPoly <- ifelse(block==1,sp::SpatialPolygons(list(firstPoly)), ifelse(block==2,sp::SpatialPolygons(list(sp::Polygons(list(poly1),ID = "A"),sp::Polygons(list(poly2), ID = "B")) ,ifelse(block==3, sp::SpatialPolygons(list(sp::Polygons(list(poly1),ID = "A"), sp::Polygons(list(poly2), ID = "B"), sp::Polygons(list(poly3), ID ="C"))),NA)))
but this also does not work and gives the error
Error in rep(yes, length.out = len) : attempt to replicate an object of type 'S4'
Any help or advice is welcome. Thank you

how to interpolate data within groups in R using seqtime?

I am trying to use seqtime (https://github.com/hallucigenia-sparsa/seqtime) to analyze time-serie microbiome data, as follow:
meta = data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta<- meta[order(meta$day, meta$condition),]
meta.ts<-as.data.frame(t(meta))
otu=matrix(1:390, ncol = 39)
oturar<-rarefyFilter(otu, min=0)
rarotu<-oturar$rar
time<-meta.ts[1,]
interp.otu<-interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
the interpolation returns the following error:
[1] "Processing group a"
[1] "Number of members 13"
intervals
0
12
[1] "Selected interval: 1"
[1] "Length of time series: 13"
[1] "Length of time series after interpolation: 1"
Error in stinepack::stinterp(time.vector, as.numeric(x[i, ]), xout = xout, :
The values of x must strictly increasing
I tried to change method to "hyman", but it returns the error below:
Error in interpolateSub(x = x, time.vector = time.vector, method = method) :
Time points must be provided in chronological order.
I am using R version 3.6.1 and I am a bit new to R.
Please can anyone tell me what I am doing wrong/ how to go around these errors?
Many thanks!
I used quite some time stumbling around trying to figure this out. It all comes down to the data structure of meta and the resulting time variable used as input for the time.vector parameter.
When meta.ts is being converted to a data frame, all strings are automatically converted to factors - this includes day.
To adjust, you can edit your code to the following:
library(seqtime)
meta <- data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta <- meta[order(meta$day, meta$condition),]
meta.ts <- as.data.frame(t(meta), stringsAsFactors = FALSE) # Set stringsAsFactors = FALSE
otu <- matrix(1:390, ncol = 39)
oturar <- rarefyFilter(otu, min=0)
rarotu <- oturar$rar
time <- as.integer(meta.ts[1,]) # Now 'day' is character, so convert to integer
interp.otu <- interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
As a bonus, read this blogpost for information on the stringsAsFactors parameter. Strings automatically being converted to Factors is a common bewilderment.

Selecting features from a feature set using mRMRe package

I am a new user of R and trying to use mRMRe R package (mRMR is one of the good and well known feature selection approaches) to obtain feature subset from a feature set. Please excuse if my question is simple as I really want to know how I can fix an error. Below is the detail.
Suppose, I have a csv file (gene.csv) having feature set of 6 attributes ([G1.1.1.1], [G1.1.1.2], [G1.1.1.3], [G1.1.1.4], [G1.1.1.5], [G1.1.1.6]) and a target class variable [Output] ('1' indicates positive class and '-1' stands for negative class). Here's a sample gene.csv file:
[G1.1.1.1] [G1.1.1.2] [G1.1.1.3] [G1.1.1.4] [G1.1.1.5] [G1.1.1.6] [Output]
11.688312 0.974026 4.87013 7.142857 3.571429 10.064935 -1
12.538226 1.223242 3.669725 6.116208 3.363914 9.174312 1
10.791367 0.719424 6.115108 6.47482 3.597122 10.791367 -1
13.533835 0.37594 6.766917 7.142857 2.631579 10.902256 1
9.737828 2.247191 5.992509 5.992509 2.996255 8.614232 -1
11.864407 0.564972 7.344633 4.519774 3.389831 7.909605 -1
11.931818 0 7.386364 5.113636 3.409091 6.818182 1
16.666667 0.333333 7.333333 4.333333 2 8.333333 -1
I am trying to get best feature subset of 2 attributes (out of above 6 attributes) and wrote following R code.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
f_data <- mRMR.data(data = data.frame(df))
featureData(f_data)
mRMR.ensemble(data = f_data, target_indices = 7,
feature_count = 2, solution_count = 1)
When I run this code, I am getting following error for the statement f_data <- mRMR.data(data = data.frame(df)):
Error in .local(.Object, ...) :
data columns must be either of numeric, ordered factor or Surv type
However, data in each column of the csv file are real number.So, how can I change the R code to fix this problem? Also, I am not sure what should be the value of target_indices in the statement mRMR.ensemble(data = f_data, target_indices = 7,feature_count = 2, solution_count = 1) as my target class variable name is "[Output]" in the gene.csv file.
I will appreciate much if anyone can help me to obtain the best feature subset based on the gene.csv file using mRMRe R package.
I solved the problem by modifying my code as follows.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
df[[7]] <- as.numeric(df[[7]])
f_data <- mRMR.data(data = data.frame(df))
results <- mRMR.classic("mRMRe.Filter", data = f_data, target_indices = 7,
feature_count = 2)
solutions(results)
It worked fine. The output of the code gives the indices of the selected 2 features.
I think it has to do with your Output column which is probably of class integer. You can check that using class(df[[7]]).
To convert it to numeric as required by the warning, just type:
df[[7]] <- as.numeric(df[[7]])
That worked for me.
As for the other question, after reading the documentation, setting target_indices = 7 seems the right choice.

How to choose a specific line in a dataset using R

dataset
And I want to pick the row which the Date is 17/12/2006 and 18/12/2006, the type of Date is character, I use the code:
a<-c('17/12/2006','18/12/2006')
NewTable<-WholeTable[which($Date %in% a)]
The error is "Error in which$Date : object of type 'closure' is not subsettable"
Then I try another code:
WholeTable$Date <- as.character(WholeTable$Date)
NewTable<-subset(WholeTable, Date == "17/12/2006"|Date == "18/12/2006")
It can create a new subset but with 0 rows.
Really confused
May be easier if you provide a minimum dataset, if I understand correctly though, this should work:
# In this example date is a factor variable with 4 levels
Wholetable <- data.frame(date = c("16/12/2006", "17/12/2006", "18/12/2006", "19/12/2006"), a = c(1:4))
Newtable <- subset(Wholetable, date == "17/12/2006" | date == "18/12/2006")

Filter xts objects by common dates

I am stuck with the following code.
For reference the code it is taken from the following website (http://gekkoquant.com/2013/01/21/statistical-arbitrage-trading-a-cointegrated-pair/), I am also compiling the code through R Studio.
library("quantmod")
startDate = as.Date("2013-01-01")
symbolLst<-c("WPL.AX","BHP.AX")
symbolData <- new.env()
getSymbols(symbolLst, env = symbolData, src = "yahoo", from = startDate)
stockPair <- list(
a =coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[1],"\"",sep="")))))
,b = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[2],"\"",sep="")))))
,hedgeRatio = 0.70 ,name=title)
spread <- stockPair$a - stockPair$hedgeRatio*stockPair$b
I am getting the following error.
Error in stockPair$a - stockPair$hedgeRatio * stockPair$b :
non-conformable arrays
The reason these particular series don't match is because "WPL.AX" has an extra value (date:19-05-2014 - the matrix lengths are different) compared to "BHP". How can I solve this issue when loading data?
I have also tested other stock pairs such as "ANZ","WBC" with the source = "google" which produces two of the same length arrays.
> length(stockPair$a)
[1] 360
> length(stockPair$b)
[1] 359
Add code such as this prior to the stockPair computation, to trim each xts set to the intersection of dates:
common_dates <- as.Date(Reduce(intersect, eapply(symbolData, index)))
symbolData <- eapply(symbolData, `[`, i=common_dates)
Your code works fine if you don't convert your xts object to matrix via coredata. Then Ops.xts will ensure that only the rows with the same index will be subtracted. And fortune(106) applies.
fortunes::fortune(106)
# If the answer is parse() you should usually rethink the question.
# -- Thomas Lumley
# R-help (February 2005)
stockPair <- list(
a = Cl(symbolData[[symbolLst[1]]])
,b = Cl(symbolData[[symbolLst[2]]])
,hedgeRatio = 0.70
,name = "title")
spread <- stockPair$a - stockPair$hedgeRatio*stockPair$b
Here's an alternative approach:
# merge stocks into a single xts object
stockPair <- do.call(merge, eapply(symbolData, Cl))
# ensure stockPair columns are in the same order as symbolLst, since
# eapply may loop over the environment in an order you don't expect
stockPair <- stockPair[,pmatch(symbolLst, colnames(stockPair))]
colnames(stockPair) <- c("a","b")
# add hedgeRatio and name as xts attributes
xtsAttributes(stockPair) <- list(hedgeRatio=0.7, name="title")
spread <- stockPair$a - attr(stockPair,'hedgeRatio')*stockPair$b

Resources