How to convert matrix with tick data into xts? - r

I have the data which i am trying to convert into xts format:
> dput(data)
structure(list(50370788L, 50370777L, 50370694L, 50370620L, 50370504L,
620639L, 620639L, 592639L, 592639L, 592639L, "2015-10-24",
"2015-10-24", "2015-09-04", "2015-09-04", "2015-09-04", structure(list(
id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
structure(list(id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
structure(list(id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
structure(list(id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
structure(list(id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
"$GBSN Still sticking with my prediction of FDA coming sometime in March..",
"$GBSN Last time I check NASDAQ gave them till sometime in April to get it together or else they'll see pink. Correct me if in wrong?",
"$GBSN time for retailers to get knocked out of the ring with a 25 to 30 % gain",
"$GBSN market cap will end up around 65 million not enough to comply rs takes it to 21 dollars pps 26$ by august",
"$GBSN shorts are going to attack the sell off"), .Dim = c(5L,
5L), .Dimnames = list(c("2016-02-28 16:59:53", "2016-02-28 16:58:58",
"2016-02-28 16:51:36", "2016-02-28 16:46:09", "2016-02-28 16:34:34"
), c("GBSN.Message_ID", "GBSN.User_ID", "GBSN.User_Join_Date",
"GBSN.Message_Symbols", "GBSN.Message_Body")))
I have been trying to use :
Message_series <- xts(zoo(data, format='%Y-%m-%d %H:%M:%S'))
i get this error:
Error in zoo(data, format = "%Y-%m-%d %H:%M:%S") :
unused argument (format = "%Y-%m-%d %H:%M:%S")

Your matrix is not tidy. Look at the fourth column (data[,4]). zoo and hence xts not support so complicated object only simple matrix with all element in the same type.
First and second columns are OK. They inherited the list properties so the conversion is not so straightforward.
data.mat <- matrix(as.numeric(data[,1:2]), ncol = 2)
colnames(data.mat) <- colnames(data)[1:2]
xts(data.mat, order.by = as.POSIXct(rownames(data)))
Join data can be converted and included:
data.mat <- cbind(data.mat, as.numeric(as.Date(as.character(data[,3]))))
colnames(data.mat) <- colnames(data)[1:3]
data.xts <- xts(data.mat, order.by = as.POSIXct(rownames(data)))
and transformable back:
as.Date(coredata(data.xts['2016-02-28 16:59:53',3]))
You can code variables id, symbol, title from Message_Symbols too in the same way.
I recommend you store Message_Body in a separate object (e.g. data.frame).

Based on the column names of data, it appears that all of your data is or could be of character type. However, data[,4], GBSN.Message_Symbols, contains lists, not an atomic vector so we'll have to flatten using rbind. apply is then used to convert each column to a character vector and combine to form a character matrix. The xts object is formed by converting the rownames to POSIX date/time types and using them as the index. Code would look like
# flatten list data in column 4 to a data frame
mat4 <- do.call(rbind, data[,4])
# convert all data to character type
data.mat <- apply(cbind(data[,-4], mat4), 2, as.character)
# create xts time series
data.xts <- xts(data.mat, order.by = as.POSIXct(rownames(data)))

Related

How to select one value of a data.frame within a list column with R?

I have a data.frame that contains a type column. The list contains a 1x3 data.frame. I only want one value from this list. Thus will flatten my data.frame so I can write out a csv.
How do I select one item from the nested data.frame (see the 2nd column)?
Here's the nested col. I'd provide the data but cannot flatten to write_csv.
result of dput:
structure(list(id = c("1386707", "1386700", "1386462", "1386340",
"1386246", "1386300"), fields.created = c("2020-05-07T02:09:27.000-0700",
"2020-05-07T01:20:11.000-0700", "2020-05-06T21:38:14.000-0700",
"2020-05-06T07:19:44.000-0700", "2020-05-06T06:11:43.000-0700",
"2020-05-06T02:26:44.000-0700"), fields.customfield_10303 = c(NA,
NA, 3, 3, NA, NA), fields.customfield_28100 = list(NULL, structure(list(
self = ".../rest/api/2/customFieldOption/76412",
value = "New Feature", id = "76412"), .Names = c("self",
"value", "id"), class = "data.frame", row.names = 1L), structure(list(
self = ".../rest/api/2/customFieldOption/76414",
value = "Technical Debt", id = "76414"), .Names = c("self",
"value", "id"), class = "data.frame", row.names = 1L), NULL,
structure(list(self = ".../rest/api/2/customFieldOption/76411",
value = "Maintenance", id = "76411"), .Names = c("self",
"value", "id"), class = "data.frame", row.names = 1L), structure(list(
self = ".../rest/api/2/customFieldOption/76412",
value = "New Feature", id = "76412"), .Names = c("self",
"value", "id"), class = "data.frame", row.names = 1L))), row.names = c(NA,
6L), class = "data.frame", .Names = c("id", "fields.created",
"fields.customfield_10303", "fields.customfield_28100"))
I found a way to do this.
First, instead of changing the data, I added a column with mutate. Then, directly selected the same column from all nested lists. Then, I converted the list column into a vector. Finally, I cleaned it up by removing the other columns.
It seems to work. I don't know yet how it will handle multiple rows within the nested df.
dat <- sample_dat %>%
mutate(cats = sapply(nested_col, `[[`, 2)) %>%
mutate(categories = sapply(cats, toString)) %>%
select(-nested_col, -cats)
Related
How to directly select the same column from all nested lists within a list?
r-convert list column into character vector where lists are characters
library(dplyr)
library(tidyr)
df <- tibble(Group=c("A","A","B","C","D","D"),
Batman=1:6,
Superman=c("red","blue","orange","red","blue","red"))
nested <- df %>%
nest(data=-Group)
unnested <- nested %>%
unnest(data)
Nesting and unnesting data with tidyr
library(purrr)
nested %>%
mutate(data=map(data,~select(.x,2))) %>%
unnest(data)
select with purrr, but lapply as you've done is fine, it's just for aesthetics ;)

Using literal month names with year in ramcharts

Here is my code to generate barplot using rAmChart,
library(rAmCharts)
amBarplot(x = "month", y = "value", data = dataset,
dataDateFormat = "MM/YYYY", minPeriod = "MM",
show_values = FALSE, labelRotation = -90, depth = 0.1)
However, is there a way to use month names & year in my x axis? I am trying to use MMM-YY formats.
Sample dataset,
structure(list(value = c(11544, 9588, 9411, 10365, 11154, 12688
), month = c("05/2012", "06/2012", "07/2012", "08/2012", "09/2012",
"10/2012")), .Names = c("value", "month"), row.names = c(NA,
6L), class = "data.frame")
Thanks.
It appears that rAmCharts doesn't expose AmCharts' dateFormats setting in the categoryAxis, so you have to access it through the init event and create your own dateFormats array with a modified format string for the MM period. I'm not very experienced with R, but here's how I managed to make it work using R 3.4.2 and rAmCharts 2.1.5
chart <- amBarplot( ... settings omitted ... )
addListener(.Object = chart,
name = 'init',
expression = paste(
"function(e) {",
"e.chart.categoryAxis.dateFormats = ",
'[{"period":"fff","format":"JJ:NN:SS"},{"period":"ss","format":"JJ:NN:SS"},',
'{"period":"mm","format":"JJ:NN"},{"period":"hh","format":"JJ:NN"},{"period":"DD","format":"MMM DD"},',
'{"period":"WW","format":"MMM DD"},',
'{"period":"MM","format":"MMM-YY"},', # "add YY to default MM format
'{"period":"YYYY","format":"YYYY"}]; ',
'e.chart.validateData();',
"}")
)
Here is a different solution:
library(rAmCharts)
dataset <- structure(list(value = c(11544, 9588, 9411, 10365, 11154, 12688
), month = c("05/2012", "06/2012", "07/2012", "08/2012", "09/2012",
"10/2012")), .Names = c("value", "month"), row.names = c(NA,
6L), class = "data.frame")
dataset$month <- as.character(
format(
as.Date(paste0("01/",dataset$month), "%d/%m/%Y"),
"%B %Y"))
amBarplot(x = "month", y = "value", data = dataset,
show_values = FALSE, labelRotation = -90, depth = 0.1)

Automatically split function output (list) into component data.frames

I have a functions which yields 2 dataframes. As functions can only return one object, I combined these dataframes as a list. However, I need to work with both dataframes separately. Is there a way to automatically split the list into the component dataframes, or to write the function in a way that both objects are returned separately?
The function:
install.packages("plyr")
require(plyr)
fun.docmerge <- function(x, y, z, crit, typ, doc = checkmerge) {
mergedat <- paste(deparse(substitute(x)), "+",
deparse(substitute(y)), "=", z)
countdat <- nrow(x)
check_t1 <- data.frame(mergedat, countdat)
z1 <- join(x, y, by = crit, type = typ)
countdat <- nrow(z1)
check_t2 <- data.frame(mergedat, countdat)
doc <- rbind(doc, check_t1, check_t2)
t1<-list()
t1[["checkmerge"]]<-doc
t1[[z]]<-z1
return(t1)
}
This is the call to the function, saving the result list to the new object results.
results <- fun.docmerge(x = df1, y = df2, z = "df3", crit = c("id"), typ = "left")
In the following sample data to replicate the problem:
df1 <- structure(list(id = c("XXX1", "XXX2", "XXX3",
"XXX4"), tr.isincode = c("ISIN1", "ISIN2",
"ISIN3", "ISIN4")), .Names = c("id", "isin"
), row.names = c(NA, 4L), class = "data.frame")
df2 <- structure(list(id= c("XXX1", "XXX5"), wrong= c(1L,
1L)), .Names = c("id", "wrong"), row.names = 1:2, class = "data.frame")
checkmerge <- structure(list(mergedat = structure(integer(0), .Label = character(0), class = "factor"),
countdat = numeric(0)), .Names = c("mergedat", "countdat"
), row.names = integer(0), class = "data.frame")
In the example, a list with the dataframes df3 and checkmerge are returned. I would need both dataframes separately. I know that I could do it via manual assignment (e.g., checkmerge <- results$checkmerge) but I want to eliminate manual changes as much as possible and am therefore looking for an automated way.

Using dygraph with xts-object drops Label in plot

I have a seemingly small challenge, but I can't get to an answer. Here is my minimum working example.
fr_nuke <- structure(list(Date = structure(c(1420070400, 1420074000, 1420077600,
1420081200, 1420084800, 1420088400), class = c("POSIXct", "POSIXt"), tzone = ""),
`61` = c(57945, 57652, 57583, 57551, 57465, 57683),
`3244` = c(72666.64, 73508.78, 69749.17, 67080.13, 66357.65, 66524.13),
`778` = c(2.1133, 2.1133, 2.1133, 2.1133, 2.1133, 2.1133),
fcasted_nuke_temp = c(54064.6099092888, 54064.6099092888, 54064.6099092888,
54064.6099092888, 54064.6099092888, 54064.6099092888),
fcasted_nuke_cons = c(55921.043096775, 56319.5688170977, 54540.4094334057,
53277.340242333, 52935.4411965463, 53014.2244890147)),
.Names = c("Date", "61", "3244", "778", "fcasted_nuke_temp", "fcasted_nuke_cons"),
row.names = c(NA, 6L), class = "data.frame")
series1 <- as.xts(fr_nuke$'61', fr_nuke$Date)
series2 <- as.xts(fr_nuke$fcasted_nuke_temp, fr_nuke$Date)
series3 <- as.xts(fr_nuke$fcasted_nuke_cons, fr_nuke$Date)
grp_input <- cbind(series1,series2,series3)
dygraph(grp_input)
The resulting plot does not show the label of the individual series. Specifying the series with
dygraph(grp_input) %>% dySeries("V1", label = "Label1")
Results in:
Error in dySeries(., "V1", label = "Label1") : One or more of the
specified series were not found. Valid series names are: ..1, ..2, ..3
However, it works if I plot only one series (e.g. series1).
dygraph(series1) %>% dySeries("V1", label = "Label1")
Either set the colnames for the grp_input object, or use merge to construct the column names from the object names.
# setting colnames
require(dygraphs)
require(xts)
grp_input <- cbind(series1, series2, series3)
colnames(grp_input) <- c("V1", "V2", "V3")
dygraph(grp_input) %>% dySeries("V1", label = "Label1")
# using merge
require(dygraphs)
require(xts)
grp_input <- merge(series1, series2, series3)
dygraph(grp_input) %>% dySeries("series1", label = "Label1")

BioConductor IRanges coverage counts and identify segments

I have a dataset with interval information for a bunch of manufacturing circuits
df <- data.frame(structure(list(circuit = structure(c(2L, 1L, 2L, 1L, 2L, 3L,
1L, 1L, 2L), .Label = c("a", "b", "c"), class = "factor"), start = structure(c(1393621200,
1393627920, 1393628400, 1393631520, 1393650300, 1393646400, 1393656000,
1393668000, 1393666200), class = c("POSIXct", "POSIXt"), tzone = ""),
end = structure(c(1393626600, 1393631519, 1393639200, 1393632000,
1393660500, 1393673400, 1393667999, 1393671600, 1393677000
), class = c("POSIXct", "POSIXt"), tzone = ""), id = structure(1:9, .Label = c("1001",
"1002", "1003", "1004", "1005", "1006", "1007", "1008", "1009"
), class = "factor")), .Names = c("circuit", "start", "end",
"id"), class = "data.frame", row.names = c(NA, -9L)))
Circuit: Identifier for circuit
Start: Time the circuit started running
Finish: Time the circuit stopped running
Id: Unique identifier for the row
I'm able to create a new data set that counts the number of overlapping intervals:
ir <- IRanges(start = as.numeric(df$start), end = as.numeric(df$end), names = df$id)
cov <- coverage(ir)
start_time <- as.POSIXlt(start(cov), origin = "1970-01-01")
end_time <- as.POSIXlt(end(cov), origin = "1970-01-01")
seconds <- runLength(cov)
circuits_running <- runValue(cov)
res <- data.frame(start_time,end_time,seconds,circuits_running)[-1,]
But what I really need is something that looks more like this:
sqldf("select
res.start_time,
res.end_time,
res.seconds,
res.circuits_running,
df.circuit,
df.id
from res left join df on (res.start_time between df.start and df.end)")
The problem is that the sqldf way of using an inequality join is unbearably slow on my full dataset.
How can I get something similar using IRanges alone?
I suspect it has something to do with RangedData but I've not been able to see how to get what I want. Here's what I've tried...
rd <- RangedData(ir, circuit = df$circuit, id = df$id)
coverage(rd) # works but seems to lose the circuit/id info
The coverage can be represented as ranges, dropping the first (the range from 1970 to the first start point)
cov <- coverage(ir)
intervals <- ranges(cov)[-1]
Your query is to find the start of the interval of each circuit, so I narrow the interval to their start coordinate and find overlaps (the first argument is the 'query', the second the 'subject')
olaps <- findOverlaps(narrow(intervals, width(intervals)), ir)
The number of circuits running in a particular interval is
tabulate(queryHits(olaps), queryLength(olaps))
and the actual circuits are
df[subjectHits(olaps), c("circuit", "id")]
The pieces can be knit together as, perhaps
df1 <- cbind(uid=seq_along(intervals),
as.data.frame(intervals),
circuits_running=tabulate(queryHits(olaps), queryLength(olaps)))
df2 <- cbind(uid=queryHits(olaps),
df[subjectHits(olaps), c("circuit", "id")])
merge(df1, df2, by="uid", all=TRUE)
Ranges can have associated with them 'metadata' that is accessible and subset in a coordinated way, so the connection between data.frame and ranges does not have to be so loose and ad hoc. I might instead have
ir <- IRanges(start = as.numeric(df$start), end = as.numeric(df$end))
mcols(ir) <- DataFrame(df)
## ...
mcols(ir[subjectHits(olaps)])
perhaps with as.data.frame() when done with IRanges-land.
It's better to ask your questions about IRanges on the Bioconductor mailing list; no subscription required.

Resources