passing column names to data.table programmatically - r

I would like to be able to write a function that runs regressions in a data.table by groups and then nicely organizes the results. Here is a sample of what I would like to do:
require(data.table)
dtb = data.table(y=1:10, x=10:1, z=sample(1:10), weights=1:10, thedate=1:2)
models = c("y ~ x", "y ~ z")
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})
#do more stuff with res
I would like to wrap all this into a function since the #doe more stuff might be long. The issue I face is how to pass the various names of things to data.table? For example, how do I pass the column name weights? how do I pass thedate? I envision a prototype that looks like this:
myfun = function(dtb, models, weights, dates)
Let me be clear: passing the formulas to my function is NOT the problem. If the weights I wanted to use and the column name describing the date, thedate were known then my function could simply look like this:
myfun = function(dtb, models) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})
#do more stuff with res
}
However the column names corresponding to thedate and to the weights are unknown in advance. I would like to pass them to my function as so:
#this will not work
myfun = function(dtb, models, w, d) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=w, data=.SD))),by=d]})
#do more stuff with res
}
Thanks

Here is a solution that relies on having the data in long format (which makes more sense to me, in this cas
library(reshape2)
dtlong <- data.table(melt(dtb, measure.var = c('x','z')))
foo <- function(f, d, by, w ){
# get the name of the w argument (weights)
w.char <- deparse(substitute(w))
# convert `list(a,b)` to `c('a','b')`
# obviously, this would have to change depending on how `by` was defined
by <- unlist(lapply(as.list(as.list(match.call())[['by']])[-1], as.character))
# create the call substituting the names as required
.c <- substitute(as.list(coef(lm(f, data = .SD, weights = w), list(w = as.name(w.char)))))
# actually perform the calculations
d[,eval(.c), by = by]
}
foo(f= y~value, d= dtlong, by = list(variable, thedate), w = weights)
variable thedate (Intercept) value
1: x 1 11.000000 -1.00000000
2: x 2 11.000000 -1.00000000
3: z 1 1.009595 0.89019190
4: z 2 7.538462 -0.03846154

one possible solution:
fun = function(dtb, models, w_col_name, date_name) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=eval(parse(text=w_col_name)), data=.SD))),by=eval(parse(text=paste0("list(",date_name,")")))]})
}

Can't you just add (inside that anonymous function call):
f <- as.formula(f)
... as a separate line before the dtb[,as.list(coef(lm(f, ...)? That's the usual way of turning a character element into a formula object.
> res = lapply(models, function(f) {f <- as.formula(f)
dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})
>
> str(res)
List of 2
$ :Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
..$ thedate : int [1:2] 1 2
..$ (Intercept): num [1:2] 11 11
..$ x : num [1:2] -1 -1
..- attr(*, ".internal.selfref")=<externalptr>
$ :Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
..$ thedate : int [1:2] 1 2
..$ (Intercept): num [1:2] 6.27 11.7
..$ z : num [1:2] 0.0633 -0.7995
..- attr(*, ".internal.selfref")=<externalptr>
If you need to build character versions of formulas from component names, just use paste or paste0 and pass to the models character vector. Tested code supplied with receipt of testable examples.

Related

Is there a R function to solve figure out what data type is the proper here?

I want to run a loop with 26 matrices, 13 with numbers (e.g. 1,1,2,2,2,3) and the other 13 with letters simulating diferent parameters (e.g. U1, U2, U3...etc). My problem comes when I want to run them in a loop attaching the first term of each matrix in order to run them simultaneously in the function. The errors that appears is the next:
Errors were caught in checkModelList The model value for U is not
allowed. Check ?MARSS.form Error: Stopped in checkModelList() due to
specification problem(s).
I drop the code and the data structure below:
str(Y)
num [1:43, 1:24] NA NA NA 0.158 -1.172 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:43] "WGR_D_l.s" "WGR_D_l.m" "WGR_D_l.l" "WGR_Sh_l.s" ...
..$ : NULL
str(Z1)
num [1:43, 1] 1 1 1 1 1 1 1 1 1 1 ...
str(U1)
chr [1, 1] "U1"
z = c("Z1_","Z2_",'Z3_', 'Z4_', 'Z5_', "Z6_","Z7_",'Z8_', 'Z9_', 'Z10_','Z11_', 'Z12_', 'Z13_')
u = c("U1_","U2_",'U3_', 'U4_', 'U5_', "U6_","U7_",'U8_', 'U9_', 'U10_','U11_', 'U12_', 'U13_')
Q = c("unconstrained", "diagonal and unequal", "diagonal and equal")
q = c('Qun1','Qdu1','Qde1')
for(g in 1:length(U)){
model01$U = U[g]
for(i in 1:length(Z)){
model01$Z = Z[[i]]
for(j in 1:length(Q)){
model01$Q = Q[j]
print(paste(q[j], sep=""))
m1 = MARSS(Y, model=model01,
control=list(maxit = 5000,trace = -1, conv.test.slope.tol=100),
silent=2, method="kem")
model.name.txt = paste("C:/Users/ubeda/OneDrive/Desktop/Resultados post-TFM/1. RUN_Nocovariates_2xU/TXT/",q[j],'.txt', sep='')
capture.output(m1,file=model.name.txt)
print(model.name.txt)
model.name.rds = paste("C:/Users/ubeda/OneDrive/Desktop/Resultados post-TFM/1. RUN_Nocovariates_2xU/RDS/",q[j], '.rds', sep='')
print(model.name.rds)
saveRDS(m1,model.name.rds)
}#j
}#i
}#end g
However, if I run them one by one the code run smoothly. This would cost me an eternity to run the code 13 times, and I thought the loop was the best option. Here I drop the code in which the model run withouth problems:
Q = c("unconstrained", "diagonal and unequal", "diagonal and equal")
q = c('Qun1','Qdu1','Qde1')
h = "catches raw"
for(j in 1:length(Q)){
model01$Q = Q[j]
print(paste(q[j], sep=""))
m1 = MARSS(Y, model=model01,
control=list(maxit = 5000,trace = -1, conv.test.slope.tol=100),
silent=2, method="kem")
model.name.txt = paste("C:/Users/ubeda/OneDrive/Desktop/Resultados post-TFM/RUN_NAO_catches_raw_2xU/TXT/",q[j],h[1],'.txt', sep='')
capture.output(m1,file=model.name.txt)
print(model.name.txt)
model.name.rds = paste("C:/Users/ubeda/OneDrive/Desktop/Resultados post-TFM/RUN_NAO_catches_raw_2xU/RDS/",q[j],h[1], '.rds', sep='')
print(model.name.rds)
saveRDS(m1,model.name.rds)
}#j
Anyone has any idea about what is my problem?

lapply error - PGLS (caper) with multiples comparative.data

i need help in the following problem.
I generated a list containing 1000 comparative.dataand i want to run 1000 pgls using each of these comparative.data. I tried to use lapply function for this, using the following code:
pg <- lapply(obj, function(z){pgls(formula = y ~ x, cd[[z]], lambda = "ML")})
obj is a list of 1000 data.frames with my data. cd is my list of 1000 comparative.data.
When i tried to run this code the followin error returned:
Error in pgls(formula = y ~ x, cd[[z]], lambda = "ML") :
object 'z' not found
I can not see where is the error's source
Thanks in advance
More informations
obj is used to generate the comparative.data. To generate the 1000 comparative.data using the 1000 data frames in obj list, i used:
cd <- lapply(1:1000, function(x) comparative.data(phy = phylogeny,
data = as.data.frame(obj[[x]]),
names.col = species_name,
vcv=T, vcv.dim=3))
To run one pgls for the hundredth comparative.data the code is:
mod <- pgls(formula = y ~ x, cd[[100]], lambda = "ML")
Calling the hundredth obj and hundredth cd
obj[[100]]
# A tibble: 136 x 3
# Groups: Binomial, herbivores [136]
Binomial herbivores tm
* <chr> <dbl> <dbl>
1 Abies_alba 30. 0.896
2 Abies_balsamea 2. 0.990
3 Abies_borisii-regis 1. 0.940
4 Alcea_rosea 7. 0.972
5 Amaranthus_caudatus 1. 0.173
6 Amaranthus_hybridus_subsp._cruentus 1. 0.310
7 Aquilegia_vulgaris 9. 0.365
8 Arabidopsis_thaliana 8. 0.00280
9 Arabis_alpina 2. 0.978
10 Ariocarpus_fissuratus 1. 0.930
# ... with 126 more rows
cd[[100]]
Comparative dataset of 136 taxa:
Phylogeny: tree
136 tips, 134 internal nodes
chr [1:136] "Mercurialis_annua" "Manihot_esculenta"
"Malpighia_emarginata" "Comarum_palustre" ...
VCV matrix present:
VCV.array [1:136, 1:136, 1:16] 61.9 189.3 189.3 189.3 189.3 ...
Data: as.data.frame(obj[[x]])
$ herbivores: num [1:136] 4 1 1 5 19 21 7 4 4 2 ...
$ tm : num [1:136] 0.516 0.915 1.013 0.46 0.236 ...
Since cd was created from obj, there is no need to reference obj in lapply call but simply pass your list of comparative.data which you can do by object:
# BELOW d IS DATA FRAME OBJECT PASSED INTO LAPPLY LOOP
pg_list <- lapply(cd, function(d) pgls(formula = y ~ x, d, lambda = "ML"))
Or by index:
# BELOW i IS INTEGER VALUE PASSED INTO LAPPLY LOOP
pg_list <- lapply(seq_along(cd), function(i) pgls(formula = y ~ x, cd[[i]], lambda = "ML"))
Alternatively, you can combine both lapply calls, assuming you do not need the intermediate object, cd list, for other purposes:
# BELOW x IS OBJECT PASSED INTO LAPPLY LOOP
pg_list <- lapply(obj, function(x) {
cd <- comparative.data(phy = phylogeny,
data = as.data.frame(x),
names.col = species_name,
vcv=T, vcv.dim=3))
pgls(formula = y ~ x, cd, lambda = "ML")
})

function on nested ranked lists: comparing lists of data against combinations of another list with data R

I really need help with my script, I am not a professional in R.
Some background information about what I want to do.
There are two ranked lists of data ( drugs,diseases ). In these datasets there is information about how genes change in expression.
The drugRL(drug) dataset is a dataset which is a ranked list. The diseaseRL(disease) dataset is a dataset which in the description says it is the same ( ?diseaseRL ), but seems not to be a ranked list.
What i did was i took the absolute numbers from the diseaseRL dataset and normalized the data using the range of the data ( max - min of a vector of a particular disease in that dataset ).
So what i have now are two lists of dataframes containing the information of gene expression, as ranked lists.
Some code examples, first build the needed packages:
# Compile/install packages using biocLite.
#source("https://bioconductor.org/biocLite.R")
#biocLite("DrugVsDiseasedata")
#biocLite("gespeR")
#biocLite("DrugVsDisease") # may not be needed.
Then import packages/datasets :
#import libraries
library("DrugVsDisease")#may not be needed
library("DrugVsDiseasedata")
library("cMap2data")
library("gespeR")
#import datasets
data(diseaseRL)
data(drugRL)
> class(drugRL)
[1] "matrix"
>
> class(diseaseRL)
[1] "matrix"
>
> str(drugRL)
num [1:11709, 1:1309] 1870 4059 2250 10284 8999 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:11709] "ZNF702P" "SAMD4A" "VN1R1" "ZNF419" ...
..$ : chr [1:1309] "(+)-chelidonine" "(+)-isoprenaline" "(+/-)-catechin" "(-)-MK-801" ...
>
> str(diseaseRL)
num [1:11709, 1:45] 0.01683 -0.00112 -0.00126 0.04902 0.02605 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:11709] "LINC00115" "GOT2P1" "TP73-AS1" "PIN1P1" ...
..$ : chr [1:45] "wilms-tumor" "glaucoma-open-angle" "diabetes-mellitus-type-ii" "soft-tissue-sarcoma" ...
>
Now comes the part where i created a function to normalize the datasets:
NormalizeRLData <- function(x){
data.rankedlist <- x
data.rankedlist.abs <- as.data.frame(abs(data.rankedlist))
data.rankedlist.abs.ordered <-
data.rankedlist.abs[order(data.rankedlist.abs,decreasing=T), , drop = FALSE]
data.rankedlist.abs.ordered.max <- max(data.rankedlist.abs.ordered)
data.rankedlist.abs.ordered.min <- min(data.rankedlist.abs.ordered)
data.rankedlist.abs.ordered.normalizedToOwnRange <- (data.rankedlist.abs.ordered
/
(data.rankedlist.abs.ordered.max - data.rankedlist.abs.ordered.min ))
data.rankedlist.abs.ordered.normalizedToOwnRange.ordered <-
data.rankedlist.abs.ordered.normalizedToOwnRange[order(
data.rankedlist.abs.ordered.normalizedToOwnRange,decreasing=T ), , drop =
FALSE]
return(data.rankedlist.abs.ordered.normalizedToOwnRange.ordered)
}
diseaseRL.normalized <- apply(diseaseRL,2,NormalizeRLData)
drugRL.normalized <- apply(drugRL,2,NormalizeRLData)
There are multiple doubts/problems i have in order to proceed. I am unsure if what i did till now can be done more effectively, in particular in regards to the following, using rank-biased overlap (RBO).
RBO is a function which is able to compare two ranked lists. I want to make use of this function for the lists of normalized data, containing data.frames of the genes (for the disease and drug ranked lists). The input of this rbo function is a named vector.
example :
> a <- c(4,2,5,5)
> b <- c(1,2,3,4)
> names(a) <- c('one','two','three','four')
> names(b) <- c('one','two','three','four')
> rbo(a,b, p = 0.95)
[1] 0.9650417
What is the most efficient way to do this, so first of all can i have a better output then what i have at the moment to provide to the rbo function ?
And second :
If not ( or similar case ), I will have a list of data.frames containing the gene information for ether a drug, and another for a disease. I want to do the rbo function for every drug against every disease.
I tried using sapply, but i could not get it to work properly, and I am unsure if it is the right way to go. I need to maintain the names of the drugs , and for the other dataset the name of the disease, but also the gene names are important. So i can later check with genes and disease and drugs are having interactions.
I really hope someone here can shed some light into this !
p.s: If any one tries to help me here, but has problem compiling the packages, I may help ! Maybe i could send an example dataset ( not sure if i can attach anything here directly ).
Best Regards,
Rick
First, your user-defined method can vastly be reduced in verbosity. No need to cast into data.frame, initially order, or use drop in [] as vectors are being passed into the method. Consider the following adjustment where last line is the returned object:
NormalizeRLData <- function(x){
rnklist <- abs(x)
rnklist <- rnklist[order(rnklist)]
normRng <- rnklist / (max(rnklist) - min(rnklist))
normRng[order(normRng, decreasing = TRUE)]
}
diseaseRL.normalized <- apply(diseaseRL,2,NormalizeRLData)
drugRL.normalized <- apply(drugRL,2,NormalizeRLData)
Secondly, your normalized matrices (not dataframes) can possibly indeed be run with sapply by passing two inputs into the method, specifically both number of columns of each matrix as arguments. When two vectors are passed in sapply a cartesian product is applied where all combinations between both sets are iterated.
Since matrices maintain named columns and rows, it should adhere to rbo requirements. The return will be a matrix M x N where dimensions are the number of columns of disease and drug matrices.
# TWO-INPUT SAPPLY
rbo_mat <- sapply(seq(ncol(diseaseRL.normalized)), function(i,j) rbo(diseaseRL.normalized[,i], drugRL.normalized[,j], p = 0.95),
seq(ncol(drugRL.normalized)))
# EQUIVALENT WITH VAPPLY TO [V]ERIFY TYPE AND LENGTH OF OUTPUT
rbo_mat <- vapply(seq(ncol(diseaseRL.normalized)), function(i,j) rbo(diseaseRL.normalized[,i], drugRL.normalized[,j], p = 0.95),
numeric(seq(ncol(drugRL.normalized))),
seq(ncol(drugRL.normalized)))
You might even be able to use the lesser know apply function, rapply (recursive apply):
cols_list <- list(seq(ncol(diseaseRL.normalized)), seq(ncol(drugRL.normalized)))
rbo_mat2 <- rapply(cols_list, function(i,j) rbo(drugRL.normalized[,j], diseaseRL.normalized[,i], p = 0.95),
how="replace")[[1]]
TEST EXAMPLE
Because I cannot reproduce OP's data and do not have necessary packages, below is working example of above methodology with random normal data and uses the correlation function, cor as substitute for rbo:
set.seed(142)
mat1 <- sapply(1:10, function(i) rnorm(20))
colnames(mat1) <- LETTERS[1:10]
rownames(mat1) <- letters[1:20]
str(mat1)
# num [1:20, 1:10] 1.255 1.704 0.88 -0.582 -0.169 ...
# - attr(*, "dimnames")=List of 2
# ..$ : chr [1:20] "a" "b" "c" "d" ...
# ..$ : chr [1:10] "A" "B" "C" "D" ...
mat2 <- sapply(1:5, function(i) rnorm(20))
colnames(mat2) <- LETTERS[1:5]
rownames(mat2) <- letters[1:20]
str(mat2)
# num [1:20, 1:5] -0.156 0.449 -0.822 -1.062 0.838 ...
# - attr(*, "dimnames")=List of 2
# ..$ : chr [1:20] "a" "b" "c" "d" ...
# ..$ : chr [1:5] "A" "B" "C" "D" ...
corr_mat <- sapply(seq(ncol(mat1)), function(i,j) cor(mat1[,i], mat2[,j]),
seq(ncol(mat2)))
corr_mat2 <- vapply(seq(ncol(mat1)), function(i,j) cor(mat1[,i], mat2[,j]),
numeric(ncol(mat2)),
seq(ncol(mat2)))
corr_mat3 <- rapply(list(seq(ncol(mat1)), ncol(mat2)), function(i,j) cor(mat2[,j], mat1[,i]),
how="replace")[[1]]

Plot from package "lomb" in ggplot2

I am using the package "lomb" to calculate Lomb-Scargle Periodograms, a method for analysing biological time series data. The package does create a plot if you tell it to do so. However, the plots are not too nice (compared to ggplot2 plots). Therefore, I would like to plot the results with ggplot. However, I do not know how to access the function for the curve plotted...
This is a sample code for a plot:
TempDiff <- runif(4033, 3.0, 18) % just generate random numbers
Time2 <- seq(1,4033) % Time vector
Rand.LombScargle <- randlsp(repeats=10, TempDiff, times = Time2, from = 12, to = 36,
type = c("period"), ofac = 10, alpha = 0.01, plot = T,
trace = T, xlab="period", main = "Lomb-Scargle Periodogram")
I have also tried to find out something about the function looking into the function randlsp itself, but could not really find anything that seemed useful to me there...
getAnywhere(randlsp)
A single object matching ‘randlsp’ was found
It was found in the following places
package:lomb
namespace:lomb
with value
function (repeats = 1000, x, times = NULL, from = NULL, to = NULL,
type = c("frequency", "period"), ofac = 1, alpha = 0.01,
plot = TRUE, trace = TRUE, ...)
{
if (is.ts(x)) {
x = as.vector(x)
}
if (!is.vector(x)) {
times <- x[, 1]
x <- x[, 2]
}
if (plot == TRUE) {
op <- par(mfrow = c(2, 1))
}
realres <- lsp(x, times, from, to, type, ofac, alpha, plot = plot,
...)
realpeak <- realres$peak
pks <- NULL
if (trace == TRUE)
cat("Repeats: ")
for (i in 1:repeats) {
randx <- sample(x, length(x))
randres <- lsp(randx, times, from, to, type, ofac, alpha,
plot = F)
pks <- c(pks, randres$peak)
if (trace == TRUE) {
if (i/10 == floor(i/10))
cat(i, " ")
}
}
if (trace == TRUE)
cat("\n")
prop <- length(which(pks >= realpeak))
p.value <- prop/repeats
if (plot == TRUE) {
mx = max(c(pks, realpeak)) * 1.25
hist(pks, xlab = "Peak Amplitude", xlim = c(0, mx), main = paste("P-value: ",
p.value))
abline(v = realpeak)
par(op)
}
res = realres[-(8:9)]
res = res[-length(res)]
res$random.peaks = pks
res$repeats = repeats
res$p.value = p.value
class(res) = "randlsp"
return(invisible(res))
Any idea will be appreciated!
Best,
Christine
PS: Here an example of the plot with real data.
The key to getting ggplot graphs out of any returned object is to convert the data that you need in to some sort of data.frame. To do this, you can look at what kind of object your returned value is and see what sort of data you can immediately extract into a data.frame
str(Rand.LombScargle) # get the data type and structure of the returned value
List of 12
$ scanned : num [1:2241] 12 12 12 12 12 ...
$ power : num [1:2241] 0.759 0.645 0.498 0.341 0.198 ...
$ data : chr [1:2] "times" "x"
$ n : int 4033
$ type : chr "period"
$ ofac : num 10
$ n.out : int 2241
$ peak : num 7.25
$ peak.at : num [1:2] 24.6908 0.0405
$ random.peaks: num [1:10] 4.99 9.82 7.03 7.41 5.91 ...
$ repeats : num 10
$ p.value : num 0.3
- attr(*, "class")= chr "randlsp"
in the case of randlsp, its a list, which is usually what is returned from statistical functions. Most of this information can also be obtained from ?randlsp too.
It looks as if Rand.LombScargle$scanned and Rand.LombScargle$power contains most of what is needed for the first graph:
There is also a horizontal line on the Periodogram, but it doesn't correspond to anything that was returned by randlsp. Looking at the source code that you provided, it looks as if the Periodogram is actually generated by lsp().
LombScargle <- lsp( TempDiff, times = Time2, from = 12, to = 36,
type = c("period"), ofac = 10, alpha = 0.01, plot = F)
str(LombScargle)
List of 12
$ scanned : num [1:2241] 12 12 12 12 12 ...
$ power : num [1:2241] 0.759 0.645 0.498 0.341 0.198 ...
$ data : chr [1:2] "Time2" "TempDiff"
$ n : int 4033
$ type : chr "period"
$ ofac : num 10
$ n.out : int 2241
$ alpha : num 0.01
$ sig.level: num 10.7
$ peak : num 7.25
$ peak.at : num [1:2] 24.6908 0.0405
$ p.value : num 0.274
- attr(*, "class")= chr "lsp"
I am guessing that, based on this data, the line is indicating the significance level LombScargle$sig.level
Putting this together, we can create our data to pass to ggplot from lsp:
lomb.df <- data.frame(period=LombScargle$scanned, power=LombScargle$power)
# use the data frame to set up the line plot
g <- ggplot(lomb.df, aes(period, power)) + geom_line() +
labs(y="normalised power", title="Lomb-Scargle Periodogram")
# add the sig.level horizontal line
g + geom_hline(yintercept=LombScargle$sig.level, linetype="dashed")
For the histogram, it looks like this is based on the vector Rand.LombScargle$random.peaks from randlsp:
rpeaks.df <- data.frame(peaks=Rand.LombScargle$random.peaks)
ggplot(rpeaks.df, aes(peaks)) +
geom_histogram(binwidth=1, fill="white", colour="black") +
geom_vline(xintercept=Rand.LombScargle$peak, linetype="dashed") +
xlim(c(0,12)) +
labs(title=paste0("P-value: ", Rand.LombScargle$p.value),
x="Peak Amplitude",
y="Frequency")
Play around with these graphs to get them looking to your taste.

Convert text to date/timestamp in a matrix in R for plot.ts

I have a matrix with value and timestamps as strings.
> m
value time
[1,] 0 "2014-10-20T01:48:00.019+02:00"
[2,] 0 "2014-10-20T01:48:30.019+02:00"
[3,] 0 "2014-10-20T01:49:00.019+02:00"
[4,] 0 "2014-10-20T01:49:30.020+02:00"
[5,] 0 "2014-10-20T01:50:00.020+02:00"
...
I would like to convert the strings to timestamps or so, to plot them on a timeseries chart (I suggest to use plot.ts!?). I knew that I can use
strptime(data, "%Y-%m-%dT%H:%M:%OS")
to convert the string, but I don't knew how to apply to the matrix.
Background:
I loaded data from a JSON file:
{
measurements:
0: {
id: "87144000"
self: "http://xxxyyyzzz.com/measurement/measurements/87144000"
source: {...}
time: "2014-10-20T01:48:00.019+02:00"
type: "LightSensor"
LightSensor: {
light: {
unit: "LUX"
value: 0
}
}
}
...
}
I loaded and transform:
> l <- fromJSON(file = "./dev/learning-r/data/c8y-measurement-light.json")
> m <- lapply (l$measurements, function(x) c(x$LightSensor$light['value'], x['time']))
> m <- do.call(rbind, m)
> str(m)
List of 2000
$ : num 0
...
$ : num 0
[list output truncated]
- attr(*, "dim")= int [1:2] 1000 2
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "value" "time"
Thanks!
You might use:
strptime(x['time'], "%Y-%m-%dT%H:%M:%OS");
directly in your lapply function. This ignores the timezone for now, but should work. Look at the %z parameter if you interested in the timezone (some regex might be required).
Then convert the matrix to a dataframe:
df <- data.frame(m)
Now you should be able to plot the data directly (please check that not all values are zero with summary(df)) with plot(df).
I found out, that the structure m is a list. I converted the list to a data.frame
dat <- data.frame(m)
Now I can apply #agtudy slightly changed command
dat$time <- as.POSIXct(strptime(dat$time, "%Y-%m-%dT%H:%M:%OS"))
Results in
> dat
value time
1 0 2014-10-20 01:48:00

Resources