I am trying to insert forecasted values from a forecasting model along with timestamps in mongodb from.
The following code converts the R dataframe into json and then bson. However,when the result is inserted into mongodb, the timestamp is not recognized as date object.
mongo1 <-mongo.create(host = "localhost:27017",db = "test",username = "test",password = "test")
rev<-data.frame(ts=c("2017-01-06 05:30:00","2017-01-06 05:31:00","2017-01-06 05:32:00","2017-01-06 05:33:00","2017-01-06 05:34:00"),value=c(10,20,30,40,50))
rev$ts<-as.POSIXct(strptime(rev$ts,format = "%Y-%m-%d %H:%M:%S",tz=""))
revno<-"Revision1"
mylist <- list()
mylist[[ revno ]] <- rev
mylist["lastRevision"]<-revno
StartTime<-"2017-01-06 05:30:00"
site<-"Site1"
id <- mongo.bson.buffer.create()
mongo.bson.buffer.append(id, "site",site)
mongo.bson.buffer.append(id, "ts",as.POSIXct(strptime(StartTime,format = "%Y-%m-%d %H:%M:%S",tz="")) )
s <- mongo.bson.from.buffer(id)
rev.json<-toJSON(mylist,POSIXt=c("mongo"))
rev.bson<-mongo.bson.from.JSON(rev.json)
actPower <- mongo.bson.buffer.create()
mongo.bson.buffer.append(actPower, "_id",s)
mongo.bson.buffer.append(actPower,"activePower",rev.bson)
x <- mongo.bson.from.buffer(actPower)
x
mongo.insert(mongo1,'solarpulse.forecast',x)
Actual Output:
{
"_id" : {
"site" : "site1",
"ts" : ISODate("2017-01-06T18:30:00Z")
},
"activePower" : {
"Revision1" : [
{
"ts" : 1483660800000,
"value" : 10
},
{
"ts" : 1483660860000,
"value" : 20
},
{
"ts" : 1483660920000,
"value" : 30
},
{
"ts" : 1483660980000,
"value" : 40
},
{
"ts" : 1483661040000,
"value" : 50
}
],
"lastRevision" : [
"Revision1"
]
}
}
Expected Output format:
"_id" : {
"site" : "test",
"ts" : ISODate("2016-12-18T18:30:00Z")
}
"Revision1": [{
"ts": ISODate("2016-12-19T07:30:00Z"),
"value": 31
}, {
"ts": ISODate("2016-12-19T07:45:00Z"),
"value": 52
}, {
"ts": ISODate("2016-12-19T08:00:00Z"),
"value": 53
}, {
"ts": ISODate("2016-12-19T08:15:00Z"),
"value": 30
}, {
"ts": ISODate("2016-12-19T08:30:00Z"),
"value": 43
}, {
"ts": ISODate("2016-12-19T08:45:00Z"),
"value": 31
}, {
"ts": ISODate("2016-12-19T09:00:00Z"),
"value": 16
}, {
"ts": ISODate("2016-12-19T09:15:00Z"),
"value": 39
}, {
"ts": ISODate("2016-12-19T09:30:00Z"),
"value": 17
}, {
"ts": ISODate("2016-12-19T09:45:00Z"),
"value": 45
}, {
"ts": ISODate("2016-12-19T10:00:00Z"),
"value": 60
}, {
"ts": ISODate("2016-12-19T10:15:00Z"),
"value": 39
}, {
"ts": ISODate("2016-12-19T10:30:00Z"),
"value": 46
}, {
"ts": ISODate("2016-12-19T10:45:00Z"),
"value": 57
}, {
"ts": ISODate("2016-12-19T11:00:00Z"),
"value": 29
}, {
"ts": ISODate("2016-12-19T11:15:00Z"),
"value": 7
}]
You can use library(mongolite) to insert dates correctly for you. However, I've only managed to get it to correctly insert dates using data.frames. It fails to insert dates correctly using lists or JSON strings.
Here is a working example using a data.frame to insert the data.
library(mongolite)
m <- mongo(collection = "test_dates", db = "test", url = "mongodb://localhost")
# m$drop()
df <- data.frame(id = c("site1","site2"),
ts = c(Sys.time(), Sys.time()))
m$insert(df)
#Complete! Processed total of 2 rows.
#$nInserted
#[1] 2
#
#$nMatched
#[1] 0
#
#$nRemoved
#[1] 0
#
#$nUpserted
#[1] 0
#
#$writeErrors
#list()
A potential (but less than ideal) solution could be to coerce your list to a data.frame and then insert that.
rev<-data.frame(ts=c("2017-01-06 05:30:00","2017-01-06 05:31:00","2017-
01-06 05:32:00","2017-01-06 05:33:00","2017-01-06 05:34:00"),value=c(10,20,30,40,50))
rev$ts<-as.POSIXct(strptime(rev$ts,format = "%Y-%m-%d %H:%M:%S",tz=""))
revno<-"Revision1"
mylist <- list()
mylist[[ revno ]] <- rev
mylist["lastRevision"]<-revno
m$insert(data.frame(mylist))
Or alternatively, insert your list, and then write a function to convert ts values to ISODates() directly within mongo
Related
Example json data:
{
"data": [
{
"place": "FM346",
"id": [
"7_day_A",
"7_day_B",
"7_day_C",
"7_day_D"
],
"values": [
0,
30,
23,
43
]
},
{
"place": "LH210",
"id": [
"1_day_A",
"1_day_B",
"1_day_C",
"1_day_D"
],
"values": [
4,
45,
100,
9
]
}
]
}
what i need to transform it into:
{
"data": [
{
"place": "FM346",
"7_day_A": {
"value": 0
},
"7_day_B": {
"value": 30
},
"7_day_C": {
"value": 23
},
"7_day_D": {
"value": 43
}
},
{
"place": "LH210",
"1_day_A": {
"value": 4
},
"1_day_B": {
"value": 45
},
"1_day_C": {
"value": 100
},
"1_day_D": {
"value": 9
}
}
]
}
i have tried this:
{
data:[.data |.[]|
{
place: (.place),
(.id[]):
{
value: (.values[])
}
}]
}
(in jqplay: https://jqplay.org/s/f4BBtN9gwmp)
and this:
{
data:[.data |.[]|
{
place: (.place),
test:
[{
(.id[]):
{
value: (.values[])
}
}]
}]
}
(in jqplay: https://jqplay.org/s/pKIvQe1CzgX)
but they arent grouped in the way i wanted and it gives each value to each id, not the corresponding one.
I have been trying for some time now, but im new to jq and have no idea how to transform it this way, thanks in advance for any answers.
You can use transpose here, which can play a key role in converting the arrays to key/value pairs
.data[] |= {place} +
([ .id, .values ] | transpose | map({(.[0]): { value: .[1] } }) | add)
The solution works by converting the array-of-arrays [.id, .values] by transposing them, i.e. converting
[["7_day_A","7_day_B","7_day_C","7_day_D"],[0,30,23,43]]
[["1_day_A","1_day_B","1_day_C","1_day_D"],[4,45,100,9]]
to
[["7_day_A",0],["7_day_B",30],["7_day_C",23],["7_day_D",43]]
[["1_day_A",4],["1_day_B",45],["1_day_C",100],["1_day_D",9]]
With the transformation done, we construct an object with key as the zeroth index element and value as an object comprising of the value of first index element, and combine the results together with add
Demo - jqplay
I'm trying to get some data in a JSON file using R, but it does not work when the data is under brackets and keys, I'm getting a lot of data, the problem is actually getting the value of the "released" parameter. example:
{
"index": [
{
"id": "a979eb2b85d6c13086b29a21bdc421b2673379a4",
"date": "2019-03-22T01:20:01-0300",
"status": "OK",
"sensor": [
{
"id": "15",
"number": 127,
"callback": {
"released": true #it is not possible to return this data
}
}
]
},
{
"id": "db2890f501a3a49ed74aeb065168e057c3fd51d2",
"date": "2019-03-25T01:20:01-0300",
"status": "NOK",
"sensor": [
{
"id": "15",
"number": 149,
"callback": {
"released": false #it is not possible to return this data
}
}
]
}
]
}
Follow the code:
library(jsonlite)
data <- fromJSON("Desktop/json/file.json")
pagination <- list()
for(i in 0:10){
pagination[[i+1]] <- data$index$sensor$callback
}
data_org <- rbind_pages(pagination)
nrow(data_org)
length <- nrow(data_org)
data_org[1:length, c("released")]
The response was being:
nrow(data_org)
# [1] 0
data_org[1:length, c("released")]
# NULL
Ideally, I need to get the count of how many times "London" is used in the city name. But the query returns different values for "london" and "London" and "LoNdOn" and so on.
I have tried using Case Insensitive as an option, but it doesn't give me the required result.
Here's my query,
{
"queryType": "topN",
"dataSource": "wikiticker",
"dimension":"cityName",
"granularity": "ALL",
"metric": "count",
"threshold": 10,
"filter":
{
"type": "search",
"dimension": "cityName",
"query": {
"type": "insensitive_contains",
"value": "london",
}
},
"aggregations": [
{
"type": "longSum",
"name": "count",
"fieldName": "count"
}
],
"intervals": ["2014-10-01T00:00:00.000Z/2016-10-07T00:00:00.000Z"]
}
And here's my result :
[ {
"timestamp" : "2015-09-12T00:46:58.771Z",
"result" : [ {
"count" : 21,
"cityName" : "London"
},
{
"count" : 10,
"cityName" : "New London"
},
{
"count" : 3,
"cityName" : "london"
},
{
"count" : 1,
"cityName" : "LoNdon"
},
{
"count" : 1,
"cityName" : "LondOn"
} ]
} ]
I should get something like:
[ {
"timestamp" : "2015-09-12T00:46:58.771Z",
"result" : [ {
"count" : 26,
"cityName" : "London"
},
{
"count" : 10,
"cityName" : "New London"
} ]
} ]
Use the filtered aggregator:
A filtered aggregator wraps any given aggregator, but only aggregates the values for which the given dimension filter matches.
{
"type" : "filtered",
"filter" : {
"type" : "search",
"dimension" : cityName,
"query": {
"type":"contains",
"value":"london"
}
},
"aggregator" : {
"type": "count",
"name": "Total Count of the Name London"
}
}
References
Druid Documentation: Filtered Aggregator
I am trying to use for loop for every object using jq.
Sample Input generated by Elasticsearch
{
"took": 202,
"timed_out": false,
"aggregations": {
"aggsDateHistogram": {
"buckets": [
{
"key": 1465974236000,
"search": {
"value": 14
}
},
{
"key": 1465975137000,
"search": {
"value": 16
}
}
]
}
}
}
I want to have an object that has a key value and corresponding value of value index from search.
{ "date": .aggregations.aggsDateHistogram.buckets[].key, "value": .aggregations.aggsDateHistogram.buckets[].search.value }
This gives me an object but with cartesian product, but I only want to have values like
key[1] : search[1].value
key[2] : search[2].value
So you want to produce this output?
[
{
"key": 1465974236000,
"value": 14
},
{
"key": 1465975137000,
"value": 16
}
]
The following will do just that:
.aggregations[].buckets
| map({key: .key, value: .search.value})
And from a terminal:
jq '.aggregations[].buckets
| map({key: .key, value: .search.value})' input.json
Here is a slightly simpler solution
[ .aggregations[].buckets[] | {key, value:.search.value} ]
I am accessing bulk data in elastic through R. For analytics purpose I need to query for data for a relatively long duration (say a month). The data for a month is approx 4.5 million rows and R goes out of memory.
Sample data is below (for 1 day):
dt <- as.Date("2015-09-01", "%Y-%m-%d")
frmdt <- strftime(dt,"%Y-%m-%d")
todt <- as.Date(dt+1)
todt <- strftime(todt,"%Y-%m-%d")
connect(es_base="http://xx.yy.zzz.kk")
start_date <- as.integer(as.POSIXct(frmdt))*1000
end_date <- as.integer(as.POSIXct(todt))*1000
query <- sprintf('{"query":{"range":{"time":{"gte":"%s","lte":"%s"}}}}',start_date,end_date)
s_list <- elastic::Search(index = "organised_2015_09",type = "PROPERTY_SEARCH", body=query ,
fields = c("trackId", "time"), size=1000000)$hits$hits
length(s_list)
[1] 144612
This result for 1 day has 144k records and is 222 MB. Sample list item below:
> s_list[[1]]
$`_index`
[1] "organised_2015_09"
$`_type`
[1] "PROPERTY_SEARCH"
$`_id`
[1] "1441122918941"
$`_version`
[1] 1
$`_score`
[1] 1
$fields
$fields$time
$fields$time[[1]]
[1] 1441122918941
$fields$trackId
$fields$trackId[[1]]
[1] "fd4b4ce88101e58623ba9e6e31971d1f"
Actually a summary count of number of items by "trackId" and "time" (summarize for every day) would suffice for analytics purpose. Hence I tried to transform this into a count query with aggregations. So I constructed the below query:
query < -'{"size" : 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"time": {
"gte": 1441045800000,
"lte": 1443551400000
}
}
}
}
},
"aggs": {
"articles_over_time": {
"date_histogram": {
"field": "time",
"interval": "day",
"time_zone": "+05:30"
},
"aggs": {
"group_by_state": {
"terms": {
"field": "trackId",
"size": 0
}
}
}
}
}
}'
response <- elastic::Search(index="organised_recent",type="PROPERTY_SEARCH",body=query, search_type="count")
However I did not gain in speed or document size. i think I am missing something but not sure what.