Extract a specific key word from a string in R

Extract a specific key word from a string in R - r

I have a column "place" in my table which contains data about a place that looks like:
{ "id" : "94965b2c45386f87", "name" : "New York", "boundingBoxCoordinates" : [ [ { "longitude" : -79.76259, "latitude" : 40.477383 }, { "longitude" : -79.76259, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 40.477383 } ] ], "countryCode" : "US", "fullName" : "New York, USA", "boundingBoxType" : "Polygon", "URL" : "https://api.twitter.com/1.1/geo/id/94965b2c45386f87.json", "accessLevel" : 0, "placeType" : "admin", "country" : "United States" }
From this, I want to extract the country name. I have tried the following code:
loc <- t1$place
loc = gsub('"', '', loc)
loc = gsub(',', '', loc)
to clean up the string and now it looks like this:
"{ id : 00ed6f0947c230f4 name : Caloocan City boundingBoxCoordinates : [ [ { longitude : 120.9607709 latitude : 14.6344661 } { longitude : 120.9607709 latitude : 14.7873208 } { longitude : 121.1015117 latitude : 14.7873208 } { longitude : 121.1015117 latitude : 14.6344661 } ] ] countryCode : PH fullName : Caloocan City National Capital Region boundingBoxType : Polygon URL : https://api.twitter.com/1.1/geo/id/00ed6f0947c230f4.json accessLevel : 0 placeType : city country : Republika ng Pilipinas }"
Now to extract the country name, I want to use the word() function:
word(loc, n, sep=fixed(" : "))
where n in the position of the country name I still did not count. But this function gives the correct output when n=1 but gives an error for any other vaue of n:
Error in word[loc, "start"] : subscript out of bounds
Why is that happening? The loc variable certainly has more words with that separation. Or can someone suggest a better way of extracting the country name from that field?
EDIT: t1 is the dataframe that consists my entire table. Presently I am interested only in the place field of my table which has the information in the above mentioned format. Hence I am trying to load the place field into a separate variable called "loc" using the basic assignment instruction:
loc <- t1$place
In order to read it as a JSON, the place field needs to be delimited by single quotes which it is not originally. I have 2 millions rows in my table so I really can't manually add the delimiters.

This looks like a JSON object so it would be easier to use a JSON parse to extract the data.
So if this your string value
x <- '{ "id" : "94965b2c45386f87", "name" : "New York", "boundingBoxCoordinates" : [ [ { "longitude" : -79.76259, "latitude" : 40.477383 }, { "longitude" : -79.76259, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 40.477383 } ] ], "countryCode" : "US", "fullName" : "New York, USA", "boundingBoxType" : "Polygon", "URL" : "https://api.twitter.com/1.1/geo/id/94965b2c45386f87.json", "accessLevel" : 0, "placeType" : "admin", "country" : "United States" }'
then you can do
library(jsonlite)
# or library(RJSOINIO)
# or library(rjson)
fromJSON(x)$country
# [1] "United States"

Related

Unable to get the square brackets for the single individual records while using the json_agg inside the json_compose in Teradata-sql-assistance

According to the documentation,
if we have the following data in a table named emp_table
empID company empName empAge
1, 'Teradata', 'Cameron', 24
2, 'Teradata', 'Justin', 34
3, 'Apple', 'Someone', 34
and if we use the following query :
SELECT JSON_Compose(T.company, T.employees)
FROM
(
SELECT company, JSON_agg(empID AS id,
empName AS name,
empAge AS age) AS employees
FROM emp_table
GROUP BY company
) AS T;
we are supposed to get :
JSON_Compose
------------
{
"company" : "Teradata",
"employees" : [
{ "id" : 1, "name" : "Cameron", "age" : 24 },
{ "id" : 2, "name" : "Justin", "age" : 34 }
]
}
{
"company" : "Apple",
"employees" : [
{ "id" : 3, "name" : "Someone", "age" : 24 }
]
}
But when I try to follow the same steps, I'm not able to get the square bracket for the individual records.
The result I'm getting is :
JSON_Compose
------------
{
"company" : "Teradata",
"employees" : [
{ "id" : 1, "name" : "Cameron", "age" : 24 },
{ "id" : 2, "name" : "Justin", "age" : 34 }
]
}
{
"company" : "Apple",
"employees" :
{ "id" : 3, "name" : "Someone", "age" : 24 }
}
Is there any way to get square bracket for individual records as well ?

Firebase rules, how to allow users to only see their own data

I am trying to get a rule set working to allow users to see there own data...
My current rule set is:
{
"rules": {
".read": "root.child('users').child(auth.uid).child('admin').val() === true",
".write": "root.child('users').child(auth.uid).child('admin').val() === true",
"users": {
".indexOn": ["active"],
"$user_id": {
".read": "$user_id === auth.uid",
".write": "$user_id === auth.uid"
}
},
"active_alerts": {
".indexOn": "alert_id"
},
"trips": {
".indexOn": "archive",
"$trip_id": {
".read": "data.child('who_called').child('key').val() === root.child('users').child(auth.uid).child('customer').child('key').val()",
"notes": {
"$note_id": {
".read": "data.child('display').val() === true"
}
}
}
}
}
}
The user path has data that looks like this:
{
"active" : true,
"admin" : false,
"customer" : {
"key" : "-Ldsu71CgIJxh1DVTTCP",
"name" : "Demo Customer"
},
"email" : "emai#email.com",
"last_login" : "2019-05-02T18:34:26.466Z",
"name" : "Demo",
"primary_phone" : "4197460180",
"typeahead" : "demo"
}
and the matching item in /trips:
{
"airline" : {
"key" : "195",
"name" : "AAL"
},
"archive" : false,
"arrival_airport" : {
"code" : "PHL",
"icao" : "KPHL",
"key" : "108",
"name" : "Philadelphia",
"timezone" : "America/New_York"
},
"bill_to" : {
"key" : "-LdqFpqAOm-dOl9xBtp2",
"name" : "AGT Global Logistics "
},
"consignee" : {
"key" : "-LdqHNMzPrP9epp_W-DS",
"name" : "Exelon Peach Bottom"
},
"customer_reference" : "124914",
"departure_airport" : {
"code" : "MKE",
"icao" : "KMKE",
"key" : "90",
"name" : "Milwaukee",
"timezone" : "America/Chicago"
},
"last_update" : "2019-05-02T18:02:57.274Z",
"level" : {
"key" : "-LWlODaCFUcejExn41Rr",
"name" : "Next Flight Out"
},
"milestones" : [ {
"airport" : {
"code" : "MKE",
"icao" : "KMKE",
"key" : "90",
"name" : "Milwaukee",
"timezone" : "America/Chicago"
},
"flight_time" : "2019-05-02T12:33:00.000Z",
"status" : {
"key" : "4",
"name" : "completed"
},
"type" : {
"key" : "0",
"name" : "Picked up"
}
}, {
"airport" : {
"code" : "MKE",
"icao" : "KMKE",
"key" : "90",
"name" : "Milwaukee",
"timezone" : "America/Chicago"
},
"flight_time" : "2019-05-02T13:51:00.000Z",
"status" : {
"key" : "4",
"name" : "completed"
},
"type" : {
"key" : "1",
"name" : "Dropped to departure airport"
}
}, {
"airline" : {
"key" : "195",
"name" : "AAL"
},
"airport" : {
"code" : "MKE",
"icao" : "KMKE",
"key" : "90",
"name" : "Milwaukee",
"timezone" : "America/Chicago"
},
"alert_id" : 29624287,
"flight_number" : "4883",
"flight_time" : "2019-05-02T16:28:03.000Z",
"ident" : "PDT4883-1556601968-airline-0144",
"img_url" : "....",
"note" : "arrival ~ PDT4883 arrived at PHL from MKE",
"status" : {
"key" : "4",
"name" : "completed"
},
"type" : {
"key" : "2",
"name" : "Departed Airport"
}
}, {
"airline" : {
"key" : "195",
"name" : "AAL"
},
"airport" : {
"code" : "PHL",
"icao" : "KPHL",
"key" : "108",
"name" : "Philadelphia",
"timezone" : "America/New_York"
},
"alert_id" : 29624287,
"flight_number" : "4883",
"flight_time" : "2019-05-02T18:02:00.000Z",
"note" : "arrival ~ PDT4883 arrived at PHL from MKE",
"status" : {
"key" : "4",
"name" : "completed"
},
"type" : {
"key" : "4",
"name" : "Arrived Airport"
}
}, {
"airport" : {
"code" : "PHL",
"icao" : "KPHL",
"key" : "108",
"name" : "Philadelphia",
"timezone" : "America/New_York"
},
"flight_time" : "2019-05-02T20:00:00.000Z",
"status" : {
"key" : 0,
"name" : "planned"
},
"type" : {
"key" : "6",
"name" : "Out for delivery"
}
}, {
"airport" : {
"code" : "PHL",
"icao" : "KPHL",
"key" : "108",
"name" : "Philadelphia",
"timezone" : "America/New_York"
},
"flight_time" : "2019-05-02T21:30:00.000Z",
"status" : {
"key" : 0,
"name" : "planned"
},
"type" : {
"key" : "7",
"name" : "Delivered"
}
} ],
"pieces" : [ {
"description" : "Valves",
"height" : "11",
"length" : "27",
"qty" : "1",
"units" : {
"key" : "2",
"name" : "IN"
},
"weight" : "50",
"weight_units" : {
"key" : "3",
"name" : "LBS"
},
"width" : "19"
} ],
"protect_time" : "2019-05-02T21:30:00.000Z",
"ready_time" : "2019-05-02T13:00:00.000Z",
"shipper" : {
"key" : "-LdqG3I48m662R7ABa5i",
"name" : "FAIRBANKS MORSE - MKE"
},
"trip_id" : "LFC-155676269",
"trip_notes" : [ {
"date_time" : "2019-05-02T10:29:43.892Z",
"display" : true,
"note" : "delay ~ Philadelphia Intl (PHL) is experiencing all inbound flights being held at their origin due to low clouds"
}, {
"date_time" : "2019-05-02T13:05:52.708Z",
"display" : true,
"note" : "filed ~ PDT4883 (E145) filed to depart MKE # Thu (02 May) 16:24 GMT for PHL # ETA 18:09 GMT (02 May) (UECKR5 SAMPL ADIME GERBS J146 CXR EWC JST BOJID2)"
}, {
"date_time" : "2019-05-02T14:51:00.000Z",
"display" : true,
"note" : "Shipment has been manifested onto flight AA4883 - TC"
}, {
"date_time" : "2019-05-02T16:28:31.325Z",
"display" : true,
"note" : "departure ~ PDT4883 (E145) departed MKE # 16:28 GMT for PHL ETA 18:13 GMT"
}, {
"date_time" : "2019-05-02T18:02:57.274Z",
"display" : true,
"note" : "arrival ~ PDT4883 arrived at PHL from MKE"
} ],
"who_called" : {
"key" : "-Ldsu71CgIJxh1DVTTCP",
"name" : "Demo Customer"
}
}
as you can see, the customer.key and who_called key match, but the user is still not able to see the data. Not sure what I am doing wrong here in the whole rule set. Your help is appreciated!

To allow users see trips added by them use the query-based rules like this;
"trips": {
".indexOn": "archive",
".read": "auth.uid != null && query.orderByChild == 'who_called/key' &&
query.equalTo == root.child('users/' + auth.uid + '/customer/key').val()",
"$trip_id": {
".read": "data.child('who_called').child('key').val() === root.child('users').child(auth.uid).child('customer').child('key').val()",
"notes": {
"$note_id": {
".read": "data.child('display').val() === true"
}
}
}
}
Only restriction with this is you cannot access the user's trips without using the query specified in the rule. That means you can't access the trips like this;
firebase.database().ref('users/USERID/customer/-Ldsu71CgIJxh1DVTTCP)
It has to be done like
firebase.database().ref('trips').orderByChild('who_called/key')
.equalTo('-Ldsu71CgIJxh1DVTTCP')
I've tried it and it works. Hope it helps

Missing matchType at HERE geocoding responses

HERE Geocoding docs say that there's a "MatchType" result ("quality of the location match, either pointAddress or interpolated"). Nevertheless, I'm not receiving it. Here's a curl example request:
$ curl -G https://geocoder.cit.api.here.com/6.2/geocode.json --data-urlencode "app_id=XXX" --data-urlencode "app_code=YYYY" --data-urlencode "searchtext=Plaza España, Valladolid, Spain" | json_pp
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 916 100 916 0 0 4182 0 --:--:-- --:--:-- --:--:-- 4182
{
"Response" : {
"MetaInfo" : {
"Timestamp" : "2018-07-11T12:21:43.726+0000"
},
"View" : [
{
"_type" : "SearchResultsViewType",
"Result" : [
{
"MatchQuality" : {
"Country" : 1,
"City" : 1,
"District" : 1
},
"Location" : {
"DisplayPosition" : {
"Latitude" : 41.65039,
"Longitude" : -4.72559
},
"LocationType" : "point",
"MapView" : {
"BottomRight" : {
"Longitude" : -4.71874,
"Latitude" : 41.64807
},
"TopLeft" : {
"Longitude" : -4.72961,
"Latitude" : 41.65196
}
},
"LocationId" : "NT_zM4WS5CjEFeGmNW-rZko9A",
"Address" : {
"District" : "Plaza España",
"County" : "Valladolid",
"Country" : "ESP",
"City" : "Valladolid",
"State" : "Castilla y León",
"PostalCode" : "47002",
"Label" : "Plaza España, Valladolid, Castilla y León, España",
"AdditionalData" : [
{
"value" : "España",
"key" : "CountryName"
},
{
"key" : "StateName",
"value" : "Castilla y León"
},
{
"value" : "Valladolid",
"key" : "CountyName"
}
]
},
"NavigationPosition" : [
{
"Longitude" : -4.72559,
"Latitude" : 41.65039
}
]
},
"Relevance" : 1,
"MatchLevel" : "district"
}
],
"ViewId" : 0
}
]
}
}

The documentation isn't very clear on this one.
The MatchType indicates the precision of a house number match. If the place has an address you will get a pointAddress or interpolated, but if there is no house number at all MatchType is excluded from the response. The interpolated response happens if the data comes from an address range so the bounding box is calculated for that location, for example units 1-4 are identified as townhouses in a building but you searched for unit 3.
In this example, since it appears like you are searching for a landmark the precision of the number address matchType doesn't come through in the response.

Mongolite and aggregation with $lookup on ObjectId vs character

Working with mongolite v0.9.1 (R) and MongoDB v3.4, I'd like to join two collections, the first one, the parent containing an ObjectId, the second one, the children containing the string value of the parents' ObjectId.
This is the basic syntax :
conParent$aggregate('[
{ "$lookup":
{ "from":"Children",
"localField": "_id",
"foreignField": "parent_id",
"as": "children"
}
}
]')
$lookup seems to take only field name, I've tried this, producing syntaxic errors :
.../...
"foreignField": "{'$oid':'parent_id'}"
.../...
So is there a way to deal with that ?
In the other way, I tried to save the parent's ObjectId in the children in ObjectId format with no luck (I still get a string in MongoDB) :
result <- transform(
computeFunction,
parent_id = sprintf('{"$oid":"%s"}',parent$"_id"))
resultCon <- conout$insert(as.data.frame(result))
Is it possible to store an Id as ObjectId in mongolite?
Note : I'm doing bulk inserts so I can't deal with JSON string manipulations.
Any idea ?
Edit:
Here is an example of the collections i am using :
The Parent collection :
{
"_id" : ObjectId("586f7e8b837abeabb778d2fd"),
"name" : "Root1",
"date" : "2017-01-01",
"value" : 1.0,
"value1" : 10.0,
"value2" : 100.0
},
{
"_id" : ObjectId("586f7ea4837abeabb778d30a"),
"name" : "Root1",
"date" : "2017-01-02",
"value" : 2.0,
"value1" : 20.0,
"value2" : 200.0
}
The Children collection :
{
"_id" : ObjectId("586f7edf837abeabb778d319"),
"name" : "Item1",
"value" : 1.1,
"date" : "2017-01-01",
"parent_id" : "586f7e8b837abeabb778d2fd"
}
{
"_id" : ObjectId("586f7efa837abeabb778d324"),
"name" : "Item2",
"value1" : 11.111111111,
"value2" : 12.222222222,
"date" : "2017-01-01",
"parent_id" : "586f7e8b837abeabb778d2fd"
}
{
"_id" : ObjectId("586f7f15837abeabb778d328"),
"name" : "Item1",
"value" : 2.2,
"date" : "2017-01-02",
"parent_id" : "586f7ea4837abeabb778d30a"
}
{
"_id" : ObjectId("586f7f2b837abeabb778d32e"),
"name" : "Item2",
"value1" : 21.111111111,
"value2" : 22.222222222,
"date" : "2017-01-02",
"parent_id" : "586f7ea4837abeabb778d30a"
}

Could you try :
"foreignField": "_id"
Starting from mongo's website example :
library(mongolite)
library(jsonlite)
a = '[{ "_id" : 1, "item" : 1, "price" : 12, "quantity" : 2 },
{ "_id" : 2, "item" : 2, "price" : 20, "quantity" : 1 },
{ "_id" : 3 }]'
b= '[{ "_id" : 1, "sku" : "abc", "description": "product 1", "instock" : 120 },
{ "_id" : 2, "sku" : "def", "description": "product 2", "instock" : 80 },
{ "_id" : 3, "sku" : "ijk", "description": "product 3", "instock" : 60 },
{ "_id" : 4, "sku" : "jkl", "description": "product 4", "instock" : 70 },
{ "_id" : 5, "sku": null, "description": "Incomplete" },
{ "_id" : 6 }]'
mongo_orders <- mongo(db = "mydb", collection = "orders")
mongo_orders$insert(fromJSON(a))
mongo_inventory <- mongo(db = "mydb", collection = "inventory")
mongo_inventory$insert(fromJSON(b))
df <- mongo_orders$aggregate('[
{
"$lookup":
{
"from": "inventory",
"localField": "item",
"foreignField": "_id",
"as": "inventory_docs"
}
}
]')
str(df)
It works as well when both are set to _id
"localField": "_id",
"foreignField": "_id",

Well I must say that's not possible at all !
Mongilite retrieve _id as character and do not contain any ObjectId implementation.
So...

For loop in R does not work correctly

This is in continuation for my previous question on keyword extraction from a string in R: Extract a specific key word from a string in R
I have written the following code that returns the keyword as i wish:
loc <- t1$place
loc <- gsub('"', '', loc)
loc <- gsub(',', '', loc)
for(i in 1:nrow(t1))
country <- word(loc[i], 19, sep=fixed(" : "))
country <- gsub(' }', '', country)
The for loop does not seem to work correctly. When I use the same code insde for loop with hardcoded numbers as shown below:
country <- word(loc[2], 19, sep=fixed(" : "))
country <- gsub(' }', '', country)
The code seems to work. But when I put it through a loop, it gives me an error
Error in word[loc, "start"] : subscript out of bounds
Please help me where it is going wrong.
class(country)
says it is a character type. Is the way I coded the for loop wrong??
Other details: t1 is the dataframe of my table. I used Import dataset to load my file week_tweet_filtered.csv and used the command:
t1 <- week_tweet_filtered
to load the same in t1 variable. I access the place column of my table using t1$place. Also, the place column contains fields of the format:
{ "id" : "94965b2c45386f87", "name" : "New York", "boundingBoxCoordinates" : [ [ { "longitude" : -79.76259, "latitude" : 40.477383 }, { "longitude" : -79.76259, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 40.477383 } ] ], "countryCode" : "US", "fullName" : "New York, USA", "boundingBoxType" : "Polygon", "URL" : "https://api.twitter.com/1.1/geo/id/94965b2c45386f87.json", "accessLevel" : 0, "placeType" : "admin", "country" : "United States" }

This worked for me
x<-'{ "id" : "94965b2c45386f87", "name" : "New York", "boundingBoxCoordinates" : [ [ { "longitude" : -79.76259, "latitude" : 40.477383 }, { "longitude" : -79.76259, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 40.477383 } ] ], "countryCode" : "US", "fullName" : "New York, USA", "boundingBoxType" : "Polygon", "URL" : "https://api.twitter.com/1.1/geo/id/94965b2c45386f87.json", "accessLevel" : 0, "placeType" : "admin", "country" : "United States" }'
y<-fromJSON(x)
y[['country']]
Notice that the first line encloses the json in single quotes ... I don't know if that is the problem you are having.
If you don't have the quotes try
x<-as.string(t1$place)
I don't really understand how you are getting that not as a string.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract a specific key word from a string in R - r

Related

Unable to get the square brackets for the single individual records while using the json_agg inside the json_compose in Teradata-sql-assistance

Firebase rules, how to allow users to only see their own data

Missing matchType at HERE geocoding responses

Mongolite and aggregation with $lookup on ObjectId vs character

For loop in R does not work correctly

Categories

Resources