R Getting JSON data into dataframe - r

I have this file with JSON formatted data, but need this into a dataframe. Ultimately I would like to plot the geolocations onto a map, but can't seem to get this data into a df first.
json_to_df <- function(file){
file <- lapply(file, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x)
})
df <- do.call("rbind", file)
return(df)
}
But I get only this error:
Error in fromJSON(file) :
STRING_ELT() can only be applied to a 'character vector', not a 'list'
The file structure looks like this (this is only part of the data):
{
"results": [
{
"utc_offset": 7200000,
"venue": {
"country": "nl",
"localized_country_name": "Netherlands",
"city": "Bergen",
"address_1": "16 Notweg",
"name": "FitClub Bergen",
"lon": 4.699218,
"id": 24632049,
"lat": 52.673046,
"repinned": false
},
"headcount": 0,
"distance": 22.46796989440918,
"visibility": "public",
"waitlist_count": 0,
"created": 1467149834000,
"rating": {
"count": 0,
"average": 0
},
"maybe_rsvp_count": 0,
"description": "<p>Start your week off right with a Monday Morning Bootcamp!!! The fresh air and peaceful dunes provide the perfect setting for a total body workout. Whether you are a beginner with brand spankin' new health goals and in need of some direction, or training for a race or competition, we're the trainers for you!!! See you at 8:50 for sign-in!</p>",
"event_url": "https://www.meetup.com/FitClubBergen/events/234936736/",
"yes_rsvp_count": 3,
"duration": 3600000,
"name": "Free Bootcamp in the Bergen Dunes",
"id": "glzqvlyvnbgc",
"time": 1477292400000,
"updated": 1477297999000,
"group": {
"join_mode": "open",
"created": 1441658286000,
"name": "FitClub Bergen Free Bootcamp in the Dunes",
"group_lon": 4.710000038146973,
"id": 18908751,
"urlname": "FitClubBergen",
"group_lat": 52.66999816894531,
"who": "FitClubbers"
},
"status": "past"
},
{
"utc_offset": 7200000,
"venue": {
"country": "nl",
"localized_country_name": "Netherlands",
"city": "Bergen",
"address_1": "16 Notweg",
"name": "FitClub Bergen",
"lon": 4.699218,
"id": 24632049,
"lat": 52.673046,
"repinned": false
},
"headcount": 0,
"distance": 22.46796989440918,
"visibility": "public",
"waitlist_count": 0,
"created": 1467149834000,
"rating": {
"count": 0,
"average": 0
},
"maybe_rsvp_count": 0,
"description": "<p>Start your week off right with a Monday Morning Bootcamp!!! The fresh air and peaceful dunes provide the perfect setting for a total body workout. Whether you are a beginner with brand spankin' new health goals and in need of some direction, or training for a race or competition, we're the trainers for you!!! See you at 8:50 for sign-in!</p> <p>ALWAYS FREE</p> <p>FOR ALL LEVELS OF FITNESS</p> <p>BRING: water bottle and energy</p>",
"event_url": "https://www.meetup.com/FitClubBergen/events/234936737/",
"yes_rsvp_count": 3,
"name": "Monday Morning Bootcamp in the Bergen Dunes",
"id": "flzqvlyvnbgc",
"time": 1477292400000,
"updated": 1477303926000,
"group": {
"join_mode": "open",
"created": 1441658286000,
"name": "FitClub Bergen Free Bootcamp in the Dunes",
"group_lon": 4.710000038146973,
"id": 18908751,
"urlname": "FitClubBergen",
"group_lat": 52.66999816894531,
"who": "FitClubbers"
},
"status": "past"
},
{
"utc_offset": 7200000,
"venue": {
"country": "nl",
"localized_country_name": "Netherlands",
"city": "Amsterdam",
"phone": "020 4275777",
"address_1": "Dijksgracht 2",
"address_2": "1019 BS ",
"name": "Klimmuur Central",
"lon": 4.91284,
"id": 1143381,
"lat": 52.376626,
"repinned": false
},
"headcount": 0,
"distance": 1.0689502954483032,
"visibility": "public",
"waitlist_count": 0,
"created": 1477215767000,
"rating": {
"count": 0,
"average": 0
},
"maybe_rsvp_count": 0,
"description": "<p>Climbing Right After Work: RAW.<br/>Quiet hall, pretty much every rope available; no rope chasing necessary. And.. still some time left to do other things later that evening. Take you gear and an extra sandwich to work and join me afterwards pulling some plastic.<br/>Some notes:<br/>- This events starts #17:00. If you can't make it that early, please comment the time you can.<br/>- Please fill in your belaying skills in your profile. If you've never climbed before or don't have belaying skills: follow an introduction course a the gym first! Safety above all!</p>",
"event_url": "https://www.meetup.com/The-Amsterdam-indoor-rockclimbing/events/235054729/",
"yes_rsvp_count": 3,
"name": "Monday's RAW Climb",
"id": "235054729",
"time": 1477321200000,
"updated": 1477334279000,
"group": {
"join_mode": "approval",
"created": 1358348565000,
"name": "The Amsterdam indoor rockclimbing",
"group_lon": 4.889999866485596,
"id": 6689952,
"urlname": "The-Amsterdam-indoor-rockclimbing",
"group_lat": 52.369998931884766,
"who": "Climbers"
},
"status": "past"
},
{
"utc_offset": 7200000,
"venue": {
"country": "nl",
"localized_country_name": "Netherlands",
"city": "Amstelveen",
"address_1": "Langs de Akker 3",
"name": "Emergohal",
"lon": 4.87967,
"id": 23816542,
"lat": 52.290199,
"repinned": false
},
"rsvp_limit": 12,
"headcount": 0,
"distance": 5.541957378387451,
"visibility": "public",
"waitlist_count": 0,
"created": 1474452073000,
"fee": {
"amount": 5.5,
"accepts": "cash",
"description": "per person",
"currency": "EUR",
"label": "price",
"required": "0"
},
"rating": {
"count": 0,
"average": 0
},
"maybe_rsvp_count": 0,
"description": "<p>We will play the Whole Season indoor soccer on Mondays from 18:00 - 19:00 starting 5 September until May 2017 in the Emergohal Amstelveen.</p> <p>Preferred payment is with Paypal EUR 5.50 (in advance)<br/>If this is not possible you may pay cash but then I will ask EUR 6,-<br/>(Please have the exact cash with you)</p> <p>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</p> <p>A couple of Unisys (ex)colleagues and football lovers are playing every Monday in the Emergohal Amstelveen at 6PM on a reasonable good level. We are looking for a compact group of players who are willing/able to play (almost) every Monday playing 5v5 (or 6v6).<br/>We are playing with the FIFA Futsal rules in mind:<br/>http://www.fifa.com/mm/document/footballdevelopment/refereeing/51/44/50/lawsofthegamefutsal2014_15_eneu_neutral.pdf</p> <p>The Emergohal has dressing rooms and a nice bar for after the game.</p> <p>Hope to see you on Mondays</p> <p>Cheers Jeroen</p> <p>For questions you may call me on[masked], send a text message (SMS) or leave a message on this meetup group.</p>",
"event_url": "https://www.meetup.com/Futsal_Emergohal_Monday_18-00/events/234290812/",
"yes_rsvp_count": 11,
"duration": 4500000,
"name": "Futsal",
"id": "234290812",
"time": 1477323900000,
"updated": 1477330559000,
"group": {
"join_mode": "approval",
"created": 1474445066000,
"name": "Futsal_Emergohal_Monday_18.00",
"group_lon": 4.860000133514404,
"id": 20450096,
"urlname": "Futsal_Emergohal_Monday_18-00",
"group_lat": 52.31999969482422,
"who": "Players"
},
"status": "past"
}],
"meta": {
"next": "https://api.meetup.com/2/open_events?and_text=False&offset=1&city=Amsterdam&sign=True&format=json&lon=4.88999986649&limited_events=False&photo-host=public&page=20&time=-24m%2C&radius=25.0&lat=52.3699989319&status=past&desc=False",
"method": "OpenEvents",
"total_count": 643,
"link": "https://api.meetup.com/2/open_events",
"count": 20,
"description": "Searches for recent and upcoming public events hosted by Meetup groups. Its search window is the past one month through the next three months, and is subject to change. Open Events is optimized to search for current events by location, category, topic, or text, and only lists Meetups that have **3 or more RSVPs**. The number or results returned with each request is not guaranteed to be the same as the page size due to secondary filtering. If you're looking for a particular event or events within a particular group, use the standard [Events](/meetup_api/docs/2/events/) method.",
"lon": ,
"title": "Meetup Open Events v2",
"url": "",
"signed_url": "{signed_url}",
"id": "",
"updated": 1479988687055,
"lat":
}
}
So I was wondering how I would put this in a dataframe or csv even to be able to extract geolocations later?

There is no need to write a parser yourself, there are a number of packages that can read JSON formatted data. The one I use, and #hrbrmstr linked, is jsonlite. This package provides a fromJSON function which can parse JSON into a data.frame:
fromJSON('file.json', flatten = TRUE)
note that the flatten argument here ensures the json is flattended into a nice data.frame.

Related

Getting a specific item in a sub array and selecting one value from it

I want to get the boardgame rank (value) from this nested array in Cosmos DB.
{
"name": "Alpha",
"statistics": {
"numberOfUserRatingVotes": 4155,
"averageRating": 7.26201,
"baysianAverageRating": 6.71377,
"ratingStandardDeviation": 1.18993,
"ratingMedian": 0,
"rankings": [
{
"id": 1,
"name": "boardgame",
"friendlyName": "Board Game Rank",
"type": "subtype",
"value": 746
},
{
"id": 4664,
"name": "wargames",
"friendlyName": "War Game Rank",
"type": "family",
"value": 140
},
{
"id": 5497,
"name": "strategygames",
"friendlyName": "Strategy Game Rank",
"type": "family",
"value": 434
}
],
"numberOfComments": 1067,
"weight": 2.3386,
"numberOfWeightVotes": 127
},
}
So I want:
{
"name": "Alpha",
"rank": 746
}
Using this query:
SELECT g.name, r
FROM Games g
JOIN r IN g.statistics.rankings
WHERE r.name = 'boardgame'
I get this (so close!):
{
"name": "Alpha",
"r": {
"id": 1,
"name": "boardgame",
"friendlyName": "Board Game Rank",
"type": "subtype",
"value": 746
}
},
But extending the query to this:
SELECT g.name, r.value as rank
FROM Games g
JOIN r IN g.statistics.rankings
WHERE r.name = 'boardgame'
I get this error:
Failed to query item for container Games:
Message: {"errors":[{"severity":"Error","location":{"start":21,"end":26},"code":"SC1001","message":"Syntax error, incorrect syntax near 'value'."}]}
ActivityId: 0a0cb394-2fc3-4a67-b54c-4d02085b6878, Microsoft.Azure.Documents.Common/2.14.0
I don't understand why this doesn't work? I don't understand what the syntax error is. I tried adding square braces but that didn't help. Can some help me understand why I get this error and also how to achieve the output I'm looking for?
This should work,
SELECT g.name, r["value"] as rank
FROM Games g
JOIN r IN g.statistics.rankings
WHERE r.name = 'boardgame'

Missing values in JSON in R function

I'm using the following function to extract the retweet ids from a tweet JSON file (to build retweet cascades). Here is a part of the code in R:
parse_raw_tweets_to_cascades <- function(path, batch = 1000, cores = 1, output_path = NULL, keep_user = F, keep_absolute_time = F, keep_text = F, keep_retweet_count = F, progress = T, return_as_list = T, save_temp = F, api_version=2) {
check_required_packages(c('jsonlite', 'data.table', 'bit64'))
library(data.table)
# a helper function
zero_if_null <- function(count) {
ifelse(is.null(count), 0, count)
}
if (api_version == 2) {
parse_tweet <- function(tweet, keep_text = F) {
tryCatch({
json_tweet <- jsonlite::fromJSON(tweet)
if (is.null(json_tweet$includes) || is.null(json_tweet$includes$users)) {
stop('The author information is required!')
}
id <- json_tweet$data$id
magnitude <-zero_if_null(json_tweet$includes$users$public_metrics$followers_count)
user_id <- json_tweet$data$author_id
username <- json_tweet$user$username
retweet_id <- NA
#print(typeof(id)) #character
#cat(sprintf('id is: %s \n', id))
cat(sprintf('magnitude is: %d \n', magnitude))
if (keep_text) text <- json_tweet$data$text
if (!is.null(json_tweet$data$referenced_tweets) && json_tweet$data$referenced_tweets$type == 'retweeted') {
#if this tweet is a retweet, get original tweet's information
retweet_id <- json_tweet$data$referenced_tweets$id
cat("retweet_id: ", retweet_id, "\n")
if (keep_text) text <- NA
}
cat("Monaaaaa", "\n")
res <- list(id = id, magnitude = magnitude, user_id = user_id,
username = username, retweet_id = retweet_id)
if (keep_text) res[['text']] <- text
res
},
.... # warning for error processing json
)
}
}
This is the error I receive:
Error processing json: Error in if
(!is.null(json_tweet$data$referenced_tweets) &&
json_tweet$data$referenced_tweets$type == : missing value where
TRUE/FALSE needed
Question:
I don't know why the retweet ids are null.
I checked my json file and searched for retweeted. I see the path (json_tweet$data$referenced_tweets$type) is correct.
NOTE: The above function is part of the evently library. The package works fine with their sample data provided on Github which is in Twitter API V1 format, but it doesn't work with my JSON file which is in V2.
Here is a small part of my data (part of JSON for retweets of one user):
{"data": [{"referenced_tweets": [{"type": "retweeted", "id": "1253739069273710594"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}, {"start": 18, "end": 24, "username": "AC360", "id": "227837742"}], "annotations": [{"start": 25, "end": 39, "probability": 0.7096, "type": "Person", "normalized_text": "President Trump"}], "urls": [{"start": 98, "end": 121, "url": "", "expanded_url": "", "display_url": "", "images": [{"url": "", "width": 144, "height": 144}, {"url": "", "width": 144, "height": 144}], "status": 200, "title": "Ultraviolet Irradiation of Blood: \u201cThe Cure That Time Forgot\u201d?", "description": "Ultraviolet blood irradiation (UBI) was extensively used in the 1940s and 1950s to treat many diseases including septicemia, pneumonia, tuberculosis, arthritis, asthma and even poliomyelitis. The early studies were carried out by several physicians in ...", "unwound_url": ""}]}, "public_metrics": {"retweet_count": 3, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253834847258370048", "context_annotations": [{"domain": {"id": "3", "name": "TV Shows", "description": "Television shows from around the world"}, "entity": {"id": "10000271509", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1249271407508242432", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1249277031881138178", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1250891078401552385", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1249271407508242432", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1249277031881138178", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1250891078401552385", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}], "created_at": "2020-04-24T23:54:57.000Z", "author_id": "1890848160", "text": "RT #warriors_mom: #AC360 President Trump was referring to this well-documented medical treatment: ", "source": "Twitter for iPhone", "conversation_id": "1253834847258370048"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253452455540666371"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 24, "end": 27, "probability": 0.691, "type": "Place", "normalized_text": "U.S."}]}, "public_metrics": {"retweet_count": 5, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253828982413410307", "context_annotations": [{"domain": {"id": "123", "name": "Ongoing News Story", "description": "Ongoing News Stories like 'Brexit'"}, "entity": {"id": "1220701888179359745", "name": "COVID-19"}}], "created_at": "2020-04-24T23:31:39.000Z", "author_id": "863857568", "text": "RT #warriors_mom: Major U.S. credit-card issuers begin lowering customer spending limits as coronavirus pandemic shutdowns leave millions j\u2026", "source": "Twitter for iPhone", "conversation_id": "1253828982413410307"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253815956662620163"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}, {"start": 18, "end": 32, "username": "RealMattCouch", "id": "601535938"}], "annotations": [{"start": 33, "end": 41, "probability": 0.8682, "type": "Person", "normalized_text": "Seth Rich"}]}, "public_metrics": {"retweet_count": 2, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253816055161651202", "created_at": "2020-04-24T22:40:16.000Z", "author_id": "1065308069645754368", "text": "RT #warriors_mom: #RealMattCouch Seth Rich", "source": "Twitter for Android", "conversation_id": "1253816055161651202"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253811776103333890"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 63, "end": 67, "probability": 0.9967, "type": "Person", "normalized_text": "Trump"}, {"start": 69, "end": 74, "probability": 0.9523, "type": "Place", "normalized_text": "Russia"}, {"start": 87, "end": 95, "probability": 0.8678, "type": "Organization", "normalized_text": "Alfa Bank"}]}, "public_metrics": {"retweet_count": 1, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253812582806216704", "context_annotations": [{"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "30", "name": "Entities [Entity Service]", "description": "Entity Service top level domain, every item that is in Entity Service should be in this domain"}, "entity": {"id": "848920371311001600", "name": "Technology", "description": "Technology and computing"}}, {"domain": {"id": "30", "name": "Entities [Entity Service]", "description": "Entity Service top level domain, every item that is in Entity Service should be in this domain"}, "entity": {"id": "898650876658634752", "name": "Cybersecurity", "description": "Cybersecurity"}}], "created_at": "2020-04-24T22:26:29.000Z", "author_id": "987931361963950080", "text": "RT #warriors_mom: Top cyber security team finds no evidence of Trump-Russia chatter on Alfa Bank server: A cyber security report debunks th\u2026", "source": "Twitter for Android", "conversation_id": "1253812582806216704"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253461793168674821"}], "attachments": {"media_keys": ["3_1253461775980339201", "3_1253461780254392326", "3_1253461784981377024", "3_1253461788408102912"]}, "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "hashtags": [{"start": 23, "end": 32, "tag": "FakeNews"}], "urls": [{"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461775980339201"}, {"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461780254392326"}, {"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461784981377024"}, {"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461788408102912"}]}, "public_metrics": {"retweet_count": 6, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253787731517476866", "created_at": "2020-04-24T20:47:44.000Z", "author_id": "461486301", "text": "RT #warriors_mom: Dear #FakeNews Media... seriously? \ud83d\ude44\ud83e\udd23 ", "source": "Twitter Web App", "conversation_id": "1253787731517476866"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253348491805577216"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 18, "end": 23, "probability": 0.868, "type": "Organization", "normalized_text": "Amazon"}]}, "public_metrics": {"retweet_count": 2, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253787648180789253", "context_annotations": [{"domain": {"id": "45", "name": "Brand Vertical", "description": "Top level entities that describe a Brands industry"}, "entity": {"id": "781974596706635776", "name": "Retail"}}, {"domain": {"id": "46", "name": "Brand Category", "description": "Categories within Brand Verticals that narrow down the scope of Brands"}, "entity": {"id": "783335558466506752", "name": "Online"}}, {"domain": {"id": "47", "name": "Brand", "description": "Brands and Companies"}, "entity": {"id": "10026792024", "name": "Amazon"}}], "created_at": "2020-04-24T20:47:24.000Z", "author_id": "3003997593", "text": "RT #warriors_mom: Amazon Scooped Up Data From Its Own Sellers to Launch Competing Products: Contrary to assertions to Congress, employees o\u2026", "source": "Twitter for iPhone", "conversation_id": "1253787648180789253"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253716749817729025"}], ....}

HERE geocode API latinized address

I've noticed quite severe inconsistency in result provided by HERE /geocode API endpoint. Some address parts have original special characters like in "Łódź" city and some don't.
When doing following request:
https://geocoder.cit.api.here.com/6.2/geocode.json?lon=19.4734111&lat=51.73771300000001&language=sv-SE&searchtext=sienkiewicza lodz&result_types=address,place&cs=pds&additionaldata=Country2,true
We get the result which is inconsistent
"Address": {
"Label": "ulica Henryka Sienkiewicza, 90-009 Lodz, Polen",
"Country": "POL",
"State": "Woj. Łódzkie",
"County": "Lodz",
"City": "Lodz",
"District": "Lodz",
"Subdistrict": "Śródmieście",
"Street": "ulica Henryka Sienkiewicza",
"PostalCode": "90-009",
"AdditionalData": [
{
"value": "PL",
"key": "Country2"
},
{
"value": "Polen",
"key": "CountryName"
},
{
"value": "Woj. Łódzkie",
"key": "StateName"
},
{
"value": "Lodz",
"key": "CountyName"
}
]
}
As we can see value for state contains polish characters "Woj. Łódzkie", but city is "Lodz" which is not ok.
All results should contain original letters like "Łódź". In other words such results shouldn't be latinized.
Thank you
When using a language code different than the one of the original data, like in your case sv-SE for data in Poland, you get exonyms "where available", which is why you may get a mix of alphabets.
If you remove the language parameter from the query, or set it to Polish explicitely with language=pl-PL, you get the following response for your example:
"Address": {
"Label": "ulica Henryka Sienkiewicza, 90-057 Łódź, Polska",
"Country": "POL",
"State": "Woj. Łódzkie",
"County": "Łódź",
"City": "Łódź",
"District": "Łódź",
"Subdistrict": "Śródmieście",
"Street": "ulica Henryka Sienkiewicza",
"PostalCode": "90-057",
"AdditionalData": [
{
"value": "PL",
"key": "Country2"
},
{
"value": "Polska",
"key": "CountryName"
},
{
"value": "Woj. Łódzkie",
"key": "StateName"
},
{
"value": "Łódź",
"key": "CountyName"
}
]
}

How can I get all cities within a given radius of a given city from the HERE API?

I need to be able to give a City/State or Postal Code and an Integer for the radius and return all cities/postal codes within the given radius using HERE but the Documentation seems to be unclear.
https://geocoder.api.here.com/6.2/reversegeocode.json?app_id=xxx&app_code=xxxx
The return is:
<ns2:Error xmlns:ns2="http://www.navteq.com/lbsp/Errors/1" type="PermissionError" subtype="InvalidCredentials">
<Details>invalid credentials for xxxxxxx</Details>
</ns2:Error>
But when I use
https://geocoder.api.here.com/6.2/geocode.json?app_id=xxx&app_code=xxxx
I get data back.
It is worth mentioning I am on the Freemium Package for the REST API
So first why can't I get data back from the Reverse Geo Code API?
And what is the appropriate string to accomplish the above?
Update:
Leaving the rest here in case someone else runs into this. To use Reverse Geo Code the API Route is actually
https://reverse.geocoder.api.here.com/6.2/reversegeocode.json
Though I still need some help on how to get the Radius Data
Update 2:
https://reverse.geocoder.api.here.com/6.2/reversegeocode.json?app_id=x&app_code=x&level=city&mode=retrieveAreas&prox=52.5309,13.3847,80467.2
Returns:
{
"Response": {
"MetaInfo": {
"Timestamp": "2019-04-27T17:47:41.043+0000"
},
"View": [
{
"_type": "SearchResultsViewType",
"ViewId": 0,
"Result": [
{
"Relevance": 1,
"Distance": 0,
"Direction": 0,
"MatchLevel": "district",
"MatchQuality": {
"Country": 1,
"State": 1,
"County": 1,
"City": 1,
"District": 1,
"PostalCode": 1
},
"Location": {
"LocationId": "NT_0ES-GaH3lZzJCuLQBrdw7C",
"LocationType": "point",
"DisplayPosition": {
"Latitude": 52.5309,
"Longitude": 13.3847
},
"MapView": {
"TopLeft": {
"Latitude": 52.54063,
"Longitude": 13.36566
},
"BottomRight": {
"Latitude": 52.50407,
"Longitude": 13.42964
}
},
"Address": {
"Label": "Mitte, Berlin, Deutschland",
"Country": "DEU",
"State": "Berlin",
"County": "Berlin",
"City": "Berlin",
"District": "Mitte",
"PostalCode": "10178",
"AdditionalData": [
{
"value": "Deutschland",
"key": "CountryName"
},
{
"value": "Berlin",
"key": "StateName"
},
{
"value": "Berlin",
"key": "CountyName"
}
]
},
"MapReference": {
"ReferenceId": "53500282",
"SideOfStreet": "neither",
"CountryId": "20147700",
"StateId": "20187401",
"CountyId": "20187402",
"CityId": "20187403",
"DistrictId": "20187417"
}
}
}
]
}
]
}
}
The documentation says to put in coordinates and a radius, that's 50 miles, so not sure why I am only receiving 1 city, Berlin, in the response.
Update 3:
https://reverse.geocoder.api.here.com/6.2/multi-reversegeocode.json?app_id=x&app_code=x&level=city&mode=retrieveAreas&prox=60.5544,-151.2583,80000
Tried with multi-reversegeocode
Return is: {}
Get cities in a proximity radius
Use the Reverse Geocode endpoint with the following parameters
https://reverse.geocoder.api.here.com/6.2/reversegeocode.json?
prox=52.5309,13.3847,80500 /* Note: 80.5 km around Berlin */
&app_id=YOUR_APP_ID
&app_code=YOUR_APP_CODE
&mode=retrieveAreas
&level=city
&gen=9
In short, your example in "Update 2" is correct, besides that it is missing the gen query parameter. Indeed, as per the API Reference, the query parameter level is valid only in combination with gen=2 or higher.

Want to output two values from each line of a huge JSONL file in R Studio

I'm walking through a huge JSONL file (100G, 100M rows) line by line extracting two key values from the data. Ideally, I want this written to a file with two columns. I'm a real beginner here.
Here is an example of the JSON on each row of the file referenced on my C drive:
https://api.unpaywall.org/v2/10.6118/jmm.2017.23.2.135?email=YOUR_EMAIL
or:
{
"best_oa_location": {
"evidence": "open (via page says license)",
"host_type": "publisher",
"is_best": true,
"license": "cc-by-nc",
"pmh_id": null,
"updated": "2018-02-14T11:18:21.978814",
"url": "FAKEURL",
"url_for_landing_page": "URL2",
"url_for_pdf": "URL4",
"version": "publishedVersion"
},
"data_standard": 2,
"doi": "10.6118/jmm.2017.23.2.135",
"doi_url": "URL5",
"genre": "journal-article",
"is_oa": true,
"journal_is_in_doaj": false,
"journal_is_oa": false,
"journal_issns": "2288-6478,2288-6761",
"journal_name": "Journal of Menopausal Medicine",
"oa_locations": [
{
"evidence": "open (via page says license)",
"host_type": "publisher",
"is_best": true,
"license": "cc-by-nc",
"pmh_id": null,
"updated": "2018-02-14T11:18:21.978814",
"url": "URL6",
"url_for_landing_page": "hURL7": "hURL8",
"version": "publishedVersion"
},
{
"evidence": "oa repository (via OAI-PMH doi match)",
"host_type": "repository",
"is_best": false,
"license": "cc-by-nc",
"pmh_id": "oai:pubmedcentral.nih.gov:5606912",
"updated": "2017-10-21T18:12:39.724143",
"url": "URL9",
"url_for_landing_page": "URL11",
"url_for_pdf": "URL12",
"version": "publishedVersion"
},
{
"evidence": "oa repository (via pmcid lookup)",
"host_type": "repository",
"is_best": false,
"license": null,
"pmh_id": null,
"updated": "2018-10-11T01:49:34.280389",
"url": "URL13",
"url_for_landing_page": "URL14",
"url_for_pdf": null,
"version": "publishedVersion"
}
],
"published_date": "2017-01-01",
"publisher": "The Korean Society of Menopause (KAMJE)",
"title": "A Case of Granular Cell Tumor of the Clitoris in a Postmenopausal Woman",
"updated": "2018-06-20T20:31:37.509896",
"year": 2017,
"z_authors": [
{
"affiliation": [
{
"name": "Department of Obstetrics and Gynecology, Soonchunhyang University Cheonan Hospital, University of Soonchunhyang College of Medicine, Cheonan, Korea."
}
],
"family": "Min",
"given": "Ji-Won"
},
{
"affiliation": [
{
"name": "Department of Obstetrics and Gynecology, Soonchunhyang University Cheonan Hospital, University of Soonchunhyang College of Medicine, Cheonan, Korea."
}
],
"family": "Kim",
"given": "Yun-Sook"
}
]
}
Here's the code i'm using/wrote:
library (magrittr)
library (jqr)
con = file("C:/users/ME/desktop/miniunpaywall.jsonl", "r");
while ( length(line <- readLines(con, n = -1)) > 0) {
write.table( line %>% jq ('.doi,.best_oa_location.license'), file='test.txt', quote=FALSE, row.names=FALSE);}
What results from this is a line of text for each row of JSON that looks like this:
"10.1016/j.ijcard.2018.10.014,CC-BY"
This is effectively:
"[DOI],[LICENSE]"
I want ideally to have the output be:
[DOI] tab [LICENSE]
I believe my problem is that I'm writing the values as a string into a single column when i say:
write.table( line %>% jq ('.doi,.best_oa_location.license')
I havent figured out a way to remove the quotes i'm getting around each line in my file or how i could separate the two values with a tab. I feel I'm pretty close. Help!

Resources