Missing values in JSON in R function - r

I'm using the following function to extract the retweet ids from a tweet JSON file (to build retweet cascades). Here is a part of the code in R:
parse_raw_tweets_to_cascades <- function(path, batch = 1000, cores = 1, output_path = NULL, keep_user = F, keep_absolute_time = F, keep_text = F, keep_retweet_count = F, progress = T, return_as_list = T, save_temp = F, api_version=2) {
check_required_packages(c('jsonlite', 'data.table', 'bit64'))
library(data.table)
# a helper function
zero_if_null <- function(count) {
ifelse(is.null(count), 0, count)
}
if (api_version == 2) {
parse_tweet <- function(tweet, keep_text = F) {
tryCatch({
json_tweet <- jsonlite::fromJSON(tweet)
if (is.null(json_tweet$includes) || is.null(json_tweet$includes$users)) {
stop('The author information is required!')
}
id <- json_tweet$data$id
magnitude <-zero_if_null(json_tweet$includes$users$public_metrics$followers_count)
user_id <- json_tweet$data$author_id
username <- json_tweet$user$username
retweet_id <- NA
#print(typeof(id)) #character
#cat(sprintf('id is: %s \n', id))
cat(sprintf('magnitude is: %d \n', magnitude))
if (keep_text) text <- json_tweet$data$text
if (!is.null(json_tweet$data$referenced_tweets) && json_tweet$data$referenced_tweets$type == 'retweeted') {
#if this tweet is a retweet, get original tweet's information
retweet_id <- json_tweet$data$referenced_tweets$id
cat("retweet_id: ", retweet_id, "\n")
if (keep_text) text <- NA
}
cat("Monaaaaa", "\n")
res <- list(id = id, magnitude = magnitude, user_id = user_id,
username = username, retweet_id = retweet_id)
if (keep_text) res[['text']] <- text
res
},
.... # warning for error processing json
)
}
}
This is the error I receive:
Error processing json: Error in if
(!is.null(json_tweet$data$referenced_tweets) &&
json_tweet$data$referenced_tweets$type == : missing value where
TRUE/FALSE needed
Question:
I don't know why the retweet ids are null.
I checked my json file and searched for retweeted. I see the path (json_tweet$data$referenced_tweets$type) is correct.
NOTE: The above function is part of the evently library. The package works fine with their sample data provided on Github which is in Twitter API V1 format, but it doesn't work with my JSON file which is in V2.
Here is a small part of my data (part of JSON for retweets of one user):
{"data": [{"referenced_tweets": [{"type": "retweeted", "id": "1253739069273710594"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}, {"start": 18, "end": 24, "username": "AC360", "id": "227837742"}], "annotations": [{"start": 25, "end": 39, "probability": 0.7096, "type": "Person", "normalized_text": "President Trump"}], "urls": [{"start": 98, "end": 121, "url": "", "expanded_url": "", "display_url": "", "images": [{"url": "", "width": 144, "height": 144}, {"url": "", "width": 144, "height": 144}], "status": 200, "title": "Ultraviolet Irradiation of Blood: \u201cThe Cure That Time Forgot\u201d?", "description": "Ultraviolet blood irradiation (UBI) was extensively used in the 1940s and 1950s to treat many diseases including septicemia, pneumonia, tuberculosis, arthritis, asthma and even poliomyelitis. The early studies were carried out by several physicians in ...", "unwound_url": ""}]}, "public_metrics": {"retweet_count": 3, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253834847258370048", "context_annotations": [{"domain": {"id": "3", "name": "TV Shows", "description": "Television shows from around the world"}, "entity": {"id": "10000271509", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1249271407508242432", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1249277031881138178", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1250891078401552385", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1249271407508242432", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1249277031881138178", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1250891078401552385", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}], "created_at": "2020-04-24T23:54:57.000Z", "author_id": "1890848160", "text": "RT #warriors_mom: #AC360 President Trump was referring to this well-documented medical treatment: ", "source": "Twitter for iPhone", "conversation_id": "1253834847258370048"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253452455540666371"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 24, "end": 27, "probability": 0.691, "type": "Place", "normalized_text": "U.S."}]}, "public_metrics": {"retweet_count": 5, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253828982413410307", "context_annotations": [{"domain": {"id": "123", "name": "Ongoing News Story", "description": "Ongoing News Stories like 'Brexit'"}, "entity": {"id": "1220701888179359745", "name": "COVID-19"}}], "created_at": "2020-04-24T23:31:39.000Z", "author_id": "863857568", "text": "RT #warriors_mom: Major U.S. credit-card issuers begin lowering customer spending limits as coronavirus pandemic shutdowns leave millions j\u2026", "source": "Twitter for iPhone", "conversation_id": "1253828982413410307"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253815956662620163"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}, {"start": 18, "end": 32, "username": "RealMattCouch", "id": "601535938"}], "annotations": [{"start": 33, "end": 41, "probability": 0.8682, "type": "Person", "normalized_text": "Seth Rich"}]}, "public_metrics": {"retweet_count": 2, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253816055161651202", "created_at": "2020-04-24T22:40:16.000Z", "author_id": "1065308069645754368", "text": "RT #warriors_mom: #RealMattCouch Seth Rich", "source": "Twitter for Android", "conversation_id": "1253816055161651202"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253811776103333890"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 63, "end": 67, "probability": 0.9967, "type": "Person", "normalized_text": "Trump"}, {"start": 69, "end": 74, "probability": 0.9523, "type": "Place", "normalized_text": "Russia"}, {"start": 87, "end": 95, "probability": 0.8678, "type": "Organization", "normalized_text": "Alfa Bank"}]}, "public_metrics": {"retweet_count": 1, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253812582806216704", "context_annotations": [{"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "30", "name": "Entities [Entity Service]", "description": "Entity Service top level domain, every item that is in Entity Service should be in this domain"}, "entity": {"id": "848920371311001600", "name": "Technology", "description": "Technology and computing"}}, {"domain": {"id": "30", "name": "Entities [Entity Service]", "description": "Entity Service top level domain, every item that is in Entity Service should be in this domain"}, "entity": {"id": "898650876658634752", "name": "Cybersecurity", "description": "Cybersecurity"}}], "created_at": "2020-04-24T22:26:29.000Z", "author_id": "987931361963950080", "text": "RT #warriors_mom: Top cyber security team finds no evidence of Trump-Russia chatter on Alfa Bank server: A cyber security report debunks th\u2026", "source": "Twitter for Android", "conversation_id": "1253812582806216704"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253461793168674821"}], "attachments": {"media_keys": ["3_1253461775980339201", "3_1253461780254392326", "3_1253461784981377024", "3_1253461788408102912"]}, "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "hashtags": [{"start": 23, "end": 32, "tag": "FakeNews"}], "urls": [{"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461775980339201"}, {"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461780254392326"}, {"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461784981377024"}, {"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461788408102912"}]}, "public_metrics": {"retweet_count": 6, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253787731517476866", "created_at": "2020-04-24T20:47:44.000Z", "author_id": "461486301", "text": "RT #warriors_mom: Dear #FakeNews Media... seriously? \ud83d\ude44\ud83e\udd23 ", "source": "Twitter Web App", "conversation_id": "1253787731517476866"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253348491805577216"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 18, "end": 23, "probability": 0.868, "type": "Organization", "normalized_text": "Amazon"}]}, "public_metrics": {"retweet_count": 2, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253787648180789253", "context_annotations": [{"domain": {"id": "45", "name": "Brand Vertical", "description": "Top level entities that describe a Brands industry"}, "entity": {"id": "781974596706635776", "name": "Retail"}}, {"domain": {"id": "46", "name": "Brand Category", "description": "Categories within Brand Verticals that narrow down the scope of Brands"}, "entity": {"id": "783335558466506752", "name": "Online"}}, {"domain": {"id": "47", "name": "Brand", "description": "Brands and Companies"}, "entity": {"id": "10026792024", "name": "Amazon"}}], "created_at": "2020-04-24T20:47:24.000Z", "author_id": "3003997593", "text": "RT #warriors_mom: Amazon Scooped Up Data From Its Own Sellers to Launch Competing Products: Contrary to assertions to Congress, employees o\u2026", "source": "Twitter for iPhone", "conversation_id": "1253787648180789253"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253716749817729025"}], ....}

Related

Getting a specific item in a sub array and selecting one value from it

I want to get the boardgame rank (value) from this nested array in Cosmos DB.
{
"name": "Alpha",
"statistics": {
"numberOfUserRatingVotes": 4155,
"averageRating": 7.26201,
"baysianAverageRating": 6.71377,
"ratingStandardDeviation": 1.18993,
"ratingMedian": 0,
"rankings": [
{
"id": 1,
"name": "boardgame",
"friendlyName": "Board Game Rank",
"type": "subtype",
"value": 746
},
{
"id": 4664,
"name": "wargames",
"friendlyName": "War Game Rank",
"type": "family",
"value": 140
},
{
"id": 5497,
"name": "strategygames",
"friendlyName": "Strategy Game Rank",
"type": "family",
"value": 434
}
],
"numberOfComments": 1067,
"weight": 2.3386,
"numberOfWeightVotes": 127
},
}
So I want:
{
"name": "Alpha",
"rank": 746
}
Using this query:
SELECT g.name, r
FROM Games g
JOIN r IN g.statistics.rankings
WHERE r.name = 'boardgame'
I get this (so close!):
{
"name": "Alpha",
"r": {
"id": 1,
"name": "boardgame",
"friendlyName": "Board Game Rank",
"type": "subtype",
"value": 746
}
},
But extending the query to this:
SELECT g.name, r.value as rank
FROM Games g
JOIN r IN g.statistics.rankings
WHERE r.name = 'boardgame'
I get this error:
Failed to query item for container Games:
Message: {"errors":[{"severity":"Error","location":{"start":21,"end":26},"code":"SC1001","message":"Syntax error, incorrect syntax near 'value'."}]}
ActivityId: 0a0cb394-2fc3-4a67-b54c-4d02085b6878, Microsoft.Azure.Documents.Common/2.14.0
I don't understand why this doesn't work? I don't understand what the syntax error is. I tried adding square braces but that didn't help. Can some help me understand why I get this error and also how to achieve the output I'm looking for?
This should work,
SELECT g.name, r["value"] as rank
FROM Games g
JOIN r IN g.statistics.rankings
WHERE r.name = 'boardgame'

Here.com Route API - V8 - State Mileage

We're trying to migrate from routes v7 to routes v8. Using v8, how can we get a breakdown of miles per US State?
In version 7.2 we could do
https://route.ls.hereapi.com/routing/7.2/calculateroute.json?apiKey=API_KEY&mode=fastest;truck&excludecountries=MEX,CAN&metricSystem=imperial&routeattributes=sm,sc&instructionFormat=text&truckType=tractorTruck&trailersCount=1&waypoint0=geo!33.90251,-81.13206&waypoint1=geo!39.80203,-105.08759
And the state codes would be in the summaryByCountry element:
"summaryByCountry": [
{
"distance": 189887,
"trafficTime": 8239,
"baseTime": 8206,
"flags": [
"motorway",
"builtUpArea"
],
"text": "The trip takes 118 mi and 2:17 h.",
"travelTime": 8206,
"country": "South Carolina",
"_type": "RouteSummaryByCountryType"
},
...
In version 8, a similar request:
https://router.hereapi.com/v8/routes?apiKey=API_KEY&origin=32.20618,-110.96474&destination=40.391537,-104.681168&routingMode=fast&transportMode=truck&avoid[features]=ferry&exclude[countries]=MEX,CAN&units=imperial&return=polyline,summary,actions,instructions&spans=countryCode,length,truckAttributes,notices&truck[trailerCount]=1&via=40.014984,-105.270546
"spans": [
{
"offset": 0,
"truckAttributes": [
"open"
],
"length": 1460740,
"countryCode": "USA"
},
{
"offset": 14050,
"truckAttributes": [
"open",
"tollRoad"
],
"length": 272,
"countryCode": "USA"
},
{
"offset": 14053,
"truckAttributes": [
"open"
],
"length": 23153,
"countryCode": "USA"
}
v7: summaryByCountry
v8: Not present. If spans are requested with spans=countryCode,length, then information about the distance in each country can be retrieved. No plans to support in any other manner.
Response in v8 contains :
spans": [
{
"offset": 0,
"truckAttributes": [
"open"
],
"length": 76817,
"countryCode": "USA"
}
],
link to migration guide : https://developer.here.com/documentation/routing-api/migration_guide/index.html

Transit API how to get all Stops served by specific route?

Want to display transit route near by location of my city, I can get it from next departure but if I want to get all Stops served by that specific route in sequence order?
Following is single object of station returned stations/by_geocoord.json
{
"id": "400702222",
"name": "Cosburn Ave at Woodbine Ave",
"distance": 24,
"duration": "PT0H0M24S",
"x": -79.317285,
"y": 43.696509,
"has_board": 1,
"country": "Canada",
"ccode": "CAN",
"state": "ON",
"postal": "M4C 4G4",
"district": "Woodbine-Lumsden",
"street": "Woodbine Ave",
"number": "1349",
"city": "Toronto",
"Transports": {
"Transport": [
{
"name": "87",
"mode": 5,
"dir": "West - 87A Cosburn towards Broadview Station via East York Acres",
"At": {
"textColor": "#FFFFFF",
"color": "#804000"
}
},
{
"name": "87",
"mode": 5,
"dir": "West - 87C Cosburn towards Broadview Station",
"At": {
"textColor": "#FFFFFF",
"color": "#804000"
}
}
]
}
}
How to get list of stops which are included in dir: West - 87A Cosburn towards Broadview Station via East York Acres
You can try this to get all the transit stations/stops within a given radius by using resource as stations/by_geocoord
Here is the Source for the documentation.
https://developer.here.com/documentation/transit/dev_guide/topics/resource-search-geocoord.html

Want to output two values from each line of a huge JSONL file in R Studio

I'm walking through a huge JSONL file (100G, 100M rows) line by line extracting two key values from the data. Ideally, I want this written to a file with two columns. I'm a real beginner here.
Here is an example of the JSON on each row of the file referenced on my C drive:
https://api.unpaywall.org/v2/10.6118/jmm.2017.23.2.135?email=YOUR_EMAIL
or:
{
"best_oa_location": {
"evidence": "open (via page says license)",
"host_type": "publisher",
"is_best": true,
"license": "cc-by-nc",
"pmh_id": null,
"updated": "2018-02-14T11:18:21.978814",
"url": "FAKEURL",
"url_for_landing_page": "URL2",
"url_for_pdf": "URL4",
"version": "publishedVersion"
},
"data_standard": 2,
"doi": "10.6118/jmm.2017.23.2.135",
"doi_url": "URL5",
"genre": "journal-article",
"is_oa": true,
"journal_is_in_doaj": false,
"journal_is_oa": false,
"journal_issns": "2288-6478,2288-6761",
"journal_name": "Journal of Menopausal Medicine",
"oa_locations": [
{
"evidence": "open (via page says license)",
"host_type": "publisher",
"is_best": true,
"license": "cc-by-nc",
"pmh_id": null,
"updated": "2018-02-14T11:18:21.978814",
"url": "URL6",
"url_for_landing_page": "hURL7": "hURL8",
"version": "publishedVersion"
},
{
"evidence": "oa repository (via OAI-PMH doi match)",
"host_type": "repository",
"is_best": false,
"license": "cc-by-nc",
"pmh_id": "oai:pubmedcentral.nih.gov:5606912",
"updated": "2017-10-21T18:12:39.724143",
"url": "URL9",
"url_for_landing_page": "URL11",
"url_for_pdf": "URL12",
"version": "publishedVersion"
},
{
"evidence": "oa repository (via pmcid lookup)",
"host_type": "repository",
"is_best": false,
"license": null,
"pmh_id": null,
"updated": "2018-10-11T01:49:34.280389",
"url": "URL13",
"url_for_landing_page": "URL14",
"url_for_pdf": null,
"version": "publishedVersion"
}
],
"published_date": "2017-01-01",
"publisher": "The Korean Society of Menopause (KAMJE)",
"title": "A Case of Granular Cell Tumor of the Clitoris in a Postmenopausal Woman",
"updated": "2018-06-20T20:31:37.509896",
"year": 2017,
"z_authors": [
{
"affiliation": [
{
"name": "Department of Obstetrics and Gynecology, Soonchunhyang University Cheonan Hospital, University of Soonchunhyang College of Medicine, Cheonan, Korea."
}
],
"family": "Min",
"given": "Ji-Won"
},
{
"affiliation": [
{
"name": "Department of Obstetrics and Gynecology, Soonchunhyang University Cheonan Hospital, University of Soonchunhyang College of Medicine, Cheonan, Korea."
}
],
"family": "Kim",
"given": "Yun-Sook"
}
]
}
Here's the code i'm using/wrote:
library (magrittr)
library (jqr)
con = file("C:/users/ME/desktop/miniunpaywall.jsonl", "r");
while ( length(line <- readLines(con, n = -1)) > 0) {
write.table( line %>% jq ('.doi,.best_oa_location.license'), file='test.txt', quote=FALSE, row.names=FALSE);}
What results from this is a line of text for each row of JSON that looks like this:
"10.1016/j.ijcard.2018.10.014,CC-BY"
This is effectively:
"[DOI],[LICENSE]"
I want ideally to have the output be:
[DOI] tab [LICENSE]
I believe my problem is that I'm writing the values as a string into a single column when i say:
write.table( line %>% jq ('.doi,.best_oa_location.license')
I havent figured out a way to remove the quotes i'm getting around each line in my file or how i could separate the two values with a tab. I feel I'm pretty close. Help!

R Getting JSON data into dataframe

I have this file with JSON formatted data, but need this into a dataframe. Ultimately I would like to plot the geolocations onto a map, but can't seem to get this data into a df first.
json_to_df <- function(file){
file <- lapply(file, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x)
})
df <- do.call("rbind", file)
return(df)
}
But I get only this error:
Error in fromJSON(file) :
STRING_ELT() can only be applied to a 'character vector', not a 'list'
The file structure looks like this (this is only part of the data):
{
"results": [
{
"utc_offset": 7200000,
"venue": {
"country": "nl",
"localized_country_name": "Netherlands",
"city": "Bergen",
"address_1": "16 Notweg",
"name": "FitClub Bergen",
"lon": 4.699218,
"id": 24632049,
"lat": 52.673046,
"repinned": false
},
"headcount": 0,
"distance": 22.46796989440918,
"visibility": "public",
"waitlist_count": 0,
"created": 1467149834000,
"rating": {
"count": 0,
"average": 0
},
"maybe_rsvp_count": 0,
"description": "<p>Start your week off right with a Monday Morning Bootcamp!!! The fresh air and peaceful dunes provide the perfect setting for a total body workout. Whether you are a beginner with brand spankin' new health goals and in need of some direction, or training for a race or competition, we're the trainers for you!!! See you at 8:50 for sign-in!</p>",
"event_url": "https://www.meetup.com/FitClubBergen/events/234936736/",
"yes_rsvp_count": 3,
"duration": 3600000,
"name": "Free Bootcamp in the Bergen Dunes",
"id": "glzqvlyvnbgc",
"time": 1477292400000,
"updated": 1477297999000,
"group": {
"join_mode": "open",
"created": 1441658286000,
"name": "FitClub Bergen Free Bootcamp in the Dunes",
"group_lon": 4.710000038146973,
"id": 18908751,
"urlname": "FitClubBergen",
"group_lat": 52.66999816894531,
"who": "FitClubbers"
},
"status": "past"
},
{
"utc_offset": 7200000,
"venue": {
"country": "nl",
"localized_country_name": "Netherlands",
"city": "Bergen",
"address_1": "16 Notweg",
"name": "FitClub Bergen",
"lon": 4.699218,
"id": 24632049,
"lat": 52.673046,
"repinned": false
},
"headcount": 0,
"distance": 22.46796989440918,
"visibility": "public",
"waitlist_count": 0,
"created": 1467149834000,
"rating": {
"count": 0,
"average": 0
},
"maybe_rsvp_count": 0,
"description": "<p>Start your week off right with a Monday Morning Bootcamp!!! The fresh air and peaceful dunes provide the perfect setting for a total body workout. Whether you are a beginner with brand spankin' new health goals and in need of some direction, or training for a race or competition, we're the trainers for you!!! See you at 8:50 for sign-in!</p> <p>ALWAYS FREE</p> <p>FOR ALL LEVELS OF FITNESS</p> <p>BRING: water bottle and energy</p>",
"event_url": "https://www.meetup.com/FitClubBergen/events/234936737/",
"yes_rsvp_count": 3,
"name": "Monday Morning Bootcamp in the Bergen Dunes",
"id": "flzqvlyvnbgc",
"time": 1477292400000,
"updated": 1477303926000,
"group": {
"join_mode": "open",
"created": 1441658286000,
"name": "FitClub Bergen Free Bootcamp in the Dunes",
"group_lon": 4.710000038146973,
"id": 18908751,
"urlname": "FitClubBergen",
"group_lat": 52.66999816894531,
"who": "FitClubbers"
},
"status": "past"
},
{
"utc_offset": 7200000,
"venue": {
"country": "nl",
"localized_country_name": "Netherlands",
"city": "Amsterdam",
"phone": "020 4275777",
"address_1": "Dijksgracht 2",
"address_2": "1019 BS ",
"name": "Klimmuur Central",
"lon": 4.91284,
"id": 1143381,
"lat": 52.376626,
"repinned": false
},
"headcount": 0,
"distance": 1.0689502954483032,
"visibility": "public",
"waitlist_count": 0,
"created": 1477215767000,
"rating": {
"count": 0,
"average": 0
},
"maybe_rsvp_count": 0,
"description": "<p>Climbing Right After Work: RAW.<br/>Quiet hall, pretty much every rope available; no rope chasing necessary. And.. still some time left to do other things later that evening. Take you gear and an extra sandwich to work and join me afterwards pulling some plastic.<br/>Some notes:<br/>- This events starts #17:00. If you can't make it that early, please comment the time you can.<br/>- Please fill in your belaying skills in your profile. If you've never climbed before or don't have belaying skills: follow an introduction course a the gym first! Safety above all!</p>",
"event_url": "https://www.meetup.com/The-Amsterdam-indoor-rockclimbing/events/235054729/",
"yes_rsvp_count": 3,
"name": "Monday's RAW Climb",
"id": "235054729",
"time": 1477321200000,
"updated": 1477334279000,
"group": {
"join_mode": "approval",
"created": 1358348565000,
"name": "The Amsterdam indoor rockclimbing",
"group_lon": 4.889999866485596,
"id": 6689952,
"urlname": "The-Amsterdam-indoor-rockclimbing",
"group_lat": 52.369998931884766,
"who": "Climbers"
},
"status": "past"
},
{
"utc_offset": 7200000,
"venue": {
"country": "nl",
"localized_country_name": "Netherlands",
"city": "Amstelveen",
"address_1": "Langs de Akker 3",
"name": "Emergohal",
"lon": 4.87967,
"id": 23816542,
"lat": 52.290199,
"repinned": false
},
"rsvp_limit": 12,
"headcount": 0,
"distance": 5.541957378387451,
"visibility": "public",
"waitlist_count": 0,
"created": 1474452073000,
"fee": {
"amount": 5.5,
"accepts": "cash",
"description": "per person",
"currency": "EUR",
"label": "price",
"required": "0"
},
"rating": {
"count": 0,
"average": 0
},
"maybe_rsvp_count": 0,
"description": "<p>We will play the Whole Season indoor soccer on Mondays from 18:00 - 19:00 starting 5 September until May 2017 in the Emergohal Amstelveen.</p> <p>Preferred payment is with Paypal EUR 5.50 (in advance)<br/>If this is not possible you may pay cash but then I will ask EUR 6,-<br/>(Please have the exact cash with you)</p> <p>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</p> <p>A couple of Unisys (ex)colleagues and football lovers are playing every Monday in the Emergohal Amstelveen at 6PM on a reasonable good level. We are looking for a compact group of players who are willing/able to play (almost) every Monday playing 5v5 (or 6v6).<br/>We are playing with the FIFA Futsal rules in mind:<br/>http://www.fifa.com/mm/document/footballdevelopment/refereeing/51/44/50/lawsofthegamefutsal2014_15_eneu_neutral.pdf</p> <p>The Emergohal has dressing rooms and a nice bar for after the game.</p> <p>Hope to see you on Mondays</p> <p>Cheers Jeroen</p> <p>For questions you may call me on[masked], send a text message (SMS) or leave a message on this meetup group.</p>",
"event_url": "https://www.meetup.com/Futsal_Emergohal_Monday_18-00/events/234290812/",
"yes_rsvp_count": 11,
"duration": 4500000,
"name": "Futsal",
"id": "234290812",
"time": 1477323900000,
"updated": 1477330559000,
"group": {
"join_mode": "approval",
"created": 1474445066000,
"name": "Futsal_Emergohal_Monday_18.00",
"group_lon": 4.860000133514404,
"id": 20450096,
"urlname": "Futsal_Emergohal_Monday_18-00",
"group_lat": 52.31999969482422,
"who": "Players"
},
"status": "past"
}],
"meta": {
"next": "https://api.meetup.com/2/open_events?and_text=False&offset=1&city=Amsterdam&sign=True&format=json&lon=4.88999986649&limited_events=False&photo-host=public&page=20&time=-24m%2C&radius=25.0&lat=52.3699989319&status=past&desc=False",
"method": "OpenEvents",
"total_count": 643,
"link": "https://api.meetup.com/2/open_events",
"count": 20,
"description": "Searches for recent and upcoming public events hosted by Meetup groups. Its search window is the past one month through the next three months, and is subject to change. Open Events is optimized to search for current events by location, category, topic, or text, and only lists Meetups that have **3 or more RSVPs**. The number or results returned with each request is not guaranteed to be the same as the page size due to secondary filtering. If you're looking for a particular event or events within a particular group, use the standard [Events](/meetup_api/docs/2/events/) method.",
"lon": ,
"title": "Meetup Open Events v2",
"url": "",
"signed_url": "{signed_url}",
"id": "",
"updated": 1479988687055,
"lat":
}
}
So I was wondering how I would put this in a dataframe or csv even to be able to extract geolocations later?
There is no need to write a parser yourself, there are a number of packages that can read JSON formatted data. The one I use, and #hrbrmstr linked, is jsonlite. This package provides a fromJSON function which can parse JSON into a data.frame:
fromJSON('file.json', flatten = TRUE)
note that the flatten argument here ensures the json is flattended into a nice data.frame.

Resources