I'm having trouble with parsing a JSON file - r

I am attempting to use a .json file I found online, but I'm starting to think that there is an underlying issue with the file. I am not very knowledgeable in .json files, so I am trying to convert it into a CSV file. I have yet to find a website that can do that for me.
I've tried using R to convert the file since the file is also quite large and I can only assume that most websites have a size limit. I have tried flattening it in r with this code:
library(jsonlite)
library(tidyr)
library(tidyverse)
json_string <- readLines("data.json")
json_data <- fromJSON(json_string)
json_data <- flatten(json_data)
df <- as_data_frame(json_data)
write_csv(df, "output.csv")
but it returns this error:
! Tibble columns must have compatible sizes.
* Size 2: Columns `A-Alrund, God of the Cosmos // A-Hakka, Whispering Raven`, `A-Blessed Hippogriff // A-Tyr's Blessing`, `A-Emerald Dragon // A-Dissonant Wave`, `A-Monster Manual // A-Zoological Study`, `A-Rowan, Scholar of Sparks // A-Will, Scholar of Frost`, and 484 more.
* Size 3: Column `Smelt // Herd // Saw`.
* Size 5: Column `Who // What // When // Where // Why`.
* Size 6: Columns `Everythingamajig`, `Garbage Elemental`, `Ineffable Blessing`, `Knight of the Kitchen Sink`, `Scavenger Hunt`, and 4 more.
i Only values of size one are recycled.
Backtrace:
1. tibble::as_data_frame(json_data)
3. tibble:::as_tibble.list(x, ...)
4. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
5. tibble:::recycle_columns(x, .rows, lengths)
Here is what the first 2 items of the .json file look like
{"data": {"\"Ach! Hans, Run!\"": [{"colorIdentity": ["G", "R"], "colors": ["G", "R"], "convertedManaCost": 6.0, "foreignData": [], "identifiers": {"scryfallOracleId": "a2c5ee76-6084-413c-bb70-45490d818374"}, "isFunny": true, "layout": "normal", "legalities": {}, "manaCost": "{2}{R}{R}{G}{G}", "manaValue": 6.0, "name": "\"Ach! Hans, Run!\"", "printings": ["UNH"], "purchaseUrls": {"cardKingdom": "https://mtgjson.com/links/84dfefe718a51cf8", "cardKingdomFoil": "https://mtgjson.com/links/d8c9f3fc1e93c89c", "cardmarket": "https://mtgjson.com/links/b9d69f0d1a9fb80c", "tcgplayer": "https://mtgjson.com/links/c51d2b13ff76f1f0"}, "rulings": [], "subtypes": [], "supertypes": [], "text": "At the beginning of your upkeep, you may say \"Ach! Hans, run! It's the . . .\" and the name of a creature card. If you do, search your library for a card with that name, put it onto the battlefield, then shuffle. That creature gains haste. Exile it at the beginning of the next end step.", "type": "Enchantment", "types": ["Enchantment"]}], "\"Brims\" Barone, Midway Mobster": [{"colorIdentity": ["B", "W"], "colors": ["B", "W"], "convertedManaCost": 5.0, "foreignData": [], "identifiers": {"scryfallOracleId": "c64c31f2-c1be-414e-9dff-c3b77ba97545"}, "isFunny": true, "layout": "normal", "leadershipSkills": {"brawl": false, "commander": true, "oathbreaker": false}, "legalities": {}, "manaCost": "{3}{W}{B}", "manaValue": 5.0, "name": "\"Brims\" Barone, Midway Mobster", "power": "5", "printings": ["UNF"], "purchaseUrls": {"cardKingdom": "https://mtgjson.com/links/d1e320bd9d6813c0", "cardKingdomFoil": "https://mtgjson.com/links/18f86e8a04682c34", "cardmarket": "https://mtgjson.com/links/d5a3d8cfb60767d4", "tcgplayer": "https://mtgjson.com/links/980f45f2bc8c3733"}, "rulings": [], "subtypes": ["Human", "Rogue"], "supertypes": ["Legendary"], "text": "When \"Brims\" Barone, Midway Mobster enters the battlefield, put a +1/+1 counter on each other creature you control that has a hat.\n\"Brims\" Barone, Midway Mobster has menace as long as you're wearing a hat.", "toughness": "4", "type": "Legendary Creature — Human Rogue", "types": ["Creature"]}]}
I am hoping that the resulting csv file has the keys as the column names, and the values to be assigned to the columns based on their keys.
EDIT:
I have now attached a screenshot of what the json_data structure looks like.Structure of json_data

Assuming it's one of the JSON dumps from scryfall, try this:
library(jsonlite)
library(tidyr)
library(tidyverse)
todo <- list.files(pattern = ".json")
json_data <- fromJSON(todo)
json_data_flat_jsl <- jsonlite::flatten(json_data)
df <- as_tibble(json_data_flat_jsl)
write_csv(df, "output.csv")

Related

OpenAI package leaving linebreak in response

I've starting using OpenAI API in R. I downloaded the openai package. I keep getting a double linebreak in the text response. Here's an example of my code:
library(openai)
vector = create_completion(
model = "text-davinci-003",
prompt = "Tell me what the weather is like in London, UK, in Celsius in 5 words.",
max_tokens = 20,
temperature = 0,
echo = FALSE
)
vector_2 = vector$choices[1]
vector_2$text
[1] "\n\nRainy, mild, cool, humid."
Is there a way to get rid of this without 'correcting' the response text using other functions?
No, it's not possible.
The OpenAI API returns the completion with starting \n\n by default. There's no parameter for the Completions endpoint to control this.
You need to remove linebreak manually.
Example response looks like this:
{
"id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi7",
"object": "text_completion",
"created": 1589478378,
"model": "text-davinci-003",
"choices": [
{
"text": "\n\nThis is indeed a test",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 7,
"total_tokens": 12
}
}

extract value from JSON object using SQLite and the json_tree function

I have a table (named, patrons) that contains a column (named, json_patron_varfields) of JSON data--an array of objects that looks something like this:
[
{
"display_order": 1,
"field_content": "example 1",
"name": "Note",
"occ_num": 0,
"varfield_type_code": "x"
},
{
"display_order": 2,
"field_content": "example 2",
"name": "Note",
"occ_num": 1,
"varfield_type_code": "x"
},
{
"display_order": 3,
"field_content": "some field we do not want",
"occ_num": 0,
"varfield_type_code": "z"
}
]
What I'm trying to do is to target the objects that contain the key named varfield_type_code and the value of x which I've been able to do with the following query:
SELECT
patrons.patron_record_id,
json_extract(patrons.json_patron_varfields, json_tree.path)
FROM
patrons,
json_tree(patrons.json_patron_varfields)
WHERE
json_tree.key = 'varfield_type_code'
AND json_tree.value = 'x'
My Question is... how do I extract (or even possibly filter on) the values of the field_content keys from the objects I'm extracting?
I'm struggling with the syntax of how to do that... I was thinking it could be as simple as using json_extract(patrons.json_patron_varfields, json_tree.path."field_content") but that doesn't appear to be correct..
You can concat to build the string
json_tree.path || '.field_content'
With the structure you've given - you can also use json_each() instead of json_tree() which may simplify things.
extract:
SELECT
patrons.patron_record_id,
json_extract(value, '$.field_content')
FROM
patrons,
json_each(patrons.json_patron_varfields)
WHERE json_extract(value, '$.varfield_type_code') = 'x'
filter:
SELECT
patrons.patron_record_id,
value
FROM
patrons,
json_each(patrons.json_patron_varfields)
WHERE json_extract(value, '$.varfield_type_code') = 'x'
AND json_extract(value, '$.field_content') = 'example 2'

split a key-value pair in Python

I have a dictionairy as follows:
{
"age": "76",
"Bank": "98310",
"Stage": "final",
"idnr": "4578",
"last number + Value": "[345:K]"}
I am trying to adjust the dictionary by splitting the last key-value pair creating a new key('total data'), it should look like this:
"Total data":¨[
{
"last number": "345"
"Value": "K"
}]
}
Does anyone know if there is a split function based on ':' and '+' or a for loop to accomplish this?
Thanks in advance.
One option to accomplish that could be getting the last key from the dict and using split on + for the key and : for the value removing the outer square brackets assuming the format of the data is always the same.
If you want Total data to contain a list, you can wrap the resulting dict in []
from pprint import pprint
d = {
"age": "76",
"Bank": "98310",
"Stage": "final",
"idnr": "4578",
"last number + Value": "[345:K]"
}
last = list(d.keys())[-1]
d["Total data"] = dict(
zip(
last.strip().split('+'),
d[last].strip("[]").split(':')
)
)
pprint(d)
Output (tested with Python 3.9.4)
{'Bank': '98310',
'Stage': 'final',
'Total data': {' Value': 'K', 'last number ': '345'},
'age': '76',
'idnr': '4578',
'last number + Value': '[345:K]'}
Python demo

Getting a list item within a dictionary by key

I have a list of state names:
stateNames = ['Alabama', 'Georgia', 'Florida']
And I have a dictionary that has a list of codes for each state name. *Not all states have codes. And I don't need the codes for all states, just the ones from the list:
masterdict = {'Alaska': [1,2,3], 'Alabama': [4, 5, 6], 'Arkansas': [7,8,9], 'Arizona': [], 'California': [], 'Colorado': [], 'Connecticut': [], 'DistrictOfColumbia': [], 'Delaware': [], 'Florida': [21, 48], 'Georgia': ['1,3,2,4,5'], 'Wyoming': []}
I want to look through my list and get the codes individually for each state in that list. I'm still a little off on the logic. One is a list(item in list) and one is a dictionary with keys ('state name') and values (list of codes). What am I doing incorrectly here:
for item in stateNames:
for k in masterdict.item():
if item == masterdict[k]:
print(masterdict[v])
In your first loop, you are getting all of the keys. So you don't need to do another loop. Just grab the value by the key.
for item in stateNames:
print(masterdict[item])

importing data from postgreSQL to R: characters conversion

I have a table stored in a postgreSQL and a column(named js) has the following format:
I am using the following code to import the data:
library('RPostgreSQL')
drv <- dbDriver('PostgreSQL')
CON <- dbConnect(drv,host='bi*********zonaws.com',port=****,user='****',password='*****')
dbGetQuery(CON,'SELECT js FROM ln2 LIMIT 1')
and I have the following resulst for the first row:
{"id": "pub%2Fc%25C3%25B3nal-o-meara%2F27%2F4a5%2F933", "age": null, "name": "Cónal O'Meara", "emails": [] }#continues but i stop it here. . . .
My question is this conversion in some letters. In the original table the name is Cónal O'Meara and in the imported in R is "Cónal O'Meara". How can I overcame that?

Resources