OpenAI package leaving linebreak in response - r

I've starting using OpenAI API in R. I downloaded the openai package. I keep getting a double linebreak in the text response. Here's an example of my code:
library(openai)
vector = create_completion(
model = "text-davinci-003",
prompt = "Tell me what the weather is like in London, UK, in Celsius in 5 words.",
max_tokens = 20,
temperature = 0,
echo = FALSE
)
vector_2 = vector$choices[1]
vector_2$text
[1] "\n\nRainy, mild, cool, humid."
Is there a way to get rid of this without 'correcting' the response text using other functions?

No, it's not possible.
The OpenAI API returns the completion with starting \n\n by default. There's no parameter for the Completions endpoint to control this.
You need to remove linebreak manually.
Example response looks like this:
{
"id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi7",
"object": "text_completion",
"created": 1589478378,
"model": "text-davinci-003",
"choices": [
{
"text": "\n\nThis is indeed a test",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 7,
"total_tokens": 12
}
}

Related

I'm having trouble with parsing a JSON file

I am attempting to use a .json file I found online, but I'm starting to think that there is an underlying issue with the file. I am not very knowledgeable in .json files, so I am trying to convert it into a CSV file. I have yet to find a website that can do that for me.
I've tried using R to convert the file since the file is also quite large and I can only assume that most websites have a size limit. I have tried flattening it in r with this code:
library(jsonlite)
library(tidyr)
library(tidyverse)
json_string <- readLines("data.json")
json_data <- fromJSON(json_string)
json_data <- flatten(json_data)
df <- as_data_frame(json_data)
write_csv(df, "output.csv")
but it returns this error:
! Tibble columns must have compatible sizes.
* Size 2: Columns `A-Alrund, God of the Cosmos // A-Hakka, Whispering Raven`, `A-Blessed Hippogriff // A-Tyr's Blessing`, `A-Emerald Dragon // A-Dissonant Wave`, `A-Monster Manual // A-Zoological Study`, `A-Rowan, Scholar of Sparks // A-Will, Scholar of Frost`, and 484 more.
* Size 3: Column `Smelt // Herd // Saw`.
* Size 5: Column `Who // What // When // Where // Why`.
* Size 6: Columns `Everythingamajig`, `Garbage Elemental`, `Ineffable Blessing`, `Knight of the Kitchen Sink`, `Scavenger Hunt`, and 4 more.
i Only values of size one are recycled.
Backtrace:
1. tibble::as_data_frame(json_data)
3. tibble:::as_tibble.list(x, ...)
4. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
5. tibble:::recycle_columns(x, .rows, lengths)
Here is what the first 2 items of the .json file look like
{"data": {"\"Ach! Hans, Run!\"": [{"colorIdentity": ["G", "R"], "colors": ["G", "R"], "convertedManaCost": 6.0, "foreignData": [], "identifiers": {"scryfallOracleId": "a2c5ee76-6084-413c-bb70-45490d818374"}, "isFunny": true, "layout": "normal", "legalities": {}, "manaCost": "{2}{R}{R}{G}{G}", "manaValue": 6.0, "name": "\"Ach! Hans, Run!\"", "printings": ["UNH"], "purchaseUrls": {"cardKingdom": "https://mtgjson.com/links/84dfefe718a51cf8", "cardKingdomFoil": "https://mtgjson.com/links/d8c9f3fc1e93c89c", "cardmarket": "https://mtgjson.com/links/b9d69f0d1a9fb80c", "tcgplayer": "https://mtgjson.com/links/c51d2b13ff76f1f0"}, "rulings": [], "subtypes": [], "supertypes": [], "text": "At the beginning of your upkeep, you may say \"Ach! Hans, run! It's the . . .\" and the name of a creature card. If you do, search your library for a card with that name, put it onto the battlefield, then shuffle. That creature gains haste. Exile it at the beginning of the next end step.", "type": "Enchantment", "types": ["Enchantment"]}], "\"Brims\" Barone, Midway Mobster": [{"colorIdentity": ["B", "W"], "colors": ["B", "W"], "convertedManaCost": 5.0, "foreignData": [], "identifiers": {"scryfallOracleId": "c64c31f2-c1be-414e-9dff-c3b77ba97545"}, "isFunny": true, "layout": "normal", "leadershipSkills": {"brawl": false, "commander": true, "oathbreaker": false}, "legalities": {}, "manaCost": "{3}{W}{B}", "manaValue": 5.0, "name": "\"Brims\" Barone, Midway Mobster", "power": "5", "printings": ["UNF"], "purchaseUrls": {"cardKingdom": "https://mtgjson.com/links/d1e320bd9d6813c0", "cardKingdomFoil": "https://mtgjson.com/links/18f86e8a04682c34", "cardmarket": "https://mtgjson.com/links/d5a3d8cfb60767d4", "tcgplayer": "https://mtgjson.com/links/980f45f2bc8c3733"}, "rulings": [], "subtypes": ["Human", "Rogue"], "supertypes": ["Legendary"], "text": "When \"Brims\" Barone, Midway Mobster enters the battlefield, put a +1/+1 counter on each other creature you control that has a hat.\n\"Brims\" Barone, Midway Mobster has menace as long as you're wearing a hat.", "toughness": "4", "type": "Legendary Creature — Human Rogue", "types": ["Creature"]}]}
I am hoping that the resulting csv file has the keys as the column names, and the values to be assigned to the columns based on their keys.
EDIT:
I have now attached a screenshot of what the json_data structure looks like.Structure of json_data
Assuming it's one of the JSON dumps from scryfall, try this:
library(jsonlite)
library(tidyr)
library(tidyverse)
todo <- list.files(pattern = ".json")
json_data <- fromJSON(todo)
json_data_flat_jsl <- jsonlite::flatten(json_data)
df <- as_tibble(json_data_flat_jsl)
write_csv(df, "output.csv")

How can I use jq to sort by datetime field and filter based on attribute?

I am trying to sort following json response based on "startTime" and also want to filter based on "name" and fetch only "dataCenter" of matched record. Can you please help with jq function for doing it?
I tried something like this jq '.[]|= sort_by(.startTime)' but it doesnt return correct result.
[
{
"name": "JPCSKELT",
"dataCenter": "mvsADM",
"orderId": "G9HC8",
"scheduleTable": "FD33515",
"nodeGroup": null,
"controlmApp": "P/C-DEVELOPMENT-LRSP",
"groupName": "SCMTEST",
"assignmentGroup": "HOST_CONFIG_MGMT",
"owner": "PC00000",
"description": null,
"startTime": "2021-11-11 17:45:48.0",
"endTime": "2021-11-11 17:45:51.0",
"successCount": 1,
"failureCount": 0,
"dailyRunCount": 0,
"scriptName": "JPCSKELT"
},
{
"name": "JPCSKELT",
"dataCenter": "mvsADM",
"orderId": "FWX98",
"scheduleTable": "JPCS1005",
"nodeGroup": null,
"controlmApp": "P/C-DEVELOPMENT-LRSP",
"groupName": "SCMTEST",
"assignmentGroup": "HOST_CONFIG_MGMT",
"owner": "PC00000",
"description": null,
"startTime": "2021-07-13 10:49:47.0",
"endTime": "2021-07-13 10:49:49.0",
"successCount": 1,
"failureCount": 0,
"dailyRunCount": 0,
"scriptName": "JPCSKELT"
},
{
"name": "JPCSKELT",
"dataCenter": "mvsADM",
"orderId": "FWX98",
"scheduleTable": "JPCS1005",
"nodeGroup": null,
"controlmApp": "P/C-DEVELOPMENT-LRSP",
"groupName": "SCMTEST",
"assignmentGroup": "HOST_CONFIG_MGMT",
"owner": "PC00000",
"description": null,
"startTime": "2021-10-13 10:49:47.0",
"endTime": "2021-10-13 10:49:49.0",
"successCount": 1,
"failureCount": 0,
"dailyRunCount": 0,
"scriptName": "JPCSKELT"
}
]
You can use the following expression to sort the input -
sort_by(.startTime | sub("(?<time>.*)\\..*"; "\(.time)") | strptime("%Y-%m-%d %H:%M:%S") | mktime)
The sub("(?<time>.*)\\..*"; "\(.time)") expression removes the trailing decimal fraction.
I assume you can use the result from the above query to perform desired filtering.
Welcome. From what I'm guessing you're asking, you want to supply a value to filter the records on using the name property, sort the results by the startTime property and then just output the value of the dataCenter property for those records. How about this:
jq --arg name JPCSKELT '
map(select(.name==$name))|sort_by(.startTime)[].dataCenter
' data.json
Based on your sample data, this produces:
"mvsADM"
"mvsADM"
"mvsADM"
So I'm wondering if this is what you're really asking?

extract value from JSON object using SQLite and the json_tree function

I have a table (named, patrons) that contains a column (named, json_patron_varfields) of JSON data--an array of objects that looks something like this:
[
{
"display_order": 1,
"field_content": "example 1",
"name": "Note",
"occ_num": 0,
"varfield_type_code": "x"
},
{
"display_order": 2,
"field_content": "example 2",
"name": "Note",
"occ_num": 1,
"varfield_type_code": "x"
},
{
"display_order": 3,
"field_content": "some field we do not want",
"occ_num": 0,
"varfield_type_code": "z"
}
]
What I'm trying to do is to target the objects that contain the key named varfield_type_code and the value of x which I've been able to do with the following query:
SELECT
patrons.patron_record_id,
json_extract(patrons.json_patron_varfields, json_tree.path)
FROM
patrons,
json_tree(patrons.json_patron_varfields)
WHERE
json_tree.key = 'varfield_type_code'
AND json_tree.value = 'x'
My Question is... how do I extract (or even possibly filter on) the values of the field_content keys from the objects I'm extracting?
I'm struggling with the syntax of how to do that... I was thinking it could be as simple as using json_extract(patrons.json_patron_varfields, json_tree.path."field_content") but that doesn't appear to be correct..
You can concat to build the string
json_tree.path || '.field_content'
With the structure you've given - you can also use json_each() instead of json_tree() which may simplify things.
extract:
SELECT
patrons.patron_record_id,
json_extract(value, '$.field_content')
FROM
patrons,
json_each(patrons.json_patron_varfields)
WHERE json_extract(value, '$.varfield_type_code') = 'x'
filter:
SELECT
patrons.patron_record_id,
value
FROM
patrons,
json_each(patrons.json_patron_varfields)
WHERE json_extract(value, '$.varfield_type_code') = 'x'
AND json_extract(value, '$.field_content') = 'example 2'

Elastic package in R: Sort for version > v5...not working

Using elastic version V5.1
I'm trying to use the example of index shakespeare.
Tried:
Search(index="shakespeare", type="act", sort = '{"_source": ["speaker:desc"] }', size = 5)
and
Search(index="shakespeare",body = '{"_source": ["play_name", "speaker", "text_entry"] }',
sort='{"_source": ["text_entry" : {"order" : "desc"}] }' ,q="york", size = 5)
But not getting the right results.
Can someone help me with the correct syntax for sort for version V5 above.
Thanks.
Okay, fix pushed.
Reinstall like devtools::install_github("ropensci/elastic")
Problem is explained here https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html
So to allow using sort on a field, need to enable fielddata on that field. so for the example above, do
library(elastic)
connect()
mapping_create("shakespeare", "act", update_all_types = TRUE, body = '{
"properties": {
"speaker": {
"type": "text",
"fielddata": true
}
}
}')
res <- Search("shakespeare", "act", body = '{"sort":[{"speaker":{"order" : "desc"}}]}')
vapply(res$hits$hits, "[[", "", c("_source", "speaker"))
#> [1] "ARCHBISHOP OF YORK" "VERNON" "PLANTAGENET" "PETO" "KING HENRY IV"
#> [6] "HOTSPUR" "FALSTAFF" "CHARLES" ""
does that work for you?

How to process output from match function in jq?

I'm using js tool to parse some JSONs/strings. My minimal example is the following command:
echo '"foo foo"' | jq 'match("(foo)"; "g")'
Which results in the following output:
{
"offset": 0,
"length": 3,
"string": "foo",
"captures": [
{
"offset": 0,
"length": 3,
"string": "foo",
"name": null
}
]
}
{
"offset": 4,
"length": 3,
"string": "foo",
"captures": [
{
"offset": 4,
"length": 3,
"string": "foo",
"name": null
}
]
}
I want my final output for this example to be:
"foo,foo"
But in this case I get two separate objects instead of an array or similar that I could call implode on. I guess either the API isn't made for my UC or my understanding of it is very wrong. Please, advise.
The following script takes the string value from each of the separate objects with .string, wraps them in an array [...] and then joins the members of the array with commas using join.
I modified the regex because you didn't actually need a capture group for the given use case, but if you wanted to access the capture groups you could do .captures[].string instead of .string.
echo '"foo foo"' | jq '[match("foo"; "g").string] | join(",")'

Resources