Is there an R library or function for formatting international currency strings? - r

Here's a snippet of the JSON data I'm working with:
{
"item" = "Mexican Thing",
...
"raised": "19",
"currency": "MXN"
},
{
"item" = "Canadian Thing",
...
"raised": "42",
"currency": "CDN"
},
{
"item" = "American Thing",
...
"raised": "1",
"currency": "USD"
}
You get the idea.
I'm hoping there's a function out there that can take in a standard currency abbreviation and a number and spit out the appropriate string. I could theoretically write this myself except I can't pretend like I know all the ins and outs of this stuff and I'm bound to spend days and weeks being surprised by bugs or edge cases I didn't think of. I'm hoping there's a library (or at least a web api) already written that can handle this but my Googling has yielded nothing useful so far.
Here's an example of the result I want (let's pretend "currency" is the function I'm looking for)
currency("USD", "32") --> "$32"
currency("GBP", "45") --> "£45"
currency("EUR", "19") --> "€19"
currency("MXN", "40") --> "MX$40"

Assuming your real json is valid, then it should be relatively simple. I'll provide a valid json string, fixing the three invalid portions here: = should be :; ... is obviously a placeholder; and it should be a list wrapped in [ and ]:
js <- '[{
"item": "Mexican Thing",
"raised": "19",
"currency": "MXN"
},
{
"item": "Canadian Thing",
"raised": "42",
"currency": "CDN"
},
{
"item": "American Thing",
"raised": "1",
"currency": "USD"
}]'
with(jsonlite::parse_json(js, simplifyVector = TRUE),
paste(raised, currency))
# [1] "19 MXN" "42 CDN" "1 USD"
Edit: in order to change to specific currency characters, don't make this too difficult: just instantiate a lookup vector where "USD" (for example) prepends "$" and appends "" (nothing) to the raised string. (I say both prepend/append because I believe some currencies are always post-digits ... I could be wrong.)
pre_currency <- Vectorize(function(curr) switch(curr, USD="$", GDP="£", EUR="€", CDN="$", "?"))
post_currency <- Vectorize(function(curr) switch(curr, USD="", GDP="", EUR="", CDN="", "?"))
with(jsonlite::parse_json(js, simplifyVector = TRUE),
paste0(pre_currency(currency), raised, post_currency(currency)))
# [1] "?19?" "$42" "$1"
I intentionally left "MXN" out of the vector here to demonstrate that you need a default setting, "?" (pre/post) here. You may choose a different default/unknown currency value.
An alternative:
currency <- function(val, currency) {
pre <- sapply(currency, switch, USD="$", GDP="£", EUR="€", CDN="$", "?")
post <- sapply(currency, switch, USD="", GDP="", EUR="", CDN="", "?")
paste0(pre, val, post)
}
with(jsonlite::parse_json(js, simplifyVector = TRUE),
currency(raised, currency))
# [1] "?19?" "$42" "$1"

Related

Group nested array objects to parent key in JQ

I have JSON coming from an external application, formatted like so:
{
"ticket_fields": [
{
"url": "https://example.com/1122334455.json",
"id": 1122334455,
"type": "tagger",
"custom_field_options": [
{
"id": 123456789,
"name": "I have a problem",
"raw_name": "I have a problem",
"value": "help_i_have_problem",
"default": false
},
{
"id": 456789123,
"name": "I have feedback",
"raw_name": "I have feedback",
"value": "help_i_have_feedback",
"default": false
},
]
}
{
"url": "https://example.com/6677889900.json",
"id": 6677889900,
"type": "tagger",
"custom_field_options": [
{
"id": 321654987,
"name": "United States,
"raw_name": "United States",
"value": "location_123_united_states",
"default": false
},
{
"id": 987456321,
"name": "Germany",
"raw_name": "Germany",
"value": "location_456_germany",
"default": false
}
]
}
]
}
The end goal is to be able to get the data into a TSV in the sense that each object in the custom_field_options array is grouped by the parent ID (ticket_fields.id), and then transposed such that each object would be represented on a single line, like so:
Ticket Field ID
Name
Value
1122334455
I have a problem
help_i_have_problem
1122334455
I have feedback
help_i_have_feedback
6677889900
United States
location_123_united_states
6677889900
Germany
location_456_germany
I have been able to export the data successfully to TSV already, but it reads per-line, and without preserving order, like so:
Using jq -r '.ticket_fields[] | select(.type=="tagger") | [.id, .custom_field_options[].name, .custom_field_options[].value] | #tsv'
Ticket Field ID
Name
Name
Value
Value
1122334455
I have a problem
I have feedback
help_i_have_problem
help_i_have_feedback
6677889900
United States
Germany
location_123_united_states
location_456_germany
Each of the custom_field_options arrays in production may consist of any number of objects (not limited to 2 each). But I seem to be stuck on how to appropriately group or map these objects to their parent ticket_fields.id and to transpose the data in a clean manner. The select(.type=="tagger") is mentioned in the query as there are multiple values for ticket_fields.type which need to be filtered out.
Based on another answer on here, I did try variants of jq -r '.ticket_fields[] | select(.type=="tagger") | map(.custom_field_options |= from_entries) | group_by(.custom_field_options.ticket_fields) | map(map( .custom_field_options |= to_entries))' without success. Any assistance would be greatly appreciated!
You need two nested iterations, one in each array. Save the value of .id in a variable to access it later.
jq -r '
.ticket_fields[] | select(.type=="tagger") | .id as $id
| .custom_field_options[] | [$id, .name, .value]
| #tsv
'

Use scrapy to collect information for one item from multiple pages (and output it as a nested dictionary)

I'm trying to scrape data from a tournaments site.
Each tournament has some information such as the venue, the date, prices etc.
And also the rank of teams that took part. The rank is a table that simply provides the name of the team, and its position in the rank.
Then, you can click on the name of the team which takes you to a page were we can get the roster of players that the team selected for that tournament.
I'd like to scrape the data into something like:
[{
"name": "Grand Tournament",
"venue": "...",
"date": "...",
"rank": [
{"team_name": "Team name",
"rank": 1,
"roster": ["player1", "player2", "..."]
},
{"team_name": "Team name",
"rank": 2,
"roster": ["player1", "player2", "..."]
}
]
}]
I have the following spider to scrape a single tournament page (usage: scrapy crawl tournamentspider -a strat_url="<tournamenturl>")
class TournamentSpider(scrapy.Spider):
name = "tournamentspider"
allowed_domains = ["..."]
def start_requests(self):
try:
yield scrapy.Request(url=self.start_url, callback=self.parse)
except AttributeError:
raise ValueError("You must use this spider with argument start_url.")
def parse(self, response):
tournament_item = TournamentItem()
tournament_item['teams'] = []
tournament_item ['name'] = "Tournament Name"
tournament_item['date'] = "Date"
tournament_item['venue'] = "Venue"
ladder = response.css('#ladder')
for row in ladder.css('table tbody tr'):
row_cells = row.xpath('td')
participation_item = PlayerParticipationItem()
participation_item['team_name'] = "Team Name"
participation_item['rank'] = "x"
# Parse roster
roster_url_page = row_cells[2].xpath('a/#href').get()
# Follow link to extract list
base_url = urlparse(response.url)
absolute_url = f'{base_url.scheme}://{base_url.hostname}/{list_url_page}'
request = scrapy.Request(absolute_url, callback=self.parse_roster_page)
request.meta['participation_item'] = participation_item
yield request
# Include participation item in the roster
tournament_item['players'].append(participation_item)
yield tournament_item
def parse_roster_page(self, response):
participation_item = response.meta['participation_item']
participation_item['roster'] = ["Player1", "Player2", "..."]
return participation_item
My problem is that this spider produces the following output:
[{
"name": "Grand Tournament",
"venue": "...",
"date": "...",
"rank": [
{"team_name": "Team name",
"rank": 1,
},
{"team_name": "Team name",
"rank": 2,
}
]
},
{"team_name": "Team name",
"rank": 1,
"roster": ["player1", "player2", "..."]
},
{"team_name": "Team name",
"rank": 2,
"roster": ["player1", "player2", "..."]
}]
I know that those extra items in the output are generated by the yield request line. When I remove it, I'm no longer scraping the roster page, so the extra items disappear, but I no longer have the roster data.
Is is possible to get the output I'm aiming for?
I know that a different approach could be to scrape the tournament information, and then teams with a field that identifies the tournament. But I'd like to know if the initial approach is achievable.
you can use scrapy inline requests to to call parse_roster_page and you'll get the roster data without yielding it out.
The only change you need to include is the decorator #inline_requests with the function parse_roster_page.
from inline_requests import inline_requests
class TournamentSpider(scrapy.Spider):
def parse(self, response):
...
#inline_requests
def parse_roster_page(self, response):
...

How to effectively chain groupby queries from flat api data in Kafka Streams?

I have some random data coming from an API into a Kafka topic that looks like this:
{"vin": "1N6AA0CA7CN040747", "make": "Nissan", "model": "Pathfinder", "year": 1993, "color": "Blue", "salePrice": "$58312.28", "city": "New York City", "state": "New York", "zipCode": "10014"}
{"vin": "1FTEX1C88AF678435", "make": "Audi", "model": "200", "year": 1991, "color": "Aquamarine", "salePrice": "$65651.53", "city": "Newport Beach", "state": "California", "zipCode": "92662"}
{"vin": "JN8AS1MU1BM237985", "make": "Subaru", "model": "Legacy", "year": 1990, "color": "Violet", "salePrice": "$21325.27", "city": "Joliet", "state": "Illinois", "zipCode": "60435"}
{"vin": "SCBGR3ZA1CC504502", "make": "Mercedes-Benz", "model": "E-Class", "year": 1986, "color": "Fuscia", "salePrice": "$81822.04", "city": "Pasadena", "state": "California", "zipCode": "91117"}
I am able to create KStream objects and observe them, like this:
KStream<byte[], UsedCars> usedCarsInputStream =
builder.stream("used-car-colors", Consumed.with(Serdes.ByteArray(), new UsedCarsSerdes()));
//k, v => year, countof cars in year
KTable<String,Long> yearCount = usedCarsInputStream
.filter((k,v)->v.getYear() > 2010)
.selectKey((k,v) -> v.getVin())
.groupBy((key, value) -> Integer.toString(value.getYear()))
.count().toStream().print(Printed.<String, Long>toSysOut().withLabel("blah"));
This of course gives us a count of the records grouped by each year greater than 2010. However, what I would like to do in the next step, but have been unable to accomplish, is to simply take each of those years, as in a foreach, and count the number of cars in each color per year. I attempted writing a foreach on yearCount.toStream() to further process the data, but got no results.
I am looking for output that might look like this:
{
"2011": [
{
"blue": "99",
"green": "243,",
"red": "33"
}
],
"2012": [
{
"blue": "74,",
"green": "432,",
"red": "2"
}
]
}
I believe I may have answered my own question. I would welcome any others to comment on my own solution.
What I did not realize is that you can do GroupBy an object that is essentially a compound object. In this case, I needed the equivalent of this following SQL statement
SELECT year, color, count(*) FROM use_car_colors AS years
GROUP BY year, color
In Kafka Streams, you can accomplish this by creating an object -- in this situation, I created a POJO class called 'YearColor' with members year and color -- and then select that as a key in Kafka Streams:
usedCarsInputStream
.selectKey((k,v) -> new YearColor(v.getYear(), v.getColor()))
.groupByKey(Grouped.with(new YearColorSerdes(), new UsedCarsSerdes()))
.count()
.toStream()
.peek((yc, ct) -> System.out.println("year: " + yc.getYear() + " color: " + yc.getColor()
+ " count: " + ct));
You of course have to implement the Serializer and Deserializer for this object (and I did with YearColorSerdes()). My output when running the Kafka Streams application gives me updates on the modified counts, a la:
year: 2012 color: Maroon count: 2
year: 2013 color: Khaki count: 1
year: 2012 color: Crimson count: 5
year: 2011 color: Pink count: 4
year: 2011 color: Green count: 2
which is what I was looking for.

split a key-value pair in Python

I have a dictionairy as follows:
{
"age": "76",
"Bank": "98310",
"Stage": "final",
"idnr": "4578",
"last number + Value": "[345:K]"}
I am trying to adjust the dictionary by splitting the last key-value pair creating a new key('total data'), it should look like this:
"Total data":¨[
{
"last number": "345"
"Value": "K"
}]
}
Does anyone know if there is a split function based on ':' and '+' or a for loop to accomplish this?
Thanks in advance.
One option to accomplish that could be getting the last key from the dict and using split on + for the key and : for the value removing the outer square brackets assuming the format of the data is always the same.
If you want Total data to contain a list, you can wrap the resulting dict in []
from pprint import pprint
d = {
"age": "76",
"Bank": "98310",
"Stage": "final",
"idnr": "4578",
"last number + Value": "[345:K]"
}
last = list(d.keys())[-1]
d["Total data"] = dict(
zip(
last.strip().split('+'),
d[last].strip("[]").split(':')
)
)
pprint(d)
Output (tested with Python 3.9.4)
{'Bank': '98310',
'Stage': 'final',
'Total data': {' Value': 'K', 'last number ': '345'},
'age': '76',
'idnr': '4578',
'last number + Value': '[345:K]'}
Python demo

Dynamodb SK range key is not returning data as expected

I am using the below query to get from DB (GSI):
results = table.query(
IndexName="Table-ID-index",
KeyConditionExpression=Key("id").eq(id),
)
However, my data is not sort based on the range key set with the GSI. Sample response I am getting with above query:
{
"value": "test1",
"sk": "1#1",
"id": "1"
},
{
"value": "test19",
"sk": "19#19",
"id": "19"
},
{
"value": "test2",
"sk": "2#2",
"id": "2"
}
sk 19 should come after sk 2. Is there anything I have missed in my query?
If memory serves, this is because the strings being stored and sorted in their UTF-8 encoded form. From the documentation:
"DynamoDB collates and compares strings using the bytes of the underlying UTF-8 string encoding. For example, "a" (0x61) is greater than "A" (0x41), and "¿" (0xC2BF) is greater than "z" (0x7A)."

Resources