How to split column into rows by Object in array? - r

Hi all my dataframe looks somewhat like this:
**| Descriptor |**
[{"name": "Some name", "id": "L73871287"}, {"name": "Another name", "id": "L7123287"}]
[{"name": "Yet another name", "id": "L73556287"}, {"name": "Yet another name", "id": "L73556287"}]
How would one go about splitting this data by objects in R?
So to get:
**| Descriptor |**
{"name": "Some name", "id": "L73871287"}
{"name": "Another name", "id": "L7123287"}
{"name": "Yet another name", "id": "L73556287"}
{"name": "Yet another name", "id": "L73556287"}
Even better would be to just get a column "name" and a column "id", but idk if this is possible in R (I have a python and javascript background, but the file was too large for python)

Maybe this is what you are looking for:
library(jsonlite)
json <- '[{"name": "Some name", "id": "L73871287"}, {"name": "Another name", "id": "L7123287"}],
[{"name": "Yet another name", "id": "L73556287"}, {"name": "Yet another name", "id": "L73556287"}]'
ls <- fromJSON(txt = paste0("[", json, "]"))
do.call(rbind, ls)
#> name id
#> 1 Some name L73871287
#> 2 Another name L7123287
#> 3 Yet another name L73556287
#> 4 Yet another name L73556287

Related

How to select specific key after filtering the data using YQ

myres.json
[
{
"id": "id_1",
"name": "default",
},
{
"id": "id_2",
"name": "name2",
},
{
"id": "id_3",
"name": "name3",
}
]
I waanted to get only name whose id = 3
I am able to filter out the object using yq following command
yq -r '.[] | select(.id == "id_3" )' myres.json
and output is
{
"id": "id_3",
"name": "name3",
}
I tried with with_entries, from_entries but no luck.
Thanks in advance !!
I am using kislyuk/yq 2.14.1 version
As Per #Inian, I made few changes in query as follows according to my requirements.
yq -r '.[] | select(.id=="id_2").name' s.txt

Group nested array objects to parent key in JQ

I have JSON coming from an external application, formatted like so:
{
"ticket_fields": [
{
"url": "https://example.com/1122334455.json",
"id": 1122334455,
"type": "tagger",
"custom_field_options": [
{
"id": 123456789,
"name": "I have a problem",
"raw_name": "I have a problem",
"value": "help_i_have_problem",
"default": false
},
{
"id": 456789123,
"name": "I have feedback",
"raw_name": "I have feedback",
"value": "help_i_have_feedback",
"default": false
},
]
}
{
"url": "https://example.com/6677889900.json",
"id": 6677889900,
"type": "tagger",
"custom_field_options": [
{
"id": 321654987,
"name": "United States,
"raw_name": "United States",
"value": "location_123_united_states",
"default": false
},
{
"id": 987456321,
"name": "Germany",
"raw_name": "Germany",
"value": "location_456_germany",
"default": false
}
]
}
]
}
The end goal is to be able to get the data into a TSV in the sense that each object in the custom_field_options array is grouped by the parent ID (ticket_fields.id), and then transposed such that each object would be represented on a single line, like so:
Ticket Field ID
Name
Value
1122334455
I have a problem
help_i_have_problem
1122334455
I have feedback
help_i_have_feedback
6677889900
United States
location_123_united_states
6677889900
Germany
location_456_germany
I have been able to export the data successfully to TSV already, but it reads per-line, and without preserving order, like so:
Using jq -r '.ticket_fields[] | select(.type=="tagger") | [.id, .custom_field_options[].name, .custom_field_options[].value] | #tsv'
Ticket Field ID
Name
Name
Value
Value
1122334455
I have a problem
I have feedback
help_i_have_problem
help_i_have_feedback
6677889900
United States
Germany
location_123_united_states
location_456_germany
Each of the custom_field_options arrays in production may consist of any number of objects (not limited to 2 each). But I seem to be stuck on how to appropriately group or map these objects to their parent ticket_fields.id and to transpose the data in a clean manner. The select(.type=="tagger") is mentioned in the query as there are multiple values for ticket_fields.type which need to be filtered out.
Based on another answer on here, I did try variants of jq -r '.ticket_fields[] | select(.type=="tagger") | map(.custom_field_options |= from_entries) | group_by(.custom_field_options.ticket_fields) | map(map( .custom_field_options |= to_entries))' without success. Any assistance would be greatly appreciated!
You need two nested iterations, one in each array. Save the value of .id in a variable to access it later.
jq -r '
.ticket_fields[] | select(.type=="tagger") | .id as $id
| .custom_field_options[] | [$id, .name, .value]
| #tsv
'

jq: remove elements from array and output into a single line each one

Is there any way to output as "slurp" format and input array?
[
{ "id": "id1", "value": "value1"},
{ "id": "id2", "value": "value2"}
]
I'd like to get:
{ "id": "id1", "value": "value1"}
{ "id": "id2", "value": "value2"}
Each element outside of array, and each element into a single line.
I've tried with -c option. but it generating all array into a single line.
I mean, -c option is generating me:
[{"id":"id1","value":"value1"},{"id":"id2","value":"value2"}]
jq -c '.[]' does what you want:
In:
[
{"foo": 42, "bar": "less interesting data"},
{"foo": 42, "bar": "less interesting data"},
{"foo": 42, "bar": "less interesting data"}
]
Out:
{"foo":42,"bar":"less interesting data"}
{"foo":42,"bar":"less interesting data"}
{"foo":42,"bar":"less interesting data"}

Is my partition transform in Vega written correctly because the graph that is visualized is not accurate

I am creating a hierarchical representation of data in Vega. To do this I am using the stratify and partition transformations. The issue that is occurring lies with the x coordinates that are generated with the partition transformation. In the link, navigate to data viewer and select tree-map. The x0 and x1 for the initial id, the top most element, "completed stories" within the hierarchy ranges from 0 - 650. The next two elements, "testable" & "not testable", should have a combined x range of 0 - 650. But instead, they range from 0 - 455. The width should be based on their quantities, located in the "amount" field. Any suggestions as to why the rectangle that is generated is not commensurate with the quantities.
Link to Vega Editor with code shown
For your dataset "rawNumbers", values should only be provided for the "leave" nodes when using stratify transform.
{
"name": "rawNumbers",
"values": [
{"id": "completed stories", "parent": null},
{"id": "testable", "parent": "completed stories"},
{"id": "not testable", "parent": "completed stories", "amount": 1435},
{"id": "sufficiently tested", "parent": "testable"},
{"id": "insufficiently tested", "parent": "testable"},
{"id": "integration tested", "parent": "sufficiently tested", "amount": 1758},
{"id": "unit tested", "parent": "sufficiently tested", "amount": 36},
{"id": "partial coverage", "parent": "insufficiently tested", "amount": 298},
{"id": "no coverage", "parent": "insufficiently tested", "amount": 341}
]
},
Open in Vega Editor

Use scrapy to collect information for one item from multiple pages (and output it as a nested dictionary)

I'm trying to scrape data from a tournaments site.
Each tournament has some information such as the venue, the date, prices etc.
And also the rank of teams that took part. The rank is a table that simply provides the name of the team, and its position in the rank.
Then, you can click on the name of the team which takes you to a page were we can get the roster of players that the team selected for that tournament.
I'd like to scrape the data into something like:
[{
"name": "Grand Tournament",
"venue": "...",
"date": "...",
"rank": [
{"team_name": "Team name",
"rank": 1,
"roster": ["player1", "player2", "..."]
},
{"team_name": "Team name",
"rank": 2,
"roster": ["player1", "player2", "..."]
}
]
}]
I have the following spider to scrape a single tournament page (usage: scrapy crawl tournamentspider -a strat_url="<tournamenturl>")
class TournamentSpider(scrapy.Spider):
name = "tournamentspider"
allowed_domains = ["..."]
def start_requests(self):
try:
yield scrapy.Request(url=self.start_url, callback=self.parse)
except AttributeError:
raise ValueError("You must use this spider with argument start_url.")
def parse(self, response):
tournament_item = TournamentItem()
tournament_item['teams'] = []
tournament_item ['name'] = "Tournament Name"
tournament_item['date'] = "Date"
tournament_item['venue'] = "Venue"
ladder = response.css('#ladder')
for row in ladder.css('table tbody tr'):
row_cells = row.xpath('td')
participation_item = PlayerParticipationItem()
participation_item['team_name'] = "Team Name"
participation_item['rank'] = "x"
# Parse roster
roster_url_page = row_cells[2].xpath('a/#href').get()
# Follow link to extract list
base_url = urlparse(response.url)
absolute_url = f'{base_url.scheme}://{base_url.hostname}/{list_url_page}'
request = scrapy.Request(absolute_url, callback=self.parse_roster_page)
request.meta['participation_item'] = participation_item
yield request
# Include participation item in the roster
tournament_item['players'].append(participation_item)
yield tournament_item
def parse_roster_page(self, response):
participation_item = response.meta['participation_item']
participation_item['roster'] = ["Player1", "Player2", "..."]
return participation_item
My problem is that this spider produces the following output:
[{
"name": "Grand Tournament",
"venue": "...",
"date": "...",
"rank": [
{"team_name": "Team name",
"rank": 1,
},
{"team_name": "Team name",
"rank": 2,
}
]
},
{"team_name": "Team name",
"rank": 1,
"roster": ["player1", "player2", "..."]
},
{"team_name": "Team name",
"rank": 2,
"roster": ["player1", "player2", "..."]
}]
I know that those extra items in the output are generated by the yield request line. When I remove it, I'm no longer scraping the roster page, so the extra items disappear, but I no longer have the roster data.
Is is possible to get the output I'm aiming for?
I know that a different approach could be to scrape the tournament information, and then teams with a field that identifies the tournament. But I'd like to know if the initial approach is achievable.
you can use scrapy inline requests to to call parse_roster_page and you'll get the roster data without yielding it out.
The only change you need to include is the decorator #inline_requests with the function parse_roster_page.
from inline_requests import inline_requests
class TournamentSpider(scrapy.Spider):
def parse(self, response):
...
#inline_requests
def parse_roster_page(self, response):
...

Resources