I need help to fix scraping graphql API - web-scraping
I was able via Google Dev Tools - Networking to paste the graphql query into Insomnia (copy url bash) to make a working python request. Now something has been changed on the part of the provider. Now I can not even run the curl in insomnia. I only get response 400.
On my previous code I get error message, which I can not solve myself.
I would be very happy for a working solution.
My coder that worked so far is:
import requests
import json
def scrape_digitec():
url = "https://www.digitec.ch/api/graphql"
headers = {
"authority": "www.digitec.ch",
"accept": "application/json",
"accept-language": "de-CH",
"cache-control": "no-cache",
"content-type": "application/json",
"origin": "https://www.digitec.ch",
"pragma": "no-cache",
"referer": "https://www.digitec.ch/search?q=bang%20olufsen",
"sec-ch-ua": '"Not A;Brand";v="99", "Chromium";v="100", "Google Chrome";v="100"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
"x-dg-country": "ch",
"x-dg-mandator": "406802",
"x-dg-portal": "25",
"x-dg-testgroup": "Default"
}
search = 'lg'
offset = '0'
payload = '{"query":"query ENTER_SEARCH(\\t$query: String!\\t$sortOrder: ProductSort\\t$limit: Int = 9\\t$offset: Int = 0\\t$filters: [SearchFilter]\\t$include: [String!]\\t$exclude: [String!]\\t$searchQueryId: String\\t$siteId: String) {\\tsearch(\\t\\tquery: $query\\t\\tfilters: $filters\\t\\tsearchQueryId: $searchQueryId\\t\\tsiteId: $siteId\\t) {\\t\\tproducts(limit: $limit, offset: $offset, sortOrder: $sortOrder) {\\t\\t\\ttotal\\t\\t\\thasMore\\t\\t\\tnextOffset\\t\\t\\tresults {\\t\\t\\t\\t...ProductSearchResult\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\t__typename\\t\\t}\\t\\tfilters(include: $include, exclude: $exclude) {\\t\\t\\tproduct {\\t\\t\\t\\tidentifier\\t\\t\\t\\tname\\t\\t\\t\\tfilterType\\t\\t\\t\\tscore\\t\\t\\t\\ttooltip {\\t\\t\\t\\t\\t...FilterTooltipResult\\t\\t\\t\\t\\t__typename\\t\\t\\t\\t}\\t\\t\\t\\t...CheckboxSearchFilterResult\\t\\t\\t\\t...RangeSearchFilterResult\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\t__typename\\t\\t}\\t\\tmagazinePages(limit: 3) {\\t\\t\\tids {\\t\\t\\t\\tid\\t\\t\\t\\tscore\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\ttotal\\t\\t\\t__typename\\t\\t}\\t\\tauthors(limit: 3) {\\t\\t\\tids {\\t\\t\\t\\tid\\t\\t\\t\\tscore\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\ttotal\\t\\t\\t__typename\\t\\t}\\t\\tdiscussions(limit: 3) {\\t\\t\\tids {\\t\\t\\t\\tid\\t\\t\\t\\tscore\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\ttotal\\t\\t\\t__typename\\t\\t}\\t\\tquestions(limit: 3) {\\t\\t\\tids {\\t\\t\\t\\tid\\t\\t\\t\\tscore\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\ttotal\\t\\t\\t__typename\\t\\t}\\t\\tratings(limit: 3) {\\t\\t\\tids {\\t\\t\\t\\tid\\t\\t\\t\\tscore\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\ttotal\\t\\t\\t__typename\\t\\t}\\t\\tproductTypes(limit: 24) {\\t\\t\\ttotal\\t\\t\\tresults {\\t\\t\\t\\tid\\t\\t\\t\\tname\\t\\t\\t\\tprimarySynonyms\\t\\t\\t\\tisVisible\\t\\t\\t\\tdescription\\t\\t\\t\\tmetaDescription\\t\\t\\t\\timageUrl\\t\\t\\t\\tsearchScore\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\t__typename\\t\\t}\\t\\tbrands(limit: 24) {\\t\\t\\ttotal\\t\\t\\tresults {\\t\\t\\t\\tid\\t\\t\\t\\ttitle\\t\\t\\t\\tsearchScore\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\t__typename\\t\\t}\\t\\thelp(limit: 3) {\\t\\t\\tids {\\t\\t\\t\\tid\\t\\t\\t\\tscore\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\ttotal\\t\\t\\thasMore\\t\\t\\tresults {\\t\\t\\t\\tsearchScore\\t\\t\\t\\ttitle\\t\\t\\t\\tid\\t\\t\\t\\turl\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\t__typename\\t\\t}\\t\\t_meta {\\t\\t\\tqueryInfo {\\t\\t\\t\\tcorrectedQuery\\t\\t\\t\\tdidYouMeanQuery\\t\\t\\t\\tlastProductSearchPass\\t\\t\\t\\texecutedSearchTerm\\t\\t\\t\\ttestGroup\\t\\t\\t\\tisManagedQuery\\t\\t\\t\\tisRerankedQuery\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\tredirectionUrl\\t\\t\\tportalReferral {\\t\\t\\t\\tproductCount\\t\\t\\t\\tportalName\\t\\t\\t\\turl\\t\\t\\t\\tproductImageUrls\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\t__typename\\t\\t}\\t\\t__typename\\t}}fragment ProductSearchResult on ProductSearchResultItem {\\tsearchScore\\tmandatorSpecificData {\\t\\t...ProductMandatorSpecific\\t\\t__typename\\t}\\tproduct {\\t\\t...ProductMandatorIndependent\\t\\t__typename\\t}\\toffer {\\t\\t...ProductOffer\\t\\t__typename\\t}\\t__typename}fragment FilterTooltipResult on FilterTooltip {\\ttext\\tmoreInformationLink\\t__typename}fragment CheckboxSearchFilterResult on CheckboxSearchFilter {\\toptions {\\t\\tidentifier\\t\\tname\\t\\tproductCount\\t\\tscore\\t\\treferenceValue {\\t\\t\\tvalue\\t\\t\\tunit {\\t\\t\\t\\tabbreviation\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\t__typename\\t\\t}\\t\\tpreferredValue {\\t\\t\\tvalue\\t\\t\\tunit {\\t\\t\\t\\tabbreviation\\t\\t\\t\\t__typename\\t\\t\\t}\\t\\t\\t__typename\\t\\t}\\t\\ttooltip {\\t\\t\\t...FilterTooltipResult\\t\\t\\t__typename\\t\\t}\\t\\t__typename\\t}\\t__typename}fragment RangeSearchFilterResult on RangeSearchFilter {\\treferenceMin\\tpreferredMin\\treferenceMax\\tpreferredMax\\treferenceStepSize\\tpreferredStepSize\\trangeMergeInfo {\\t\\tisBottomMerged\\t\\tisTopMerged\\t\\t__typename\\t}\\treferenceUnit {\\t\\tabbreviation\\t\\t__typename\\t}\\tpreferredUnit {\\t\\tabbreviation\\t\\t__typename\\t}\\trangeFilterDataPoint {\\t\\t...RangeFilterDataPointResult\\t\\t__typename\\t}\\t__typename}fragment ProductMandatorSpecific on MandatorSpecificData {\\tisBestseller\\tisDeleted\\tshowroomSites\\tsectorIds\\t__typename}fragment ProductMandatorIndependent on ProductV2 {\\tid\\tproductId\\tname\\tnameProperties\\tproductTypeId\\tproductTypeName\\tbrandId\\tbrandName\\taverageRating\\ttotalRatings\\ttotalQuestions\\tisProductSet\\timages {\\t\\turl\\t\\theight\\t\\twidth\\t\\t__typename\\t}\\tenergyEfficiency {\\t\\tenergyEfficiencyColorType\\t\\tenergyEfficiencyLabelText\\t\\tenergyEfficiencyLabelSigns\\t\\tenergyEfficiencyImage {\\t\\t\\turl\\t\\t\\theight\\t\\t\\twidth\\t\\t\\t__typename\\t\\t}\\t\\t__typename\\t}\\tseo {\\t\\tseoProductTypeName\\t\\tseoNameProperties\\t\\tproductGroups {\\t\\t\\tproductGroup1\\t\\t\\tproductGroup2\\t\\t\\tproductGroup3\\t\\t\\tproductGroup4\\t\\t\\t__typename\\t\\t}\\t\\tgtin\\t\\t__typename\\t}\\thasVariants\\tsmallDimensions\\tbasePrice {\\t\\tpriceFactor\\t\\tvalue\\t\\t__typename\\t}\\t__typename}fragment ProductOffer on OfferV2 {\\tid\\tproductId\\tofferId\\tshopOfferId\\tprice {\\t\\tamountIncl\\t\\tamountExcl\\t\\tcurrency\\t\\tfraction\\t\\t__typename\\t}\\tdeliveryOptions {\\t\\tmail {\\t\\t\\tclassification\\t\\t\\tfutureReleaseDate\\t\\t\\t__typename\\t\\t}\\t\\tpickup {\\t\\t\\tsiteId\\t\\t\\tclassification\\t\\t\\tfutureReleaseDate\\t\\t\\t__typename\\t\\t}\\t\\tdetailsProvider {\\t\\t\\tproductId\\t\\t\\tofferId\\t\\t\\tquantity\\t\\t\\ttype\\t\\t\\t__typename\\t\\t}\\t\\t__typename\\t}\\tlabel\\ttype\\tvolumeDiscountPrices {\\t\\tminAmount\\t\\tprice {\\t\\t\\tamountIncl\\t\\t\\tamountExcl\\t\\t\\tcurrency\\t\\t\\t__typename\\t\\t}\\t\\tisDefault\\t\\t__typename\\t}\\tsalesInformation {\\t\\tnumberOfItems\\t\\tnumberOfItemsSold\\t\\tisEndingSoon\\t\\tvalidFrom\\t\\t__typename\\t}\\tincentiveText\\tisIncentiveCashback\\tisNew\\tisSalesPromotion\\thideInProductDiscovery\\tcanAddToBasket\\thidePrice\\tinsteadOfPrice {\\t\\ttype\\t\\tprice {\\t\\t\\tamountIncl\\t\\t\\tamountExcl\\t\\t\\tcurrency\\t\\t\\tfraction\\t\\t\\t__typename\\t\\t}\\t\\t__typename\\t}\\tminOrderQuantity\\t__typename}fragment RangeFilterDataPointResult on RangeFilterDataPoint {\\tcount\\treferenceValue {\\t\\tvalue\\t\\tunit {\\t\\t\\tabbreviation\\t\\t\\t__typename\\t\\t}\\t\\t__typename\\t}\\tpreferredValue {\\t\\tvalue\\t\\tunit {\\t\\t\\tabbreviation\\t\\t\\t__typename\\t\\t}\\t\\t__typename\\t}\\t__typename}\",\"variables\":{\"limit\":100,\"offset\":'+offset+',\"query\":\"'+search+'\",\"filters\":[],\"sortOrder\":null,\"include\":[\"bra\",\"pt\",\"pr\"],\"exclude\":[\"off\"],\"searchQueryId\":\"4ce81461-09e2-4f7a-bb9a-8f6f8503fdc4\",\"siteId\":null},\"operationName\":\"ENTER_SEARCH\"}'
response = requests.request("POST", url, data=payload, headers=headers)
print(response)
data = response.json()
print(json.dumps(data, indent=2))
print(json.dumps(data))
if __name__ == '__main__':
scrape_digitec()
You need to format your payload into json format (python's dictionary/lists), then use the json parameter, as opposed to data:
import requests
import json
def scrape_digitec():
url = "https://www.digitec.ch/api/graphql"
headers = {
"authority": "www.digitec.ch",
"accept": "application/json",
"accept-language": "de-CH",
"cache-control": "no-cache",
"content-type": "application/json",
"origin": "https://www.digitec.ch",
"pragma": "no-cache",
"referer": "https://www.digitec.ch/search?q=bang%20olufsen",
"sec-ch-ua": '"Not A;Brand";v="99", "Chromium";v="100", "Google Chrome";v="100"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
"x-dg-country": "ch",
"x-dg-mandator": "406802",
"x-dg-portal": "25",
"x-dg-testgroup": "Default"
}
search = 'lg'
offset = 0
payload = [{
"operationName":"ENTER_SEARCH",
"variables":{
"limit":24,
"offset":offset,
"query":search,
"filters":[],
#"sortOrder":null,
"include":["bra","pt","pr","off"],
"searchQueryId":"e1b620fc-bf9c-41c6-85c0-cc49e5d12e25",
#"siteId":null},
},
"query":"query ENTER_SEARCH($query: String!, $sortOrder: ProductSort, $limit: Int = 9, $offset: Int = 0, $filters: [SearchFilter], $include: [String!], $exclude: [String!], $searchQueryId: String, $siteId: String) {\n search(\n query: $query\n filters: $filters\n searchQueryId: $searchQueryId\n siteId: $siteId\n ) {\n products(limit: $limit, offset: $offset, sortOrder: $sortOrder) {\n total\n hasMore\n nextOffset\n results {\n ...ProductSearchResult\n __typename\n }\n __typename\n }\n filters(include: $include, exclude: $exclude) {\n product {\n identifier\n name\n filterType\n score\n tooltip {\n ...FilterTooltipResult\n __typename\n }\n ...CheckboxSearchFilterResult\n ...RangeSearchFilterResult\n __typename\n }\n __typename\n }\n magazinePages(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n authors(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n discussions(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n questions(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n ratings(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n productTypes(limit: 24) {\n total\n results {\n id\n name\n primarySynonyms\n isVisible\n description\n metaDescription\n imageUrl\n searchScore\n __typename\n }\n __typename\n }\n brands(limit: 24) {\n total\n results {\n id\n title\n searchScore\n __typename\n }\n __typename\n }\n _meta {\n queryInfo {\n correctedQuery\n didYouMeanQuery\n lastProductSearchPass\n executedSearchTerm\n testGroup\n isManagedQuery\n isRerankedQuery\n __typename\n }\n redirectionUrl\n portalReferral {\n productCount\n portalName\n url\n productImageUrls\n __typename\n }\n __typename\n }\n __typename\n }\n}\n\nfragment ProductSearchResult on ProductSearchResultItem {\n searchScore\n mandatorSpecificData {\n ...ProductMandatorSpecific\n __typename\n }\n product {\n ...ProductMandatorIndependent\n __typename\n }\n offer {\n ...ProductOffer\n __typename\n }\n __typename\n}\n\nfragment FilterTooltipResult on FilterTooltip {\n text\n moreInformationLink\n __typename\n}\n\nfragment CheckboxSearchFilterResult on CheckboxSearchFilter {\n options {\n identifier\n name\n productCount\n score\n referenceValue {\n value\n unit {\n abbreviation\n __typename\n }\n __typename\n }\n preferredValue {\n value\n unit {\n abbreviation\n __typename\n }\n __typename\n }\n tooltip {\n ...FilterTooltipResult\n __typename\n }\n __typename\n }\n __typename\n}\n\nfragment RangeSearchFilterResult on RangeSearchFilter {\n referenceMin\n preferredMin\n referenceMax\n preferredMax\n referenceStepSize\n preferredStepSize\n rangeMergeInfo {\n isBottomMerged\n isTopMerged\n __typename\n }\n referenceUnit {\n abbreviation\n __typename\n }\n preferredUnit {\n abbreviation\n __typename\n }\n rangeFilterDataPoint {\n ...RangeFilterDataPointResult\n __typename\n }\n __typename\n}\n\nfragment ProductMandatorSpecific on MandatorSpecificData {\n isBestseller\n isDeleted\n showroomSites\n sectorIds\n __typename\n}\n\nfragment ProductMandatorIndependent on ProductV2 {\n id\n productId\n name\n nameProperties\n productTypeId\n productTypeName\n brandId\n brandName\n averageRating\n totalRatings\n totalQuestions\n isProductSet\n images {\n url\n height\n width\n __typename\n }\n energyEfficiency {\n energyEfficiencyColorType\n energyEfficiencyLabelText\n energyEfficiencyLabelSigns\n energyEfficiencyImage {\n url\n height\n width\n __typename\n }\n __typename\n }\n seo {\n seoProductTypeName\n seoNameProperties\n productGroups {\n productGroup1\n productGroup2\n productGroup3\n productGroup4\n __typename\n }\n gtin\n __typename\n }\n hasVariants\n smallDimensions\n basePrice {\n priceFactor\n value\n __typename\n }\n __typename\n}\n\nfragment ProductOffer on OfferV2 {\n id\n productId\n offerId\n shopOfferId\n price {\n amountIncl\n amountExcl\n currency\n fraction\n __typename\n }\n deliveryOptions {\n mail {\n classification\n futureReleaseDate\n __typename\n }\n pickup {\n siteId\n classification\n futureReleaseDate\n __typename\n }\n detailsProvider {\n productId\n offerId\n quantity\n type\n __typename\n }\n __typename\n }\n label\n type\n volumeDiscountPrices {\n minAmount\n price {\n amountIncl\n amountExcl\n currency\n __typename\n }\n isDefault\n __typename\n }\n salesInformation {\n numberOfItems\n numberOfItemsSold\n isEndingSoon\n validFrom\n __typename\n }\n incentiveText\n isIncentiveCashback\n isNew\n isSalesPromotion\n hideInProductDiscovery\n canAddToBasket\n hidePrice\n insteadOfPrice {\n type\n price {\n amountIncl\n amountExcl\n currency\n fraction\n __typename\n }\n __typename\n }\n minOrderQuantity\n __typename\n}\n\nfragment RangeFilterDataPointResult on RangeFilterDataPoint {\n count\n referenceValue {\n value\n unit {\n abbreviation\n __typename\n }\n __typename\n }\n preferredValue {\n value\n unit {\n abbreviation\n __typename\n }\n __typename\n }\n __typename\n}\n"}]
response = requests.post(url, json=payload, headers=headers)
print(response)
data = response.json()
print(json.dumps(data, indent=2))
print(json.dumps(data))
if __name__ == '__main__':
scrape_digitec()
Related
Scrapy Request giving 400 error while Python requests giving 200
I have the following Python script working: import requests import json url = "https://www.galaxus.ch/api/graphql/enter-search" payload = json.dumps([ { "operationName": "ENTER_SEARCH", "variables": { "limit": 24, "offset": 0, "query": "8719934001237", "filters": [], "sortOrder": None, "include": [ "bra", "pt", "pr", "off" ], "searchQueryId": "5ca2074a-59ea-44be-a6b4-74946d50285c", "siteId": None }, "query": "query ENTER_SEARCH($query: String!, $sortOrder: ProductSort, $limit: Int = 9, $offset: Int = 0, $filters: [SearchFilter], $include: [String!], $exclude: [String!], $searchQueryId: String, $rewriters: [String!], $siteId: String) {\n search(\n query: $query\n filters: $filters\n searchQueryId: $searchQueryId\n rewriters: $rewriters\n siteId: $siteId\n ) {\n products(limit: $limit, offset: $offset, sortOrder: $sortOrder) {\n total\n hasMore\n nextOffset\n results {\n ...ProductSearchResult\n __typename\n }\n __typename\n }\n filters(include: $include, exclude: $exclude) {\n product {\n identifier\n name\n filterType\n score\n tooltip {\n ...FilterTooltipResult\n __typename\n }\n ...CheckboxSearchFilterResult\n ...RangeSearchFilterResult\n __typename\n }\n __typename\n }\n magazinePages(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n authors(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n discussions(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n questions(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n ratings(limit: 3) {\n ids {\n id\n score\n __typename\n }\n total\n __typename\n }\n productTypes(limit: 24) {\n total\n results {\n id\n name\n primarySynonyms\n isVisible\n description\n metaDescription\n imageUrl\n searchScore\n __typename\n }\n __typename\n }\n brands(limit: 24) {\n total\n results {\n id\n title\n searchScore\n __typename\n }\n __typename\n }\n _meta {\n queryInfo {\n correctedQuery\n didYouMeanQuery\n lastProductSearchPass\n executedSearchTerm\n testGroup\n isManagedQuery\n isRerankedQuery\n __typename\n }\n redirectionUrl\n portalReferral {\n productCount\n portalName\n url\n productImageUrls\n __typename\n }\n __typename\n }\n __typename\n }\n}\n\nfragment ProductSearchResult on ProductSearchResultItem {\n searchScore\n mandatorSpecificData {\n ...ProductMandatorSpecific\n __typename\n }\n product {\n ...ProductMandatorIndependent\n __typename\n }\n offer {\n ...ProductOffer\n __typename\n }\n __typename\n}\n\nfragment FilterTooltipResult on FilterTooltip {\n text\n moreInformationLink\n __typename\n}\n\nfragment CheckboxSearchFilterResult on CheckboxSearchFilter {\n options {\n identifier\n name\n productCount\n score\n referenceValue {\n value\n unit {\n abbreviation\n __typename\n }\n __typename\n }\n preferredValue {\n value\n unit {\n abbreviation\n __typename\n }\n __typename\n }\n tooltip {\n ...FilterTooltipResult\n __typename\n }\n __typename\n }\n __typename\n}\n\nfragment RangeSearchFilterResult on RangeSearchFilter {\n referenceMin\n preferredMin\n referenceMax\n preferredMax\n referenceStepSize\n preferredStepSize\n rangeMergeInfo {\n isBottomMerged\n isTopMerged\n __typename\n }\n referenceUnit {\n abbreviation\n __typename\n }\n preferredUnit {\n abbreviation\n __typename\n }\n rangeFilterDataPoint {\n ...RangeFilterDataPointResult\n __typename\n }\n __typename\n}\n\nfragment ProductMandatorSpecific on MandatorSpecificData {\n isBestseller\n isDeleted\n showroomSites\n sectorIds\n __typename\n}\n\nfragment ProductMandatorIndependent on ProductV2 {\n id\n productId\n name\n nameProperties\n productTypeId\n productTypeName\n brandId\n brandName\n averageRating\n totalRatings\n totalQuestions\n isProductSet\n images {\n url\n height\n width\n __typename\n }\n energyEfficiency {\n energyEfficiencyColorType\n energyEfficiencyLabelText\n energyEfficiencyLabelSigns\n energyEfficiencyImage {\n url\n height\n width\n __typename\n }\n __typename\n }\n seo {\n seoProductTypeName\n seoNameProperties\n productGroups {\n productGroup1\n productGroup2\n productGroup3\n productGroup4\n __typename\n }\n gtin\n __typename\n }\n hasVariants\n smallDimensions\n basePrice {\n priceFactor\n value\n __typename\n }\n __typename\n}\n\nfragment ProductOffer on OfferV2 {\n id\n productId\n offerId\n shopOfferId\n price {\n amountIncl\n amountExcl\n currency\n fraction\n __typename\n }\n deliveryOptions {\n mail {\n classification\n futureReleaseDate\n __typename\n }\n pickup {\n siteId\n classification\n futureReleaseDate\n __typename\n }\n detailsProvider {\n productId\n offerId\n quantity\n type\n __typename\n }\n __typename\n }\n label\n type\n volumeDiscountPrices {\n minAmount\n price {\n amountIncl\n amountExcl\n currency\n __typename\n }\n isDefault\n __typename\n }\n salesInformation {\n numberOfItems\n numberOfItemsSold\n isEndingSoon\n validFrom\n __typename\n }\n incentiveText\n isIncentiveCashback\n isNew\n isSalesPromotion\n hideInProductDiscovery\n canAddToBasket\n hidePrice\n insteadOfPrice {\n type\n price {\n amountIncl\n amountExcl\n currency\n fraction\n __typename\n }\n __typename\n }\n minOrderQuantity\n __typename\n}\n\nfragment RangeFilterDataPointResult on RangeFilterDataPoint {\n count\n referenceValue {\n value\n unit {\n abbreviation\n __typename\n }\n __typename\n }\n preferredValue {\n value\n unit {\n abbreviation\n __typename\n }\n __typename\n }\n __typename\n}\n" } ]) headers = { 'accept-language': 'de-CH', 'content-type': 'application/json', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36' } response = requests.request("POST", url, headers=headers, data=payload) print(response.text) However, when I convert it to scrapy request, just changing the "data" parameter by "body" I get 400 error. I have seen possible solutions at stack overflow but none seems to work...
I fixed it, when making an scrapy request is needed to eliminate the "[]" at the beginning and end of the body
I want to scrape data from website which has hidden api but sendinf form data also not working
I want to scrape data from this website which has hidden api from which i wanted to extract data but sending form data also not working in scrapy. this is the website main url 'https://www.priceline.com/relax/at/478502/from/20220523/to/20220527/rooms/1/adults/2?vrid=2af9fb11ff31fc1a4170ac6a891116da' and this is the api url 'https://www.priceline.com/pws/v0/pcln-graph/' i post the request with the form data but not getting any data except i got 403 response code. this is the code # packages import scrapy from scrapy.crawler import CrawlerProcess from scrapy.selector import Selector from scrapy.http import FormRequest import urllib import os import json import csv import datetime # property scraper class class ResidentialSale(scrapy.Spider): # scraper name name = 'therapists' start_url = 'https://www.priceline.com/relax/at/478502/from/20220523/to/20220527/rooms/1/adults/2?vrid=2af9fb11ff31fc1a4170ac6a891116da' base_url = 'https://www.priceline.com/pws/v0/pcln-graph/' # headers headers = { "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36" } headers2 = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36", #"Accept": "*/*", #"Accept-Encoding": "gzip, deflate, br", #"Accept-Language": "en-US,en;q=0.9,bn;q=0.8,es;q=0.7,ar;q=0.6", #"Connection": "keep-alive", #"Content-Length": "1843", "Content-Type": "application/json", #"Host": "apis.airportthai.co.th", 'origin': 'https://www.priceline.com', 'referer': 'https://www.priceline.com/relax/at/478502/from/20220523/to/20220527/rooms/1/adults/2?vrid=2af9fb11ff31fc1a4170ac6a891116da', #"sec-ch-ua-mobile": "?0" } # payload payload = {"query":"query getHotelContentDeals($deals: [ContentDealType], $cguid: String, $rid: String, $at: String, $rguid: String, $visitId: String, $appc: String, $responseOptions: String, $addErrToResponse: Boolean, $googleMapStatic: GoogleMapStaticArguments) {\n hotelContent(deals: $deals, rid: $rid, at: $at, rguid: $rguid, cguid: $cguid, visitId: $visitId, appc: $appc, responseOptions: $responseOptions, addErrToResponse: $addErrToResponse, googleMapStatic: $googleMapStatic) {\n rguid\n errorMessage\n hotels {\n name\n starRating\n hotelId\n pclnId\n brandId\n chainCode\n taxId\n propertyTypeId\n quotes {\n text\n __typename\n }\n childrenStayFree\n maxChildrenStayFreeAge\n maxChildrenStayFreeNum\n customDesc {\n paragraphTitle\n text\n __typename\n }\n description\n hotelThemes {\n hotelThemeId\n hotelThemeName\n __typename\n }\n guaranteedBrandsIcon {\n icon\n name\n iconName\n __typename\n }\n policies {\n additionalPolicies\n cardsAccepted\n checkInTime\n checkOutTime\n parkingPolicy {\n policyText\n freeParking\n __typename\n }\n internetPolicy {\n policyText\n freeInternet\n __typename\n }\n childPolicy {\n policyText\n childrenStayFree\n __typename\n }\n childrenDescription\n importantInfo\n coronaInfoCheck\n coronaImportantInfo\n petDescription\n __typename\n }\n location {\n neighborhoodName\n neighborhoodDescription\n neighborhoodId\n lat\n lon\n address {\n addressLine1\n cityName\n provinceCode\n countryName\n zip\n phone\n isoCountryCode\n __typename\n }\n googleMapStatic {\n url\n __typename\n }\n cityID\n zoneId\n __typename\n }\n hotelFeatures {\n breakfastDetails\n features\n topAmenities\n hotelAmenities {\n code\n displayable\n filterable\n free\n name\n type\n category\n categoryId\n globalAmenityName\n relatedImages {\n urls\n __typename\n }\n __typename\n }\n cleanlinessAmensList\n highlightedAmenities\n amenityCategories {\n categoryId\n relatedImages {\n urls\n __typename\n }\n __typename\n }\n __typename\n }\n hotelOtherInfo {\n hotelOtherInfoData {\n id\n name\n detail\n __typename\n }\n __typename\n }\n images {\n imageHDUrl\n imageUrl\n __typename\n }\n __typename\n }\n __typename\n }\n}\n","variables":{"deals":[{"dealId":"478502","isSopqHotel":"false"}],"appc":"DESKTOP","rid":"DTDIRECT","responseOptions":"ALL_AMENITIES,HOTEL_IMAGES,UHD_IMAGES","cguid":"b6a02daf29ebfcd2d3a1f83498e688da","visitId":"2021102715122841282f06-RRLXGQD","addErrToResponse":"true","googleMapStatic":{"size":{"x":320}}},"operationName":"getHotelContentDeals"} try: os.remove('abx.csv') except OSError: pass # custom settings custom_settings = { 'CONCURRENT_REQUEST_PER_DOMAIN': 2, 'DOWNLOAD_DELAY': 1 } # general crawler def start_requests(self): # initial HTTP request yield scrapy.Request( url=self.start_url, #body = json.dumps(self.payload), headers=self.headers, #method = "POST", callback=self.parse ) def parse(self, res): print(res.status) yield FormRequest( url = self.base_url, body = json.dumps(self.payload), method = "POST", headers = self.headers2, callback = self.parse2 ) def parse2(self, response): print(response) ''' with open('qsranks.csv', 'a') as csv_file: writer = csv.DictWriter(csv_file, fieldnames=items.keys()) writer.writerow(items) ''' if __name__ == '__main__': # run scraper process = CrawlerProcess() process.crawl(ResidentialSale) process.start() #ResidentialSale.parse(ResidentialSale, '') all information is in this script. and the error i getting is this: 021-10-27 21:13:32 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2021-10-27 21:13:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.priceline.com/relax/at/478502/from/20220523/to/20220527/rooms/1/adults/2?vrid=2af9fb11ff31fc1a4170ac6a891116da> (referer: None) 200 2021-10-27 21:13:34 [scrapy.core.engine] DEBUG: Crawled (403) <POST https://www.priceline.com/pws/v0/pcln-graph/> (referer: https://www.priceline.com/relax/at/478502/from/20220523/to/20220527/rooms/1/adults/2?vrid=2af9fb11ff31fc1a4170ac6a891116da) 2021-10-27 21:13:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.priceline.com/pws/v0/pcln-graph/>: HTTP status code is not handled or not allowed 2021-10-27 21:13:35 [scrapy.core.engine] INFO: Closing spider (finished) 2021-10-27 21:13:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
Here is an example as working solution. The problem was in body aka Request Payload. Request Payload data is in json format but when sent it as body data then it creates error because json boolean data type meaning true/false start with lower case but when we send json data as python string(body) then we must convert true/false as True/False if exists. There are a few true/false in Request Payload and I converted them into True/False from body manually and I got response status 200. Code: import scrapy import json class PriceLine(scrapy.Spider): name = 'price' def start_requests(self): body = {"query":"query getHotelDetails($hotelID: ID, $allInclusive: Boolean, $checkIn: String, $checkOut: String, $roomsCount: Int, $cguid: ID, $cugdor: String, $currencyCode: String, $pclnID: ID, $metaID: ID, $metaHotelId: ID, $rehabRateKey: ID, $preferredRateID: ID, $rID: ID, $rateDisplayOption: String, $rguid: ID, $visitId: String, $refClickID: String, $reviewCount: Float, $paymentRateMerge: Boolean, $multiOccDisplay: Boolean, $multiOccRates: Boolean, $appCode: String, $adults: Int, $children: [String], $unlockDeals: Boolean, $authToken: ID, $responseOptions: String, $includePrepaidFeeRates: Boolean, $addErrToResponse: Boolean, $packagesDetailsSearchQuery: HotelPsapiDetailsArguments) {\n details: hotelDetails(hotelID: $hotelID, checkIn: $checkIn, checkOut: $checkOut, roomsCount: $roomsCount, cguid: $cguid, cugdor: $cugdor, currencyCode: $currencyCode, pclnID: $pclnID, metaID: $metaID, metaHotelId: $metaHotelId, rehabRateKey: $rehabRateKey, preferredRateID: $preferredRateID, rID: $rID, rateDisplayOption: $rateDisplayOption, rguid: $rguid, visitId: $visitId, refClickID: $refClickID, reviewCount: $reviewCount, paymentRateMerge: $paymentRateMerge, multiOccDisplay: $multiOccDisplay, multiOccRates: $multiOccRates, appCode: $appCode, adults: $adults, children: $children, allInclusive: $allInclusive, unlockDeals: $unlockDeals, authToken: $authToken, responseOptions: $responseOptions, includePrepaidFeeRates: $includePrepaidFeeRates, addErrToResponse: $addErrToResponse, packagesDetailsSearchQuery: $packagesDetailsSearchQuery) {\n rguid\n errorMessage\n hotel {\n pkgComponentIndex\n maxPricedOccupancy\n maxOccupancy\n merchandisingInfo {\n color\n badgeText\n bannerHeader\n bannerText\n __typename\n }\n reasonsToBook {\n color\n icon\n header\n substring\n __typename\n }\n hotelViewCount {\n cumulativeViewCount\n __typename\n }\n commonRoomAmenities {\n type\n name\n __typename\n }\n recmdScore\n totalReviewCount\n overallGuestRating\n rooms {\n isUnlockedMemberDeal\n displayableRates {\n originalRates {\n gid\n __typename\n }\n __typename\n }\n __typename\n }\n transformedRooms {\n maxPricedOccupancy\n roomDisplayName\n maxOccupancy\n isGreatForFamily\n roomId\n longDescription\n roomFacilities\n cleanliness {\n score\n totalReviews\n __typename\n }\n beddingOption\n bedCount\n roomThumbnailUrl\n roomSize\n amenities {\n code\n __typename\n }\n imageUrls {\n largeUrl\n mediumUrl\n __typename\n }\n roomOccupancies {\n roomCode\n numberOfAdults\n numberOfChildren\n numberOfBeds\n numberOfRooms\n __typename\n }\n roomRates {\n cartToken\n pkgPriceInformation {\n totalCost\n totalCostPerTraveler\n totalCostWithHotelMandatoryFees\n totalPayNow\n totalPayLater\n totalSavings\n originalCostPerTraveler\n totalStrikethrough\n totalHotelMandatoryFees\n roomMandatoryFees\n __typename\n }\n preferredRateFlag\n pricedOccupancy\n couponApplicable\n suggestedNumOfRooms\n mergedRate {\n isFullyUnlocked\n rateIdentifier\n price\n grandTotal\n currencySymbol\n roomsLeft\n cancellationPolicy\n cancellationPolicyLongText\n cancellationMsg\n refundPolicy\n debugString\n paymentOptionsText\n feeAmount\n isPayLater\n isUniversalCartEligible\n isXSellEligible\n __typename\n }\n isPayLater\n rateIdentifier\n isBestDeal\n price\n grandTotal\n currencySymbol\n roomsLeft\n strikeThroughPrice\n isFreeCancellation\n cancellationPolicy\n cancellationPolicyLongText\n cancellationMsg\n ccRequired\n refundPolicy\n savingPct\n payLaterMessage\n feeAmount\n bannerText\n programName\n merchandisingFlag\n rateLevelAmenities {\n name\n isHighlighted\n __typename\n }\n totalPriceExcludingTaxesAndFeePerStay\n paymentOptionsText\n disclaimerMessage\n debugString\n promos {\n promoType\n isVariableMarkupPromo\n title\n desc\n isHighlighted\n __typename\n }\n isFullyUnlocked\n incrementalPricingIconName\n isUniversalCartEligible\n basketPriceKey\n isXSellEligible\n itemDetailsKey\n bundlePriceKey\n rateKey\n __typename\n }\n cartToken\n basketPriceKey\n itemDetailsKey\n priceKey\n bundlePriceKey\n token\n planCode\n rateTypeCode\n gdsName\n __typename\n }\n guestReviews {\n firstName\n overallScore\n reviewTextGeneral\n reviewTextNegative\n reviewTextPositive\n sourceCode\n travelerType\n travelerTypeId\n creationDate\n __typename\n }\n reviewRatingSummary {\n ratings {\n description\n label\n score\n summaryCount\n summaryValue\n __typename\n }\n travelerType {\n count\n id\n type\n __typename\n }\n __typename\n }\n signInDealsAvailable\n signInDealsMinRate\n ratings {\n category\n score\n __typename\n }\n bookings {\n firstName\n lastNameInitial\n bookedPrice\n bookedCurrencyCode\n justBookedBadge\n __typename\n }\n ratesSummary {\n pricedOccupancy\n suggestedNumOfRooms\n freeCancelableRateAvail\n minPrice\n totalCostPerTraveler\n minStrikePrice\n promptUserToNativeApp\n savingsClaimStrikePrice\n savingsClaimDisclaimer\n savingsClaimPercentage\n minCurrencyCodeSymbol\n minCurrencyCode\n roomLeft\n payWhenYouStayAvailable\n pclnId\n programName\n merchandisingFlag\n preferredRateId\n rateIdentifier\n showRecommendation\n suggestedNumOfRooms\n status\n __typename\n }\n hasNodateRooms\n isAllInclusiveHotel\n location {\n neighborhoodDescription\n __typename\n }\n hotelFeatures {\n features\n highlightedAmenities\n hotelAmenities {\n code\n displayable\n free\n name\n type\n __typename\n }\n topAmenities\n breakfastDetails\n __typename\n }\n policies {\n checkInTime\n checkOutTime\n petDescription\n childrenDescription\n importantInfo\n __typename\n }\n itemKey\n basketItemKey\n componentKey\n retailPrice {\n pricePerPerson\n displayPricePerPerson\n amount\n displayAmount\n __typename\n }\n images {\n imageHDURL\n imageURL\n __typename\n }\n __typename\n }\n componentKeyMap\n los\n signInDealRelatedInfo {\n promptUserToSignIn\n __typename\n }\n __typename\n }\n}\n","variables":{"appCode":"DESKTOP","cguid":"0175bb9aa41f22723c5b1eefa03d025c","checkIn":"20220523","checkOut":"20220527","rID":"DTDIRECT","roomsCount":1,"currencyCode":"USD","refClickID":"","unlockDeals":True,"includePrepaidFeeRates":True,"visitId":"202110271625255708419d-RRLXGQD","addErrToResponse":True,"adults":2,"paymentRateMerge":False,"multiOccDisplay":True,"multiOccRates":True,"hotelID":"478502","rateDisplayOption":"S","reviewCount":5,"responseOptions":"POP_COUNT,REVIEWS,CUSTOM_DESC,RATE_SUMMARY,RATINGS,DETAILED_ROOM,HOTEL_IMAGES,RATE_IMPORTANT_INFO,RATE_CHARGES_DETAIL,PROXIMITY,BOOKINGS,NORATEROOMS,REFUND_INFO"},"operationName":"getHotelDetails"} yield scrapy.Request( url='https://www.priceline.com/pws/v0/pcln-graph/', callback=self.parse, method ="POST", body = json.dumps(body), headers = { 'accept':' */*', 'accept-encoding': 'gzip, deflate, br', 'accept-language': 'en-US,en;q=0.9,bn;q=0.8,es;q=0.7,ar;q=0.6', 'apollographql-client-name': 'relax', 'apollographql-client-version': 'master-1.1.813', 'content-length': '3452', 'content-type': 'application/json', 'origin': 'https://www.priceline.com', 'referer': 'https://www.priceline.com/relax/at/478502/from/20220523/to/20220527/rooms/1/adults/2?vrid=c97c644f8dd3411a3a8337ad364d86bc', 'sec-ch-ua-mobile': '?0', 'sec-fetch-dest': 'empty', 'sec-fetch-mode': 'cors', 'sec-fetch-site': 'same-origin', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36' }) def parse(self, response): resp = json.loads(response.body) for h in resp['data']['details']['hotel']['transformedRooms']: yield { 'roomDisplayName': h['roomDisplayName']} Output: {'roomDisplayName': 'Standard Room'} 2021-10-28 01:40:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.priceline.com/pws/v0/pcln-graph/> {'roomDisplayName': 'Standard Room'} 2021-10-28 01:40:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.priceline.com/pws/v0/pcln-graph/> {'roomDisplayName': 'Standard Room with River View'} 2021-10-28 01:40:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.priceline.com/pws/v0/pcln-graph/> {'roomDisplayName': 'Superior Room with Cathedral View'} 2021-10-28 01:40:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.priceline.com/pws/v0/pcln-graph/> {'roomDisplayName': 'Family Room'}
A way to pass data to instance from external file
I'd like to pass configuration file to an instance through metadata. Currently for testing purposes I'm just concatenating text inside the blueprint, like this: RE_vm: type: cloudify.openstack.nodes.Server properties: resource_id: { concat: ['router_cp_', { get_input: client_name }] } server: image: { get_input: re_image } flavor: { get_input: re_flavor } key_name: '' install_agent: false openstack_config: *openstack_config interfaces: cloudify.interfaces.lifecycle: create: inputs: args: meta: hostname: { concat: ['vMX-RE-', { get_input: client_name }] } (...) files: "/var/db/configfile": { concat: ["groups {\n re0 {\n system {\n host-name %hostname%;\n }\n interfaces {\n unit 0 {\n address ", { get_attribute: [left_port, fixed_ip_address] }, "/", { get_input: left_network_mask }, ";\n }\n }\n }\n global {\n system {\n root-authentication {\n encrypted-password \"", { get_secret: encrypted_password }, "\"; ## SECRET-DATA\n }\n }\n }\n }\n "]} relationships: - target: mgmt_port type: cloudify.openstack.server_connected_to_port But for real-life use, this is not feasible. How can I point to an external file and pass it to the instance? I haven't found any apparent way of doing that in the documentation. And preferably I'd like to dynamically fill some variables, based on inputs or some rules (like in Ansible templates). How can I do that with Cloudify?
If I understand correctly what you are asking for, you can either use SSH connection to the VM (using fabric plugin) and upload the file or it's content to the VM; Or you could use the cloudify-utilities-plugin and use the configuration plugin for this. For dynamically filling the inputs, you can change the inputs value when creating the deployment, inputs could be in a file (with file changed per deployment), or added to the cfy deployment create as variables, in cmd line or UI.
Unable to parse the JSON document: Unrecognized token '$util': was expecting ('true', 'false' or 'null')
I am trying to use a query with a filter: query queryPitchesByApprovedIndex($approved: Boolean = true) { queryPitchesByApprovedIndex(approved: $approved) { items { id } } } The mapping template, made by AppSync, looks like this: { "version": "2017-02-28", "operation": "Query", "query": { "expression": "#approved = :approved", "expressionNames": { "#approved": "approved", }, "expressionValues": { ":approved": {"B": $util.dynamodb.toBinary($ctx.args.approved)}, }, }, "index": "approved-index", "limit": $util.defaultIfNull($ctx.args.first, 20), "nextToken": $util.toJson($util.defaultIfNullOrEmpty($ctx.args.after, null)), "scanIndexForward": true, "select": "ALL_ATTRIBUTES", } The error I get is: Unable to parse the JSON document: 'Unrecognized token '$util': was expecting ('true', 'false' or 'null')\n at [Source: (String)\"{\n \"version\": \"2017-02-28\",\n \"operation\": \"Query\",\n \"query\": {\n \"expression\": \"#approved = :approved\",\n \"expressionNames\": {\n \"#approved\": \"approved\",\n },\n \"expressionValues\": {\n \":approved\": {\"B\": $util.dynamodb.toBinary($ctx.args.approved)},\n },\n },\n \"index\": \"approved-index\",\n \"limit\": 20,\n \"nextToken\": null,\n \"scanIndexForward\": true,\n \"select\": \"ALL_ATTRIBUTES\",\n}\"; line: 10, column: 31]' Any idea how I can fix that?
$util.dynamodb.toBinary(String data) takes a String as input, but you are passing a Boolean and this is why it fails evaluation. This is good feedback, I will check with the team if it's possible to make the utility more lenient and take a Boolean as well $util.dynamodb.toBinary(Boolean data) Here is a possible workaround in the meantime: #if($ctx.args.approved) #set($approved = $util.dynamodb.toBinaryJson("true")) #else #set($approved = $util.dynamodb.toBinaryJson("false")) #end { "version": "2017-02-28", "operation": "Query", "query": { "expression": "#approved = :approved", "expressionNames": { "#approved": "approved", }, "expressionValues": { ":approved": $approved }, }, "index": "approved-index", "limit": $util.defaultIfNull($ctx.args.first, 20), "nextToken": $util.toJson($util.defaultIfNullOrEmpty($ctx.args.after, null)), "scanIndexForward": true, "select": "ALL_ATTRIBUTES", }
How to maintain state with Cloud Functions and Cloud FIrestore
How do you maintain the correct state when using Cloud Functions? They are not guaranteed to fire in the same order that they are called. Here is a sequence of events: A document is updated currentState: state1 A document is updated currentState: state2 The Cloud Function triggers the state2 update. The Cloud Function triggers the state1 update. If your application requires the carrying out of functions in the correct order of states, there's a problem.
Cloud Functions are not guaranteed to fire in order or only once. Therefore, you must make them idempotent. You can resolve this in the following way: Always use transactions to update the state, so that 2 clients don't try and change the state at the same time. Create a state table which manages the state and runs functions based on the current state vs. the previous state. Clients must not change the state to a value less than that which exists currently. states.json [ {"currentState": "state1", "action": "state2", "newStates": ["state2"]}, {"currentState": "state1", "action": "state3", "newStates": ["state2", "state3"]}, {"currentState": "state1", "action": "state4", "newStates": ["state2", "state3", "state4"]}, {"currentState": "state1", "action": "state5", "newStates": ["state2", "state3", "state4", "state5"]}, {"currentState": "state2", "action": "state3", "newStates": ["state3"]}, {"currentState": "state2", "action": "state4", "newStates": ["state3", "state4"]}, {"currentState": "state2", "action": "state5", "newStates": ["state3", "state4", "state5"]}, {"currentState": "state3", "action": "state4", "newStates": ["state4"]}, {"currentState": "state3", "action": "state5", "newStates": ["state4", "state5"]}, {"currentState": "state4", "action": "state5", "newStates": ["state5"]} ] app.js function processStates (beforeState, afterState) { const states = require('../states'); let newStates; // Check the states and set the new state try { newStates = states.filter(function(e) {return e.currentState == beforeState && e.action == afterState;})[0].newStates; } catch (err) { newStates = null; } console.log(`newStates: ${newStates}`); if (newStates) { newStates.forEach(newState) { // Process state change here switch (newState) { case 'state1': { // Process state1 change break; } case 'state2': { // Process state2 change break; } default: { } } } } } Once you have an array of states, you can iterate through the using something like forEach or map to process the required commands.