Is there a way to minimize the number of unique combination? - r

Trying to request ERA5 data. The request is limited by size, and the system will auto reject any requests bigger than the limit. However, one wants to be as close to the request limit as possible as each request takes few hours to be processed by Climate Data Store (CDS).
For example, I have a vector of years <- seq(from = 1981, to = 2019, by = 1) and a vector of variables <- c("a", "b", "c", "d", "e"...., "z"). The max request size is 11. Which means length(years) * length(variables) must be smaller or equal to 11.
For each request, I have to provide a list containing character vectors for years and variables. For example:
req.list <- list(year = c("1981", "1982", ..."1991"), variable = c("a")) This will work since there are 11 years and 1 variable.
I thought about using expand.grid() then use row 1-11, row 12-22, ...and unique() value each column to get the years and variable for request. But this approach sometimes will lead to request size too big:
req.list <- list(year = c("2013", "2014", ..."2018"), variable = c("a", "b")) is rejected since length(year) * length(variable) = 12 > 11.
Also I am using foreach() and doParallel to create multiple requests (max 15 requests at a time)
If anyone has a better solution please share (minimize the number of unique combos while obeying the request size limit), thank you very much.

The limit is set in terms of number of fields, which one can think of as number of "records" in the grib sense. Usually the approach suggested is to leave the list of variables, and shorter timescales in the retrieval command and then loop over the years (longer times). This is a matter of choice though for ERA5 as the data is all on cache, not on tape drive, with tape drive based requests it is important to retrieve data on the same tape with a single request (i.e. if you use the CDS to retrieve seasonal forecasts or other datasets that are not ERA5).
this is a simple looped example:
import cdsapi
c = cdsapi.Client()
yearlist=[str(s) for s in range(1979,2019)]
for year in yearlist:
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'format': 'netcdf',
'variable': [
'10m_u_component_of_wind', '10m_v_component_of_wind', '2m_dewpoint_temperature',
'2m_temperature',
],
'year': year,
'month': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
],
'day': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
'13', '14', '15',
'16', '17', '18',
'19', '20', '21',
'22', '23', '24',
'25', '26', '27',
'28', '29', '30',
'31',
],
'time': [
'00:00', '01:00', '02:00',
'03:00', '04:00', '05:00',
'06:00', '07:00', '08:00',
'09:00', '10:00', '11:00',
'12:00', '13:00', '14:00',
'15:00', '16:00', '17:00',
'18:00', '19:00', '20:00',
'21:00', '22:00', '23:00',
],
},
'data'+year+'.nc')
I presume you can parallelize this with foreach although I've never tried, I'm presuming it won't help too much as there is a job limit per user which is set quite low, so you will just end up with a large number of jobs in the queue there...

Related

I'm having trouble with parsing a JSON file

I am attempting to use a .json file I found online, but I'm starting to think that there is an underlying issue with the file. I am not very knowledgeable in .json files, so I am trying to convert it into a CSV file. I have yet to find a website that can do that for me.
I've tried using R to convert the file since the file is also quite large and I can only assume that most websites have a size limit. I have tried flattening it in r with this code:
library(jsonlite)
library(tidyr)
library(tidyverse)
json_string <- readLines("data.json")
json_data <- fromJSON(json_string)
json_data <- flatten(json_data)
df <- as_data_frame(json_data)
write_csv(df, "output.csv")
but it returns this error:
! Tibble columns must have compatible sizes.
* Size 2: Columns `A-Alrund, God of the Cosmos // A-Hakka, Whispering Raven`, `A-Blessed Hippogriff // A-Tyr's Blessing`, `A-Emerald Dragon // A-Dissonant Wave`, `A-Monster Manual // A-Zoological Study`, `A-Rowan, Scholar of Sparks // A-Will, Scholar of Frost`, and 484 more.
* Size 3: Column `Smelt // Herd // Saw`.
* Size 5: Column `Who // What // When // Where // Why`.
* Size 6: Columns `Everythingamajig`, `Garbage Elemental`, `Ineffable Blessing`, `Knight of the Kitchen Sink`, `Scavenger Hunt`, and 4 more.
i Only values of size one are recycled.
Backtrace:
1. tibble::as_data_frame(json_data)
3. tibble:::as_tibble.list(x, ...)
4. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
5. tibble:::recycle_columns(x, .rows, lengths)
Here is what the first 2 items of the .json file look like
{"data": {"\"Ach! Hans, Run!\"": [{"colorIdentity": ["G", "R"], "colors": ["G", "R"], "convertedManaCost": 6.0, "foreignData": [], "identifiers": {"scryfallOracleId": "a2c5ee76-6084-413c-bb70-45490d818374"}, "isFunny": true, "layout": "normal", "legalities": {}, "manaCost": "{2}{R}{R}{G}{G}", "manaValue": 6.0, "name": "\"Ach! Hans, Run!\"", "printings": ["UNH"], "purchaseUrls": {"cardKingdom": "https://mtgjson.com/links/84dfefe718a51cf8", "cardKingdomFoil": "https://mtgjson.com/links/d8c9f3fc1e93c89c", "cardmarket": "https://mtgjson.com/links/b9d69f0d1a9fb80c", "tcgplayer": "https://mtgjson.com/links/c51d2b13ff76f1f0"}, "rulings": [], "subtypes": [], "supertypes": [], "text": "At the beginning of your upkeep, you may say \"Ach! Hans, run! It's the . . .\" and the name of a creature card. If you do, search your library for a card with that name, put it onto the battlefield, then shuffle. That creature gains haste. Exile it at the beginning of the next end step.", "type": "Enchantment", "types": ["Enchantment"]}], "\"Brims\" Barone, Midway Mobster": [{"colorIdentity": ["B", "W"], "colors": ["B", "W"], "convertedManaCost": 5.0, "foreignData": [], "identifiers": {"scryfallOracleId": "c64c31f2-c1be-414e-9dff-c3b77ba97545"}, "isFunny": true, "layout": "normal", "leadershipSkills": {"brawl": false, "commander": true, "oathbreaker": false}, "legalities": {}, "manaCost": "{3}{W}{B}", "manaValue": 5.0, "name": "\"Brims\" Barone, Midway Mobster", "power": "5", "printings": ["UNF"], "purchaseUrls": {"cardKingdom": "https://mtgjson.com/links/d1e320bd9d6813c0", "cardKingdomFoil": "https://mtgjson.com/links/18f86e8a04682c34", "cardmarket": "https://mtgjson.com/links/d5a3d8cfb60767d4", "tcgplayer": "https://mtgjson.com/links/980f45f2bc8c3733"}, "rulings": [], "subtypes": ["Human", "Rogue"], "supertypes": ["Legendary"], "text": "When \"Brims\" Barone, Midway Mobster enters the battlefield, put a +1/+1 counter on each other creature you control that has a hat.\n\"Brims\" Barone, Midway Mobster has menace as long as you're wearing a hat.", "toughness": "4", "type": "Legendary Creature — Human Rogue", "types": ["Creature"]}]}
I am hoping that the resulting csv file has the keys as the column names, and the values to be assigned to the columns based on their keys.
EDIT:
I have now attached a screenshot of what the json_data structure looks like.Structure of json_data
Assuming it's one of the JSON dumps from scryfall, try this:
library(jsonlite)
library(tidyr)
library(tidyverse)
todo <- list.files(pattern = ".json")
json_data <- fromJSON(todo)
json_data_flat_jsl <- jsonlite::flatten(json_data)
df <- as_tibble(json_data_flat_jsl)
write_csv(df, "output.csv")

Extracting information from string and creating new variable

I have a column with big character strings like this:
"[{'tipoTeste': 'RT-PCR', 'estadoTeste': 'Concluído',
'dataColetaTeste': {'__type': 'Date', 'iso': '2021-12-30T03:00:00.000Z'},
'resultadoTeste': 'Detectável', 'loteTeste': None,
'fabricanteTeste': None, 'codigoTipoTeste': '1', 'codigoEstadoTeste': '3',
'codigoResultadoTeste': '1', 'codigoFabricanteTeste': None}]"
And i want to create another variable called Date with the date information inside this huge string, in this case is 2021-12-30
Im not managing to grep this date information for all rows....
This would work:
library(stringr)
str_extract_all(str, "[0-9]{4}-[0-9]{2}-[0-9]{2}")
[[1]]
[1] "2021-12-30"
We could use parse_date directly on the string
library(parsedate)
> as.Date(parse_date(str1))
[1] "2021-12-30"
data
str1 <- "[{'tipoTeste': 'RT-PCR', 'estadoTeste': 'Concluído',
'dataColetaTeste': {'__type': 'Date', 'iso': '2021-12-30T03:00:00.000Z'},
'resultadoTeste': 'Detectável', 'loteTeste': None,
'fabricanteTeste': None, 'codigoTipoTeste': '1', 'codigoEstadoTeste': '3',
'codigoResultadoTeste': '1', 'codigoFabricanteTeste': None}]"

split a key-value pair in Python

I have a dictionairy as follows:
{
"age": "76",
"Bank": "98310",
"Stage": "final",
"idnr": "4578",
"last number + Value": "[345:K]"}
I am trying to adjust the dictionary by splitting the last key-value pair creating a new key('total data'), it should look like this:
"Total data":¨[
{
"last number": "345"
"Value": "K"
}]
}
Does anyone know if there is a split function based on ':' and '+' or a for loop to accomplish this?
Thanks in advance.
One option to accomplish that could be getting the last key from the dict and using split on + for the key and : for the value removing the outer square brackets assuming the format of the data is always the same.
If you want Total data to contain a list, you can wrap the resulting dict in []
from pprint import pprint
d = {
"age": "76",
"Bank": "98310",
"Stage": "final",
"idnr": "4578",
"last number + Value": "[345:K]"
}
last = list(d.keys())[-1]
d["Total data"] = dict(
zip(
last.strip().split('+'),
d[last].strip("[]").split(':')
)
)
pprint(d)
Output (tested with Python 3.9.4)
{'Bank': '98310',
'Stage': 'final',
'Total data': {' Value': 'K', 'last number ': '345'},
'age': '76',
'idnr': '4578',
'last number + Value': '[345:K]'}
Python demo

Replacing key from one dictionary with the key from another

Suppose I have 2 dictionaries:
Dict #1:
statedict = {'Alaska': '02', 'Alabama': '01', 'Arkansas': '05', 'Arizona': '04', 'California':'06', 'Colorado': '08', 'Connecticut': '09','DistrictOfColumbia': '11', 'Delaware': '10', 'Florida': '12', 'Georgia': '13', 'Hawaii': '15', 'Iowa': '19', 'Idaho': '16', 'Illinois': '17', 'Indiana': '18', 'Kansas': '20', 'Kentucky': '21', 'Louisiana': '22', 'Massachusetts': '25', 'Maryland': '24', 'Maine': '23', 'Michigan': '26', 'Minnesota': '27', 'Missouri': '29', 'Mississippi': '28', 'Montana': '30', 'NorthCarolina': '37', 'NorthDakota': '38', 'Nebraska': '31', 'NewHampshire': '33', 'NewJersey': '34', 'NewMexico': '35', 'Nevada': '32', 'NewYork': '36', 'Ohio': '39', 'Oklahoma': '40', 'Oregon': '41', 'Pennsylvania': '42', 'PuertoRico': '72', 'RhodeIsland': '44', 'SouthCarolina': '45', 'SouthDakota': '46', 'Tennessee': '47', 'Texas': '48', 'Utah': '49', 'Virginia': '51', 'Vermont': '50', 'Washington': '53', 'Wisconsin': '55', 'WestVirginia': '54', 'Wyoming': '56'}
Dict #2:
master_dict = {'01': ['01034','01112'], '06': ['06245', '06025, ''06007'], '13': ['13145']}
*The actual master_dict is much longer.
Basically, I want to replace the 2-digit keys in master_dict with the long name keys in statedict. How do I do this? I am trying to use the following, but it doesn't quite work.
for k, v in master_dict.items():
for state, fip in statedict.items():
if k == fip:
master_dict[k] = statedict[state]
You can use a dictionary comprehension to make a lookup table mapping values to keys. A second dictionary comprehension performs the lookups to replace numbers with words:
lookup = {v: k for k, v in statedict.items()}
result = {lookup[k]: v for k, v in master_dict.items()}
print(result)
Output:
{'Alabama': ['01034', '01112'],
'California': ['06245', '06025, 06007'],
'Georgia': ['13145']}
Try it here

Grafana: How to have the duration for a selected period

I can't find the correct mathematical formula to compute a SLA (availability) with Grafana:
I have graph to show the duration of downtime for each days:
From this, i would like to compute the SLA (eg: 99,5%).
On the graph for the selected period (Last 7 days) i can to have this data:
71258 is the sum of duration of downtime in second. I have this with summarize(1day, max, false)
I need to have the sum of duration of time for the selected period (here 7 days = 604800second). But how ?
If i have this last data, after i will do :
(100% * 604800) / 71258 = X %
100% - X % = My SLA!!
My question is: Which formula use to have the duration for a selected period in Grafana ?
One of the database you can run behind Grafana, is Axibase Time Series Database (ATSD). It provides built-in aggregation functions that can perform SLA-type calculations, for example, to compute % of the period when the value exceeded the threshold.
THRESHOLD_COUNT - number of violations in the period
THRESHOLD_DURATION - cumulative duration of the violations
THRESHOLD_PERCENT - duration divided by period
In your example, that would be THRESHOLD_PERCENT.
Here's a sample SLA report for Amazon Web Services instance: https://apps.axibase.com/chartlab/0aa34311/6/. THRESHOLD_PERCENT is visualized on the top chart.
The API request looks as follows:
{
"queries": [{
"startDate": "2016-02-22T00:00:00Z",
"endDate": "2016-02-23T00:00:00Z",
"timeFormat": "iso",
"entity": "nurswgvml007",
"metric": "app.response_time",
"aggregate": {
"types": [
"THRESHOLD_COUNT",
"THRESHOLD_DURATION",
"THRESHOLD_PERCENT"
],
"period": {
"count": 1,
"unit": "HOUR"
},
"threshold": {
"max": 200
}
}
}]
}
ATSD driver: https://github.com/grafana/grafana-plugins/tree/master/datasources/atsd
Disclosure: I work for Axibase.

Resources