Use xarray open_mfdataset on files with no time dimension included - netcdf

I have a list of NetCDF files that I would like to open with the xarray.open_mfdataset function.
This would normally be trivial, however I am running into an issue because the files I cam trying to open do not have any "time" dimension included in them:
data
Out[51]:
<xarray.Dataset>
Dimensions: (lat: 850, lon: 1500)
Coordinates:
* lat (lat) float64 54.98 54.94 54.9 54.86 ... 21.14 21.1 21.06 21.02
* lon (lon) float64 -126.0 -125.9 -125.9 -125.9 ... -66.1 -66.06 -66.02
Data variables:
Data (lat, lon) float32 ...
When I try to open my list of files with open_mfdataset, I of course get an error:
xr.open_mfdataset(files)
ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation
I however do have a list of dates corresponding to each file:
dates
Out[54]:
array([datetime.datetime(2009, 1, 1, 0, 0),
datetime.datetime(2009, 1, 2, 0, 0),
datetime.datetime(2009, 1, 3, 0, 0), ...,
datetime.datetime(2019, 12, 29, 0, 0),
datetime.datetime(2019, 12, 30, 0, 0),
datetime.datetime(2019, 12, 31, 0, 0)], dtype=object)
I assume there is some way I add a time dimension to each file and open them all with open_mfdataset, possibly with the "preprocess" argument.
Thanks for any help.

Here is my solution:
Create a function which adds a time dimension to a DataArray, and fill it with a arbitrary date:
def add_time_dim(xda):
xda = xda.expand_dims(time = [datetime.now()])
return xda
Then, pass this function to the preprocess argument when running the open_mfdataset functions:
data = xr.open_mfdataset(files, preprocess = add_time_dim)
Finally, fill the time dimension with my dates:
data['time'] = dates

Related

"You have to specify either input_ids or inputs_embeds", but I did specify the input_ids

I trained a BERT based encoder decoder model (EncoderDecoderModel) named ed_model with HuggingFace's transformers module.
I used the BertTokenizer named as input_tokenizer
I tokenized the input with:
txt = "Some wonderful sentence to encode"
inputs = input_tokenizer(txt, return_tensors="pt").to(device)
print(inputs)
The output clearly shows that a input_ids is the return dict
{'input_ids': tensor([[ 101, 5660, 7975, 2127, 2053, 2936, 5061, 102]], device='cuda:0'), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}
But when I try to predict, I get this error:
ed_model.forward(**inputs)
ValueError: You have to specify either input_ids or inputs_embeds
Any ideas ?
Well, apparently this is a known issue, for example: This issue of T5
The problem is that there's probably a renaming procedure in the code, since we use a encoder-decoder architecture we have 2 types of input ids.
The solution is to explicitly specify the type of input id
ed_model.forward(decoder_input_ids=inputs['input_ids'],**inputs)
I wish it was documented somewhere, but now you know :-)

python import data from file, and get date sequence

I have a txt file contain some lines, like this:
[datetime.datetime(2013, 1, 4, 9, 35, 0, 4996), datetime.datetime(2013, 1, 4, 9, 40, 0, 4998),datetime.datetime(2013, 1, 4, 9, 45, 0, 5000)]
how to load data and translate to list like this
[2013-01-04 09:35:00.004996,
2013-01-04 09:40:00.004998,
2013-01-04 09:45:00.005000]
for line in dataFile.readlines():
print(type(line))
I get
<class 'str'>
how to do please
Thank you in advance
you will always have strings in a text file. but you can convert the strings to datetime objects:
import time
fmt = '%Y-%m-%d %H:%M:%S.%f'
dt = time.strptime('2013-01-04 09:35:00.004996', fmt)
print(dt)
or maybe i got you wrong and in your file you really have the string that looks like a list (please clarify); then you could try
from ast import literal_eval
import datetime
import re
strg = '[datetime.datetime(2013, 1, 4, 9, 35, 0, 4996), datetime.datetime(2013, 1, 4, 9, 40, 0, 4998),datetime.datetime(2013, 1, 4, 9, 45, 0, 5000)]'
dates = []
match = re.findall('datetime.datetime\([0-9 ,]+\)', strg)
for date_str in match:
args = literal_eval(date_str.replace('datetime.datetime', ''))
dates.append(datetime.datetime(*args))
print(dates)
Your text file is not dumped property for reading as json file.
its ok you can solve your problem as below
import datetime
output=[]
#I am assuming that you have already defined datafile
for line in dataFile.readlines():
output.append(eval(line))
print output
but before writing data to your text file you need to use json.dumps(object) then it is easy to get your object back by using json.load().
dateList = []
for line in dataFile.readlines():
match = re.findall('datetime.datetime\([0-9 ,]+\)', line)
for date_str in match:
# I can get this
print(eval(date_str))
# Translate
dates = date_str.replace('datetime.datetime', '')
dateList.append(dates)
# get this
print(dateList)
Thanks ALL

Is this intended behavior or a bug in datetime timedelta?

from datetime import datetime timedelta
import pytz
ppt = pytz.timezone('US/Pacific')
first = ppt.localize(datetime(2013, 3, 10, 0, 0, 0))
first+=timedelta(hours=2)
first
returns datetime.datetime(2013, 3, 10, 2, 0, tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)
It should return datetime.datetime(2013, 3, 10, 3, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)
You can workaround this, apparent, bug by doing astimezone(ppt) after adding the hours.
So, is this a bug? Am I doing it wrong? Or is it intended to have code refresh after adding time?
You need to call normalize() using the timezone object again when doing datetime arithmetic:
>>> first
datetime.datetime(2013, 3, 10, 2, 0, tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)
>>> ppt.normalize(first)
datetime.datetime(2013, 3, 10, 3, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)
As noted in the docs:
In addition, if you perform date arithmetic on local times that cross DST boundaries, the result may be in an incorrect timezone. A normalize() method is provided to correct this.

Get same output as R console in Java using JRI

When I enter the following commands directly into the R console
library("xts")
mySeries <- xts(c(1.0, 2.0, 3.0, 5.0, 6.0), order.by=c(ISOdatetime(2001, 1, 1, 0, 0, 0), ISOdatetime(2001, 1, 2, 0, 0, 0), ISOdatetime(2001, 1, 3, 0, 0, 0), ISOdatetime(2001, 1, 4, 0, 0, 0), ISOdatetime(2001, 1, 5, 0, 0, 0)))
resultingSeries <- to.monthly(mySeries)
resultingSeries
I will get an output like this
mySeries.Open mySeries.High mySeries.Low mySeries.Close
Jan 2001 1 6 1 6
When I look into the attributes, I see the following output
attributes(resultingSeries)
$dim
[1] 1 4
$dimnames
$dimnames[[1]]
NULL
$dimnames[[2]]
[1] "mySeries.Open" "mySeries.High" "mySeries.Low" "mySeries.Close"
$index
[1] 978307200
attr(,"tclass")
[1] "yearmon"
$tclass
[1] "POSIXct" "POSIXt"
$tzone
[1] ""
$class
[1] "xts" "zoo"
$.indexCLASS
[1] "yearmon"
This is the same I get in Java. I'm wondering where the magic happens so that I see the nice output I get in R. I have no access to the event loop, since I'm using JRI like this (since, it's the recommended way and simplifies error handling):
REngine engine = REngine.engineForClass("org.rosuda.REngine.JRI.JRIEngine");
REXP result = engine.parseAndEval(...)
/edit
In Java I execute each command from above as follows:
REXP result = engine.parseAndEval("resultingSeries") // or any other command
What I get is
org.rosuda.REngine.REXPDouble#4ac66122+[12]
The payload being doubles: 1, 6, 1, 6
The attributes are the same as specified above.
Now R does some magic to display the output above. Is there a way I can get the same output without having to create it manually by myself? Where's the implementation stored, that R gets the above mentioned output?
Here is a piece of code that will work, here i extracted the first element of the field mySeries.Open from the object resultingSeries (which i converted to a data frame) which is equal to 1, notice that you can't pass all of the resultingSeries object strait into Java, you will need to break it down.
package stackoverflow;
import org.rosuda.JRI.REXP;
import org.rosuda.JRI.Rengine;
/**
*
* #author yschellekens
*/
public class StackOverflow {
public static void main(String[] args) throws Exception {
String[] Rargs = {"--vanilla"};
Rengine rengine = new Rengine( Rargs, false, null);
rengine.eval("library('xts')");
rengine.eval("mySeries <- xts(c(1.0, 2.0, 3.0, 5.0, 6.0), order.by=c(ISOdatetime(2001, 1, 1, 0, 0, 0), ISOdatetime(2001, 1, 2, 0, 0, 0), ISOdatetime(2001, 1, 3, 0, 0, 0), ISOdatetime(2001, 1, 4, 0, 0, 0), ISOdatetime(2001, 1, 5, 0, 0, 0)))");
rengine.eval("resultingSeries <- to.monthly(mySeries)");
rengine.eval("resultingSeries<-as.data.frame(resultingSeries)");
REXP result= rengine.eval("resultingSeries$mySeries.Open");
System.out.println("Greeting from R: "+result.asDouble());
}
}
And the Java output:
run:
Greeting from R: 1.0
I figured out the following workaround. The solution is far from perfect.
R offers a command to save its console output as characters vector.
capture.output( {command} )
We can access the output using
REXPString s = rengine.parseAndEval("capture.output( to.monthly(mySeries))")
String[] output = result.asStrings()
The variable output will contain all output lines
[0] mySeries.Open mySeries.High mySeries.Low mySeries.Close
[1]Jan 2001 1 6 1 6
Alternatively you coud use JRIEngine and attack yourself to the event loop, which it did not want in my case (due to the more complicated error handling).

minValue and maxValue as Time Range in hAxis in Google Chart

I need to set time range for my hAxis to have minValue of 09:00 and maxValue 17:00 with increment of 1 hour (i.e. 9, 10, 11, 12, 13, 14, ... , 17)
Currently my data is formatted as H:m (for example: 09:35, 10:20)
var formatter3 = new google.visualization.DateFormat({pattern: 'H:m'});
formatter3.format(data,0);
And below are my options:
var options = {
curveType: "function",
title : '',
hAxis:{slantedTextAngle: 90,textStyle:{fontSize:8}},
colors : ['red','#3366CC', '#999999'],
vAxes: {
0: {logScale: false, format:'0.0000'},
1: {logScale: false}
},
hAxis: {
format: 'H:m',
minValue: new Date(null, null, null, 9, 0, 0),
maxValue: new Date(null, null, null, 17, 0, 0),
viewWindow:{min: new Date(null, null, null, 9, 0, 0),
max: new Date(null, null, null, 17, 0, 0)},
series: {
0: {targetAxisIndex:0, type: "line"},
1: {targetAxisIndex:0, type: "line"},
2: {targetAxisIndex:1, type: "bars"}
}
};
However , it is still not working. Please advise. Thanks!
Unfortunately, the minValue, maxValue, and baseline value are ignored for date and time values. I am not sure that this is a recent bug but I just noticed it a week ago. You might try to experiment with the viewWindow min and max, and the gridlines.count option to get the desired result. Or you might be able to convert all your date values to strings, if the values are evenly spaced, in which case axes will use your explicit values.
Another new feature that could work for you is that you can provide an explicit array of tick values, with a ticks: [...] option. In the current release of gviz, the formatting is done using your format option, and that should be enough for your needs. In an upcoming release, you can also specify the formatting of each tick value.
So it might be best to specify the times in your example using timeofday values like so:
hAxis: {
ticks: [[9, 0, 0], [10, 0, 0], [11, 0, 0], [12, 0, 0], ...]
}
I think you could do the same kind of thing with datetime values instead, if that's what your data values are.

Resources