I am trying to feed two sentences in character level into the LSTM layer for classification. My samples are similar to the following and my labels are one hot encoded classes.
label:
label array([1., 0., 0., 0., 0.])
sample:
array([['0', ' '],
[' ', 'l'],
['1', 'e'],
['1', 't'],
['2', 'n'],
['8', 'i'],
[' ', ' '],
['"', ';'],
['h', 'h'],
['t', 's'],
['t', 'o'],
['p', 't'],
['s', 'n'],
[':', 'i'],
['/', 'c'],
['/', 'a'],
['w', 'm'],
['w', '('],
['w', ' '],
['.', '0'],
['e', '.'],
['x', '5'],
['a', '/'],
['m', 'a'],
['p', 'l'],
['l', 'l'],
['e', 'i'],
['.', 'z'],
['c', 'o'],
['o', 'm'],
['m', '"'],
['/', ' '],
['c', '"'],
['m', '/'],
['s', 'd'],
['/', 'a'],
['t', 'o'],
['i', 'l'],
['n', 'n'],
['a', 'w'],
['-', 'o'],
['a', 'd'],
['c', '-'],
['c', 'r'],
['e', 'o'],
['s', 'f'],
['s', '-'],
['-', 'r'],
['e', 'o'],
['d', 't'],
['i', 'i']], dtype='<U1')
I am trying to use the Embedding layer of Keras to map the characters into vectors. The embedding layer, however, only takes in single dimensional sequences. How can I adjust the network to take in multi dimensional sequence? Currently I have the following code that works for single dimensional samples. 51 is my lstm window size and 74 is the size of my vocabulary.
model = keras.models.Sequential()
model.add(keras.layers.Embedding(input_dim=74,
output_dim=74,
input_length=51))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.LSTM(64,
dropout=0.5,
recurrent_dropout=0.5,
return_sequences=True,
input_shape=(51, 74)))
model.add(keras.layers.LSTM(64,
dropout=0.5,
recurrent_dropout=0.5))
model.add(keras.layers.Dense(num_classes, activation='sigmoid'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
Ok, I solved this problem by adding a Reshaped layer before Embedding, and then another reshape layer after embedding. Here is the code:
model = keras.models.Sequential()
model.add(keras.layers.Reshape((2 * lstm_window_size, 1), input_shape=(
lstm_window_size, 2)))
model.add(keras.layers.Embedding(input_dim=vocab_size + 1,
output_dim=100,
input_length=lstm_window_size * 2))
model.add(keras.layers.Reshape((lstm_window_size, 200)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.LSTM(64,
dropout=0.5,
recurrent_dropout=0.5,
return_sequences=True,
input_shape=(lstm_window_size, 2)))
model.add(keras.layers.LSTM(64,
dropout=0.5,
recurrent_dropout=0.5))
model.add(keras.layers.Dense(num_classes, activation='sigmoid'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
Related
It seems that there is no on_click option with dropdown widgets, I was wondering if there is some sort of workaround. One method I was thinking was, everytime an option is chosen, to flush the options and start the dropdown from the top again, where the top option would be the empty "".
For instance suppose I have:
from IPython.html import widgets
from IPython.display import display
def dropdown_event_handler(change):
print(change.new)
# flush the options and start from "" again
options = ["", "A", "B"]
dropdown = widgets.Dropdown(options=options, description="Categories")
dropdown.observe(dropdown_event_handler, names="value")
display(dropdown)
So the desired behaviour is that if I press "A" and "A" again, A would be printed out twice.
As you already suggested, you could set the value of the widget to "" after each change:
from IPython.html import widgets
from IPython.display import display
def dropdown_event_handler(change):
print(change.new)
dropdown.value = ""
options = ["", "A", "B"]
dropdown = widgets.Dropdown(options=options, description="Categories")
dropdown.observe(dropdown_event_handler, names='value')
display(dropdown)
And I fear that is your only option. The Dropdown widget has no other type than "change". You can see all available types by printing them with type=All.
from IPython.html import widgets
from IPython.display import display
from traitlets import All
def dropdown_event_handler(change):
print(change)
options = ["", "A", "B"]
dropdown = widgets.Dropdown(options=options, description="Categories")
dropdown.observe(dropdown_event_handler, type=All)
display(dropdown)
Output:
{'name': '_property_lock', 'old': traitlets.Undefined, 'new': {'index': 1}, 'owner': Dropdown(description='Categories', options=('', 'A', 'B'), value=''), 'type': 'change'}
{'name': 'label', 'old': '', 'new': 'A', 'owner': Dropdown(description='Categories', index=1, options=('', 'A', 'B'), value=''), 'type': 'change'}
{'name': 'value', 'old': '', 'new': 'A', 'owner': Dropdown(description='Categories', index=1, options=('', 'A', 'B'), value='A'), 'type': 'change'}
{'name': 'index', 'old': 0, 'new': 1, 'owner': Dropdown(description='Categories', index=1, options=('', 'A', 'B'), value='A'), 'type': 'change'}
{'name': '_property_lock', 'old': {'index': 1}, 'new': {}, 'owner': Dropdown(description='Categories', index=1, options=('', 'A', 'B'), value='A'), 'type': 'change'}
So you can't observe a value in a Dropdown widget if it did not change. For more information see the Traitlets documentation.
Trying to request ERA5 data. The request is limited by size, and the system will auto reject any requests bigger than the limit. However, one wants to be as close to the request limit as possible as each request takes few hours to be processed by Climate Data Store (CDS).
For example, I have a vector of years <- seq(from = 1981, to = 2019, by = 1) and a vector of variables <- c("a", "b", "c", "d", "e"...., "z"). The max request size is 11. Which means length(years) * length(variables) must be smaller or equal to 11.
For each request, I have to provide a list containing character vectors for years and variables. For example:
req.list <- list(year = c("1981", "1982", ..."1991"), variable = c("a")) This will work since there are 11 years and 1 variable.
I thought about using expand.grid() then use row 1-11, row 12-22, ...and unique() value each column to get the years and variable for request. But this approach sometimes will lead to request size too big:
req.list <- list(year = c("2013", "2014", ..."2018"), variable = c("a", "b")) is rejected since length(year) * length(variable) = 12 > 11.
Also I am using foreach() and doParallel to create multiple requests (max 15 requests at a time)
If anyone has a better solution please share (minimize the number of unique combos while obeying the request size limit), thank you very much.
The limit is set in terms of number of fields, which one can think of as number of "records" in the grib sense. Usually the approach suggested is to leave the list of variables, and shorter timescales in the retrieval command and then loop over the years (longer times). This is a matter of choice though for ERA5 as the data is all on cache, not on tape drive, with tape drive based requests it is important to retrieve data on the same tape with a single request (i.e. if you use the CDS to retrieve seasonal forecasts or other datasets that are not ERA5).
this is a simple looped example:
import cdsapi
c = cdsapi.Client()
yearlist=[str(s) for s in range(1979,2019)]
for year in yearlist:
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'format': 'netcdf',
'variable': [
'10m_u_component_of_wind', '10m_v_component_of_wind', '2m_dewpoint_temperature',
'2m_temperature',
],
'year': year,
'month': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
],
'day': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
'13', '14', '15',
'16', '17', '18',
'19', '20', '21',
'22', '23', '24',
'25', '26', '27',
'28', '29', '30',
'31',
],
'time': [
'00:00', '01:00', '02:00',
'03:00', '04:00', '05:00',
'06:00', '07:00', '08:00',
'09:00', '10:00', '11:00',
'12:00', '13:00', '14:00',
'15:00', '16:00', '17:00',
'18:00', '19:00', '20:00',
'21:00', '22:00', '23:00',
],
},
'data'+year+'.nc')
I presume you can parallelize this with foreach although I've never tried, I'm presuming it won't help too much as there is a job limit per user which is set quite low, so you will just end up with a large number of jobs in the queue there...
Suppose I have 2 dictionaries:
Dict #1:
statedict = {'Alaska': '02', 'Alabama': '01', 'Arkansas': '05', 'Arizona': '04', 'California':'06', 'Colorado': '08', 'Connecticut': '09','DistrictOfColumbia': '11', 'Delaware': '10', 'Florida': '12', 'Georgia': '13', 'Hawaii': '15', 'Iowa': '19', 'Idaho': '16', 'Illinois': '17', 'Indiana': '18', 'Kansas': '20', 'Kentucky': '21', 'Louisiana': '22', 'Massachusetts': '25', 'Maryland': '24', 'Maine': '23', 'Michigan': '26', 'Minnesota': '27', 'Missouri': '29', 'Mississippi': '28', 'Montana': '30', 'NorthCarolina': '37', 'NorthDakota': '38', 'Nebraska': '31', 'NewHampshire': '33', 'NewJersey': '34', 'NewMexico': '35', 'Nevada': '32', 'NewYork': '36', 'Ohio': '39', 'Oklahoma': '40', 'Oregon': '41', 'Pennsylvania': '42', 'PuertoRico': '72', 'RhodeIsland': '44', 'SouthCarolina': '45', 'SouthDakota': '46', 'Tennessee': '47', 'Texas': '48', 'Utah': '49', 'Virginia': '51', 'Vermont': '50', 'Washington': '53', 'Wisconsin': '55', 'WestVirginia': '54', 'Wyoming': '56'}
Dict #2:
master_dict = {'01': ['01034','01112'], '06': ['06245', '06025, ''06007'], '13': ['13145']}
*The actual master_dict is much longer.
Basically, I want to replace the 2-digit keys in master_dict with the long name keys in statedict. How do I do this? I am trying to use the following, but it doesn't quite work.
for k, v in master_dict.items():
for state, fip in statedict.items():
if k == fip:
master_dict[k] = statedict[state]
You can use a dictionary comprehension to make a lookup table mapping values to keys. A second dictionary comprehension performs the lookups to replace numbers with words:
lookup = {v: k for k, v in statedict.items()}
result = {lookup[k]: v for k, v in master_dict.items()}
print(result)
Output:
{'Alabama': ['01034', '01112'],
'California': ['06245', '06025, 06007'],
'Georgia': ['13145']}
Try it here
Is there an equivalent of the CSS z-index for vis.js nodes?
Suppose that I have 2 kinds of nodes (in a graph with physics disabled): Circles and rectangles. I would like the rectangles to always be displayed over the circles when they are overlapping.
Kind of a late reply but the short answer is: no
See this issue: https://github.com/almende/vis/issues/3146
Judging by the mentioned issue, a more precise answer would be: there's no documented way to set z-index (and there's no such concept), but what you can use (with a risk of getting this broken at some update) is nodes are drawn in the same order they are defined. From comment:
I used the following test nodes:
var nodes = [
{id: 'a', label: 'a', shape: 'dot'},
{id: 'b', label: 'b', shape: 'dot'},
{id: 'c', label: 'c', shape: 'dot'},
{id: 'd', label: 'd', shape: 'dot'}
];
When not selected, these will draw in the node order:
Now, let's change the order:
var nodes = [
{id: 'c', label: 'c', shape: 'dot'},
{id: 'b', label: 'b', shape: 'dot'},
{id: 'd', label: 'd', shape: 'dot'},
{id: 'a', label: 'a', shape: 'dot'}
];
I think it is pretty straightforward. All I am trying to do is update the original dictionary's 'code' with that of another dictionary which has the value. I get a feeling 2 for loops and an IF loop can be further shortened to get the answer. In my actual problem, I have few 1000's of dicts that I have to update. Thanks guys!
Python:
referencedict = {'A': 'abc', 'B': 'xyz'}
mylistofdict = [{'name': 'John', 'code': 'A', 'age': 28}, {'name': 'Mary', 'code': 'B', 'age': 32}, {'name': 'Joe', 'code': 'A', 'age': 43}]
for eachdict in mylistofdict:
for key, value in eachdict.items():
if key == 'code':
eachdict[key] = referencedict[value]
print mylistofdict
Output:
[{'age': 28, 'code': 'abc', 'name': 'John'}, {'age': 32, 'code': 'xyz', 'name': 'Mary'}, {'age': 43, 'code': 'abc', 'name': 'Joe'}]
There is no need to loop over all values of eachdict, just look up code directly:
for eachdict in mylistofdict:
if 'code' not in eachdict:
continue
eachdict['code'] = referencedict[eachdict['code']]
You can probably omit the test for code being present, your example list always contains a code entry, but I thought it better to be safe. Looking up the code in the referencedict structure assumes that all possible codes are available.
I used if 'code' not in eachdict: continue here; the opposite is just as valid (if 'code' in eachdict), but this way you can more easily remove the line if you do not need it, and you save yourself an indent level.
referencedict = {'A': 'abc', 'B': 'xyz'}
mylistofdict = [{'name': 'John', 'code': 'A', 'age': 28}, {'name': 'Mary', 'code': 'B', 'age': 32}, {'name': 'Joe', 'code': 'A', 'age': 43}]
for x in mylistofdict:
try:
x['code']=referencedict.get(x['code'])
except KeyError:
pass
print(mylistofdict)