How to pull out variable-length lists using PyParsing - pyparsing

I have the following data:
protocol:: DHCP Other items
following:: line
I'd like to pull the "protocol" data strings into an array of ['dhcp', 'other', 'items'], and then have the following line parsed as a separate thing. I've tried 'protocol::' + OneOrMore(Word(alphas)), but that is eating up the following lines as well.
I've tried a several different variations of this, and nothing has worked. Is there a preferred way of doing this?
eg:
text = """
protocol:: DHCP some other thing
following:: line
"""
delim = '::'
proto_line = Group('protocol' + delim + OneOrMore(Word(alphanums)))
following_line = Group('following' + delim + OneOrMore(Word(alphanums)))
grammar = OneOrMore(proto_line | following_line)
print grammar.parseString(text).dump()
prints:
[['DHCP', 'some', 'other', 'thing', 'following']]
[0]:
['DHCP', 'some', 'other', 'thing', 'following']
and I would like [[protocol stuff], [following stuff]].

Related

how to split date-time and data from same column in csv using python?

I am storing data from Arduino to RaspberryPi. My code is working well, but the date-time and data are logging in the first column together (picture 1), although those are shown separately in the RaspberryPi display (picture 2). How can I log those in a seperate column? I attached my code here.
import time,datetime
from datetime import datetime
import csv
import serial
arduino_port = "/dev/ttyACM0"
baud = 9600
fileName="Arduino1.csv"
ser = serial.Serial(arduino_port, baud)
print("Connected to Arduino port:" + arduino_port)
file = open(fileName, "a")
print("Created file")
samples =float('inf')
print_labels = False
line = 0
print('Press Ctrl-C to quit...')
print('Time, CO2(ppm), Temp(C), Humi(%), Vis, IR, UV, pH')
print('-' *85)
while line <= samples:
d=datetime.now().strftime('%Y-%m-%d %H:%M:%S')
if print_labels:
if line==0:
print("Printing Column Headers")
else:
print("Line " + str(line) + ": writing...")
getData=str(ser.readline())
data=getData [1:] [1:-5]
print(d, data)
file = open(fileName, "a")
file.write(d+ data+ "\n")
line = line+1
file.close()
try adding a comma after the Date-time string to separate the data to another column? The date-time and data appear to be separate in the RaspberryPi, but not comma separated, thus both are in the same column.
Try this -
file.write(d + "," + data + "\n")

pyparsing recursive grammar space separated list inside a comma separated list

Have the following string that I'd like to parse:
((K00134,K00150) K00927,K11389) (K00234,K00235)
each step is separated by a space and alternation is represented by a comma. I'm stuck in the first part of the string where there is a space inside the brackets. The desired output I'm looking for is:
[[['K00134', 'K00150'], 'K00927'], 'K11389'], ['K00234', 'K00235']
What I've got so far is a basic setup to do recursive parsing, but I'm stumped on how to code in a space separated list into the bracket expression
from pyparsing import Word, Literal, Combine, nums, \
Suppress, delimitedList, Group, Forward, ZeroOrMore
ortholog = Combine(Literal('K') + Word(nums, exact=5))
exp = Forward()
ortholog_group = Suppress('(') + Group(delimitedList(ortholog)) + Suppress(')')
atom = ortholog | ortholog_group | Group(Suppress('(') + exp + Suppress(')'))
exp <<= atom + ZeroOrMore(exp)
You are on the right track, but I think you only need one place where you include grouping with ()'s, not two.
import pyparsing as pp
LPAR,RPAR = map(pp.Suppress, "()")
ortholog = pp.Combine('K' + pp.Word(pp.nums, exact=5))
ortholog_group = pp.Forward()
ortholog_group <<= pp.Group(LPAR + pp.OneOrMore(ortholog_group | pp.delimitedList(ortholog)) + RPAR)
expr = pp.OneOrMore(ortholog_group)
tests = """\
((K00134,K00150) K00927,K11389) (K00234,K00235)
"""
expr.runTests(tests)
gives:
((K00134,K00150) K00927,K11389) (K00234,K00235)
[[['K00134', 'K00150'], 'K00927', 'K11389'], ['K00234', 'K00235']]
[0]:
[['K00134', 'K00150'], 'K00927', 'K11389']
[0]:
['K00134', 'K00150']
[1]:
K00927
[2]:
K11389
[1]:
['K00234', 'K00235']
This is not exactly what you said you were looking for:
you wanted: [[['K00134', 'K00150'], 'K00927'], 'K11389'], ['K00234', 'K00235']
I output : [[['K00134', 'K00150'], 'K00927', 'K11389'], ['K00234', 'K00235']]
I'm not sure why there is grouping in your desired output around the space-separated part (K00134,K00150) K00927. Is this your intention or a typo? If intentional, you'll need to rework the definition of ortholog_group, something that will do a delimited list of space-delimited groups in addition to the grouping at parens. The closest I could get was this:
[[[[['K00134', 'K00150']], 'K00927'], ['K11389']], [['K00234', 'K00235']]]
which required some shenanigans to group on spaces, but not group bare orthologs when grouped with other groups. Here is what it looked like:
ortholog_group <<= pp.Group(LPAR + pp.delimitedList(pp.Group(ortholog_group*(1,) & ortholog*(0,))) + RPAR) | pp.delimitedList(ortholog)
The & operator in combination with the repetition operators gives the space-delimited grouping (*(1,) is equivalent to OneOrMore, *(0,) with ZeroOrMore, but also supports *(10,) for "10 or more", or *(3,5) for "at least 3 and no more than 5"). This too is not quite exactly what you asked for, but may get you closer if indeed you need to group the space-delimited bits.
But I must say that grouping on spaces is ambiguous - or at least confusing. Should "(A,B) C D" be [[A,B],C,D] or [[A,B],C],[D] or [[A,B],[C,D]]? I think, if possible, you should permit comma-delimited lists, and perhaps space-delimited also, but require the ()'s when items should be grouped.

String recognition in idl

I have the following strings:
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat
and from each I want to extract the three variables, 1. SWIR32 2. the date and 3. the text following the date. I want to automate this process for about 200 files, so individually selecting the locations won't exactly work for me.
so I want:
variable1=SWIR32
variable2=2005210
variable3=East_A
variable4=SWIR32
variable5=2005210
variable6=Froemke-Hoy
I am going to be using these to add titles to graphs later on, but since the position of the text in each string varies I am unsure how to do this using strmid
I think you want to use a combination of STRPOS and STRSPLIT. Something like the following:
s = ['F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat', $
'F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat']
name = STRARR(s.length)
date = name
txt = name
foreach sub, s, i do begin
sub = STRMID(sub, 1+STRPOS(sub, '\', /REVERSE_SEARCH))
parts = STRSPLIT(sub, '_', /EXTRACT)
name[i] = parts[0]
date[i] = parts[1]
txt[i] = STRJOIN(parts[2:*], '_')
endforeach
You could also do this with a regular expression (using just STRSPLIT) but regular expressions tend to be complicated and error prone.
Hope this helps!

IndexError: list index out of range, scores.append( (fields[0], fields[1]))

I'm trying to read a file and put contents in a list. I have done this mnay times before and it has worked but this time it throws back the error "list index out of range".
the code is:
with open("File.txt") as f:
scores = []
for line in f:
fields = line.split()
scores.append( (fields[0], fields[1]))
print(scores)
The text file is in the format;
Alpha:[0, 1]
Bravo:[0, 0]
Charlie:[60, 8, 901]
Foxtrot:[0]
I cant see why it is giving me this problem. Is it because I have more than one value for each item? Or is it the fact that I have a colon in my text file?
How can I get around this problem?
Thanks
If I understand you well this code will print you desired result:
import re
with open("File.txt") as f:
# Let's make dictionary for scores {name:scores}.
scores = {}
# Define regular expressin to parse team name and team scores from line.
patternScore = '\[([^\]]+)\]'
patternName = '(.*):'
for line in f:
# Find value for team name and its scores.
fields = re.search(patternScore, line).groups()[0].split(', ')
name = re.search(patternName, line).groups()[0]
# Update dictionary with new value.
scores[name] = fields
# Print output first goes first element of keyValue in dict then goes keyName
for key in scores:
print (scores[key][0] + ':' + key)
You will recieve following output:
60:Charlie
0:Alpha
0:Bravo
0:Foxtrot

how to print recursively a Python dictionary and its subdictionaries with whitespace alignment into columns

I want to create a function that can take a dictionary of dictionaries such as the following
information = {
"sample information": {
"ID": 169888,
"name": "ttH",
"number of events": 124883,
"cross section": 0.055519,
"k factor": 1.0201,
"generator": "pythia8",
"variables": {
"trk_n": 147,
"zappo_n": 9001
}
}
}
and then print it in a neat way such as the following, with alignment of keys and values using whitespace:
sample information:
ID: 169888
name: ttH
number of events: 124883
cross section: 0.055519
k factor: 1.0201
generator: pythia8
variables:
trk_n: 147
zappo_n: 9001
My attempt at the function is the following:
def printDictionary(
dictionary = None,
indentation = ''
):
for key, value in dictionary.iteritems():
if isinstance(value, dict):
print("{indentation}{key}:".format(
indentation = indentation,
key = key
))
printDictionary(
dictionary = value,
indentation = indentation + ' '
)
else:
print(indentation + "{key}: {value}".format(
key = key,
value = value
))
It produces the output like the following:
sample information:
name: ttH
generator: pythia8
cross section: 0.055519
variables:
zappo_n: 9001
trk_n: 147
number of events: 124883
k factor: 1.0201
ID: 169888
As is shown, it successfully prints the dictionary of dictionaries recursively, however is does not align the values into a neat column. What would be some reasonable way of doing this for dictionaries of arbitrary depth?
Try using the pprint module. Instead of writing your own function, you can do this:
import pprint
pprint.pprint(my_dict)
Be aware that this will print characters such as { and } around your dictionary and [] around your lists, but if you can ignore them, pprint() will take care of all the nesting and indentation for you.

Resources