The follow code snippet is adapted from Fredrik Lundh's effbot.org entry
Using Regular Expressions for Lexical Analysis
Say you want to tokenize an expression such as "(3+5)*10":
code snippet is at dzone: http://snippets.dzone.com/user/bondgeek
Using Regular Expressions for Lexical Analysis
Say you want to tokenize an expression such as "(3+5)*10":
#!/usr/bin/env pythonGives us...
'''
Use regex to tokenize a string expression.
adapted from:
http://effbot.org/zone/xml-scanner.htm
'''
import re
reg_token = re.compile(r"""
\s* #skip whitespace
([0-9\.]+| #one or more digits or '.'
aka floats or ints
\w+| #words
[+\-*/!^%&|]{1,2}| #operators
.) #any character except newline
""",
re.VERBOSE)
def tokenize(expr):
'''
Returns a list of tokens for an expression string.
Allows operators +-*/!^%&|
Treats doubled operator e.g., **, ++ as single token
'''
def v_token(obj):
try:
if '.' in obj:
return float(obj)
else:
return int(obj)
except:
return obj
return [v_token(tkn.group()) for tkn
in reg_token.finditer(expr)]
Let's test on some expressions
expr = ["(3+7)*90", # basic
"(3+7.1)*90", # has floats
"(3+7.1)*90*alpha", # has variables
"(3+7.1)*90*alpha, g", # invalid expression, tokenize and leave to parser
"(5.0 - 3.2)/6*9", # other forms
"b = 2 + a*10",
"x = \n x**2", #picks up **, ++ as a token
"i++",
""
]
for exp in expr:
tkns = tokenize(exp)
print("\nExpression: %s\nTokens: %s " % (exp, tkns))
Expression: (3+7)*90
Tokens: ['(', 3, '+', 7, ')', '*', 90]
Expression: (3+7.1)*90
Tokens: ['(', 3, '+', 7.0999999999999996, ')', '*', 90]
Expression: (3+7.1)*90*alpha
Tokens: ['(', 3, '+', 7.0999999999999996, ')', '*', 90, '*', 'a', 'l', 'p', 'h', 'a']
Expression: (3+7.1)*90*alpha, g
Tokens: ['(', 3, '+', 7.0999999999999996, ')', '*', 90, '*', 'a', 'l', 'p', 'h', 'a', ',', ' g']
Expression: (5.0 - 3.2)/6*9
Tokens: ['(', 5.0, ' -', 3.2000000000000002, ')', '/', 6, '*', 9]
Expression: b = 2 + a*10
Tokens: ['b', ' =', 2, ' +', ' a', '*', 10]
Expression: x =
x**2
Tokens: ['x', ' =', ' \n x', '**', 2]
Expression: i++
Tokens: ['i', '++']
Expression:
Tokens: []
code snippet is at dzone: http://snippets.dzone.com/user/bondgeek
No comments:
Post a Comment