Tuesday, July 14, 2009

RegEx Tokenizer

The follow code snippet is adapted from Fredrik Lundh's effbot.org entry
Using Regular Expressions for Lexical Analysis

Say you want to tokenize an expression such as "(3+5)*10":
#!/usr/bin/env python
'''
Use regex to tokenize a string expression.
adapted from:
http://effbot.org/zone/xml-scanner.htm
'''
import re

reg_token = re.compile(r"""
\s* #skip whitespace
([0-9\.]+| #one or more digits or '.'
aka floats or ints
\w+| #words
[+\-*/!^%&|]{1,2}| #operators
.) #any character except newline
""",
re.VERBOSE)

def tokenize(expr):
'''
Returns a list of tokens for an expression string.
Allows operators +-*/!^%&|
Treats doubled operator e.g., **, ++ as single token
'''
def v_token(obj):
try:
if '.' in obj:
return float(obj)
else:
return int(obj)
except:
return obj

return [v_token(tkn.group()) for tkn
in reg_token.finditer(expr)]


Let's test on
some expressions

expr = ["(3+7)*90", # basic
"(3+7.1)*90", # has floats
"(3+7.1)*90*alpha", # has variables
"(3+7.1)*90*alpha, g", # invalid expression, tokenize and leave to parser
"(5.0 - 3.2)/6*9", # other forms
"b = 2 + a*10",
"x = \n x**2", #picks up **, ++ as a token
"i++",
""
]

for exp in expr:
tkns = tokenize(exp)
print("\nExpression: %s\nTokens: %s " % (exp, tkns))

Gives us...
Expression: (3+7)*90
Tokens: ['(', 3, '+', 7, ')', '*', 90]

Expression: (3+7.1)*90
Tokens: ['(', 3, '+', 7.0999999999999996, ')', '*', 90]

Expression: (3+7.1)*90*alpha
Tokens: ['(', 3, '+', 7.0999999999999996, ')', '*', 90, '*', 'a', 'l', 'p', 'h', 'a']

Expression: (3+7.1)*90*alpha, g
Tokens: ['(', 3, '+', 7.0999999999999996, ')', '*', 90, '*', 'a', 'l', 'p', 'h', 'a', ',', ' g']

Expression: (5.0 - 3.2)/6*9
Tokens: ['(', 5.0, ' -', 3.2000000000000002, ')', '/', 6, '*', 9]

Expression: b = 2 + a*10
Tokens: ['b', ' =', 2, ' +', ' a', '*', 10]

Expression: x =
x**2
Tokens: ['x', ' =', ' \n x', '**', 2]

Expression: i++
Tokens: ['i', '++']

Expression:
Tokens: []


code snippet is at dzone: http://snippets.dzone.com/user/bondgeek



Bare Minimum Vim

Because you just can't seem to avoid using it.

Bare Minimum Vim

v,V - visual mode, visual-line mode
(hit ESC to exit)
n__ - prefix a command with a number to repeat 'n' times
E.g., 5dd deletes 5 lines

Write & Quit = Save & Exit
! - generally causes a command to ignore changes--see :q!
:w - write the file without exit
:w [filename] - saves to new filename, stays in current buffer
:w! [filename] - overwrites if filename exists
:wq - write and quit
:q - quit (fails if there are unsaved changes)
:q! - quit and throw away changes

Navigation
NOTE: lower case treats punctuation as word break
[n]G - go to line "n", end of file if n omitted
e,E - jump ahead to end of word
b,B - jump back to beginning of word
w,W - jump ahead to start of word
0,$ - jump to start(0) or end($) of line

Edits
r - replace character under cursor
x - delete character under cursor (places in clipboard)
dd, dw - delete a line, word
J - Join line below to current line
. - dot repeats last change
u - undo last change
e! - discard changes to last write

Insert Mode
i,I - start insert mode at i=cursor, I=start of line
a,A - start insert mode after a=cursor, A=end of line
Esc - exit insert mode

Yank & Pull = Cut & Paste
yy,yw - yank a line, word
y$ - yank to end of line
p, P - put the clipboard after, before cursor

Find & Replace
/ptrn - search forward for pattern, wraps by default
?ptrn - search backward for pattern
n,N - repeat in same(n), opposite(N) direction
%s/o/n/gc, g
- global replace old with new with or without confirm


Splitting windows
Note: Ctrl+w, then the command
^w s - Split window horizontally
^w v - Split window vertically
^w w - switch between windows
^w q - Quit a window

Combo commands
ea - jump to end of word, then start insert mode after cursor
xp - delete then paste (transposes characters)
:wq - write then quit

Sightly beyond bare minimum
:e - re-edit current file--e.g, if it has changed on disk
:e [file] - edit [file] in new buffer
ZZ = :wq
ZQ = :q!
d[move] - delete in direction of arrow movement
D - delete to end of line
R - Replace mode, instead of insert
~ - Switch case of character under cursor
:r [file] - read file, inserting under cursor
h,j,k,l - Navigate right, down, up, left
()[]{} - Navigate by sentence, paragraph, section
E.g. '(' moves 1 sentence backwards, ')' moves forward

Some other useful references
http://www.worldtimzone.com/res/vi.html
http://www.fprintf.net/vimCheatSheet.html
http://jmcpherson.org/editing.html
-- Good place to start if you want to know more, written by somebody who appreciates vim.