Thursday, August 13, 2009

best blog ever

johnaugust.com

Has nothing to do with quantitative finance or technology--it's about screenwriting.  But it's very well written (no surprise) and, most strikingly, reading the comments doesn't reinforce my most cynical  humanity the way most blogs do



best blog ever

johnaugust.com

Has nothing to do with quantitative finance or technology--it's about screenwriting.  But it's very well written (no surprise) and, most strikingly, reading the comments doesn't reinforce my most cynical feelings for humanity the way most blogs do.



Thursday, August 6, 2009

Dear Steve,

MEMORANDUM--HIGHLY CONFIDENTIAL

To:  Steve Jobs

From:  bondgeek

Date: August 6th, 2009

re:  BlackBerry Curve Outsells iPhone 3GS -- Smartphone Sales -- InformationWeek

Dear Mr. Jobs,

It's about the email.  True push technology and straight forward synching with corporate email (so what if you're smarter than the Lords of the IT realm at all these major corporations--so is my coffee cup--they control their fiefdoms and you don't--WORK WITH THEM!!!).

Please let me know if you have any questions.





Tuesday, July 14, 2009

RegEx Tokenizer

The follow code snippet is adapted from Fredrik Lundh's effbot.org entry
Using Regular Expressions for Lexical Analysis

Say you want to tokenize an expression such as "(3+5)*10":
#!/usr/bin/env python
'''
Use regex to tokenize a string expression.
adapted from:
http://effbot.org/zone/xml-scanner.htm
'''
import re

reg_token = re.compile(r"""
\s* #skip whitespace
([0-9\.]+| #one or more digits or '.'
aka floats or ints
\w+| #words
[+\-*/!^%&|]{1,2}| #operators
.) #any character except newline
""",
re.VERBOSE)

def tokenize(expr):
'''
Returns a list of tokens for an expression string.
Allows operators +-*/!^%&|
Treats doubled operator e.g., **, ++ as single token
'''
def v_token(obj):
try:
if '.' in obj:
return float(obj)
else:
return int(obj)
except:
return obj

return [v_token(tkn.group()) for tkn
in reg_token.finditer(expr)]


Let's test on
some expressions

expr = ["(3+7)*90", # basic
"(3+7.1)*90", # has floats
"(3+7.1)*90*alpha", # has variables
"(3+7.1)*90*alpha, g", # invalid expression, tokenize and leave to parser
"(5.0 - 3.2)/6*9", # other forms
"b = 2 + a*10",
"x = \n x**2", #picks up **, ++ as a token
"i++",
""
]

for exp in expr:
tkns = tokenize(exp)
print("\nExpression: %s\nTokens: %s " % (exp, tkns))

Gives us...
Expression: (3+7)*90
Tokens: ['(', 3, '+', 7, ')', '*', 90]

Expression: (3+7.1)*90
Tokens: ['(', 3, '+', 7.0999999999999996, ')', '*', 90]

Expression: (3+7.1)*90*alpha
Tokens: ['(', 3, '+', 7.0999999999999996, ')', '*', 90, '*', 'a', 'l', 'p', 'h', 'a']

Expression: (3+7.1)*90*alpha, g
Tokens: ['(', 3, '+', 7.0999999999999996, ')', '*', 90, '*', 'a', 'l', 'p', 'h', 'a', ',', ' g']

Expression: (5.0 - 3.2)/6*9
Tokens: ['(', 5.0, ' -', 3.2000000000000002, ')', '/', 6, '*', 9]

Expression: b = 2 + a*10
Tokens: ['b', ' =', 2, ' +', ' a', '*', 10]

Expression: x =
x**2
Tokens: ['x', ' =', ' \n x', '**', 2]

Expression: i++
Tokens: ['i', '++']

Expression:
Tokens: []


code snippet is at dzone: http://snippets.dzone.com/user/bondgeek



Bare Minimum Vim

Because you just can't seem to avoid using it.

Bare Minimum Vim

v,V - visual mode, visual-line mode
(hit ESC to exit)
n__ - prefix a command with a number to repeat 'n' times
E.g., 5dd deletes 5 lines

Write & Quit = Save & Exit
! - generally causes a command to ignore changes--see :q!
:w - write the file without exit
:w [filename] - saves to new filename, stays in current buffer
:w! [filename] - overwrites if filename exists
:wq - write and quit
:q - quit (fails if there are unsaved changes)
:q! - quit and throw away changes

Navigation
NOTE: lower case treats punctuation as word break
[n]G - go to line "n", end of file if n omitted
e,E - jump ahead to end of word
b,B - jump back to beginning of word
w,W - jump ahead to start of word
0,$ - jump to start(0) or end($) of line

Edits
r - replace character under cursor
x - delete character under cursor (places in clipboard)
dd, dw - delete a line, word
J - Join line below to current line
. - dot repeats last change
u - undo last change
e! - discard changes to last write

Insert Mode
i,I - start insert mode at i=cursor, I=start of line
a,A - start insert mode after a=cursor, A=end of line
Esc - exit insert mode

Yank & Pull = Cut & Paste
yy,yw - yank a line, word
y$ - yank to end of line
p, P - put the clipboard after, before cursor

Find & Replace
/ptrn - search forward for pattern, wraps by default
?ptrn - search backward for pattern
n,N - repeat in same(n), opposite(N) direction
%s/o/n/gc, g
- global replace old with new with or without confirm


Splitting windows
Note: Ctrl+w, then the command
^w s - Split window horizontally
^w v - Split window vertically
^w w - switch between windows
^w q - Quit a window

Combo commands
ea - jump to end of word, then start insert mode after cursor
xp - delete then paste (transposes characters)
:wq - write then quit

Sightly beyond bare minimum
:e - re-edit current file--e.g, if it has changed on disk
:e [file] - edit [file] in new buffer
ZZ = :wq
ZQ = :q!
d[move] - delete in direction of arrow movement
D - delete to end of line
R - Replace mode, instead of insert
~ - Switch case of character under cursor
:r [file] - read file, inserting under cursor
h,j,k,l - Navigate right, down, up, left
()[]{} - Navigate by sentence, paragraph, section
E.g. '(' moves 1 sentence backwards, ')' moves forward

Some other useful references
http://www.worldtimzone.com/res/vi.html
http://www.fprintf.net/vimCheatSheet.html
http://jmcpherson.org/editing.html
-- Good place to start if you want to know more, written by somebody who appreciates vim.

Thursday, May 28, 2009

switch statements in python

I may as well weight in on the absence of a switch statement in Python, though the topic has been addressed, see:

Python Zone » Python switch statement, or
http://dinomite.net/2008/python-switch-statements-part-2

Here are some examples of switching structures using dictionaries:

# switch example 1
cases = {
'a':
    lambda: 'one',
'b':
    lambda: 'two',
'default':
    lambda: 'three'
}

switch = lambda c: cases.get(c, cases['default'])()

var = ['b','xxx']
for v in var:
    out = switch(v)
    print("Switch on %s = %s"%(v,out))


# switch example 2
op = raw_input("Enter operation for 2 'op' 3: ")

if op in "+-*/":
    print("2 %s 3 = " % op)
else:
    print("'%s' is an unkown operation." % op)

cases = {
     '+': lambda : 2 + 3,
     '-': lambda : 2 - 3,
     '*': lambda : 2 * 3,
     '/': lambda : 2. / 3.
     }

switch = cases.get(op, lambda : 0)()

print(switch)


# switch example 3
name = raw_input("What is your name? ")
op = raw_input("enter 'l' or 'p'")

cases = {
    'l': len,
    'p': lambda txt: txt.upper()
    }

def default(obj):
    print("Invalid entry, %s" % obj)

switch = cases.get(op, default)
out = switch(name)

print(out)
I would say that each of these is as readable as C's switch statement.  I can't say anything for performance, since I haven't run any tests--but Python's dictionaries are fast.   In fact, if I understand the implementation correctly this pattern is pretty much what the switch statement does--set up a hash table (i.e. a dictionary) of test values and choose a code block based on the condition entered.  

The main argument would be the cumbersomeness of switch on larger blocks of code.  To do that you need to move each block of switch code into a function defined by def.   My only response to that is that if you're writing large blocks of code inside a switch statement, you should think about putting it in a function--in my experience there is usually a fair amount of reused code inside a large switch statement, since the conditions usually each deal with a separate state of a single variable.  Also, the difference in readability between switch and nested if...then..else's diminishes rapidly the larger the conditional code blocks get.

So, should Python have a switch statement?  Readability of code long term may benefit if there is one way to do a switch as opposed to using different, although similar patterns like above.  But in general, I support keeping control flow logic made of the simplest building blocks--in the long run I think that keeps bloat down and enforces more thinking about the nature of the specific programming problem at hand.

But if a switch statement was added to Python, I'd probably use it.





Wednesday, May 13, 2009

Speaking of timing

Using list comprehension is much faster than not:

In [35]: ll = [(x,x*x) for x in range(100)]

In [36]: def f1(obj):
....: for row in obj:
....: x = row[0]
....: y = row[1]
....:

In [37]: def f2(obj):
....: for row in obj:
....: x,y = row
....:

In [38]: def f3(obj):
....: for row in obj:
....: x,y = (row[0],row[1])
....:

In [39]: timing(f1,10000,ll)
f1 2.04

In [40]: timing(f2,10000,ll)
f2 0.8

In [41]: timing(f3,10000,ll)
f3 2.41
Function calls always add a bit of overhead:

def f1(obj):
x = min(obj,0.0)

def f3(obj):
if obj <= 0.0:
x = obj
else:
x = 0.0

In [56]: timing(f1,10000,5.)
f1 0.06

In [57]: timing(f1,10000,-5.)
f1 0.05

In [67]: timing(f3,10000,5.)
f3 0.04

In [68]: timing(f3,10000,-5.)
f3 0.04


Useful to know.






Wednesday, May 6, 2009

Really, they're slow...

Maybe that previous example isn't fair...after all, I'm treating the value like a list.

In [187]: simple = lambda d: d+d

In [188]: timing(simple, 100000,dyyyymm)
<lambda> 34.84

In [189]: simple(dyyyymm)
Out[189]: Decimal("4018.10")

In [190]: simple(fyyyymm)
Out[190]: 4018.0999999999999

In [191]: timing(simple, 100000,fyyyymm)
<lambda> 0.26




Python Decimals are really slow

10.4. decimal — Decimal fixed point and floating point arithmetic — Python v2.6.2 documentation

In [182]: rparts = lambda d: map(int, (floor(d), round(100*(d%1),0)))

In [183]: rparts(dyyyymm)
Out[183]: [2009, 5]

In [184]: rparts(fyyyymm)
Out[184]: [2009, 5]

In [185]: timing(rparts,10000,dyyyymm)
<lambda> 9.03

In [186]: timing(rparts,10000,fyyyymm)
<lambda> 0.28


Note:  the timing function I'm using here is the same one referred to in an earlier post "Timing is everything". 


Saturday, May 2, 2009

6. Built-in Types — Python v2.6.2 documentation

Python variables contain pointers to the data, not the data itself--this one of the more confusing aspects of the language for many.  The implications of this, though, may be made clear by the following example:


class mytester(object):
    def __init__(self, thingone={}, thingtwo=None):
        self.thingone = thingone
        if isinstance(thingtwo,dict):
            self.thingtwo = thingtwo
        else:
            self.thingtwo = {}
    def setone(self, **kwargs):
        self.thingone.update(kwargs)
        return self
    def settwo(self, **kwargs):
        self.thingtwo.update(kwargs)
        return self

t1 = mytester()
t2 = mytester()

print(t1.thingone, t2.thingone)
# ({}, {})

print(t1.thingtwo, t2.thingtwo) 
# ({}, {})

t1.settwo(say='hi').settwo(to='thing one and thing two') # set thingtwo for t1
# <__main__.mytester object at 0xecc9d0>

print(t1.thingtwo, t2.thingtwo)  # t1 & t2 remain independent
# ({'to': 'thing one and thing two', 'say': 'hi'}, {})

t1.setone(say='hi').setone(to='thing one and thing two') # set thingone for t1
# <__main__.mytester object at 0xecc9d0>

print(t1.thingone, t2.thingone) # thingone dict points to same object for each
# ({'to': 'thing one and thing two', 'say': 'hi'}, {'to': 'thing one and thing two', 'say': 'hi'})

print id(t1.thingone), id(t2.thingone) #they're the same object!
# (15704976, 15704976)

print id(t1.thingtwo), id(t2.thingtwo) #they're not!
# (9733440, 9733584)

In the first instance, the 'thingone' attribute of the class, initializing it with the default '={}' creates a pointer to a dictionary object called 'thingone' (NOT self.thingone).  So, when you change the values in that object for one class, you change them for all.

One could see where the ability to do this would be useful (say, where you would a singleton pattern in C++, for example).  Still, in general the ability to pass a value to the __init__ method of class implies to the user that the values will be unique for each instance of the class--so the pattern used for 'thingtwo' should be used more often.

Also, this exercise highlights the importance of the id() function in Python--it clarifies a lot of things.





Wednesday, April 22, 2009

Timing is everything

Python Patterns - An Optimization Anecdote
The above link is to an article by Guido van Rossum on the Python website.   Needless to say, anything Guido says on the topic of Python is worth looking at (he is the author of the Python programming language). 

I thought of this particular article while reading a post, trying to decide how to check if an object is a sequence.  While all the contributors to the discussion are helpful, none actually checks the performance of the proposed solutions.  This is typical of the posts you see on various forums. 

Guido's article highlights how straightforward it is to do basic testing most of the time.  Here is a quick summary of performance of the proposed solutions in the above link:

import time

# Guido's timing function
def timing(f, n, a):
    print f.__name__,
    r = range(n)
    t1 = time.clock()
    for i in r:
        f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a)
    t2 = time.clock()
    print round(t2-t1, 3)

if __name__ == "__main__":
    #For Example
    # some functions to check if an object is a sequence
    def isit(obj):
        try:
            it = iter(obj)
            return True
        except TypeError:
            return False

    isit2 = lambda obj: isinstance(obj,basestring) or    \ getattr(obj,'__iter__',False)

    def isit3(obj):
        return (isinstance(obj,basestring) or getattr(obj,'__iter__',False))

    #...then:
    '''
    >>> timing(isit3, 100000, [])
    isit3 0.99
    >>> timing(isit2, 100000, [])
    0.99

    >>> timing(isit, 100000, [])
    isit 0.53
    '''





Tuesday, April 21, 2009

Choosing a Python GUI api

I narrowed the choice to Tkinter and wxPython fairly quickly--based on Tkinter being the de facto alternative and wxPython being the most discussed alternative on a basic Google search of "Python GUI".

Also wxPython has the largest widget collection, including a spreadsheet and since what I'm doing will involve a spreadsheet-like interface I decided that further investigation had quickly diminishing returns.

PyQt looks very powerful, and drives the incredibly impressive Orange application--but failed the "can I install and use it without much brain damage?" test, as did pyGTK  (also known as the "can an idiot install it?" test--me being the idiot--if something requires more than one or steps to install, it generally fails this test).   I would not be surprised to need revisit pyQt and pyGTK for larger scale projects.

The following, very helpful discussion walks through a very simple app in Tkinter and wxPython side-by-side.  Good for understanding the basic differences of the two packages and for understand the basics of GUI programming.

Building a basic GUI application in Python with Tkinter and wxWidgets

NB:  One fact that might help clear confusion as you surf GUI related posts-- wxWidgets == wxWindows.   The name of the underlying C++ library was changed to wxWidgets at some point (no doubt copyright/trademark related).

I'm also starting to look at wxGlade, a GUI builder wrapper for wxPython.

NB, re Editors:  I am using Eclipse with Pydev, with good results. 


Friday, April 17, 2009

Autumn ORM

I've been playing with Autumn, a Object Relational Mapper by Jared Kuolt.  First, I can recommend it, highly.  It took me less time to get it up and running and start doing useful things with it than to get through the first sections of the documentation for any of the other ORMs for Python out there.

Second, an interesting thought reading some of the comments on old posts when Jared first announced Autumn.   Several comments saw no reason to release a new package, feeling more or less strongly, that the open source way is to jump on to one of the existing projects.   This to me is very wrong-headed.

Part of the power of open source is from the willingness to throw out something because you don't have a tremendous revenue or sunk cost number associated with -- and to start over again using what you learned from the first effort.



Python: Static Methods versus Class Methods

The best discussion I've seen regarding the differences between static methods and class methods in Python is an old post at Miya's blog.    I like the way Miya approaches it.  Rather than going into the technical discussion found in the Python docs, he asks why would one use static methods, if it seems that class methods can do everything static methods can but not vice versa.

The key to understanding the difference between the two is in the comments.  A commentor points out
"classmethod give[s] you access to the class's attributes. static method does not so..."

Modifying the commentors example slightly:
>>> class MyClass(object):

        myattribute = 'spam'

        @classmethod
        def eggs(cls):
            return cls.myattribute

        @staticmethod
        def static_eggs():
            self.myattribute   # Will this work??
       
>>> MyClass.eggs()            
# O.K. for class method
'spam'
>>> MyClass.static_eggs()    
# ...not so much for static method
Traceback (most recent call last):
  File "<pyshell#33>", line 1, in <module>
    MyClass.static_eggs()
  File "<pyshell#31>", line 8, in static_eggs
    self.myattribute
NameError: global name 'self' is not defined

To recap:
  • Both static and class methods can be called from the class without an instance:
>>>MyClass.static_method_that_says_hi()
"HI"
>>>MyClass.class_method_that_says_hi()
"HI"
>>>x = MyClass()
>>>x.static_method_that_says_hi()
"HI"
  • Both can be inherited by sub-classes and maintain their identity (i.e., both are actually static).
  • Class methods give you access to a class attributes and static methods do not.

So, why use one instead of the other?  Why not just use class methods since they're more powerful?

For me the principle is to use the simplest structure that handle's problem.   Class methods can do more, and therefore using them should signal that you're class does fairly complicated stuff.  Having a bias to using static methods means that you've thought about parsimony in your design. 

I'll come up some examples of each and be back.