i-josh.com - My name is Josh and I support this message

In the Field system I built with Python (covered in the Field Encapsulation Pattern series of posts, found here), I ended up coming across an interesting problem. I thought I’d post my thoughts on this issue and how I ended up solving it.

As a quick recap, the system defines classes which are definitions of fields on a data model, representing the rules on how to store, represent, and validate, say, a date, a url, etc. Each of these fields will have a list of validation rules that it has to meet. An example of this is an IntegerField class which has a rule stating that the value must be an integer.

# field definition
class IntegerField(Field):
    default_clean_funcs = [validators.is_integer]

# field usage
class MyModel(Model):
    fields: {
        'my_integer_field': IntegerField('my_integer_field')
    }

The classes also have the added flexibility to accept additional validation rules from the user. This allows the user to pass in rules saying “this integer must be greater than zero” or “this integer must be odd” rather than defining additional field classes. This is very powerful because, without this functionality, I’d have to define more classes: a PositiveIntegerField, and an OddIntegerField. What if I want to mix the two? in class form: it’s a PositiveOddIntegerField (or is it an OddPositiveIntegerField?). Now, let’s say I want the value of the field to be limited to numbers less than 100; now I have 3 more classes, and naming becomes a huge problem. Instead of this mess, I just pass the validation functions into the IntegerField constructor.

class MyModel(Model):
    fields: {
        'pos_int': IntegerField('pos_int', clean_funcs=[validators.is_positive]),
        'odd_int': IntegerField('odd_int', clean_funcs=[validators.is_odd]),
        'pos_odd_int': IntegerField('pos_odd_int', clean_funcs=[validators.is_odd, validators.is_positive]),
        ...
    }

Great. Now, the problem I came across is in the constructor of the field object. See, in the IntegerField definition above, default_clean_funcs is a class or static variable, not an instance variable. This means that it’s shared among all instances of the class or, and this is important, among all instances of its subclasses. This is where I initially walked right into it.

So, Python uses the keyword self to differentiate variables on the instance from those in local scope. The… problem?… well, issue I had was that self.my_var will resolve to either an instance or a static variable, not just to an instance. Therefore, when I originally wrote the Fields, I did this:

class Field(object):
    clean_funcs = []
    
    def __init__(self, addl_clean_funcs=[])
        self.clean_funcs += addl_clean_funcs

I was assuming that self would only resolve to an instance member, creating it if it doesn’t yet exist (after all, this is how you define instance members, by using the self keyword in the __init__ function. After experiencing bugs where a UUID wasn’t being validated as an Integer (among other bugs), I realized that self.clean_funcs was actually resolving to the static member, so I had to resolve it.

Solution 1

For the posts which I wrote, and originally, I resolved it using the following method:

class Field(object):
    default_clean_funcs = []
    
    def __init__(self, addl_clean_funcs=[]):
        self.clean_funcs = default_clean_funcs + addl_clean_funcs

This is perfectly functional, but I don’t like it. You see, I end up making needless copies of the objects defined in default_clean_funcs. Okay, technically they’re pointers, so it’s really not all that horrible; it still feels like wasted space to me. Not only that, but it’s somewhat dangerous in my opinion. If I have an instance member in an object, unless it’s explicitly stated, I ought to be able to modify that instance member with impunity; from the object’s perspective, it’s MY content, it’s part of MY definition, why shouldn’t I be able to change it? In the example above, if one of the children modifies one of the functions defined in default_clean_funcs, it will be modified in EVERY child of that field which can break all the things. A simple way to get around this is to make a deep copy of default_clean_funcs, but now you really ARE making needless copies, and that will take up a bunch of room, so I don’t like this approach. Not horrible, but there’s risk.

Solution 2

class Field(object):
    def __init__(self):
        self.clean_funcs = []

class IntegerField(object):
    def __init__(self, addl_clean_funcs=[]):
        super(IntegerField, self).__init__(self)
        self.clean_funcs += addl_clean_funcs

This one has two fatal flaws for my design. First, we come back to the bajillions of copies of validation functions that exists in Solution 1. The second major issue is that I have to do a lot of overloading __init__. If you’ll recall my original design, I basically want to just have a two-line class definition if possible:

class IntegerField(Field):
    clean_funcs=[validators.is_int]

and offload all the common crap onto the parent. So this solution just won’t work.

Solution 3

class Field(object):
    default_clean_funcs = []
    
    def __init__(self, addl_clean_funcs=[]):
        self.addl_clean_funcs = addl_clean_funcs
    
    def get_clean_funcs():
        return self.default_clean_funcs + self.addl_clean_funcs

One benefit of this is that I only ever have the initial definition of default_clean_funcs; there’s no wasted space there. Another huge (to me, at any rate) benefit is that there’s a clear separation of intention here. default_clean_funcs is on the class level; there’s never any obfuscation about that fact. addl_clean_funcs is kept separate because those will change from instance to instance. Then, to get a summary of all the rules (in order to do validation, probably), you have to call get_clean_funcs; since you’re returned a value rather than accessing a member, it’s strongly implied that any changes you may make to the value are going to be transient. In reality, this is not the case; you’re still getting pointers to the function objects rather than copies, but doing the changes will be discouraged because you’d have to do them every time you fetch all the clean functions.

Drawback is that you’ll be doing these operations every time you want a list of the clean functions. Now, you could cache the results, but then you’d have pointers in a cache which you can modify… that’s no good. Again, you could get around this by making deep copies of the clean funcs, and in my opinion, it’s more okay in this instance. At least you know the copies will be used instead of in the constructor case where you won’t know.

Winner Winner, Chicken Dinner

In the end, we went with Solution 3 for production. Performance-wise and bug-wise, it’s been fine for the several weeks since I wrote that post and system. I’m still trying to find a better solution, but I think this one is probably about as good as I need to get. If you have any suggestions, please let me know in the comments below!

Using Static Members as Instance Members in Python

Solution 1

Solution 2

Solution 3

Winner Winner, Chicken Dinner

Category

Tags

Share

Navigation