No more denial! I am a Python junkie — one of my favorite languages for rapid development and for performing data/database integration. Recently I re-read several chapters of the Dive Into Python book (www.diveintopython.org) and came across a section I had no recollection of reading before, and (spoken in a Chris Farley voice) “It was awesome”.
“Data-centric Programming” (section 16.5 of Dive Into Python) is a great example of an embedded rant, a brief manifesto on the merits of Python over other languages with a slight slam on VB (nice!). More than just a slam on VB, this part of the book is a strong reminder to those who have chosen Python to embrace its elegance and, what may seem to be counter to everyday practice, harness its capabilities and form new habits.
The Python language has some way-cool features such as map, filter, reduce that have been around in the language for quite some time. Map and filter (assuming I’ve read the Python 3.0 changes correctly) are being deprecated and the recommended path forward, and the Python docs have included this for some time now, is to use List Comprehensions. Depending on what you’re planning to accomplish, list comprehensions have the benefit of working at almost-C speed and are considered to be be way more Pythonic than for loops.
Why that preamble? Well partly to ask for forgiveness on what comes next.
One of my pet projects is to build some automated tools for translating relational database models into ESRI Geodatabase models. In another post (one that is in the works but still needs polish) I’m identifying the components that I plan to address but for now I’m putting this one out there as a taster.
Databases used check constraints for data integrity. Check constraints in an ESRI Geodatabase could be implemented as RDBMS check constraints but this would not allow for ease of use of the new feature class or table within the ESRI suite of tools. So instead of leaving the check constraint as a pure RDBMS implementation we can translated that into an ESRI Coded Domain or ESRI Range Domain.
Looking at the PPDM 3.7 database model we can easily gain access to all of the check constraints by querying, in this particular case, the Oracle metadata tables/DBA views. The query would look like:
SELECT TABLE_NAME, SEARCH_CONDITION
FROM SYS.DBA_CONSTRAINTS
WHERE OWNER = 'PPDM37'
AND CONSTRAINT_TYPE = 'C'
Without the CONSTRAINT_TYPE set to ‘C’ this will return all of the constraints in the database which is fine but for now we’ll just handle check constraints.
This brings back results that look something like:
and gives us a lot of useful information like the table and field name (although it is embedded in the constraint text) as well as the type of check that is being performed. In the output above most of what show are IS NOT NULL checks — just so happens that the best way to handle these in a Geodatabase is by using the exact same approach, that is, set the property of the underlying field so that is cannot accept NULL values.
However, the types of check constraints that we’re seeking are those that use range or coded values. After trolling through the results for a while the patterns of the returned results became pretty obvious, only coded domains were found and within these coded domains we saw that many of the checks used the same values (e.g. value must be Y or N). From a Geodatabase perspective we can create a coded domain and apply it to many fields, a little bit of a deviation from how it works in a pure relational sense.
Let’s build a a mini framework for what we plan to accomplish:
- Find tables that have constraints
- Identify fields that have constraints
- Extract the constraints
- Collect common constraints together as a coded domain
- Keep track of table/field combinations that require the coded domain
- Implement as ESRI Geodatabase objects**
**this item we’ll keep for a later date since it relies heavily on implementation of other concepts and objects.
Now for some code…where to begin? For this particular example we will be accessing an Oracle database so cx_Oracle is a great module use in Python. Another thing we want to do is group the coded domains based on their values, sure we can write code for this (a dictionary is a great structure to use) but we can also use the itertools module. We also need to perform some string manipulation so let’s include the string module as well.
import cx_Oracle, string, itertools
connection = cx_Oracle.Connection('UNAME/PWORD@ORA')
cursor = connection.cursor()
#This is the SQL statement above
cursor.execute(sqlConstraint)
results = cursor.fetchall()
What this snippet does is import the modules we need, connects to the database, opens a cursor and then uses that cursor to execute a query (the same one presented above). Last step is to retrieve, in this case, all of the results of the query. These results will include the “IS NOT NULL” constraints, they also include extra spaces and hard-returns which are great for readability but useless for the computer.
So what do we do? The results represent (as per the Python DB spec) a list of tuples where each tuple is a record and in our case holds the (TABLE_NAME, SEARCH_CONDITION). The approach we used is a list comprehension, it allows us to filter the data and also transform it at the same time.
resultsFilter = [(r[0], string.split(r[1], ' ')[0], eval(string.replace(r[1], '\n', '')[string.find(r[1], ' IN ') + 4:])) for r in results if 'IS NOT NULL' not in r[1]]
This one line of code does a wild amount of work. The if statement part of the list comprehension ensures that only those entries are not ‘IS NOT NULL’ get processed — this means we’ll only get items that are eligible for coded domains. The r[0] term is simply the table name and it gets returned as the first element of a tuple. The second item of the tuple is the field name and the third item in the tuple is magic — okay, it isn’t magic, just code — and it strips hard-returns from the search condition string, slices the string based on the location of the IN operator, and then uses an eval function to translate from string into a tuple. Sample output:
('ECOZONE_XREF', 'ACTIVE_IND', ('Y', 'N')),
('ENTITLEMENT', 'ACTIVE_IND', ('Y', 'N')),
('ENT_COMPONENT', 'ACTIVE_IND', ('Y', 'N')),
('ENT_COMPONENT', 'INFO_ITEM_TYPE', ('RM_MAP', 'RM_WELL_LOG', 'RM_DOCUMENT', 'RM_SEIS_TRACE', 'RM_FOSSIL', 'RM_COMPOSITE', 'RM_LITH_SAMPLE', 'RM_EQUIPMENT')),
('ENT_COMPONENT', 'LAND_RIGHT_TYPE', ('LAND_UNIT', 'LAND_UNIT_TRACT', 'LAND_TITLE', 'LAND_AGREEMENT', 'LAND_AGREE_PART'))
Sure, at first glance the readability may not be that high, but once you’re familiar with the syntax it is remarkably elegant.
This is really close to what we need but the last step will be to group this information together so that we have a list of unique coded domains and all of the associated table/field combinations. It is easy enough to write a loop to do this for us, pushing the results into a dictionary, but why write what is already written? The itertools module has a slick groupBy method that can be used for performing the next step, the output of an itertools method is an iterator so we wrap the result in a list comprehension to pull the data back into a ‘more useable’ form i.e. lists and tuples.
groupValues = [[key, [g[:2] for g in group]] for key, group in itertools.groupby(resultsFilter, lambda r: r[2])]
Sample output:
[[('Y', 'N'), [('APPLICATION', 'ACTIVE_IND'), ('APPLICATION', 'FEES_PAID_IND'), ('APPLICATION', 'RESUBMISSION_IND'), ('APPLICATION', 'SUBMISSION_COMPLETE_IND'), ... , ('AREA_DESCRIPTION', 'ACTIVE_IND'), ('BA_ADDRESS', 'ACTIVE_IND'), ('BA_ADDRESS', 'WITHHOLDING_TAX_IND'), ('BA_ALIAS', 'ACTIVE_IND'), ('BA_ALIAS', 'PREFERRED_IND'), ('BA_AUTHORITY', 'ACTIVE_IND'), ('BA_AUTHORITY_COMP', 'ACTIVE_IND')]],
[('LAND_UNIT', 'LAND_UNIT_TRACT', 'LAND_TITLE', 'LAND_AGREEMENT', 'LAND_AGREE_PART'), [('BA_AUTHORITY_COMP', 'LAND_RIGHT_TYPE')]]]
And that’s about it! Between some basic SQL and two list comprehensions we have all the information needed to setup ESRI Geodatabase coded domains and to assign the domains to a field in a table.
2 Comments so far
Leave a comment
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
[...] Constraints and Comprehension [...]
Pingback by Segregating Foreign Keys « Thiiink: Ideas, Imagination, and Innovation in GIS September 14, 2008 @ 7:37 pm[...] at it, we should also bring back into consideration the check constraints we played with in our Constraints and Comprehension [...]
Pingback by Imminent Domain « Thiiink: Ideas, Imagination, and Innovation in GIS October 12, 2008 @ 8:57 pm