Home > ArcGIS, GIS, Python > Geometry Handling in pygp versus arcgisscripting

Geometry Handling in pygp versus arcgisscripting

A year or so ago we gave a presentation at a local Esri Developer Meetup that showcased some of the advanced functionality we have implemented in our pygp package.  For those unaware of pygp, it stands for Python Geoprocessing and is a relatively large suite of api-like code (and documentation and class diagrams) that allows for the same code to operate Python versions 2.4 through to 2.7 and Esri ArcGIS 9.2 through to 10.x (yes, that includes the dozen or so service packs in between too).

pygp contains a large number of functions and classes that are hugely beneficial when developing applications that need to be more than a typical 5 lines of code exported from Model Builder.  The main benefit of pygp is that we use it in our day-to-day business, this means it is constantly under development with new capabilities being added weekly.  Not only that, we implement things in a Pythonic and natural manner i.e. as professionals tasked with automating complex workflows how would we prefer to work?

Here for example is an example adapted from the Esri help documentation circa 9.2 and 9.3.  Recall that this is the vintage where lists were actually enumerations and cursors had to be stepped over using a while loop.

# -*- coding: ascii -*-
"""
Raises ValueError on GetObject for Null point in 9.2 and 9.3 and
Returns None in 10.
"""

__author__ = 'Jason Humber - Integrated Informatics Inc.'
__maintainer__ = '$LastChangedBy$'
__vcs_id__ = '$HeadURL$'

from os.path import join as osjoin
from arcgisscripting import create
from pygp.dataelement.extended import FeatureClass

__copyright__ = 'Copyright (c) 2009-2012, Integrated Informatics Inc.'
__license__ = 'LGPLv3'
__version__ = '0.1.0'
__email__ = 'gis@integrated-informatics.com'

POLYGON = r'sample_data\fgdb_with_data.gdb\prj_a'

def example_geometry_help(path):
    """
    Example from ESRI help documentation on accessing parts and points.

    Code adapted from ESRI Help for 9.3, reading geometries
    This approach avoids the ValueError

    :param path: Workspace Path
    :type path: str
    """
    gp = create()
    shape_field_name = gp.Describe(osjoin(path, POLYGON)).ShapeFieldName
    srows = gp.SearchCursor(POLYGON)
    srow = srows.Next()
    while srow:
        geom = srow.GetValue(shape_field_name).GetPart()
        part_num = 0
        part_count = range(geom.PartCount)
        while part_num < part_count:
            part = geom.GetPart(part_num)
            pnt = part.Next()
            pnt_count = 0
            while pnt:
                print pnt.x, pnt.y
                pnt = part.Next()
                pnt_count += 1

                if not pnt:
                    pnt = part.Next()
                    if pnt:
                        print "Interior Ring:"
            part_num += 1
        srow = srows.Next()
    del srow, srows
# End example_geometry_help function

The help documentation spells out what this results in and this example is essentially just showing how to get the points and parts of a geometry while looping over the rows in a feature class.  Nothing magical here other that to point out that this is a lot of code and that the approach does not raise a ValueError like this second approach (which should work by returning a null point).

def example_geometry_value_error(path):
    """
    Example Showing how ValueError was raised when using GetObject on a
    Null Point (i.e. the separator for exterior/interior rings) in releases
    before ArcGIS 10.  As of ArcGIS 10 the Null point is returned as None.

    :param path: Workspace Path
    :type path: str
    """
    gp = create()
    shape_field_name = gp.Describe(osjoin(path, POLYGON)).ShapeFieldName
    srows = gp.SearchCursor(osjoin(path, POLYGON))
    srow = srows.Next()

    # This approach will hit the ValueError in 9.2/9.3
    while srow:
        geom = srow.GetValue(shape_field_name).GetPart()
        part_num = 0
        part_count = range(geom.PartCount)
        while part_num < part_count:
            part = geom.GetPart(part_num)
            for i in range(part.count):
                try:
                    pnt = part.GetObject(i)
                    print pnt.x, pnt.y
                # This ValueError was reliable until 10
                # Now it no longer gets raised, code broken!!!
                except ValueError:
                    print "Interior Ring:"
                    continue
            part_num += 1
        srow = srows.Next()
    del srow, srows
# End example_geometry_value_error function

So this is annoying.  Accessing points in the geometry one way works fine and in another manner it bombs.  For those of you wondering, this issue was around for all of 9.x but then finally (and unexpectedly) it got fixed in 10!  That’s great!  But what about code that needs to work in both 9.x and 10.x, that becomes the issue.  Not only that, it is still a huge amount of code to write just to get the points out of a geometry.  Part of me wishes for the simplicity of Avenue when geometries were fast and super easy to work with…so with a little motivation here is what we did:

def example_geometry(path):
    """
    Example showing use of Geometry helper class that does the heavy lifting
    on the geometry object and returns something quite similar to WKT/GeoJSON

    Structure is simple, a tuple of tuple of Point objects, very similar to
    Avenue days of geometry and WKT MultiLineString etc.
    (((0, 498266, 6100519, None, None), (0, 499775, 6100281, None, None),
      (0, 500224, 6098694, None, None), (0, 499616, 6097662, None, None),
      (0, 498346, 6096789, None, None)))

    :param path: Workspace Path
    :type path: str
    """
    feature_class = FeatureClass(osjoin(path, POLYGON))
    for srow in feature_class.search():
        print srow.get_value(feature_class.shape_field_name).as_tuple()
# End example_geometry function

Let me step through the subtlties of this code.  First of all, it is only 3 lines (could have made it one if the motivation was to distract).  See how we have a nice FeatureClass object, this is quasi similar to a DEFeatureClass in ArcObjects but implemented for Python.

The next thing to see is that we can loop over the rows in the feature class using a search method (cool!).  Not only that, we can loop over it using a for loop, now that is hot!

On the third line we are using get_value, a method on our SearchRow object that is smart enough to know how to return geometry in a nicely usable way, that is, by returning our Geometry object.  Also see that when we want to return a special field the FeatureClass object we can get the name using a property of the FeatureClass object (for example, shape_field_name).

Last thing is the use of the as_tuple method, a beautiful method that knows how to return geometry as a set of nested tuples of Point objects that can just as easily be reassembled.

About these ads
Categories: ArcGIS, GIS, Python
  1. February 5, 2012 at 10:43 pm | #1

    Is this all pure Python? No C++ at all for speed/access to underlying ArcObjects?

  2. February 6, 2012 at 12:32 pm | #3

    Have you done any benchmarking against pure gp.whatever calls to see the performance hit that doing all this inflicts? I like the idea of a clean API, but I’ve had to move a lot of the core arcpy functionality similar to this back to the C++ part because too much of this logic in Python was making scripts ported to Arcpy unacceptably slow.

    Also, you can get a for loop on the GP searchcursor pretty easily by just doing

    for row in iter(cursor.next, None):

    The iter() cojnstructor accepts an optional stop value, which is super nice.

    • February 6, 2012 at 9:46 pm | #4

      Thanks for that iter tip, the power and simplicity of Python shines through when the optional arguments are used!

    • February 6, 2012 at 10:09 pm | #5

      As for benchmarking, we have done some but likely not in the same areas as what you might be interested. Most of our basis and biases for performance involving cursors and geometry come from our days of Avenue scripting and AO development on 8.x and 9.x whereas our database access expectations are more from ADO and the Python DB-API.

      The slowest thing we see in any of the massive data handling (like bulk inserts of features into ArcSDE) is really already handled by the improvements made at 10 (SP2+) and 10.1 and that performance issue was the slowness of inserts. It seemed that regardless of techniques chosen we could not get more than (and this is just guesstimate) about 1000 features per second to insert when working with narrow tables (i.e. few attributes) and low dimension geometry. Other examples we have is doing inserts of 100k features of Z and M aware lines with the total number of vertices around 10,000,000 across all features — this slowed down considerably to about 10 features per second (and directly using arcgisscripting). I’ll post up a performance comparison here shortly between raw arcgisscripting, pygp cursors, and also our validated cursors.

      More recently we rewrote one of our VB.NET AO applications into Python and had a gain of about 8-10 times in performance when running on a single process (and about 30 times boost when running as multiprocess). In this case the lesson learned was around implementation and algorithm and not so much the choice of language.

      What I have seen over and over again .NET developers dive into Python is that they write their Python like it is .NET. Totally understandable, I find my C# work smells like Python (but that’s a sweet smell right?) — the point being that performance can suffer for reasons that are tied to style and structure that may look ok but when profiled show a different side of the issue.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: