Inventorying Feature Classes
Over on another blog we saw a post about getting a listing of feature classes by recursively navigating over the file structure. Pretty good post but it was kind of like reading a snippet of code from how we used to do things back in early days of geoprocessing and Python and especially before pygp became what it is today.
What exactly were the notable items that made us cringe?
- Missing parameter documentation for descriptions and types
- Use of an argument named workspace where really a folder is the only legit input
- Storage of existing environment settings prior to overwriting them
- Multiple calls to listing functions (e.g. ListWorkspaces) with different arguments
- Listing feature classes on a Feature Dataset (aka slow)
- If statements to test if an iterable has content
So what do we suggest other than blatant promotion of pygp? Roughly in order of the items above we would recommend:
- Documentation, docstrings are good but not the full story, learn to use epydoc or reStructuredText and your code becomes self documenting and readability through the roof. Take it another level and use doctest (topic for another day).
- Choose your argument names carefully to allow fellow developers to easily recognize scope of use, i.e. folder is more precise than workspace.
- Context management, a beautiful thing that can be used for more than just file opening and da cursors. The with statement can be your pal if you learn the pattern.
- Step it up and use objects that know their types, getting past use of strings for return values for almost everything (except of course where we want them)
- There are much faster ways to get the feature classes (and other elements) within a feature dataset than to list the feature classes.
- No need to test for content in the iterable, just iterate over it in the list comprehension, and if it is empty it will be a “do nothing” action
Here is roughly how we would code this type of recursive listing of feature using pygp these days in half as much code and three times as much documentation and a little bit of messaging to improve the user experience.
# -*- coding: ascii -*-
"""
Recursive Listing of Feature Classes
"""
__author__ = 'Jason Humber - Integrated Informatics Inc.'
__maintainer__ = '$LastChangedBy$'
__vcs_id__ = '$HeadURL$'
from os import walk
from pygp import gp
from pygp.environment import env, SwapContext
from pygp.dataelement.extended import FolderWorkspace
__copyright__ = 'Copyright (c) 2012, Integrated Informatics Inc.'
__license__ = 'LGPLv3'
__version__ = '0.1.0'
__email__ = 'gis@integrated-informatics.com'
def seek_feature_classes(folder):
"""
Recursively searches a folder/directory for Feature Classes, returning a
list of all Feature Classes found.
:param folder: Root Folder from which the recursive seek commences
:type folder: str
:return: List of Feature Class objects
:rtype: list
"""
folder_workspaces = [folder]
feature_classes = FolderWorkspace(folder).feature_classes
with SwapContext('workspace', folder):
for root, _, _ in walk(folder):
if root not in folder_workspaces:
continue
gp.add_message('Seeking feature classes inside %s...' % root)
env.workspace = root
for name, workspace in gp.list_workspaces().items():
if isinstance(workspace, FolderWorkspace):
folder_workspaces.append(name)
feature_classes.update(workspace.feature_classes)
return feature_classes.values()
# End seek_feature_classes function
if __name__ == '__main__':
pass
In this example we’ve left out the feature type and wild card portions but if we were to include it would be done as post process on the feature class list.
Let’s talk a little more about this example. We begin by keeping track of folders that are in fact folder workspaces because we only want to recursively dive into folders and not all workspaces that are folder-like (an argument could be made about folders underneath coverage workspaces). At this time we give ourselves a container to store the feature classes based on the feature classes in the specified folder.
Now we use a context manager for environment settings. This way if there is an exception during the processing we will get back to our original workspace settings. Looping over the folders using a walk from the os module is a very nice way to recursively access directory names but it will return file geodatabase and coverage workspaces too, so we test the directory name and if it is not in the folder workspaces we skip onto the next folder. Follow this by a little love to the user to brighten their day.
At this point we have not done much so we set the workspace to be the current folder and then find the workspaces within it and in our case this returns nicely crafted objects that represent workspace objects of varying types. In the case of FolderWorkspaces we want to be sure to capture these names for subsequent tests in the loop (i.e. root not in folder_workspaces) and in all cases we want to pull the feature classes. After the recursive looping is done we exit the context manager and the environment settings are put back to what they were before we began (cool eh).
An important note here is that our feature_classes property on a workspace object knows that feature classes (and other objects for that matter) can reside in a feature dataset so we do not need to concern ourselves with manually diving in. The symmetry that this provides is the simplicity for which we strive.