Home > ArcGIS, General, GIS, Python > Visualizing Data Source Inventory Across Layers and Maps using Data Inventory Diagrams (DID)

Visualizing Data Source Inventory Across Layers and Maps using Data Inventory Diagrams (DID)

This has been one of those nagging items that comes up every so often in conversations around inventorying datasets in an enterprise environment and understanding which datasets are actually in use and which ones are used a lot (i.e. referred to by many map documents or layer files).

When were upgrading our Integrated Repoint and Repath tool last year we consciously implemented and exposed some items that would allow for this to be fairly easy.  The first was implementation of a full suite of Layer types and with this we also sweetened the return values from the data_source property – instead of just returning a string we actually return a Data Element object and do it in a manner that works for data sources that are unreachable (i.e. broken layers and table views).  The second was to put together a function that would seek out layer files and map documents.  The third thing we did was abstract Map Documents and Layers into the concept of a MappingObject (ok, perhaps not the best name but it was intended to be generic) so that these could be handled in virtually the same manner from a file standpoint.  There of course were a few more things we did but that probably covers the major items that were done with inventorying in mind.

More recently we started thinking some more about how to generate meaningful outputs from an inventory that could highlight pinch points and orphans at the same time.  I mean you could just write an inventory to spreadsheet and then do a bunch of filters and sorts.  Or if you were more savvy perhaps put the data into database and run some summaries using GROUP BY’s and so forth.  While useful overall those approaches seem to be just one way of slicing the inventory and what we were looking for is something that tells us, like slaps us in the face with it, which datasets are the most used/critical datasets.

What we came up with is a Data Inventory Diagram (DID) that depicts the files (Map Documents and Layer Files) and the data sources they contain and color coded by usage (how many referrals).  For example, this is a Data Inventory Diagram for just a single file:

did_pygp_sum_root

The file reference is in the center of the image and shown as a grey ellipse and the data sources (in this case) radiate out from the file and are shown as rectangles.  Light blue is just normal one reference and the purplish blue is simply showing that these data sources are used more frequently than once. 

Running on one file is a pretty easy case and really the sole intention behind what we want to accomplish – what we want to do is inventory large networks.  Thankfully we code for the complex and it works for the simple cases (scales down nicely!) so when we want to run this on set of folders we just do so with no additional code.  This example is mildly larger than the previous and touches approximately 50 Layer Files and Map Documents.

did_pygp

Here you can see that by running this on a larger set of files you begin to immediately realize which datasets are being used the most just by glancing at the diagram (i.e. pink and purple boxes). 

Note that in this diagram we’ve increased the information on each of the data source boxes to indicate how many times that the dataset was found and to make the diagram even easier to read we color-code the arrows from the files (grey ellipses) to the data source.

did_pygp_crop

Can’t forget about the orphans either, you should be able to see that there are some data sources that are only used by one file, important information again if you want to understand impact or need to prioritize efforts.

Does this scale up?  Sure thing, remember above that we implemented the return value from data_source as a Data Element object.  What this means, well, what it means because pygp  handles it for us, is that we can readily segregate these Data Inventory Diagrams by Workspace (now that’s cool).  Man do I wish I had this years ago when doing a pile of infrastructure upgrades for ArcGIS Server and ArcSDE – read the network and tell me what is important. 

In case you’re wondering, this functionality is available via our Integrated Repoint and Repath tool and is also bundled into our Spatial Data Sniffer and for followers of the blog, yes, it is all built around our Python Geoprocessing module (pygp).

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.