Sunday, July 19, 2015

Using Google Earth for Geographic Exploration (GETECH)

Readers of this blog know that I use Google Earth (GE) to plan outings and to manage data that I collect using various GPS devices. I use Google Earth as a sort of entry-level Geographic Information System (GIS) and in the following series of posts I plan to share what I've learned about using Google Earth as part of a citizen/community science data management system. I'll probably move these posts out to a separate blog in the future but to get it started I'm putting them here. If you aren't interested in these more technically oriented posts you can skip them. I''ll stick (GETECH) on the title to make them easy to recognize.

On most of my outings I collect geo-referenced data using one or more GPS devices. I have a Garmin hand-held GPS, GPS in my phone and even a camera with GPS built-in. I collect "tracks", the GPS receiver stores a point every few seconds allowing you to trace your route, and I also save WayPoints. These are specific locations of interest and the GPS saves the coordinates along with a name you enter. The Garmin GPS can grab a bunch of coordinate pairs in quick succession and average them to improve the accuracy of the fix. This is useful if you want to get the most accurate location that your device can record, usually around ten feet. I'll do a future post on GPS accuracy to explain this limitation.

After a trip I upload the data from my GPS devices into GE. This allows me to see the tracks (my route) and waypoints in the geographic context provided by GE. If the data was collected for a specific use I move it into whatever storage system I'm using for that project (more on this in a future post). My Garmin GPS is directly supported by GE so I simply plug the GPS unit into a USB port and use the GPS option on the Tools Menu. There are options to allow you select what you want to import. By default GPS data is added into a sub-folder of the "Temporary Places" folder in your GE Places. I typically tweak things a bit and move the imported data into the "MyPlaces" folder. This allows me to save each GPS data set for ready access in GE. You could also right click on your newly uploaded dataset and save it to your computer using the Save Place As option.

It's even easier to grab the GPS data using the phone. I have several GPS related apps on my Android phone (a Google Nexus 5) but I most often use GPS Essentials. As with the dedicated Garmin, GPS Essentials allows me to record tracks and save waypoints (it has other capabilities that I'll discuss in future posts). To get the data into GE I use the Export feature of GPS Essentials and save the data directly to Google Drive (Internet-based storage). It takes a few minutes but once the data shows up in Google Drive I save it to my local computer as a KML file. KML is the native file format of Google Earth and most software that uses geographically referenced data can use data in the KML format.

This has been my basic working process for several years but I've recently become concerned that it was breaking down. The data I collect is best categorized as "ecological inventory". I note where species are found, when I saw them and I record additional information about the circumstances. The "where" and "when" are essential to the data having value and I was not managing this information in a consistent way. It's there --embedded in the KML files-- but I didn't have an easy way to find everything related to a location; or everything for a specific time span. Some of data was stored in raw files and some was in databases that I've created. To maximize the value the data needs to be consolidated and I need a more robust process for recording and managing the metadata. Also, with over 200 top-level sub-folders in my MyPlaces folder, Google Earth was running noticeably slower.

Screen capture of Google Earth after I cleaned up and organized my MyPlaces data. The lines are tracks captured using GPS and the markers are points of interest.

So I looked into how GE actually manages this data and found that all of the data you see in GE in the "MyPlaces" folder is actually stored as a single file on your computer named myplaces.kml. The location of this file varies based on the operating system of your computer and the version of GE you are using. On my computer, running Windows 7 and using Google Earth Pro, the file is saved to:
C:\Users\<your-windows-name>\AppData\LocalLow\Google\GoogleEarth

Seeing the MyPlaces.kml file made it clear why GE was straining a bit. The MyPlaces.kml file on my computer was just under 50 megabytes in size and using a text editor to open it I found that it contained over 1.1 million lines. For context, if we assume that a page of text averages around 60 lines, the MyPlaces file on my computer contained over 18,000 pages. That's a bit much.

I had let the situation get out of hand and now it was going to require a lot of work to straighten it out. One approach would be use the GE user interface. In GE you can save each folder or item in the MyPlaces folder to a separate file on your computer.  You can then delete that item from MyPlaces and open the file when you want to use that data. If the item you save is a folder GE creates a single archive file with the (.KMZ extension) containing the entire contents of the folder. This approach would work but it was going to be a tedious process to say the least.

A second alternative is to edit a copy of the MyPlaces.kml file using a text editor (note that I said "edit a copy" - don't edit the file used by GE). You can use any text editor to open a KML file. Unfortunately, given the size of the file on my computer this also was going to be a slow and error prone process. Using NotePad++ --an editor design for working with lots of text-- opening and navigating the file was extremely sluggish (recall the 1.1 millions lines). I could have copied out sections and worked on pieces of the file but the data is structured (it's an XML file if you are familiar with that sort of thing) and there are references among different sections in the file. It would be very easy to mess this up.

The solution I chose was to write a script to run through the file (a copy of course) and save each folder, document and placemark to a new, separate, KML file. I could then remove (making a copy) the MyPlaces.kml file used by GE (make sure that GE is not running if you do this). The next time GE is run it creates a new and empty MyPlaces.kml file and you are back to a "clean" install of GE. The separate KML files created by the script can then be opened using GE when I need them.

I'm making this script available for anyone who wants to run it and you can access here:
https://gist.github.com/kentstanton/3441cc368d3c52621b19

Please note that you must have PowerShell 5.0 installed on your computer to run the script and you need to know how to run PowerShell scripts. If you are familiar with PowerShell programming you can alter the script as needed. And it would not be too difficult to port the code to a different language if you do not have a Windows computer to run it on. The script requires Windows PowerShell 5.0; the very latest version (as of this writing). I'm working on a larger project that will incorporate this functionality and remove this requirement but I wanted to go ahead and make this available now because it might be useful to some people as is.