ACCURACY ASSESSMENT WORKSHEET      2/15/93

        
        After reading several articles on the subject and spurred by the data quality
        report that came with the ARC/USA data, I have tried to come up with a workable
        scheme to help us to better include data quality information in our
        documentation of GIS coverages.  Data quality in the past was generally
        understood to mean the positional accuracy of the coordinates.  Presently, data
        quality has come to include in addition to POSITIONAL ACCURACY, LINEAGE (what
        were the source materials and how was the coverage derived from the source
        materials), ATTRIBUTE ACCURACY (do the attributes correctly describe the
        geographic entities), LOGICAL CONSISTENCY (are the features topologically
        correct-no dangling nodes or unclosed polygons, etc.), COMPLETENESS (is
        anything missing; were certain features left out because of small size or
        area). Sometimes TIMELINESS is included in the data quality list (how current
        is the data). Many of these areas I believe we handle already in our
        documentation.  Some of these items we know do but we don't report them.

        I think that for positional accuracy there are some things that we need to do
        better to describe what we know about the coordinates in the coverage, but
        currently do not have a procedure for it.  In the past, we have used the scale
        of the USGS base map as an estimate of the positional accuracy of the coverage.
        In some cases this is necessary information and should be a part of the
        accuracy assessment.  In other cases it has little to do with the real accuracy
        of our data, but was a convenient way to say something about it.  What I have
        come up with is based on the ESRI procedure used with the ARC/USA data.  It is
        called a deductive estimate of the positional accuracy based on errors that
        may(!) have occurred in each production step.  The total error is found using a
        root-sum-square calculation, which fortunately means that when you add
        everything up, the error isn't as bad as it sounds at first!  Of course it may
        have nothing to do with reality, but as yet we don't have any way to test data
        with reality.  So we'll do the best we can without getting too bent out of
        shape, I hope.

        To calculate the overall positional accuracy, we will try to evaluate as many
        of the following factors: error introduced by the scale of base map, digitizer
        transformation error, digitizer operator errors, processing errors, and data
        compilation errors.  For many of these factors we can only make an educated
        guess, either because we didn't record some processing step or message, or
        there just isn't any good way to measure something like operator error or
        compilation error.  Some of our data just has a nebulous or vague quality about
        it anyway, so don't make a big deal about calculating the little stuff.  For
        coverages like the PLSS24, ROAD100, and RIVER100 it is important to calculate
        these numbers because they will be used in so many ways.  For others, its more
        important to estimate the nebulousness.
        










        

      PLANIMETRIC ACCURACY   |                       GROUND DISTANCES IN METERS
      AT VARIOUS MAP SCALES  |Minimum    Arc/Info    AT VARIOUS MAP DISTANCES
                             |Digitizer  Digitizing                             Hand-drawn
                   NMAS(90%) |Resolution Target                                 maps
     MAP SCALE     METERS    |   .001"   .003"   .010"   .020"   .5mm     1mm     2mm
     ========================|============================================================
     1:2,000,000      1016.00|   50.80  152.40  508.00 1016.00  1000.00  2000.00   4000.00
     1:1,000,000       508.00|   25.40   76.20  254.00  508.00   500.00  1000.00   2000.00
     1:500,000         254.00|   12.70   38.10  127.00  254.00   250.00   500.00   1000.00
     1:250,000         127.00|    6.35   19.05   63.50  127.00   125.00   250.00    500.00
     1:100,000          50.80|    2.54    7.62   25.40   50.80    50.00   100.00    200.00
     1:62,500           31.75|    1.59    4.76   15.88   31.75    31.25    62.50    125.00
     1:24,000           12.19|    0.61    1.83    6.10   12.19    12.00    24.00     48.00
     1:15,580           13.41|    0.40    1.21    4.02    8.05     7.92    15.84     31.68
     1:12,000           10.16|    0.30    0.91    3.05    6.10     6.00    12.00     24.00
     1:7920 (660')       6.71|    0.20    0.60    2.01    4.02     3.96     7.92     15.84
     1:4,800 (400')      4.06|    0.12    0.37    1.22    2.44     2.40     4.80      9.60
     1:1,200 (100')      1.02|    0.03    0.09    0.30    0.61     0.60     1.20      2.40


        
        *)DATA COMPILATION ERRORS

        The first step in estimating the positional error is to try to determine how
        the data was compiled on the map that was digitized.  There are two cases:
        digitizing existing data on maps and adding information to maps to be
        digitized.  Data compilation errors can be the most significant errors in the
        estimation of coverage accuracy, particularly for geographic features plotted
        or drawn by hand on base maps.  Factors affecting the compilation are how much
        care was used in plotting the features (were they "eye-balled or plotted using
        a light table or zoom transfer scope), what features were used to locate the
        object's position and probably the most important factor, the real nature or
        inherant accuracy of what the original data tries to represent.

        For features digitized directly from USGS maps, like roads and rivers, it is
        assumed that the data compilation errors are small enough so that the maps
        conform to NMAS.  NMAS accuracy measures really only apply to well-defined
        points like road intersections.  It is not known to what extent other features
        on the USGS maps deviate from those measures, but it is not hard to imagine
        that the green forest areas and shorelines of water bodies are very dependant
        on the date of aerial photography used to create the quad map.

        For digitized lines or points that were plotted by hand, concideration must be
        given to which features on the base map are used to locate the object, say a
        well or a field boundary, and accurately place it on the map.  If roads or PLSS
        section corners are used to locate objects, the major factor becomes the
        compiler's ability to accurately measure the distance from the object to the
        known feature.  Using a standard ruler, that ability may only translate into a
        plottable accuracy of 1/32" or .5mm.  When no plotting aids are used to
        transfer a line from one map to the base map, a 1mm to 2mm error is probably
        generous. 
        

        If the position of the data is poorly known or the data itself represents
        nebulous or vague boundaries there is little point in worrying about basemap,
        digitizer or operator error.  I believe, however, that it is important to give
        even a rough estimate of how poorly known the position may be.  It is perhaps
        better to be on the conservative side rather than to be overly optimistic about
        the locational accuarcy.  This way, it is hoped that the data will not be
        pushed past its useful limits because no warning was given about its true
        locational nature.  There are no good guidelines, only the knowledge of the
        persons involved in creating the data or coverage.


        To summarize the various cases:

        * When the digitized data is from the printed map itself, well defined features
        such as roads, bridges, buildings and stream center lines should conform to
        NMAS and no additional compilation error is accumulated.

        * When digitized data is located and plotted on a map based on well defined
        features such as roads, bridges corners of large buildings, centers of smaller
        buildings and monumented benchmarks, plotting error should be in the range of
        .5mm to 1mm, with luck.  Otherwise, when "eye-balling" in a line from another
        map, I think 2mm is pretty generous.

        * When data represents poorly located features, or vague, interpreted, or
        transitional boundaries such (soils, geologic units, hand-drawn contours or
        forest/pasture land cover classes) map scale is much less important than a
        well-educated guess by the data developer as to the inherant accuracy of the
        source material.  Do the best you can.  If you find that the error from this
        step is large, don't waste time figuring out the following items--they won't
        affect the outcome that much.

        
        *)BASE MAP ERROR

        The next step in estimating the positional accuracy of a coverage is to
        determine the accuracy of the base map.  Traditionally scale has been used as a
        proxy of positional accuracy, mainly because USGS base maps are supposed to
        conform to National Map Accuracy Standards (NMAS) based on their scale.
        Typically USGS quad maps are used as base maps for digitizing and hence the
        coverage developes an inherant "accuracy" from its base map.  NMAS measures for
        various scales are listed in the accompanying table.


        *)DIGITIZER TRANSFORMATION ERROR

        The digitizer transformation error is given by ARC/INFO as the RMS error when
        map tics are entered at the digitizer.  Up to now, digitizer RMS errors have
        rarely been recorded, but in general practice, values of .003" up to .01" have
        been used.  If the RMS value is unknown, use a value in the middle range, like
        .005" or .006".  Unless the map was really off, this is not usually a
        significant source of error.  Convert map inches at scale to real-world meters
        using the table.


        *)DIGITIZER OPERATOR ERROR

        Error from the act of digitizing is a hard one to quantify, but I believe we
        can make some guesses.  If the operator is very, very careful, working with a
        small number of easily identifiable points, give them an error of .01"
        (category 1). If the operator is very, very careful, but doing lots of
        digitizing, give them an error of .02" (category 2).  If the operator is doing
        lots of digitizing of curved boundaries or contours, give them an error of .04"
        (category 3).  Convert map inches at scale to real-world meters using the
        table.  You might want to concider things like how the operator handled thick,
        hand-drawn lines, if they went down the middle or had to interpret contour
        lines as they went along.


        *)PROCESSING ERRORS

        Errors from processing can come from using CLEAN with a high fuzzy tolerance,
        or GENERALIZE with a large weed distance, or digitizing with high values for
        weed distance or snap distance.  Fuzzy tolerances are recorded in the log file
        of the coverage, but other values are hard to recover if they were not recorded
        at the time of coverage creation.  Do what you can with the log file and don't
        mess with very small ( <1.0 meter) values.  For CLEANs, use the value of the
        largest fuzzy tolerance used.


        

        *)MISCELLANEOUS STUFF
        
        When data is located using PLSS township, range, section and QQQQ, etc. (using
        John's ADD_UTM program), point will be within:

            SECTION = 800-1130 meters 
            Quarter = 400-570 meters 
            QQ = 200-280 meters
            QQQ = 100-141 meters

          Now the effects of the accuracy of the section corners 
          kicks in:
            QQQQ = 75 meters (70+22 meters)**1/2
            QQQQQ = 40 meters 
            QQQQQQ = 28 meters


        When using conversions from latitude/longitude data, the number of digits
        in the fractional part of the number becomes important.

          At 42 degrees North, a degree of parallel (latitude) is 82,853 meters. 
          Between 41 to 42 degrees North, a degree of meridian (longitude) is
          111,062 meters.

            therefore: one second is 23 meters wide and 31 meters tall.
              so seconds should go to the 1/100 place, ie. dd mm ss.00
              and decimal degrees should go to the 6th place, ie. dd.000000
                

        *)EXAMPLE 1--PLSS DATA

        1) Data compilation errors: Because we digitized points
        printed on the map, we assume no additional data
        compilation error because the map conforms to NMAS.

        2) Base map error: Use NMAS value from table, 1:24,000 scale
        topo quads--12.19 meters. SQUARED ERROR = 12.19x12.19=148.60

        3) Digitizer transformation error: Perkin-Elmer digitizing
        program handled conversion of digitizer units to map
        projection units. A built-in test made sure tics were
        within 2 digitizer units (1/50") of where they were
        supposed to be. At 1:24,000, 1/50" = 12.19 meters.  SQUARED
        ERROR = 12.19x12.19=148.60 

        4) Digitizer operator error: Digitizing of PLSS was done by
        one person only, Madhukar, who was usually very careful.
        I think he probably fit into category 2 (careful digitizing
        of many point locations), .02" at 1:24K = 12.19 meters.
        SQUARED ERROR = 12.19x12.19=148.60
        
        5) Processing errors: We first cleaned these with a 24
        meter fuzzy tolerance, back in the good old, stupid days.
        After we figured out how much that screwed up the section
        corners, we used much smaller values.  I can not find a
        complete summary of how we reprocessed the data, but it
        I think it usually went something like this: CLEAN 5 meters,
        2 x CLEAN 1 meter, 2 x CLEAN 3 meters. Largest value used was
        5 meters. SQUARED ERROR = 5x5=25  

        
        ROOT-SUM-SQUARED calculation: (148.6 + 148.6 + 148.6 + 25)**1/2 =
        (470.8)**1/2 = 21.7 meters 



        *)EXAMPLE 2--ALLUV100, Alluvial deposits

        1)Data compilation errors: Hand transferred and plotted
        from various scales of maps, error around 2mm, at 100k--200
        meters SQUARED ERROR =200x200=40000

        2)Base map error: 100k county maps, using value NMAS from
        table--50.8 meters. SQUARED ERROR = 50.8x50.8=2580.64

        3)Digitizer transformation error: Unrecorded, probably
        close to .005", at 100k--12.7  SQUARED ERROR =
        12.7x12.7=161.29

        4)Digitizer operator error: Digitizing was done mainly by one
        person, Bob Rosenburg.  This probably fits category 3,
        digitizing lots of curved lines; error around .04", at 100k--
        101.6 meters SQUARED ERROR = 101.6x101.6=10322.56
        
        5)Processing errors: CLEAN (I guessed at this) with a
        fuzzy tolerance of 50 meters. SQUARED ERROR = 50x50=2500

        
        ROOT-SUM-SQUARED calculation: (40000+2580+161+10323+2500)**1/2=
        (55564)**1/2 = 236 meters
        

        As you can see, the data compilation error had the greatest 
        effect on the overall error, so estimating this is important.