ACCURACY ASSESSMENT WORKSHEET 2/15/93 After reading several articles on the subject and spurred by the data quality report that came with the ARC/USA data, I have tried to come up with a workable scheme to help us to better include data quality information in our documentation of GIS coverages. Data quality in the past was generally understood to mean the positional accuracy of the coordinates. Presently, data quality has come to include in addition to POSITIONAL ACCURACY, LINEAGE (what were the source materials and how was the coverage derived from the source materials), ATTRIBUTE ACCURACY (do the attributes correctly describe the geographic entities), LOGICAL CONSISTENCY (are the features topologically correct-no dangling nodes or unclosed polygons, etc.), COMPLETENESS (is anything missing; were certain features left out because of small size or area). Sometimes TIMELINESS is included in the data quality list (how current is the data). Many of these areas I believe we handle already in our documentation. Some of these items we know do but we don't report them. I think that for positional accuracy there are some things that we need to do better to describe what we know about the coordinates in the coverage, but currently do not have a procedure for it. In the past, we have used the scale of the USGS base map as an estimate of the positional accuracy of the coverage. In some cases this is necessary information and should be a part of the accuracy assessment. In other cases it has little to do with the real accuracy of our data, but was a convenient way to say something about it. What I have come up with is based on the ESRI procedure used with the ARC/USA data. It is called a deductive estimate of the positional accuracy based on errors that may(!) have occurred in each production step. The total error is found using a root-sum-square calculation, which fortunately means that when you add everything up, the error isn't as bad as it sounds at first! Of course it may have nothing to do with reality, but as yet we don't have any way to test data with reality. So we'll do the best we can without getting too bent out of shape, I hope. To calculate the overall positional accuracy, we will try to evaluate as many of the following factors: error introduced by the scale of base map, digitizer transformation error, digitizer operator errors, processing errors, and data compilation errors. For many of these factors we can only make an educated guess, either because we didn't record some processing step or message, or there just isn't any good way to measure something like operator error or compilation error. Some of our data just has a nebulous or vague quality about it anyway, so don't make a big deal about calculating the little stuff. For coverages like the PLSS24, ROAD100, and RIVER100 it is important to calculate these numbers because they will be used in so many ways. For others, its more important to estimate the nebulousness. PLANIMETRIC ACCURACY | GROUND DISTANCES IN METERS AT VARIOUS MAP SCALES |Minimum Arc/Info AT VARIOUS MAP DISTANCES |Digitizer Digitizing Hand-drawn NMAS(90%) |Resolution Target maps MAP SCALE METERS | .001" .003" .010" .020" .5mm 1mm 2mm ========================|============================================================ 1:2,000,000 1016.00| 50.80 152.40 508.00 1016.00 1000.00 2000.00 4000.00 1:1,000,000 508.00| 25.40 76.20 254.00 508.00 500.00 1000.00 2000.00 1:500,000 254.00| 12.70 38.10 127.00 254.00 250.00 500.00 1000.00 1:250,000 127.00| 6.35 19.05 63.50 127.00 125.00 250.00 500.00 1:100,000 50.80| 2.54 7.62 25.40 50.80 50.00 100.00 200.00 1:62,500 31.75| 1.59 4.76 15.88 31.75 31.25 62.50 125.00 1:24,000 12.19| 0.61 1.83 6.10 12.19 12.00 24.00 48.00 1:15,580 13.41| 0.40 1.21 4.02 8.05 7.92 15.84 31.68 1:12,000 10.16| 0.30 0.91 3.05 6.10 6.00 12.00 24.00 1:7920 (660') 6.71| 0.20 0.60 2.01 4.02 3.96 7.92 15.84 1:4,800 (400') 4.06| 0.12 0.37 1.22 2.44 2.40 4.80 9.60 1:1,200 (100') 1.02| 0.03 0.09 0.30 0.61 0.60 1.20 2.40 *)DATA COMPILATION ERRORS The first step in estimating the positional error is to try to determine how the data was compiled on the map that was digitized. There are two cases: digitizing existing data on maps and adding information to maps to be digitized. Data compilation errors can be the most significant errors in the estimation of coverage accuracy, particularly for geographic features plotted or drawn by hand on base maps. Factors affecting the compilation are how much care was used in plotting the features (were they "eye-balled or plotted using a light table or zoom transfer scope), what features were used to locate the object's position and probably the most important factor, the real nature or inherant accuracy of what the original data tries to represent. For features digitized directly from USGS maps, like roads and rivers, it is assumed that the data compilation errors are small enough so that the maps conform to NMAS. NMAS accuracy measures really only apply to well-defined points like road intersections. It is not known to what extent other features on the USGS maps deviate from those measures, but it is not hard to imagine that the green forest areas and shorelines of water bodies are very dependant on the date of aerial photography used to create the quad map. For digitized lines or points that were plotted by hand, concideration must be given to which features on the base map are used to locate the object, say a well or a field boundary, and accurately place it on the map. If roads or PLSS section corners are used to locate objects, the major factor becomes the compiler's ability to accurately measure the distance from the object to the known feature. Using a standard ruler, that ability may only translate into a plottable accuracy of 1/32" or .5mm. When no plotting aids are used to transfer a line from one map to the base map, a 1mm to 2mm error is probably generous. If the position of the data is poorly known or the data itself represents nebulous or vague boundaries there is little point in worrying about basemap, digitizer or operator error. I believe, however, that it is important to give even a rough estimate of how poorly known the position may be. It is perhaps better to be on the conservative side rather than to be overly optimistic about the locational accuarcy. This way, it is hoped that the data will not be pushed past its useful limits because no warning was given about its true locational nature. There are no good guidelines, only the knowledge of the persons involved in creating the data or coverage. To summarize the various cases: * When the digitized data is from the printed map itself, well defined features such as roads, bridges, buildings and stream center lines should conform to NMAS and no additional compilation error is accumulated. * When digitized data is located and plotted on a map based on well defined features such as roads, bridges corners of large buildings, centers of smaller buildings and monumented benchmarks, plotting error should be in the range of .5mm to 1mm, with luck. Otherwise, when "eye-balling" in a line from another map, I think 2mm is pretty generous. * When data represents poorly located features, or vague, interpreted, or transitional boundaries such (soils, geologic units, hand-drawn contours or forest/pasture land cover classes) map scale is much less important than a well-educated guess by the data developer as to the inherant accuracy of the source material. Do the best you can. If you find that the error from this step is large, don't waste time figuring out the following items--they won't affect the outcome that much. *)BASE MAP ERROR The next step in estimating the positional accuracy of a coverage is to determine the accuracy of the base map. Traditionally scale has been used as a proxy of positional accuracy, mainly because USGS base maps are supposed to conform to National Map Accuracy Standards (NMAS) based on their scale. Typically USGS quad maps are used as base maps for digitizing and hence the coverage developes an inherant "accuracy" from its base map. NMAS measures for various scales are listed in the accompanying table. *)DIGITIZER TRANSFORMATION ERROR The digitizer transformation error is given by ARC/INFO as the RMS error when map tics are entered at the digitizer. Up to now, digitizer RMS errors have rarely been recorded, but in general practice, values of .003" up to .01" have been used. If the RMS value is unknown, use a value in the middle range, like .005" or .006". Unless the map was really off, this is not usually a significant source of error. Convert map inches at scale to real-world meters using the table. *)DIGITIZER OPERATOR ERROR Error from the act of digitizing is a hard one to quantify, but I believe we can make some guesses. If the operator is very, very careful, working with a small number of easily identifiable points, give them an error of .01" (category 1). If the operator is very, very careful, but doing lots of digitizing, give them an error of .02" (category 2). If the operator is doing lots of digitizing of curved boundaries or contours, give them an error of .04" (category 3). Convert map inches at scale to real-world meters using the table. You might want to concider things like how the operator handled thick, hand-drawn lines, if they went down the middle or had to interpret contour lines as they went along. *)PROCESSING ERRORS Errors from processing can come from using CLEAN with a high fuzzy tolerance, or GENERALIZE with a large weed distance, or digitizing with high values for weed distance or snap distance. Fuzzy tolerances are recorded in the log file of the coverage, but other values are hard to recover if they were not recorded at the time of coverage creation. Do what you can with the log file and don't mess with very small ( <1.0 meter) values. For CLEANs, use the value of the largest fuzzy tolerance used. *)MISCELLANEOUS STUFF When data is located using PLSS township, range, section and QQQQ, etc. (using John's ADD_UTM program), point will be within: SECTION = 800-1130 meters Quarter = 400-570 meters QQ = 200-280 meters QQQ = 100-141 meters Now the effects of the accuracy of the section corners kicks in: QQQQ = 75 meters (70+22 meters)**1/2 QQQQQ = 40 meters QQQQQQ = 28 meters When using conversions from latitude/longitude data, the number of digits in the fractional part of the number becomes important. At 42 degrees North, a degree of parallel (latitude) is 82,853 meters. Between 41 to 42 degrees North, a degree of meridian (longitude) is 111,062 meters. therefore: one second is 23 meters wide and 31 meters tall. so seconds should go to the 1/100 place, ie. dd mm ss.00 and decimal degrees should go to the 6th place, ie. dd.000000 *)EXAMPLE 1--PLSS DATA 1) Data compilation errors: Because we digitized points printed on the map, we assume no additional data compilation error because the map conforms to NMAS. 2) Base map error: Use NMAS value from table, 1:24,000 scale topo quads--12.19 meters. SQUARED ERROR = 12.19x12.19=148.60 3) Digitizer transformation error: Perkin-Elmer digitizing program handled conversion of digitizer units to map projection units. A built-in test made sure tics were within 2 digitizer units (1/50") of where they were supposed to be. At 1:24,000, 1/50" = 12.19 meters. SQUARED ERROR = 12.19x12.19=148.60 4) Digitizer operator error: Digitizing of PLSS was done by one person only, Madhukar, who was usually very careful. I think he probably fit into category 2 (careful digitizing of many point locations), .02" at 1:24K = 12.19 meters. SQUARED ERROR = 12.19x12.19=148.60 5) Processing errors: We first cleaned these with a 24 meter fuzzy tolerance, back in the good old, stupid days. After we figured out how much that screwed up the section corners, we used much smaller values. I can not find a complete summary of how we reprocessed the data, but it I think it usually went something like this: CLEAN 5 meters, 2 x CLEAN 1 meter, 2 x CLEAN 3 meters. Largest value used was 5 meters. SQUARED ERROR = 5x5=25 ROOT-SUM-SQUARED calculation: (148.6 + 148.6 + 148.6 + 25)**1/2 = (470.8)**1/2 = 21.7 meters *)EXAMPLE 2--ALLUV100, Alluvial deposits 1)Data compilation errors: Hand transferred and plotted from various scales of maps, error around 2mm, at 100k--200 meters SQUARED ERROR =200x200=40000 2)Base map error: 100k county maps, using value NMAS from table--50.8 meters. SQUARED ERROR = 50.8x50.8=2580.64 3)Digitizer transformation error: Unrecorded, probably close to .005", at 100k--12.7 SQUARED ERROR = 12.7x12.7=161.29 4)Digitizer operator error: Digitizing was done mainly by one person, Bob Rosenburg. This probably fits category 3, digitizing lots of curved lines; error around .04", at 100k-- 101.6 meters SQUARED ERROR = 101.6x101.6=10322.56 5)Processing errors: CLEAN (I guessed at this) with a fuzzy tolerance of 50 meters. SQUARED ERROR = 50x50=2500 ROOT-SUM-SQUARED calculation: (40000+2580+161+10323+2500)**1/2= (55564)**1/2 = 236 meters As you can see, the data compilation error had the greatest effect on the overall error, so estimating this is important.