Data Quality and Database Management within the Geospatial environment

Thitipat Wongsawan
2 min readNov 20, 2021

Data Quality and Database Management within the Geospatial environment has many points that should be the scope and checked and I’m concluding about Data Quality and Database Management in 3 topics.

1. Data Specification: Good primary data will be a source of a good database, when starting work we need a good data specification. Good data specifications have to be clear, comprehensive, and not duplicate collection. So take time to research a data specification is the best route. Of course, It’s difficult to make the best data specification and database without a fixed process, but we can decrease the fixed rate to low if our research is good enough. Then we can use data specification to make a user data manual and use it to train operators.

2. Data trust level: In the age of disruption we received big data from work. The first thing we should do when data comes is to give the trust level of data. the separate data level has a separate QC method, manhours, tools, and work period. We can identify the data trust level by many keys whether operators historical work, an average of data collection operation, and internal news. After we identified the data trust level we can send each data to a smooth workflow. If the data trust level is low we should fully control QC by recording all mistaken points>concluding mistaken cases> finding tools to check and recheck>find tools to cross-check QC again, etc. If the data trust level is high maybe drop some method taken times after checking many times and not find any errors.
Example 4 trust levels.
Level1: 100% QC by tools>100% QC by operators>100% cross-check by tools>80% cross-check by operators.
Level2: 100% QC by tools>90% QC by operators>100% cross-check by tools>70% cross-check by operators.
Level3: 100% QC by tools>80% QC by operators>100% cross-check by tools>60% cross-check by operators.
Level4: 100% QC by tools>70% QC by operators>100% cross-check by tools>50% cross-check by operators.

3. Data Freshness: World geospatial data changed every time, We can update the data from the real world to our database but we can’t update all map databases in real-time (exclude we can let all users update the data and send it to us every time). So data freshness rate is important because we should choose how many times update 1 area of data per year. If not updated for a long time many places and many POI will be lost and our map database will be mistaken navigation. But simultaneous the data freshness is directly variable with cost, time, and manhours. So our homework is to identify the data freshness of every place to set a smooth maintenance plan that matches cost, time, and manhours.
Keys to identify the data freshness rate are can be changed following the company policy and mapping customers, and we’ll find it from customer history, feedback, and environment data.

If we can clear these 3 topics, our data quality and database management will be flowing smoothly up to your tools.

--

--