Abstract :
Abstract Geocoding is the process of converting addresses into spatial coordinates. It has become a need for the modern world in many fields such as strategy making, disaster management, location-based analysis, and planning of infrastructure, etc. Address matching is of critical importance in geocoding and is dependent on the language, address format, and components of the addresses. Different algorithms, tools, and applications have been designed for improving address matching. The goal of the research was to standardize geocode addresses using different Natural Language Processing (NPL) models. To standardize and geocode distinct types of addresses, Islamabad was chosen as the study area because it incorporates standard and unusual addresses. Address datasets were obtained from telecommunication industries operating in the study area. In this research, two NLP models including DeepParse (DP) and SpaCy’s Named Entity Recognition (NER) were utilized and trained for address standardization. Each of the models operates using different techniques for parsing the addresses. The SpaCy model performed well with an accuracy of over 80% in all types of addresses. The DP model only outperformed the SpaCy model in urban areas with an accuracy of 95%, the others were less than 80%. This study focused on what types of addresses must be used to improve geocoding. Methods discussed in this study would be helpful in achieving a higher success rate for improving address matching percentage in the geocoding process. This research will help in choosing the best technique to improve address matching of addresses in Pakistan.
Keyword :
N/A