CHAPTER 2: Some specific issues related to geographical names data modelling
The special characteristics of place names data, compared to other geospatial data, are usually related to data modelling needs. The following sections provide some tentative principles for a general conceptual model for place names information as well as the core elements of the model and the interrelationships and key attributes of the elements. The model presented here, in its broadest application, is intended for modelling place name information in a multilingual, multi-names, and multi-scriptural environment.
2.2 Place names data modelling related standards, manuals, or guidelines
2.2.1 ModellingSome current ISO/OGC related references for geospatial data modelling are:
- ISO 19101-1:2014, Geographic information -Reference model -Part 1: Fundamentals
- ISO/TS 19103:2015, Geographic information — Conceptual schema language
- ISO 19107:2019, Geographic information -Spatial schema
- ISO 19137:2007, Geographic information -Core profile of the spatial schema
- ISO 19109:2015, Geographic information -Rules for application schema
- ISO 19131:2007, Geographic information -Data product specifications
- ISO 19104:2016, Geographic information -Terminology
- ISO 19112:2019, Geographic information -Spatial referencing by geographic identifiers
- INSPIRE Data Specification on Geographical Names (European Commission)
- Open Regional Gazetteer (EuroGeographics)
The data instance identifiers must be permanent and unique within the data set. Together with other identifiers (such as data set identifier), for example a permanent URI can be created, e.g., for 'Linked Data' purposes.
A current ISO/OGC related reference for the generation of universally unique identifiers is:
-
ITU-T X.667 = ISO/IEC 9834-8:2014, Generation of universally unique identifiers and their use in object identifiers
The selection of appropriate metadata elements to depends on the application. Some current ISO/OGC related and other references for metadata for geospatial and geographical names data are:
- ISO 19115-1:2014, Geographic information — Metadata — Part 1: Fundamentals
- INSPIRE Metadata
Geometry
The “geometries” of named places/features are often vague. Different practices and rationales exist in different countries, data sets and applications. The previously mentioned ISO 19107:2019 introduces geometric primitives point, curve, surface, and geometric complex (combination). File and tata exchange formats (e.g., GML, GeoJSON) specify the notations to be used, for example Point, LineString, Polygon. Every named place must have a reference point as its fundamental geometry.
Some current references are:
- ISO 19111:2019, Geographic information — Referencing by coordinates
- EPSG, Geodetic Parameter Dataset
There are all kinds of national feature type catalogues in use. An appropriate feature type classification depends on the application. In the following the overall ISO methodology is provided as well as examples for a general 1st level and 2nd level lassification with clear definitions, which could work as a global standard too:
- ISO 19110:2016, Geographic information — Methodology for feature cataloguing
- Examples for regional feature type catalogues
- EuroGeographics Regional Gazetteer - Documentation: https://ome-download-data.s3.eu-west-1.amazonaws.com/open-gazetteer/documents/2022-10-14_OpenRegionalGazetteer_specification.pdf
Appropriate additional geographic attributes depend on the dataset and application. Some current references for geographic attributes for geospatial data (e.g., country, administrative area, (global) grid reference, elevation, another feature type) are:
- ISO 3166, Country codes
- ISO 3166-1:2020, Codes for the representation of names of countries and their subdivisions — Part 1: Country code
- ISO 3166-2:2020, Codes for the representation of names of countries and their subdivisions — Part 2: Country subdivision code
- ISO 3166-3:2020, Codes for the representation of names of countries and their subdivisions — Part 3: Code for formerly used names of countries
- M49 Standard: United Nations Statistics Division, Standard country or area codes for statistical use (M49)
- SALB, UN Second Administrative Level Boundaries
- ISO 19170-1:2021, Geographic information — Discrete Global Grid Systems Specifications — Part 1: Core Reference System and Operations, and Equal Area Earth Reference System
Any metadata attribute(s)
Metadata for the entire dataset or product or delivery may be sufficient, depending on the dataset and application. Feature specific metadata might be introduced if appropriate, e.g., source of the geometry, data source, life span information.
2.2.5 Place name / Geographical nameUNGEGN manuals and guidelines on the standardization of geographical names are found here: https://unstats.un.org/unsd/ungegn/pubs/.
UNGEGN acknowledges that UN-GGIM seeks data specifications that follow agreed standards which are interoperable between UN-GGIM's fundamental data themes. Beyond that, UNGEGN would also like to impress an important aspect of geographical names; that is the intangible cultural heritage elements that go hand in hand with the physical characteristics relating to location identification for administration, planning, navigation, emergency response, science, resilience, etc. The sense of place, identity (both individual and collective), nation building, commemoration, language and story that go with each geographical name offer insights into much more than the 'data structure' within which this information sits. The treasure that this information reflects can be difficult to quantify, but it can elevate peoples' status and connection to the land they are part of - their place to stand and what they seek to preserve and sustain. Cultural heritage data is not owned by those who capture it, but by those people who named those places. Therefore, an element of respect and sensitivity needs to be attributed to that cultural heritage to ensure its accuracy and authenticity from the people of the place. In making this information discoverable consideration should be given to ensure the safety of sensitive cultural heritage data, ie. ensuring that the people of the place are comfortable with the level of cultural heritage data provided about their place names. In doing this, a shared and positive outcome between geographical naming authorities is that equal attention to cultural heritage translates to acceptance, celebration and longevity of place names within communities.
According to the conceptual model, each named place is associated with one or several geographical names. The different geographical names of one given spatial object may be, for example, parallel names in one or different languages, or names in different forms (e.g., complete, and short forms of country and administrative unit names).
LanguageThe current references for language codes to be used for geospatial data are:
- ISO 639, Language codes
- ISO 639-1:2002, Codes for the representation of names of languages — Part 1: Alpha-2 code
- ISO 639-2:1998, Codes for the representation of names of languages — Part 2: Alpha-3 code
- ISO 639-3:2007, Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive coverage of languages
- ISO 639-4:2010, Codes for the representation of names of languages — Part 4: General principles of coding of the representation of names of languages and related entities, and application guidelines
- ISO 639-5:2008, Codes for the representation of names of languages — Part 5: Alpha-3 code for language families and groups
- SIL International, ISO 639 Code Tables , all ISO 639 parts in a single table
During the development of the EU INSPIRE Data Specification on Geographical Names - Technical Guidelines, the different versions of ISO 639 standards were evaluated. The conclusion of the evaluation was: “Language is a major aspect of geographical names, and the choice of most appropriate codes received much attention during the preparation of this specification. The only solution enabling to code languages with sufficient details, but also enabling to code languages family as existing in some actual data sets, appeared to be a combination of the non-conflicting codes of ISO 639-3 and ISO 639-5.”
Nativeness (endonym or exonym)The simplest way of dealing with 'nativeness' is the differentiation of a geographical name in two options: as 'endonym' or 'exonym'. There are endonyms (names given by native / local people) and other names (exonyms), not given by native / local people. A third option besides endonym and exonym may be considered as some types of toponyms are discussed (e.g., names in Antarctica, undersea features...). The current definitions for endonym and exonym agreed by UNGEGN are:
Endonym:Name of a →geographical feature in an official or well-established language occurring in that area where the feature is situated. Examples: Vārānasī (not Benares); Aachen (not Aix-la-Chapelle); Krung Thep (not Bangkok); Al-Uqşur (not Luxor).
Exonym:Name used in a specific language for a →geographical feature situated outside the area where that language is widely spoken, and differing in its form from the respective →endonym(s) in the area where the geographical feature is situated. Examples: Warsaw is the English exonym for Warszawa (Polish); Mailand is German for Milano; Londres is French for London; Kūlūniyā is Arabic for Köln. The officially romanized endonym Moskva for Mocквa is not an exonym, nor is the Pinyin form Beijing, while Peking is an exonym. The United Nations recommends minimizing the use of exonyms in international usage. See also →name, traditional.
Status of nameAn appropriate list of status values depends on the data set, and the scope and application of the data set. For example, the four types of the European INSPIRE Specification (official, standardized, historical, other) may not be appropriate or sufficient for other purposes and applications. It would be useful to learn about different practices and their rationale in different countries, data sets and applications.
Any linguistic attribute(s)Possible or appropriate attributes, such as etymology, may depend on the application. For example, the INSPIRE data specification recognizes the linguistic gender and linguistic number as attributes.
Any metadata attribute(s)According to the dataset and application, name specific metadata, e.g., source of name, life span information, can be considered, i.e., attributes that can have different values by object/feature.
2.2.6 Spelling of nameEach geographical name may have one or several spellings, i.e., proper ways of writing it, in one or several scripts, like the Latin/Roman, Greek and Cyrillic scripts. All original and correct spellings shall be retained, for example, no omission or transformations of diacritical characters should be allowed.
An example:
- The city of Athens is the named place
- The endonym “Athína” (Greek language) and the exonym “Athens” (English language) are two different geographical names of this unique named place
- “Aθnνa" (Greek script) and its standard romanization "Athína" (Latin script/Romanized form) are two different spellings of the same geographical name “Athína”
At present, the UNGEGN Glossary of Terms for the Standardization of Geographical Names doesn't recognize spelling as a separately defined or described term.
Text (character content)The current references for character content are:
- 35.040.10, ISO, Coding of character sets
- ISO/IEC 10646:2020, Information technology — Universal coded character set (UCS)
- ISO 8859 family (8-bit character encoding)
- Unicode Standard, latest version (now 15.0)
- Relation between ISO/IEC 10646 and Unicode (according to the Unicode Consortium)
- Letter database, Eesti Keele Instituut, Characters (“non-English”) needed to write a certain language in the Latin script
-
Also sets requirements for the realized character content of fonts to be used
The current references for scripts are:
- ISO 15924:2022, Information and documentation — Codes for the representation of names of scripts
- Codes for the conversion of names of scripts, the same ISO codes provided by Unicode
The current references for transliteration schemes are:
- 01.140.10, ISO, Writing and transliteration
- ISO TC46 /WG3 Conversion of Written Languages
- BGN/PCGN Romanization systems
- UNGEGN WG on Romanization Systems
- Library of Congress Romanization Tables
Possible references to, for example, an official or approved ortography for a certain language, could be made.
Any linguistic attribute(s)Possible / appropriate attributes may depend on the application or may be irrelevant.
Any metadata attribute(s)Possible / appropriate attributes may depend on the application or may be irrelevant.
2.2.7 Pronunciation of namePronunciation standards are not available for the time being. No notable references can be made.
Sound linkSound / Audio files are to be considered.
IPA notationIPA is the best (only) way of systematically recording pronunciation:
Any linguistic attribute(s)Further linguistic attributes are not considered for the time being. No notable references can be made.
Any metadata attribute(s)Further metadata attributes on pronunciation, e.g., pronunciation specific metadata like automatized / human voice, or the native language or dialect of the human pronuncer, are not considered for the time being.
2.2.8 Other expression of nameAny other expression of a name, e.g., signs in sign languages, Morse code, maritime signal flags etc. are not considered for the time being.