UNSD — United Nations Group of Experts on Geographical Names

United Nations Group of Experts on Geographical Names

Data Modelling & Standards Chapter 2

CHAPTER 2: Some specific issues related to geographical names data modelling

The special characteristics of place names data, compared to other geospatial data, are usually related to data modelling needs. The following sections provide some tentative principles for a general conceptual model for place names information as well as the core elements of the model and the interrelationships and key attributes of the elements. The model presented here, in its broadest application, is intended for modelling place name information in a multilingual, multi-names, and multi-scriptural environment.

2.2 Place names data modelling related standards, manuals, or guidelines

2.2.1 Modelling

Some current ISO/OGC related references for geospatial data modelling are:

ISO 19101-1:2014, Geographic information -Reference model -Part 1: Fundamentals
ISO/TS 19103:2015, Geographic information — Conceptual schema language
ISO 19107:2019, Geographic information -Spatial schema
ISO 19137:2007, Geographic information -Core profile of the spatial schema
ISO 19109:2015, Geographic information -Rules for application schema
ISO 19131:2007, Geographic information -Data product specifications
ISO 19104:2016, Geographic information -Terminology
ISO 19112:2019, Geographic information -Spatial referencing by geographic identifiers

Other, geographical names and product specific references:

INSPIRE Data Specification on Geographical Names (European Commission)
Open Regional Gazetteer (EuroGeographics)

2.2.2 Permanent unique identifiers

The data instance identifiers must be permanent and unique within the data set. Together with other identifiers (such as data set identifier), for example a permanent URI can be created, e.g., for 'Linked Data' purposes.

A current ISO/OGC related reference for the generation of universally unique identifiers is:

ITU-T X.667

ISO/IEC 9834-8:2014,

Generation of universally unique identifiers and their use in object identifiers

2.2.3 Metadata

The selection of appropriate metadata elements to depends on the application. Some current ISO/OGC related and other references for metadata for geospatial and geographical names data are:

ISO 19115-1:2014, Geographic information — Metadata — Part 1: Fundamentals
INSPIRE Metadata

2.2.4 Named place

Geometry

The “geometries” of named places/features are often vague. Different practices and rationales exist in different countries, data sets and applications. The previously mentioned ISO 19107:2019 introduces geometric primitives point, curve, surface, and geometric complex (combination). File and tata exchange formats (e.g., GML, GeoJSON) specify the notations to be used, for example Point, LineString, Polygon. Every named place must have a reference point as its fundamental geometry.

Some current references are:

ISO 19111:2019, Geographic information — Referencing by coordinates
EPSG, Geodetic Parameter Dataset

Feature type

There are all kinds of national feature type catalogues in use. An appropriate feature type classification depends on the application. In the following the overall ISO methodology is provided as well as examples for a general 1st level and 2nd level lassification with clear definitions, which could work as a global standard too:

ISO 19110:2016, Geographic information — Methodology for feature cataloguing
Examples for regional feature type catalogues
- EuroGeographics Regional Gazetteer - Documentation: https://ome-download-data.s3.eu-west-1.amazonaws.com/open-gazetteer/documents/2022-10-14_OpenRegionalGazetteer_specification.pdf

Any geographic attribute(s)

Appropriate additional geographic attributes depend on the dataset and application. Some current references for geographic attributes for geospatial data (e.g., country, administrative area, (global) grid reference, elevation, another feature type) are:

ISO 3166, Country codes
- ISO 3166-1:2020, Codes for the representation of names of countries and their subdivisions — Part 1: Country code
- ISO 3166-2:2020, Codes for the representation of names of countries and their subdivisions — Part 2: Country subdivision code
- ISO 3166-3:2020, Codes for the representation of names of countries and their subdivisions — Part 3: Code for formerly used names of countries

M49 Standard: United Nations Statistics Division, Standard country or area codes for statistical use (M49)
SALB, UN Second Administrative Level Boundaries
ISO 19170-1:2021, Geographic information — Discrete Global Grid Systems Specifications — Part 1: Core Reference System and Operations, and Equal Area Earth Reference System

Any metadata attribute(s)

Metadata for the entire dataset or product or delivery may be sufficient, depending on the dataset and application. Feature specific metadata might be introduced if appropriate, e.g., source of the geometry, data source, life span information.

2.2.5 Place name / Geographical name

UNGEGN manuals and guidelines on the standardization of geographical names are found here: https://unstats.un.org/unsd/ungegn/pubs/.

UNGEGN acknowledges that UN-GGIM seeks data specifications that follow agreed standards which are interoperable between UN-GGIM's fundamental data themes. Beyond that, UNGEGN would also like to impress an important aspect of geographical names; that is the intangible cultural heritage elements that go hand in hand with the physical characteristics relating to location identification for administration, planning, navigation, emergency response, science, resilience, etc. The sense of place, identity (both individual and collective), nation building, commemoration, language and story that go with each geographical name offer insights into much more than the 'data structure' within which this information sits. The treasure that this information reflects can be difficult to quantify, but it can elevate peoples' status and connection to the land they are part of - their place to stand and what they seek to preserve and sustain. Cultural heritage data is not owned by those who capture it, but by those people who named those places. Therefore, an element of respect and sensitivity needs to be attributed to that cultural heritage to ensure its accuracy and authenticity from the people of the place. In making this information discoverable consideration should be given to ensure the safety of sensitive cultural heritage data, ie. ensuring that the people of the place are comfortable with the level of cultural heritage data provided about their place names. In doing this, a shared and positive outcome between geographical naming authorities is that equal attention to cultural heritage translates to acceptance, celebration and longevity of place names within communities.

According to the conceptual model, each named place is associated with one or several geographical names. The different geographical names of one given spatial object may be, for example, parallel names in one or different languages, or names in different forms (e.g., complete, and short forms of country and administrative unit names).

Language

The current references for language codes to be used for geospatial data are:

ISO 639, Language codes
- ISO 639-1:2002, Codes for the representation of names of languages — Part 1: Alpha-2 code
- ISO 639-2:1998, Codes for the representation of names of languages — Part 2: Alpha-3 code
- ISO 639-3:2007, Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive coverage of languages
- ISO 639-4:2010, Codes for the representation of names of languages — Part 4: General principles of coding of the representation of names of languages and related entities, and application guidelines
- ISO 639-5:2008, Codes for the representation of names of languages — Part 5: Alpha-3 code for language families and groups

SIL International, ISO 639 Code Tables , all ISO 639 parts in a single table

During the development of the EU INSPIRE Data Specification on Geographical Names - Technical Guidelines, the different versions of ISO 639 standards were evaluated. The conclusion of the evaluation was: “Language is a major aspect of geographical names, and the choice of most appropriate codes received much attention during the preparation of this specification. The only solution enabling to code languages with sufficient details, but also enabling to code languages family as existing in some actual data sets, appeared to be a combination of the non-conflicting codes of ISO 639-3 and ISO 639-5.”

Nativeness (endonym or exonym)

The simplest way of dealing with 'nativeness' is the differentiation of a geographical name in two options: as 'endonym' or 'exonym'. There are endonyms (names given by native / local people) and other names (exonyms), not given by native / local people. A third option besides endonym and exonym may be considered as some types of toponyms are discussed (e.g., names in Antarctica, undersea features...). The current definitions for endonym and exonym agreed by UNGEGN are:

Endonym:

Name of a →geographical feature in an official or well-established language occurring in that area where the feature is situated. Examples: Vārānasī (not Benares); Aachen (not Aix-la-Chapelle); Krung Thep (not Bangkok); Al-Uqşur (not Luxor).

Exonym:

Name used in a specific language for a →geographical feature situated outside the area where that language is widely spoken, and differing in its form from the respective →endonym(s) in the area where the geographical feature is situated. Examples: Warsaw is the English exonym for Warszawa (Polish); Mailand is German for Milano; Londres is French for London; Kūlūniyā is Arabic for Köln. The officially romanized endonym Moskva for Mocквa is not an exonym, nor is the Pinyin form Beijing, while Peking is an exonym. The United Nations recommends minimizing the use of exonyms in international usage. See also →name, traditional.

Status of name

An appropriate list of status values depends on the data set, and the scope and application of the data set. For example, the four types of the European INSPIRE Specification (official, standardized, historical, other) may not be appropriate or sufficient for other purposes and applications. It would be useful to learn about different practices and their rationale in different countries, data sets and applications.

Any linguistic attribute(s)

Possible or appropriate attributes, such as etymology, may depend on the application. For example, the INSPIRE data specification recognizes the linguistic gender and linguistic number as attributes.

Any metadata attribute(s)

According to the dataset and application, name specific metadata, e.g., source of name, life span information, can be considered, i.e., attributes that can have different values by object/feature.

2.2.6 Spelling of name

Each geographical name may have one or several spellings, i.e., proper ways of writing it, in one or several scripts, like the Latin/Roman, Greek and Cyrillic scripts. All original and correct spellings shall be retained, for example, no omission or transformations of diacritical characters should be allowed.

An example:

The city of Athens is the named place
The endonym “Athína” (Greek language) and the exonym “Athens” (English language) are two different geographical names of this unique named place
“Aθnνa" (Greek script) and its standard romanization "Athína" (Latin script/Romanized form) are two different spellings of the same geographical name “Athína”

At present, the UNGEGN Glossary of Terms for the Standardization of Geographical Names doesn't recognize spelling as a separately defined or described term.

Text (character content)

The current references for character content are:

35.040.10, ISO, Coding of character sets
- ISO/IEC 10646:2020, Information technology — Universal coded character set (UCS)
- ISO 8859 family (8-bit character encoding)

Unicode Standard, latest version (now 15.0)
- Relation between ISO/IEC 10646 and Unicode (according to the Unicode Consortium)

Letter database, Eesti Keele Instituut, Characters (“non-English”) needed to write a certain language in the Latin script

Script

The current references for scripts are:

ISO 15924:2022, Information and documentation — Codes for the representation of names of scripts
Codes for the conversion of names of scripts, the same ISO codes provided by Unicode

Transliteration scheme

The current references for transliteration schemes are:

01.140.10, ISO, Writing and transliteration
ISO TC46 /WG3 Conversion of Written Languages
BGN/PCGN Romanization systems
UNGEGN WG on Romanization Systems
Library of Congress Romanization Tables

Status of spelling

Possible references to, for example, an official or approved ortography for a certain language, could be made.

Any linguistic attribute(s)

Possible / appropriate attributes may depend on the application or may be irrelevant.

Any metadata attribute(s)

Possible / appropriate attributes may depend on the application or may be irrelevant.

2.2.7 Pronunciation of name

Pronunciation standards are not available for the time being. No notable references can be made.

Sound link

Sound / Audio files are to be considered.

IPA notation

IPA is the best (only) way of systematically recording pronunciation:

Handbook of the IPA

Any linguistic attribute(s)

Further linguistic attributes are not considered for the time being. No notable references can be made.

Any metadata attribute(s)

Further metadata attributes on pronunciation, e.g., pronunciation specific metadata like automatized / human voice, or the native language or dialect of the human pronuncer, are not considered for the time being.

2.2.8 Other expression of name

Any other expression of a name, e.g., signs in sign languages, Morse code, maritime signal flags etc. are not considered for the time being.

UNGEGN

UNGEGN Strategic Plan and Programme of Work 2021-2029
Access the approved document here

To receive the UNGEGN bulletin, click to complete the form

Tweets by @UNSD_GEGN

United Nations Group of Experts on Geographical Names

Data Modelling & Standards Chapter 2

CHAPTER 2: Some specific issues related to geographical names data modelling

UNGEGN

About

UNSD Work Programme

Contact Us

Stay Connected