CJK Unified Ideographs Extension I

CJK Unified Ideographs Extension I is a Unicode block comprising CJK Unified Ideographs included in drafts of an amendment to China's GB 18030 standard circulated in 2022 and 2023, which were fast-tracked into Unicode in 2023.

CJK Unified Ideographs Extension I
RangeU+2EBF0..U+2EE5F
(624 code points)
PlaneSIP
ScriptsHan
Assigned622 code points
Unused2 reserved code points
Unicode version history
15.1 (2023)622 (+622)
Unicode documentation
Code chart ∣ Web page
Note: [1][2]

Background

Unlike most other sets of CJK unified ideographs, Extension I was not prepared and submitted by the Ideographic Research Group (IRG).

GB 18030 is a mandatory national standard of the People's Republic of China (PRC). It defines a Unicode Transformation Format which retains compatibility with existing data in the earlier GBK and EUC-CN character encodings, and specifies particular Unicode characters which devices sold in China must support.[3] Its 2022 edition, GB 18030-2022, changed a number of required characters to map to standard Unicode code points, rather than to private use area code points.

In late 2022, the PRC made a draft of a further amendment to be made to GB 18030 available for public consultation. This draft would have placed 897 new sinographic characters in Plane 10 (hexadecimal: 0A), a yet-untitled astral Unicode plane.[4] This was motivated by a "strong need of citizen real-name certification in China".[5] Since it would impact ISO/IEC 10646 (the Universal Coded Character Set, the ISO standard synchronised with Unicode), the draft was circulated in ISO/IEC JTC 1/SC 2, the ISO subcommittee responsible for ISO 10646. The Chinese national body maintained that "ISO/IEC 10646 do not specify the purpose of the 0A plane", which ISO 10646 denotes as "reserved for future standardization", and that this use was therefore "not inappropriate".[4]

However, since the intent of ISO 10646 was for Plane 10 to be reserved for future allocation by ISO 10646 and Unicode via their usual ballot process, not for it to be allocated unilaterally by national standards bodies, this proposed move was criticised by experts and other national bodies as one which would "destabilize the synchronization" between GB 18030 and ISO/IEC 10646 (and thus Unicode), and which would make it impossible to conform to both with a single implementation,[4] effectively forking Unicode.

As an alternative, the repertoire (eventually reduced to 622 characters after expert review) was fast-tracked into Unicode version 15.1 in the CJK Unified Ideographs Extension I block.[4] The CJK Unified Ideographs Extension D block was cited as a precedent, since it comprised a repertoire of urgently needed characters (UNCs) from IRG member bodies, whereas the IRG working-set initially slated to become Extension D would instead become Extension E.[6] For compactness, the block was allocated to the available space in the Supplementary Ideographic Plane after CJK Unified Ideographs Extension F, as opposed to on the Tertiary Ideographic Plane after CJK Unified Ideographs Extension H; this means that the CJK extension blocks are no longer in alphabetical order by extension letter.[7] Following this, the draft GB 18030 amendment was modified to use the Extension I code points.[5]

The Extension I characters make up the "GIDC23" Unihan source,[8] defined as sourced from the "ID system of the Ministry of Public Security of China, 2023".[9]

Block

CJK Unified Ideographs Extension I[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+2EBFx 𮯰𮯱𮯲𮯳𮯴𮯵𮯶𮯷 𮯸𮯹𮯺𮯻𮯼𮯽𮯾𮯿
U+2EC0x 𮰀𮰁𮰂𮰃𮰄𮰅𮰆𮰇 𮰈𮰉𮰊𮰋𮰌𮰍𮰎𮰏
U+2EC1x 𮰐𮰑𮰒𮰓𮰔𮰕𮰖𮰗 𮰘𮰙𮰚𮰛𮰜𮰝𮰞𮰟
U+2EC2x 𮰠𮰡𮰢𮰣𮰤𮰥𮰦𮰧 𮰨𮰩𮰪𮰫𮰬𮰭𮰮𮰯
U+2EC3x 𮰰𮰱𮰲𮰳𮰴𮰵𮰶𮰷 𮰸𮰹𮰺𮰻𮰼𮰽𮰾𮰿
U+2EC4x 𮱀𮱁𮱂𮱃𮱄𮱅𮱆𮱇 𮱈𮱉𮱊𮱋𮱌𮱍𮱎𮱏
U+2EC5x 𮱐𮱑𮱒𮱓𮱔𮱕𮱖𮱗 𮱘𮱙𮱚𮱛𮱜𮱝𮱞𮱟
U+2EC6x 𮱠𮱡𮱢𮱣𮱤𮱥𮱦𮱧 𮱨𮱩𮱪𮱫𮱬𮱭𮱮𮱯
U+2EC7x 𮱰𮱱𮱲𮱳𮱴𮱵𮱶𮱷 𮱸𮱹𮱺𮱻𮱼𮱽𮱾𮱿
U+2EC8x 𮲀𮲁𮲂𮲃𮲄𮲅𮲆𮲇 𮲈𮲉𮲊𮲋𮲌𮲍𮲎𮲏
U+2EC9x 𮲐𮲑𮲒𮲓𮲔𮲕𮲖𮲗 𮲘𮲙𮲚𮲛𮲜𮲝𮲞𮲟
U+2ECAx 𮲠𮲡𮲢𮲣𮲤𮲥𮲦𮲧 𮲨𮲩𮲪𮲫𮲬𮲭𮲮𮲯
U+2ECBx 𮲰𮲱𮲲𮲳𮲴𮲵𮲶𮲷 𮲸𮲹𮲺𮲻𮲼𮲽𮲾𮲿
U+2ECCx 𮳀𮳁𮳂𮳃𮳄𮳅𮳆𮳇 𮳈𮳉𮳊𮳋𮳌𮳍𮳎𮳏
U+2ECDx 𮳐𮳑𮳒𮳓𮳔𮳕𮳖𮳗 𮳘𮳙𮳚𮳛𮳜𮳝𮳞𮳟
U+2ECEx 𮳠𮳡𮳢𮳣𮳤𮳥𮳦𮳧 𮳨𮳩𮳪𮳫𮳬𮳭𮳮𮳯
U+2ECFx 𮳰𮳱𮳲𮳳𮳴𮳵𮳶𮳷 𮳸𮳹𮳺𮳻𮳼𮳽𮳾𮳿
U+2ED0x 𮴀𮴁𮴂𮴃𮴄𮴅𮴆𮴇 𮴈𮴉𮴊𮴋𮴌𮴍𮴎𮴏
U+2ED1x 𮴐𮴑𮴒𮴓𮴔𮴕𮴖𮴗 𮴘𮴙𮴚𮴛𮴜𮴝𮴞𮴟
U+2ED2x 𮴠𮴡𮴢𮴣𮴤𮴥𮴦𮴧 𮴨𮴩𮴪𮴫𮴬𮴭𮴮𮴯
U+2ED3x 𮴰𮴱𮴲𮴳𮴴𮴵𮴶𮴷 𮴸𮴹𮴺𮴻𮴼𮴽𮴾𮴿
U+2ED4x 𮵀𮵁𮵂𮵃𮵄𮵅𮵆𮵇 𮵈𮵉𮵊𮵋𮵌𮵍𮵎𮵏
U+2ED5x 𮵐𮵑𮵒𮵓𮵔𮵕𮵖𮵗 𮵘𮵙𮵚𮵛𮵜𮵝𮵞𮵟
U+2ED6x 𮵠𮵡𮵢𮵣𮵤𮵥𮵦𮵧 𮵨𮵩𮵪𮵫𮵬𮵭𮵮𮵯
U+2ED7x 𮵰𮵱𮵲𮵳𮵴𮵵𮵶𮵷 𮵸𮵹𮵺𮵻𮵼𮵽𮵾𮵿
U+2ED8x 𮶀𮶁𮶂𮶃𮶄𮶅𮶆𮶇 𮶈𮶉𮶊𮶋𮶌𮶍𮶎𮶏
U+2ED9x 𮶐𮶑𮶒𮶓𮶔𮶕𮶖𮶗 𮶘𮶙𮶚𮶛𮶜𮶝𮶞𮶟
U+2EDAx 𮶠𮶡𮶢𮶣𮶤𮶥𮶦𮶧 𮶨𮶩𮶪𮶫𮶬𮶭𮶮𮶯
U+2EDBx 𮶰𮶱𮶲𮶳𮶴𮶵𮶶𮶷 𮶸𮶹𮶺𮶻𮶼𮶽𮶾𮶿
U+2EDCx 𮷀𮷁𮷂𮷃𮷄𮷅𮷆𮷇 𮷈𮷉𮷊𮷋𮷌𮷍𮷎𮷏
U+2EDDx 𮷐𮷑𮷒𮷓𮷔𮷕𮷖𮷗 𮷘𮷙𮷚𮷛𮷜𮷝𮷞𮷟
U+2EDEx 𮷠𮷡𮷢𮷣𮷤𮷥𮷦𮷧 𮷨𮷩𮷪𮷫𮷬𮷭𮷮𮷯
U+2EDFx 𮷰𮷱𮷲𮷳𮷴𮷵𮷶𮷷 𮷸𮷹𮷺𮷻𮷼𮷽𮷾𮷿
U+2EE0x 𮸀𮸁𮸂𮸃𮸄𮸅𮸆𮸇 𮸈𮸉𮸊𮸋𮸌𮸍𮸎𮸏
U+2EE1x 𮸐𮸑𮸒𮸓𮸔𮸕𮸖𮸗 𮸘𮸙𮸚𮸛𮸜𮸝𮸞𮸟
U+2EE2x 𮸠𮸡𮸢𮸣𮸤𮸥𮸦𮸧 𮸨𮸩𮸪𮸫𮸬𮸭𮸮𮸯
U+2EE3x 𮸰𮸱𮸲𮸳𮸴𮸵𮸶𮸷 𮸸𮸹𮸺𮸻𮸼𮸽𮸾𮸿
U+2EE4x 𮹀𮹁𮹂𮹃𮹄𮹅𮹆𮹇 𮹈𮹉𮹊𮹋𮹌𮹍𮹎𮹏
U+2EE5x 𮹐𮹑𮹒𮹓𮹔𮹕𮹖𮹗 𮹘𮹙𮹚𮹛𮹜𮹝
Notes
1.^ As of Unicode version 15.1
2.^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Unified Ideographs Extension I block:

VersionFinal code points[lower-alpha 1]CountL2 IDWG2 IDIRG IDDocument
15.1U+2EBF0..2EE5D622L2/23-011Lunde, Ken (2023-01-11), "18) GB 18030-2022 Amendment", CJK & Unihan Group Recommendations for UTC #174 Meeting
L2/23-057N5201N2591Draft GB 18030-2022 Amendment Feedback & Recommendations, 2023-02-03
L2/23-100GB 18030-2022 Amendment, Draft 2 + Disposition of Comments, Draft 1, 2023-04-10
L2/23-082Lunde, Ken (2023-04-22), "02 and 03", CJK & Unihan Group Recommendations for UTC #175 Meeting
L2/23-106N5214Lunde, Ken (2023-04-24), "The Alternate Proposal—Unicode Version 15.1", Proposal to provisionally assign or accept 603 urgently-needed ideographs
L2/23-076Constable, Peter (2023-05-01), "E.4.2 Proposal to provisionally assign or accept 603 urgently-needed ideographs", UTC #175 Minutes
L2/23-114RN5214R2Lunde, Ken (2023-07-05), Proposal to encode 622 urgently needed ideographs in UCS
L2/23-115Constable, Peter (2023-05-01), USNB Comments on Draft 2 of GB 18030-2020 Amendment 1 and recommendation for ISO/IEC 10646:2022 Amendment 2
L2/23-154N5238Revision of 622 UNCs of China (Feedback on WG2 N5214), 2023-06-30
L2/23-163Lunde, Ken (2023-07-11), "01", CJK & Unihan Group Recommendations for UTC #176 Meeting
L2/23-157Constable, Peter (2023-07-31), "E.1 Section 1) CJK Unified Ideographs Extension I", UTC #176 Minutes
  1. Proposed code points and characters names may differ from final code points and names

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-09-12.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-09-12.
  3. Kaplan, Michael S (2013-03-28). "You call it GB18030, I call it UTF-GBK..." Sorting it all out.
  4. United States National Body (May 1, 2023). "USNB Comments on Draft 2 of GB 18030-2022 Amendment 1 and recommendation for ISO/IEC 10646:2020 Amendment 2" (PDF). ISO/IEC JTC1/SC2 N4852, WG2 N5222; UTC L2/23-115.
  5. China National Body (2023-10-13). "IRG #61 Activity Report" (PDF). ISO/IEC JTC1/SC2/WG2/IRG N2623; UTC L2/23-240.
  6. Lunde, Ken (2023-04-22). "03) L2/23-100: GB 18030-2022 Amendment, Draft 2 + Disposition of Comments, Draft 1" (PDF). CJK & Unihan Group Recommendations for UTC #175 Meeting. UTC L2/23-082.
  7. "CJK/Unihan Changes". Unicode 15.1.0. Unicode Consortium. 2023-09-12. To keep the CJK block ranges as compact as possible, Extension I has been added to Plane 2, instead of directly after Extension H on Plane 3. Implementers should also check that their code does not assume that CJK extensions all occur in alphabetic order by the extension letter.
  8. "CJK Unified Ideographs Extension I" (PDF). The Unicode Standard, Version 15.1. Unicode Consortium. 2023.
  9. Lunde, Ken; Cook, Richard, eds. (2023-09-01). "kIRG_GSource". Unicode Han Database (Unihan). Unicode 15.1.0. UAX #38.

Further reading

  • Lunde, Ken (2023-07-15). "The First Amendment". This article details how the CJK Unified Ideographs Extension I block became standardized, and its relationship with two drafts of the GB 18030-2022 amendment.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.