Unicode version 9.0 is scheduled for release in June 2016. The final repertoire is not yet fixed, but currently 7,227 characters are scheduled for addition to Unicode 9.0, which will bring the total number of graphic and format characters in the Unicode Standard to 127,899 characters (in case you are concerned that Unicode is running out of space, that still leaves room for another 846,566 characters to be encoded). In summary, Unicode 9.0 will include at least 9 new blocks (named ranges of characters) and cover at least 4 new scripts (Osage, Bhaiksuki, Marchen and Tangut), making a total of 268 blocks and 133 scripts. I will update this post with new figures when the final repertoire for Unicode 9.0 is fixed.
No new emoji or emoticons have yet been accepted for inclusion in Unicode 9.0. However, 74 Emoji characters have been accepted for encoding, and are candidates for inclusion in Unicode 9.0.
|Provisional Character Name||Source|
Encoded to match U+1F483 💃 DANCER (typically implemented as a female dancer)
unequivocally represented as black in all variants
Encoded because there is a need for a black-coloured heart emoji, and U+2764 ❤ HEAVY BLACK HEART is typically implemented as a red heart
|U+1F919||CALL ME HAND||L2/15-054|
|U+1F91A||RAISED BACK OF HAND||L2/15-054|
|U+1F91E||HAND WITH INDEX AND MIDDLE FINGERS CROSSED||L2/15-054|
|U+1F920||FACE WITH COWBOY HAT||L2/15-054|
|U+1F923||ROLLING ON THE FLOOR LAUGHING||L2/15-054|
typically used with face or human figure
Encoded to match U+1F478 👸 PRINCESS
|U+1F935||MAN IN TUXEDO
Encoded to match U+1F470 👰 BRIDE WITH VEIL
Encoded to match U+1F385 🎅 FATHER CHRISTMAS
|U+1F938||PERSON DOING CARTWHEEL
|U+1F941||DRUM WITH DRUMSTICKS||L2/15-195|
typically shown with iced drink
marksmanship, shooting (Olympic sport)
|U+1F947||FIRST PLACE MEDAL
|U+1F948||SECOND PLACE MEDAL
|U+1F949||THIRD PLACE MEDAL
|U+1F94B||MARTIAL ARTS UNIFORM
judo and other martial arts
|U+1F958||SHALLOW PAN OF FOOD
döner kebab, falafel, gyro, shawarma
|U+1F95B||GLASS OF MILK||L2/15-267|
NB The above code points and character names are subject to change, and should not be relied on at this point in time.
- L2/15-054 Emoji Subcommittee, Emoji Additions: Animals, Compatibility, and More Popular Requests (2015-05-21)
- L2/15-195 Emoji Subcommittee, Emoji Additions Tranche 6: More Popular Requests and Gap Filling (2015-07-28)
- L2/15-196 Hiroyuki Komatsu, Proposal to add more sports-related emoji characters (2015-07-31)
- L2/15-267 Hiroyuki Komatsu, Proposal to add more food emoji characters (2015-11-05)
These characters are currently under ISO ballot for inclusion in ISO/IEC 10646:2016 (5th ed.) (see WG2 N4705 pages 130, 131, 135, and 137–138). Most of the 8,514 characters in this document will feed into Unicode version 10.0 in June 2017, but if national bodies have no objection to the emoji characters in the current CD ballot which closes 29 February 2016, then it is very likely that the Unicode Techinal Committee (UTC) will fast-track most or all of these 74 emoji into Unicode 9.0. None of these characters are particularly contentious or controversial, and I do not see any reason why they should not all be included in Unicode 9.0, but the final decision will not be made until the UTC meets in early May 2016.
It is possible that the final number of emoji added to Unicode 9.0 will be more than these seventy-four, and could include a Dumpling Emoji, and maybe even some emojis for professional women, but we will have to wait and see what the UTC decides.
It's Not All About Emoji !
[If you've got this far, and think that the proposed Shark emoji is a metaphor for Unicode in 2016 then you may want to read my last year's rant on Emoji and Unicode, entitled an Optional Discourse on Emoji.]
The 7,227 non-emoji additions to Unicode 9.0 are all included in ISO/IEC 10646:2014 (4th ed.) Amendment 2, and are highlighted in this document (along with one currency sign, nine CJK unified ideographs, 36 emoji characters, and 5 emoji modifier characters which have already been included in Unicode 8.0). 109 of these characters belong to existing scripts, but 7,118 characters belong to four new scripts :
- Osage [Osge] : An alphabet for the Osage language which was devised between 2004 and 2006 for use by the Osage Nation in the USA.
- Bhaiksuki [Bhks] : A Brahmic script used for writing Buddhist manuscripts and inscriptions in the region of northern India and Tibet during the 11th and 12th centuries.
- Marchen [Marc] : A Brahmic script used for writing the extinct Zhang Zhung language in the Bon religious tradition in Tibet.
- Tangut [Tang] : An ideographic script used by the Tangut people to write the extinct Tangut language in the Western Xia and in China (Yuan and Ming dynasties) during the 11th through 16th centuries.
Inscription in the Marchen script on the library of the Yungdrung Bon Monastery in Dolanji (Himachal Pradesh)
Photograph © Chris Hatchell
Of the 7,227 non-emoji additions, 7,178 characters are included in 9 new blocks, and 49 characters are added to existing blocks, as detailed in the two tables below. The code points and character names for all these characters are now fixed, and will not be changed. Draft official Unicode data files are available here, and I have made a plain text list of all the new characters to be added to Unicode 9.0 available here.
Postscript [2016-01-05]. To clarify, the 7,227 characters listed below will definitely be included in Unicode 9.0, and some or all of the 74 Emoji characters listed above will almost certainly also end up in Unicode 9.0, but it is also possible that some other characters in the Committe Draft for ISO/IEC 10646:2016 (5th ed.) (full draft is downloadable as N4446) will also be fast-tracked into Unicode 9.0. These may include 19 Japanese TV symbols required for ARIB STD-B62. It is not impossible that the UTC may even decide to fast-track the Adlam and/or Newa scripts into Unicode 9.0, even though there is no precedent for fast-tracking a script into a version of Unicode whilst it is still under ISO technical ballot, and such a move could potentially destabilize the relationship between the Unicode and ISO/IEC 10646 standards. At the present time I do not know for sure what the UTC will decide, but I will update this post when I have more information.
|Block Name||Range||Characters / Source Documents|
9 letters used in early Church Slavonic (1C80..1C88).
Aleksandr Andreev, Yuri Shardt, and Nikita Simmons, "Proposal to Use Standardized Variation Sequences to Encode Church Slavonic Glyph Variants in Unicode" (2014-07-20) [L2/13-153]
72 letters for Osage: 36 uppercase letters (104B0..104D3) and 36 lowercase letters (104D8..104FB).
13 head marks for Mongolian (11660..1166C).
Aaron Bell, Greg Eck, Andrew Glass, and Andrew West, "Encoding Mongolian head letters" (2014-01-17) [L2/14-030]
97 characters for Bhaiksuki: 46 letters (11C00..11C08, 11C0A..11C2E), 12 vowel signs (11C2F..11C36, 11C38..11C3B), 5 other signs (11C3C..11C40), 2 dandas (11C41..11C42), a word separator (11C43), 2 gap fillers (11C44..11C45), 10 decimal digits (11C50..11C59), 18 numbers (11C5A..11C6B), and a hundreds unit mark (11C6C).
Anshuman Pandey and Dragomir Dimitrov, "Revised Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2014-01-27) [L2/14-036]
68 characters for Marchen: 2 marks (11C70..11C71), 30 letters (11C72..11C8F), 29 subjoined letters (11C92..11CA7, 11CA9..11CAF), 5 vowel signs (11CB0..11CB4), and 2 other signs (11CB5..11CB6).
|Ideographic Symbols and Punctuation||16FE0..16FFF||
1 iteration mark for Tangut (16FE0).
See under Tangut.
6,125 Tangut ideographs (17000..187EC) [characters are named algorithmically based on their code point, as TANGUT IDEOGRAPH-hhhhh].
Richard Cook (UC Berkeley Script Encoding Initiative), "Proposal to encode Tangut characters in UCS Plane 1" (2007-05-09) [WG2 N3297 || L2/07-143] [Multi Column Chart : WG2 N3297A || L2/07-144] [Single Column Chart: WG2 N3297B || L2/07-145]
Richard Cook (UC Berkeley Script Encoding Initiative), "Tangut Proposal Code Chart Update" (2007-07-24) [L2/07-229]
Richard Cook, "Single-Column Tangut Code Chart (using Column G font)" (2008-09-03) [L2/08-336]
Michael Everson, Nathan Hill, Guillaume Jacques, Andrew West, Viacheslav Zaytsev, "Proposal for a revised Tangut character set for encoding in the SMP of the UCS" (2009-03-01) [WG2 N3577 || L2/09-095]
Michael Everson, Nathan Hill, Guillaume Jacques, Andrew West, Viacheslav Zaytsev, "Proposal for a revised Tangut character set for encoding in the SMP of the UCS" (2009-04-08) [WG2 N3577R || L2/09-115] [Appendix A: WG2 N3577R-A || L2/09-116] [Appendix B: WG2 N3577R-B || L2/09-117]
Deborah Anderson and Richard Cook, "Request for Tangut font and mappings from N3577 to Amendment 7 repertoire" (2009-03-04) [WG2 N3586]
Richard Cook and Deborah Anderson, Script Encoding Initiative, UC Berkeley, "Comments on Tangut report N4033" (2011‐06‐01) [WG2 N4094]
China, "Comments on N4325, 4326 and N4327 (Tangut)" (2012-10-20) [WG2 N4370]
China, "Explanation on the Re-facture of Tangut Fonts" (2013-06-10) [WG2 N4455]
China, "Review of N4558R Tangut glyph corrections" (2014-09-29) [WG2 N4640]
755 Tangut radicals and character components (18800..18AF2).
Richard Cook and Deborah Anderson, "Comments on the Tangut radicals and strokes proposal (N3495 = L2/08‐335)" (2008-10-29) [L2/08-399]
38 combining letters for Glagolitic (1E000..1E006, 1E008..1E018, 1E01B..1E021, 1E023..1E024, 1E026..1E02A).
Leaf from a Tangut Buddhist manuscript (Great Perfection of Wisdom Sutra)
|Block Name||Range||Characters / Source Documents|
5 Arabic letters for Bravanese (08B6..08BA).
Hamid Banafunzi, Marghani Banafunzi, and Maxamed Nuur, "Proposal to encode five Arabic script characters for the Bravanese (Chimiini)" (2014-08-31) [L2/13-178]
Roozbeh Pournader and Shervin Afshar, "Proposal to Encode Arabic Letter Teh with Small Teh Above for Bravanese" (2014-11-01) [L2/13-293]
3 Arabic letters for Warsh-based orthographies (08BB..08BD).
Lorna Evans (SIL International), "Supporting the Warsh orthography for Arabic script" (2014-04-29) [L2/14-104]
15 Quranic marks used in Pakistani printing (08D4..08E2).
Lateef Sagar Shaikh, "Proposal to encode Quranic marks used in Quran published in Pakistan" (2014-04-24) [L2/14-095]
Lateef Sagar Shaikh, "Proposal to encode Quranic Alternate Dammatan used in Quran published in Pakistan" (2014-04-25) [L2/14-096]
1 spacing candrabindu sign (0C80).
3 chillu letters (0D54..0D56).
1 para sign (0D4F).
10 characters for fractions (0D58..0D5E, 0D76..0D78).
4 power button symbols (23FB..23FE).
1 punctuation mark for Slavonic (2E43: DASH WITH LEFT UPTURN).
1 suspension mark for Byzantine Greek (2E44: DOUBLE SUSPENSION MARK).
1 letter for Unifon (A7AE: LATIN CAPITAL LETTER SMALL CAPITAL I).
1 candrabindu sign (A8C5).
|Ancient Greek Numbers||10140..1018F||
2 signs for ancient Greek (1018D..1018E).
1 sukun sign for Arabic transliteration in the Khojki script (1123E).
The author in front of a Tangut Buddhist inscription on the Cloud Platform at Juyong Pass
Previous Posts on Unicode Versions
- What's new in Unicode 5.0 ? [November 2005]
- What's new in Unicode 5.1 ? [June 2007]
- What's new in Unicode 5.2 ? [April 2008]
- What's new in Unicode 6.0 ? [November 2009]
- What's new in Unicode 6.1 ? [June 2011]
- What's new in Unicode 6.2 ? [May 2012]
- What's new in Unicode 6.3 ? [January 2013]
- What's new in Unicode 7.0 ? [October 2013]
- What's new in Unicode 8.0 ? [April 2015]