Unicode version 9.0 is scheduled for release in June 2016. The final repertoire is now fixed, and 7,500 characters (including 72 emoji) will be added to Unicode 9.0. This will bring the total number of graphic and format characters in the Unicode Standard to 128,172 characters (in case you are concerned that Unicode is running out of space, that still leaves room for another 846,293 characters to be encoded). In summary, Unicode 9.0 wil include 11 new blocks (named ranges of characters) and cover 6 new scripts (Osage, Newa, Bhaiksuki, Marchen, Tangut, and Adlam), making a total of 270 blocks and 135 scripts.
74 Emoji characters have been accepted for encoding in Unicode 9.0. However, two of these characters have been de-emojified at the request of Apple: U+1F946 RIFLE (representing Shooting or Hunting) and U+1F93B MODERN PENTATHLON (which includes Pistol Shooting as one of its disciplines) will have no Unicode properties to suggest that they are emoji. So the two characters will still be encoded in Unicode 9.0, but as plain symbols not as emoji characters; and it is unlikely that any major vendors will implement them as emoji.
|Provisional Character Name||Source|
Encoded to match U+1F483 💃 DANCER (typically implemented as a female dancer)
unequivocally represented as black in all variants
Encoded because there is a need for a black-coloured heart emoji, and U+2764 ❤ HEAVY BLACK HEART is typically implemented as a red heart
|U+1F919||CALL ME HAND||L2/15-054|
|U+1F91A||RAISED BACK OF HAND||L2/15-054|
|U+1F91E||HAND WITH INDEX AND MIDDLE FINGERS CROSSED||L2/15-054|
|U+1F920||FACE WITH COWBOY HAT||L2/15-054|
|U+1F923||ROLLING ON THE FLOOR LAUGHING||L2/15-054|
typically used with face or human figure
Encoded to match U+1F478 👸 PRINCESS
|U+1F935||MAN IN TUXEDO
Encoded to match U+1F470 👰 BRIDE WITH VEIL
Encoded to match U+1F385 🎅 FATHER CHRISTMAS
|U+1F938||PERSON DOING CARTWHEEL
NOT AN EMOJI (see above)
|U+1F941||DRUM WITH DRUMSTICKS||L2/15-195|
typically shown with iced drink
marksmanship, shooting (Olympic sport)
NOT AN EMOJI (see above)
|U+1F947||FIRST PLACE MEDAL
|U+1F948||SECOND PLACE MEDAL
|U+1F949||THIRD PLACE MEDAL
|U+1F94B||MARTIAL ARTS UNIFORM
judo and other martial arts
|U+1F958||SHALLOW PAN OF FOOD
döner kebab, falafel, gyro, shawarma
|U+1F95B||GLASS OF MILK||L2/15-267|
NB The above code points and character names are subject to change, and should not be relied on at this point in time.
- L2/15-054 Emoji Subcommittee, Emoji Additions: Animals, Compatibility, and More Popular Requests (2015-05-21)
- L2/15-195 Emoji Subcommittee, Emoji Additions Tranche 6: More Popular Requests and Gap Filling (2015-07-28)
- L2/15-196 Hiroyuki Komatsu, Proposal to add more sports-related emoji characters (2015-07-31)
- L2/15-267 Hiroyuki Komatsu, Proposal to add more food emoji characters (2015-11-05)
These characters are currently under ISO ballot for inclusion in ISO/IEC 10646:2016 (5th ed.) (see WG2 N4705 pages 130, 131, 135, and 137–138). Most of the 8,514 characters in this document will feed into Unicode version 10.0 in June 2017, but due to the urgent need of netizens to be able to use new emoji at the earliest possible date, the Unicode Technical Committee (UTC) has a habit (policy?) of fast-tracking emoji characters into the Unicode standard out of synchronization with the corresponding ISO standard (ISO/IEC 10646). On January 26 these 74 emoji characters were authorized for inclusion in the Unicode 9.0 beta, and unless any national bodies have strong and compelling objections to any of these emoji characters in the current CD ballot (which closes 29 February 2016), then these 74 emoji characters will definitely be in Unicode 9.0. A final decision will be made when the UTC meets in early May 2016.
In the end, at the UTC meeting in May 2016, the UTC decided to only accept 72 emoji characters. At the request of Apple (in response to several well-publicized emoji gun incidents, and a campaign against adding more violent emoji to Unicode), U+1F946 RIFLE and U+1F93B MODERN PENTATHLON (which includes shooting as one of its disciplines) were de-emojified, and will be encoded in Unicode 9.0 as plain non-emoji symbols. Of course, people can still use U+1F946 🥆 RIFLE (or various combinations of the letters A-Z, and many other Unicode characters) to threaten other people in text messages, but the threats will not need to be taken seriously because the rifle character will not be displayed in colour (and it is quite likely that major vendors will not support this character at all in their fonts).
More Emoji to Look Forward to ...
Proposals by Jennifer 8. Lee and friends to encode emoji characters representing chopsticks, dumplings, fortune cookies, and Chinese takeout boxes were joyfully accepted by the shadowy Emoji subcommittee at the January UTC meeting, but they were submitted too late for inclusion in Unicode 9.0 — we can look forwrd to welcoming them into Unicode 10.0 in June 2017.
Alolita Sharma (@alolita) : #UTC146 Peter Edberg accepts #dumpling #chopsticks #fortunecookie #takeoutbox originals from emoji designer YiyingLu (25 January 2016)
It's Not All About Emoji !
Emoji make up 99% of the noise and hype surrounding Unicode 9.0, but they account for only 1% of the new characters.
7,227 of the 7,426 non-emoji characters to be added to Unicode 9.0 are included in ISO/IEC 10646:2014 (4th ed.) Amendment 2, and are highlighted in this document (along with one currency sign, nine CJK unified ideographs, 36 emoji characters, and 5 emoji modifier characters which were fast-tracked into Unicode 8.0). These characters have all been through at least two rounds of ISO technical ballots, and they are now stable (they cannot be moved, removed, or renamed). The remaining 199 characters are included in the Committe Draft for ISO/IEC 10646:2016 (5th ed.) (full draft is downloadable as N4446). This edition has not yet completed its two rounds of technical ballots by ISO national bodies, but the UTC has decided to fast-track the Adlam script, the Newa script, and Japanese TV symbols (in addition to the 74 emoji discussed above) into Unicode 9.0. It is not unusual for the UTC to fast-track urgently-required characters (such as currency symbols and emoji) into a version of Unicode before they have completed their final technical ballot, but it is unprecedented to fast-track complete scripts, especially when the first technical ballot has not yet completed.
Newa in particular has been a very difficult script to get encoded because of technical and political differences of opinion about what characters to include and the encoding model to use (see the long list of documents relating to Newa in the table below). As recently as the first ballot on the Committee Draft for ISO/IEC 10646 in August 2015 the UK national body expressed concerns over the encoding of murmured resonants as atomic characters (L2/15-262 p. 16), so the encoding of Newa cannot be considered to be uncontroversial. By fast-tracking Adlam and Newa into Unicode 9.0, the UTC has effectively stiffled any ISO national body opposition to the Newa repertoire that the UTC has agreed upon. The CD ballot for ISO/IEC 10646 closes 29 February 2016, which theoretically allows the UTC time to tweak (or even withdraw) any of the fast-tracked characters in response to ballot comments by ISO national bodies, but any requests to change the character repertoire, character positions or character names for Newa or Adlam in the final ISO technical ballot (DIS ballot) later this year will have to be rejected as the encoding of Newa and Adlam is already a fait accompli.
Fast-tracked characters from the ISO/IEC 10646 CD are marked ** in the tables below.
7,297 of the 7,500 new characters in Unicode 9.0 belong to six new scripts :
- Osage [Osge] : An alphabet for the Osage language which was devised between 2004 and 2006 for use by the Osage Nation in the USA.
- Newa [Newa] : A Brahmic script used in Nepal to write Newar (Nepal Bhasa).
- Bhaiksuki [Bhks] : A Brahmic script used for writing Buddhist manuscripts and inscriptions in the region of northern India and Tibet during the 11th and 12th centuries.
- Marchen [Marc] : A Brahmic script used for writing the extinct Zhang Zhung language in the Bon religious tradition in Tibet.
- Tangut [Tang] : An ideographic script used by the Tangut people to write the extinct Tangut language in the Western Xia and in China (Yuan and Ming dynasties) during the 11th through 16th centuries.
- Adlam [Adlm] : An alphabetic script devised by the brothers Ibrahima Barry and Abdoulaye Barry during the late 1980s, in order to represent the Fulani language.
Inscription in the Marchen script on the library of the Yungdrung Bon Monastery in Dolanji (Himachal Pradesh)
Photograph © Chris Hatchell
Of the 7,500 characters added to Unicode 9.0 (including the 74 emoji), 7,357 characters are included in 11 new blocks, and 143 characters are added to existing blocks, as detailed in the two tables below. The code points and character names for all these characters are now fixed, and will not be changed. Draft official Unicode data files are available here, and I have made a plain text list of all the new characters to be added to Unicode 9.0 available here.
|Block Name||Range||Characters / Source Documents|
9 letters used in early Church Slavonic (1C80..1C88).
Aleksandr Andreev, Yuri Shardt, and Nikita Simmons, "Proposal to Use Standardized Variation Sequences to Encode Church Slavonic Glyph Variants in Unicode" (2014-07-20) [L2/13-153]
72 letters for Osage: 36 uppercase letters (104B0..104D3) and 36 lowercase letters (104D8..104FB).
92 characters for Newa: 53 letters (11400..11434); 13 vowel signs (11435..11441); 7 other signs (11442..11448); an Om character (11449); a Siddhi character (1144A); 5 punctuation marks (1144B..1144F); 10 digits (11450..11459); a placeholder mark (1145B); and an insertion sign (1145D).
Ken Whistler, "On the encoding of the “Nepaalalipi” / “Newar” script" (2012-05-11) [L2/12-200]
Dev Dass Manandhar, "Response to L2/12-200 “On the encoding of ‘Nepaalalipi’/‘Newar’ script”" (2012-07-21) [L2/12-244]
Dev Dass Manandhar, "Ancillary materials on “breathy consonants” in “Nepaalalipi”" (2012-07-21) [L2/12-245]
Dev Dass Manandhar, Samir Karmacharya and Bishnu Chitrakar, "Proposal for the Nepaalalipi script in the UCS" (2012-10-29) [L2/12-349]
Deborah Anderson, "Comparison between Newar and Nepaalalipi proposals (L2/12‐003 and L2/12‐349)" (2012-11-08) [L2/12-390]
Dev Dass Manandhar, Bishnu Chitrakar and Samir Karmacharya, "To Unicode Technical Committee (UTC)" (2013-01-28) [L2/13-029]
Dev Dass Manandhar, Samir Karmacharya and Bishnu Chitrakar, "Proposal to Encode Nepaalalipi Script in ISO/IEC 10646" (2014-04-10) [L2/14-086]
Deborah Anderson, "Comparison between Newar and Nepaalalipi proposals (L2/12‐003 and L2/14‐086)" (2014-09-23) [L2/14-220]
Deborah Anderson, "Recommendations to UTC from Script Meeting in Nepal" (2014-10-06) [L2/14-253]
Ken Whistler, "Rationale for Atomic Encoding of Murmured Resonants in Newa" (2014-10-27) [L2/14-281]
13 head marks for Mongolian (11660..1166C).
Aaron Bell, Greg Eck, Andrew Glass, and Andrew West, "Encoding Mongolian head letters" (2014-01-17) [L2/14-030]
97 characters for Bhaiksuki: 46 letters (11C00..11C08, 11C0A..11C2E); 12 vowel signs (11C2F..11C36, 11C38..11C3B); 5 other signs (11C3C..11C40); 2 dandas (11C41..11C42); a word separator (11C43); 2 gap fillers (11C44..11C45); 10 decimal digits (11C50..11C59); 18 numbers (11C5A..11C6B); and a hundreds unit mark (11C6C).
Anshuman Pandey and Dragomir Dimitrov, "Revised Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2014-01-27) [L2/14-036]
68 characters for Marchen: 2 marks (11C70..11C71); 30 letters (11C72..11C8F); 29 subjoined letters (11C92..11CA7, 11CA9..11CAF); 5 vowel signs (11CB0..11CB4); and 2 other signs (11CB5..11CB6).
|Ideographic Symbols and Punctuation||16FE0..16FFF||
1 iteration mark for Tangut (16FE0).
See under Tangut.
6,125 Tangut ideographs (17000..187EC) [characters are named algorithmically based on their code point, as TANGUT IDEOGRAPH-hhhhh].
Richard Cook (UC Berkeley Script Encoding Initiative), "Proposal to encode Tangut characters in UCS Plane 1" (2007-05-09) [WG2 N3297 || L2/07-143] [Multi Column Chart : WG2 N3297A || L2/07-144] [Single Column Chart: WG2 N3297B || L2/07-145]
Richard Cook (UC Berkeley Script Encoding Initiative), "Tangut Proposal Code Chart Update" (2007-07-24) [L2/07-229]
Richard Cook, "Single-Column Tangut Code Chart (using Column G font)" (2008-09-03) [L2/08-336]
Michael Everson, Nathan Hill, Guillaume Jacques, Andrew West, Viacheslav Zaytsev, "Proposal for a revised Tangut character set for encoding in the SMP of the UCS" (2009-03-01) [WG2 N3577 || L2/09-095]
Michael Everson, Nathan Hill, Guillaume Jacques, Andrew West, Viacheslav Zaytsev, "Proposal for a revised Tangut character set for encoding in the SMP of the UCS" (2009-04-08) [WG2 N3577R || L2/09-115] [Appendix A: WG2 N3577R-A || L2/09-116] [Appendix B: WG2 N3577R-B || L2/09-117]
Deborah Anderson and Richard Cook, "Request for Tangut font and mappings from N3577 to Amendment 7 repertoire" (2009-03-04) [WG2 N3586]
Richard Cook and Deborah Anderson, Script Encoding Initiative, UC Berkeley, "Comments on Tangut report N4033" (2011‐06‐01) [WG2 N4094]
China, "Comments on N4325, 4326 and N4327 (Tangut)" (2012-10-20) [WG2 N4370]
China, "Explanation on the Re-facture of Tangut Fonts" (2013-06-10) [WG2 N4455]
China, "Review of N4558R Tangut glyph corrections" (2014-09-29) [WG2 N4640]
755 Tangut radicals and character components (18800..18AF2).
Richard Cook and Deborah Anderson, "Comments on the Tangut radicals and strokes proposal (N3495 = L2/08‐335)" (2008-10-29) [L2/08-399]
38 combining letters for Glagolitic (1E000..1E006, 1E008..1E018, 1E01B..1E021, 1E023..1E024, 1E026..1E02A).
87 characters for Adlam: 34 uppercase letters (1E900..1E921); 34 lowercase letters (1E922..1E943); 7 marks (1E944..1E94A); 10 digits (1E950..1E959); and 2 punctuation marks (1E95E..1E95F).
Leaf from a Tangut Buddhist manuscript (Great Perfection of Wisdom Sutra)
|Block Name||Range||Characters / Source Documents|
5 Arabic letters for Bravanese (08B6..08BA).
Hamid Banafunzi, Marghani Banafunzi, and Maxamed Nuur, "Proposal to encode five Arabic script characters for the Bravanese (Chimiini)" (2014-08-31) [L2/13-178]
Roozbeh Pournader and Shervin Afshar, "Proposal to Encode Arabic Letter Teh with Small Teh Above for Bravanese" (2014-11-01) [L2/13-293]
3 Arabic letters for Warsh-based orthographies (08BB..08BD).
Lorna Evans (SIL International), "Supporting the Warsh orthography for Arabic script" (2014-04-29) [L2/14-104]
15 Quranic marks used in Pakistani printing (08D4..08E2).
Lateef Sagar Shaikh, "Proposal to encode Quranic marks used in Quran published in Pakistan" (2014-04-24) [L2/14-095]
Lateef Sagar Shaikh, "Proposal to encode Quranic Alternate Dammatan used in Quran published in Pakistan" (2014-04-25) [L2/14-096]
1 spacing candrabindu sign (0C80).
3 chillu letters (0D54..0D56).
1 para sign (0D4F).
10 characters for fractions (0D58..0D5E, 0D76..0D78).
|Combining Diacritical Marks Supplement **||1DC0..1DFF||
1 combining deletion mark for Newa (1DFB).
4 power button symbols (23FB..23FE).
1 punctuation mark for Slavonic (2E43: DASH WITH LEFT UPTURN).
1 suspension mark for Byzantine Greek (2E44: DOUBLE SUSPENSION MARK).
1 letter for Unifon (A7AE: LATIN CAPITAL LETTER SMALL CAPITAL I).
1 candrabindu sign (A8C5).
|Ancient Greek Numbers||10140..1018F||
2 signs for ancient Greek (1018D..1018E).
1 sukun sign for Arabic transliteration in the Khojki script (1123E).
|Enclosed Alphanumeric Supplement **||1F100..1F1FF||
18 Japanese TV symbols required for ARIB STD-B62 (1F19B..1F1AC).
|Enclosed Ideographic Supplement **||1F200..1F2FF||
1 Japanese TV symbol required for ARIB STD-B62 (1F23B).
|Miscellaneous Symbols and Pictographs **||1F300..1F5FF||
2 emoji (see top of post):
1F57A : MAN DANCING
1F5A4 : BLACK HEART
|Transport and Map Symbols **||1F680..1F6FF||
5 emoji (see top of post):
1F6D1 : OCTAGONAL SIGN
1F6D2 : SHOPPING TROLLEY
1F6F4 : SCOOTER
1F6F5 : MOTOR SCOOTER
1F6F6 : CANOE
|Supplemental Symbols and Pictographs **||1F900..1F9FF||
67 emoji and emoticons (see top of post):
1F919 : CALL ME HAND
1F91A : RAISED BACK OF HAND
1F91B : LEFT-FACING FIST
1F91C : RIGHT-FACING FIST
1F91D : HANDSHAKE
1F91E : HAND WITH INDEX AND MIDDLE FINGERS CROSSED
1F920 : FACE WITH COWBOY HAT
1F921 : CLOWN FACE
1F922 : NAUSEATED FACE
1F923 : ROLLING ON THE FLOOR LAUGHING
1F924 : DROOLING FACE
1F925 : LYING FACE
1F926 : FACE PALM
1F927 : SNEEZING FACE
1F930 : PREGNANT WOMAN
1F933 : SELFIE
1F934 : PRINCE
1F935 : MAN IN TUXEDO
1F936 : MOTHER CHRISTMAS
1F937 : SHRUG
1F938 : PERSON DOING CARTWHEEL
1F939 : JUGGLING
1F93A : FENCER
1F93B : MODERN PENTATHLON
1F93C : WRESTLERS
1F93D : WATER POLO
1F93E : HANDBALL
1F940 : WILTED FLOWER
1F941 : DRUM WITH DRUMSTICKS
1F942 : CLINKING GLASSES
1F943 : TUMBLER GLASS
1F944 : SPOON
1F945 : GOAL NET
1F946 : RIFLE
1F947 : FIRST PLACE MEDAL
1F948 : SECOND PLACE MEDAL
1F949 : THIRD PLACE MEDAL
1F94A : BOXING GLOVE
1F94B : MARTIAL ARTS UNIFORM
1F950 : CROISSANT
1F951 : AVOCADO
1F952 : CUCUMBER
1F953 : BACON
1F954 : POTATO
1F955 : CARROT
1F956 : BAGUETTE BREAD
1F957 : GREEN SALAD
1F958 : SHALLOW PAN OF FOOD
1F959 : STUFFED FLATBREAD
1F95A : EGG
1F95B : GLASS OF MILK
1F95C : PEANUTS
1F95D : KIWIFRUIT
1F95E : PANCAKES
1F985 : EAGLE
1F986 : DUCK
1F987 : BAT
1F988 : SHARK
1F989 : OWL
1F98A : FOX FACE
1F98B : BUTTERFLY
1F98C : DEER
1F98D : GORILLA
1F98E : LIZARD
1F98F : RHINOCEROS
1F990 : SHRIMP
1F991 : SQUID
The author in front of a Tangut Buddhist inscription on the Cloud Platform at Juyong Pass
Previous Posts on Unicode Versions
- What's new in Unicode 5.0 ? [November 2005]
- What's new in Unicode 5.1 ? [June 2007]
- What's new in Unicode 5.2 ? [April 2008]
- What's new in Unicode 6.0 ? [November 2009]
- What's new in Unicode 6.1 ? [June 2011]
- What's new in Unicode 6.2 ? [May 2012]
- What's new in Unicode 6.3 ? [January 2013]
- What's new in Unicode 7.0 ? [October 2013]
- What's new in Unicode 8.0 ? [April 2015]
Last modified: 2016-05-24