SMS Resource

SMS Encoding Guide and Segment Calculator

Compare GSM-7, ASCII, Latin-1, and UCS-2 behavior, then calculate exact segment counts for any message payload.

Practical rule: encoding determines payload size, segment count, and delivery cost profile.

Encoding Comparison

SMS payloads are limited to 140 bytes. Concatenated SMS uses UDH overhead, which reduces per-segment capacity.

Encoding Typical Use Single Segment Concatenated Segment Unit
GSM-7 Default SMS alphabet, most Western-language traffic 160 153 septets (7-bit chars)
ASCII 7-bit ASCII characters sent in 8-bit payload mode 140 134 bytes
Latin-1 Extended Western European characters in 8-bit mode 140 134 bytes
UCS-2 Multilingual/unicode messaging 70 67 16-bit code units

Segment Calculator

Segments
0
Units Used
0
Per Segment Units
0
Remaining
0
Resolved encoding: Auto
Payload bytes: 0
Character count: 0

Cost Estimator (Editable Assumptions)

Estimated Message Cost
$0.0000
Encoding Valid for Message Segments Units Payload Bytes Est. Cost (USD)

GSM-7 Notes

  • GSM-7 extension-table characters consume 2 septets (`^`, `{`, `}`, `\\`, `[`, `]`, `~`, `|`, `€`).
  • If a message includes characters outside GSM-7 tables, GSM-7 is not a valid encoding for that payload.
  • GSM-7 supports 160 chars for one segment and 153 chars per segment when concatenated.

UCS-2 Notes

  • UCS-2 uses 2 bytes per 16-bit code unit.
  • Single-segment limit is 70 code units, then 67 per concatenated segment.
  • For characters outside the BMP (for example many emoji), practical implementations use surrogate pairs and consume 2 code units.

Supported Character Sets

Use these references to validate payload compatibility before selecting an encoding in production.

GSM-7 Character Set

Basic table characters (1 septet each):

@ £ $ ¥ è é ù ì ò Ç \n Ø ø \r Å å Δ _ Φ Γ Λ Ω Π Ψ Σ Θ Ξ Æ æ ß É
! \" # ¤ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
¡ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ä Ö Ñ Ü §
¿ a b c d e f g h i j k l m n o p q r s t u v w x y z ä ö ñ ü à

Extension table characters (2 septets each): ^ { } \ [ ] ~ | €

ASCII Character Set

Supported range: U+0000 to U+007F (7-bit ASCII).

Common printable range: U+0020 to U+007E, including letters, digits, punctuation, and symbols.

Latin-1 (ISO-8859-1) Character Set

Supported range: U+0000 to U+00FF.

Includes ASCII plus Western European accented characters such as Á É Í Ó Ú Ñ Ç Ö Ü ß æ ø.

UCS-2 Character Set

Supports 16-bit Unicode code units in the Basic Multilingual Plane (BMP), roughly U+0000 to U+FFFF.

Characters outside BMP (for example many emoji) are encoded as surrogate pairs in practical implementations and consume additional units.