FE 0.9.0
Header-only C++ frontend library
Loading...
Searching...
No Matches
fe::utf8 Namespace Reference

UTF-8 helpers for decoding byte streams, encoding char32_t values, and running ASCII-style character classification on char32_t. More...

Classes

struct  Char32
 Wrapper for char32_t with an operator<< that writes UTF-8. More...

Functions

constexpr size_t num_bytes (char8_t c) noexcept
 Returns the expected number of bytes for an UTF-8 char sequence by inspecting the first byte.
constexpr char32_t append (char32_t c, char8_t b) noexcept
 Append b to c for converting UTF-8 to UTF-32.
constexpr char32_t first (char32_t c, char32_t num) noexcept
 Get relevant bits of first UTF-8 byte c of a multi-byte sequence consisting of num bytes.
constexpr char32_t min_code_point (size_t num) noexcept
 Minimum Unicode scalar value representable in an UTF-8 sequence of num bytes.
constexpr bool is_scalar_value (char32_t c) noexcept
 Is c a valid Unicode scalar value?
constexpr char8_t is_valid234 (char8_t c) noexcept
 Is the 2nd, 3rd, or 4th byte of an UTF-8 byte sequence valid?
char32_t decode (std::istream &is)
 Decodes the next UTF-8 sequence from is into a single char32_t.
bool encode (std::ostream &os, char32_t c32)
 Encodes c32 as UTF-8 and writes the resulting bytes to os.
Wrappers

Safe char32_t-style wrappers for <ctype> functions:

Like all other functions from <cctype>, the behavior of std::isalnum is undefined if the argument's value is neither representable as unsigned char nor equal to EOF.

bool isalnum (char32_t c) noexcept
bool isalpha (char32_t c) noexcept
bool isblank (char32_t c) noexcept
bool iscntrl (char32_t c) noexcept
bool isdigit (char32_t c) noexcept
bool isgraph (char32_t c) noexcept
bool islower (char32_t c) noexcept
bool isprint (char32_t c) noexcept
bool ispunct (char32_t c) noexcept
bool isspace (char32_t c) noexcept
bool isupper (char32_t c) noexcept
bool isxdigit (char32_t c) noexcept
bool isascii (char32_t c) noexcept
char32_t tolower (char32_t c) noexcept
char32_t toupper (char32_t c) noexcept
constexpr bool isrange (char32_t c, char32_t begin, char32_t finis) noexcept
 Is c within [begin, finis]?
constexpr auto isrange (char32_t begin, char32_t finis) noexcept
constexpr bool isodigit (char32_t c) noexcept
 Is octal digit?
constexpr bool isbdigit (char32_t c) noexcept
 Is binary digit?
any

Build a predicate that checks whether a code point matches any of the given values.

bool _any (char32_t c, char32_t d)
template<class... T>
bool _any (char32_t c, char32_t d, T... args)
template<class... T>
auto any (T... args)

Variables

static constexpr size_t Max = 4
 Maximal number of char8_ts of an UTF-8 byte sequence.
static constexpr char32_t BOM = 0xfeff
 Byte Order Mark.
static constexpr char32_t EoF = (char32_t)std::istream::traits_type::eof()
 End of stream sentinel returned by decode.
static constexpr char32_t Null = 0
 U+0000 NULL returned unchanged by decode.
static constexpr char32_t Invalid = 0x110000
 Sentinel returned by decode for malformed UTF-8.

Detailed Description

UTF-8 helpers for decoding byte streams, encoding char32_t values, and running ASCII-style character classification on char32_t.

The central entry points are decode and encode. Decoding returns sentinel values such as EoF and Invalid instead of throwing.

Function Documentation

◆ _any() [1/2]

bool fe::utf8::_any ( char32_t c,
char32_t d )
inline

Definition at line 153 of file utf8.h.

Referenced by _any(), and any().

◆ _any() [2/2]

template<class... T>
bool fe::utf8::_any ( char32_t c,
char32_t d,
T... args )
inline

Definition at line 155 of file utf8.h.

References _any().

◆ any()

template<class... T>
auto fe::utf8::any ( T... args)
inline

Definition at line 159 of file utf8.h.

References _any().

◆ append()

char32_t fe::utf8::append ( char32_t c,
char8_t b )
constexprnoexcept

Append b to c for converting UTF-8 to UTF-32.

Definition at line 35 of file utf8.h.

Referenced by decode().

◆ decode()

char32_t fe::utf8::decode ( std::istream & is)
inline

Decodes the next UTF-8 sequence from is into a single char32_t.

Returns EoF when the stream is exhausted and Invalid for malformed, overlong, surrogate, or otherwise non-scalar encodings.

Definition at line 64 of file utf8.h.

References append(), EoF, first(), Invalid, is_scalar_value(), is_valid234(), min_code_point(), and num_bytes().

Referenced by fe::Lexer< K, S >::next().

◆ encode()

bool fe::utf8::encode ( std::ostream & os,
char32_t c32 )
inline

Encodes c32 as UTF-8 and writes the resulting bytes to os.

Returns false when c32 is outside the encodable range.

Definition at line 96 of file utf8.h.

Referenced by fe::utf8::Char32::operator<<.

◆ first()

char32_t fe::utf8::first ( char32_t c,
char32_t num )
constexprnoexcept

Get relevant bits of first UTF-8 byte c of a multi-byte sequence consisting of num bytes.

Definition at line 38 of file utf8.h.

Referenced by decode().

◆ is_scalar_value()

bool fe::utf8::is_scalar_value ( char32_t c)
constexprnoexcept

Is c a valid Unicode scalar value?

Definition at line 52 of file utf8.h.

Referenced by decode().

◆ is_valid234()

char8_t fe::utf8::is_valid234 ( char8_t c)
constexprnoexcept

Is the 2nd, 3rd, or 4th byte of an UTF-8 byte sequence valid?

Returns
the extracted char8_t or char8_t(-1) if invalid.

Definition at line 56 of file utf8.h.

Referenced by decode().

◆ isalnum()

bool fe::utf8::isalnum ( char32_t c)
inlinenoexcept

Definition at line 125 of file utf8.h.

◆ isalpha()

bool fe::utf8::isalpha ( char32_t c)
inlinenoexcept

Definition at line 126 of file utf8.h.

◆ isascii()

bool fe::utf8::isascii ( char32_t c)
inlinenoexcept

Definition at line 137 of file utf8.h.

◆ isbdigit()

bool fe::utf8::isbdigit ( char32_t c)
constexprnoexcept

Is binary digit?

Definition at line 146 of file utf8.h.

References isrange().

◆ isblank()

bool fe::utf8::isblank ( char32_t c)
inlinenoexcept

Definition at line 127 of file utf8.h.

◆ iscntrl()

bool fe::utf8::iscntrl ( char32_t c)
inlinenoexcept

Definition at line 128 of file utf8.h.

◆ isdigit()

bool fe::utf8::isdigit ( char32_t c)
inlinenoexcept

Definition at line 129 of file utf8.h.

◆ isgraph()

bool fe::utf8::isgraph ( char32_t c)
inlinenoexcept

Definition at line 130 of file utf8.h.

◆ islower()

bool fe::utf8::islower ( char32_t c)
inlinenoexcept

Definition at line 131 of file utf8.h.

◆ isodigit()

bool fe::utf8::isodigit ( char32_t c)
constexprnoexcept

Is octal digit?

Definition at line 145 of file utf8.h.

References isrange().

◆ isprint()

bool fe::utf8::isprint ( char32_t c)
inlinenoexcept

Definition at line 132 of file utf8.h.

◆ ispunct()

bool fe::utf8::ispunct ( char32_t c)
inlinenoexcept

Definition at line 133 of file utf8.h.

◆ isrange() [1/2]

auto fe::utf8::isrange ( char32_t begin,
char32_t finis )
constexprnoexcept

Definition at line 143 of file utf8.h.

References isrange().

◆ isrange() [2/2]

bool fe::utf8::isrange ( char32_t c,
char32_t begin,
char32_t finis )
constexprnoexcept

Is c within [begin, finis]?

Definition at line 142 of file utf8.h.

Referenced by isbdigit(), isodigit(), and isrange().

◆ isspace()

bool fe::utf8::isspace ( char32_t c)
inlinenoexcept

Definition at line 134 of file utf8.h.

◆ isupper()

bool fe::utf8::isupper ( char32_t c)
inlinenoexcept

Definition at line 135 of file utf8.h.

◆ isxdigit()

bool fe::utf8::isxdigit ( char32_t c)
inlinenoexcept

Definition at line 136 of file utf8.h.

◆ min_code_point()

char32_t fe::utf8::min_code_point ( size_t num)
constexprnoexcept

Minimum Unicode scalar value representable in an UTF-8 sequence of num bytes.

Definition at line 41 of file utf8.h.

Referenced by decode().

◆ num_bytes()

size_t fe::utf8::num_bytes ( char8_t c)
constexprnoexcept

Returns the expected number of bytes for an UTF-8 char sequence by inspecting the first byte.

Retuns 0 if invalid.

Definition at line 26 of file utf8.h.

Referenced by decode().

◆ tolower()

char32_t fe::utf8::tolower ( char32_t c)
inlinenoexcept

Definition at line 138 of file utf8.h.

Referenced by fe::Lexer< K, S >::accept().

◆ toupper()

char32_t fe::utf8::toupper ( char32_t c)
inlinenoexcept

Definition at line 139 of file utf8.h.

Referenced by fe::Lexer< K, S >::accept().

Variable Documentation

◆ BOM

char32_t fe::utf8::BOM = 0xfeff
staticconstexpr

Byte Order Mark.

Definition at line 18 of file utf8.h.

Referenced by fe::Lexer< K, S >::next().

◆ EoF

char32_t fe::utf8::EoF = (char32_t)std::istream::traits_type::eof()
staticconstexpr

End of stream sentinel returned by decode.

Definition at line 19 of file utf8.h.

Referenced by decode(), and fe::Lexer< K, S >::next().

◆ Invalid

char32_t fe::utf8::Invalid = 0x110000
staticconstexpr

Sentinel returned by decode for malformed UTF-8.

Definition at line 22 of file utf8.h.

Referenced by decode().

◆ Max

size_t fe::utf8::Max = 4
staticconstexpr

Maximal number of char8_ts of an UTF-8 byte sequence.

Definition at line 17 of file utf8.h.

◆ Null

char32_t fe::utf8::Null = 0
staticconstexpr

U+0000 NULL returned unchanged by decode.

Definition at line 21 of file utf8.h.