Primitive Type char [−]
A character type.
The char type represents a single character. More specifically, since
'character' isn't a well-defined concept in Unicode, char is a 'Unicode
scalar value', which is similar to, but not the same as, a 'Unicode code
point'.
This documentation describes a number of methods and trait implementations on the
char type. For technical reasons, there is additional, separate
documentation in the std::char module as well.
Representation
char is always four bytes in size. This is a different representation than
a given character would have as part of a String. For example:
let v = vec!['h', 'e', 'l', 'l', 'o']; // five elements times four bytes for each element assert_eq!(20, v.len() * std::mem::size_of::<char>()); let s = String::from("hello"); // five elements times one byte per element assert_eq!(5, s.len() * std::mem::size_of::<u8>());
As always, remember that a human intuition for 'character' may not map to Unicode's definitions. For example, emoji symbols such as '❤️' can be more than one Unicode code point; this ❤️ in particular is two:
fn main() { let s = String::from("❤️"); // we get two chars out of a single ❤️ let mut iter = s.chars(); assert_eq!(Some('\u{2764}'), iter.next()); assert_eq!(Some('\u{fe0f}'), iter.next()); assert_eq!(None, iter.next()); }let s = String::from("❤️"); // we get two chars out of a single ❤️ let mut iter = s.chars(); assert_eq!(Some('\u{2764}'), iter.next()); assert_eq!(Some('\u{fe0f}'), iter.next()); assert_eq!(None, iter.next());
This means it won't fit into a char. Trying to create a literal with
let heart = '❤️'; gives an error:
error: character literal may only contain one codepoint: '❤
let heart = '❤️';
^~
Another implication of the 4-byte fixed size of a char is that
per-char processing can end up using a lot more memory:
let s = String::from("love: ❤️"); let v: Vec<char> = s.chars().collect(); assert_eq!(12, s.len() * std::mem::size_of::<u8>()); assert_eq!(32, v.len() * std::mem::size_of::<char>());
Methods
impl char
1.0.0fn is_digit(self, radix: u32) -> bool
Checks if a char is a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radicum are supported.
Compared to is_numeric(), this function only recognizes the characters
0-9, a-z and A-Z.
'Digit' is defined to be only the following characters:
0-9a-zA-Z
For a more comprehensive understanding of 'digit', see is_numeric().
Panics
Panics if given a radix larger than 36.
Examples
Basic usage:
fn main() { assert!('1'.is_digit(10)); assert!('f'.is_digit(16)); assert!(!'f'.is_digit(10)); }assert!('1'.is_digit(10)); assert!('f'.is_digit(16)); assert!(!'f'.is_digit(10));
Passing a large radix, causing a panic:
fn main() { use std::thread; let result = thread::spawn(|| { // this panics '1'.is_digit(37); }).join(); assert!(result.is_err()); }use std::thread; let result = thread::spawn(|| { // this panics '1'.is_digit(37); }).join(); assert!(result.is_err());
1.0.0fn to_digit(self, radix: u32) -> Option<u32>
Converts a char to a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radicum are supported.
'Digit' is defined to be only the following characters:
0-9a-zA-Z
Errors
Returns None if the char does not refer to a digit in the given radix.
Panics
Panics if given a radix larger than 36.
Examples
Basic usage:
fn main() { assert_eq!('1'.to_digit(10), Some(1)); assert_eq!('f'.to_digit(16), Some(15)); }assert_eq!('1'.to_digit(10), Some(1)); assert_eq!('f'.to_digit(16), Some(15));
Passing a non-digit results in failure:
fn main() { assert_eq!('f'.to_digit(10), None); assert_eq!('z'.to_digit(16), None); }assert_eq!('f'.to_digit(10), None); assert_eq!('z'.to_digit(16), None);
Passing a large radix, causing a panic:
fn main() { use std::thread; let result = thread::spawn(|| { '1'.to_digit(37); }).join(); assert!(result.is_err()); }use std::thread; let result = thread::spawn(|| { '1'.to_digit(37); }).join(); assert!(result.is_err());
1.0.0fn escape_unicode(self) -> EscapeUnicode
Returns an iterator that yields the hexadecimal Unicode escape of a
character, as chars.
All characters are escaped with Rust syntax of the form \\u{NNNN}
where NNNN is the shortest hexadecimal representation.
Examples
Basic usage:
fn main() { for c in '❤'.escape_unicode() { print!("{}", c); } println!(""); }for c in '❤'.escape_unicode() { print!("{}", c); } println!("");
This prints:
\u{2764}
Collecting into a String:
let heart: String = '❤'.escape_unicode().collect(); assert_eq!(heart, r"\u{2764}");
1.0.0fn escape_default(self) -> EscapeDefault
Returns an iterator that yields the literal escape code of a char.
The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:
- Tab is escaped as
\t. - Carriage return is escaped as
\r. - Line feed is escaped as
\n. - Single quote is escaped as
\'. - Double quote is escaped as
\". - Backslash is escaped as
\\. - Any character in the 'printable ASCII' range
0x20..0x7einclusive is not escaped. - All other characters are given hexadecimal Unicode escapes; see
escape_unicode.
Examples
Basic usage:
fn main() { for i in '"'.escape_default() { println!("{}", i); } }for i in '"'.escape_default() { println!("{}", i); }
This prints:
\
"
Collecting into a String:
let quote: String = '"'.escape_default().collect(); assert_eq!(quote, "\\\"");
1.0.0fn len_utf8(self) -> usize
Returns the number of bytes this char would need if encoded in UTF-8.
That number of bytes is always between 1 and 4, inclusive.
Examples
Basic usage:
fn main() { let len = 'A'.len_utf8(); assert_eq!(len, 1); let len = 'ß'.len_utf8(); assert_eq!(len, 2); let len = 'ℝ'.len_utf8(); assert_eq!(len, 3); let len = '💣'.len_utf8(); assert_eq!(len, 4); }let len = 'A'.len_utf8(); assert_eq!(len, 1); let len = 'ß'.len_utf8(); assert_eq!(len, 2); let len = 'ℝ'.len_utf8(); assert_eq!(len, 3); let len = '💣'.len_utf8(); assert_eq!(len, 4);
The &str type guarantees that its contents are UTF-8, and so we can compare the length it
would take if each code point was represented as a char vs in the &str itself:
// as chars let eastern = '東'; let capitol = '京'; // both can be represented as three bytes assert_eq!(3, eastern.len_utf8()); assert_eq!(3, capitol.len_utf8()); // as a &str, these two are encoded in UTF-8 let tokyo = "東京"; let len = eastern.len_utf8() + capitol.len_utf8(); // we can see that they take six bytes total... assert_eq!(6, tokyo.len()); // ... just like the &str assert_eq!(len, tokyo.len());
1.0.0fn len_utf16(self) -> usize
Returns the number of 16-bit code units this char would need if
encoded in UTF-16.
See the documentation for len_utf8() for more explanation of this
concept. This function is a mirror, but for UTF-16 instead of UTF-8.
Examples
Basic usage:
fn main() { let n = 'ß'.len_utf16(); assert_eq!(n, 1); let len = '💣'.len_utf16(); assert_eq!(len, 2); }let n = 'ß'.len_utf16(); assert_eq!(n, 1); let len = '💣'.len_utf16(); assert_eq!(len, 2);
fn encode_utf8(self) -> EncodeUtf8
unicode #27784)Returns an interator over the bytes of this character as UTF-8.
The returned iterator also has an as_slice() method to view the
encoded bytes as a byte slice.
Examples
#![feature(unicode)] fn main() { let iterator = 'ß'.encode_utf8(); assert_eq!(iterator.as_slice(), [0xc3, 0x9f]); for (i, byte) in iterator.enumerate() { println!("byte {}: {:x}", i, byte); } }#![feature(unicode)] let iterator = 'ß'.encode_utf8(); assert_eq!(iterator.as_slice(), [0xc3, 0x9f]); for (i, byte) in iterator.enumerate() { println!("byte {}: {:x}", i, byte); }
fn encode_utf16(self) -> EncodeUtf16
unicode #27784)Returns an interator over the u16 entries of this character as UTF-16.
The returned iterator also has an as_slice() method to view the
encoded form as a slice.
Examples
#![feature(unicode)] fn main() { let iterator = '𝕊'.encode_utf16(); assert_eq!(iterator.as_slice(), [0xd835, 0xdd4a]); for (i, val) in iterator.enumerate() { println!("entry {}: {:x}", i, val); } }#![feature(unicode)] let iterator = '𝕊'.encode_utf16(); assert_eq!(iterator.as_slice(), [0xd835, 0xdd4a]); for (i, val) in iterator.enumerate() { println!("entry {}: {:x}", i, val); }
1.0.0fn is_alphabetic(self) -> bool
Returns true if this char is an alphabetic code point, and false if not.
Examples
Basic usage:
fn main() { assert!('a'.is_alphabetic()); assert!('京'.is_alphabetic()); let c = '💝'; // love is many things, but it is not alphabetic assert!(!c.is_alphabetic()); }assert!('a'.is_alphabetic()); assert!('京'.is_alphabetic()); let c = '💝'; // love is many things, but it is not alphabetic assert!(!c.is_alphabetic());
fn is_xid_start(self) -> bool
unicode): mainly needed for compiler internals
Returns true if this char satisfies the 'XID_Start' Unicode property, and false
otherwise.
'XID_Start' is a Unicode Derived Property specified in
UAX #31,
mostly similar to ID_Start but modified for closure under NFKx.
fn is_xid_continue(self) -> bool
unicode): mainly needed for compiler internals
Returns true if this char satisfies the 'XID_Continue' Unicode property, and false
otherwise.
'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.
1.0.0fn is_lowercase(self) -> bool
Returns true if this char is lowercase, and false otherwise.
'Lowercase' is defined according to the terms of the Unicode Derived Core
Property Lowercase.
Examples
Basic usage:
fn main() { assert!('a'.is_lowercase()); assert!('δ'.is_lowercase()); assert!(!'A'.is_lowercase()); assert!(!'Δ'.is_lowercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_lowercase()); }assert!('a'.is_lowercase()); assert!('δ'.is_lowercase()); assert!(!'A'.is_lowercase()); assert!(!'Δ'.is_lowercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_lowercase());
1.0.0fn is_uppercase(self) -> bool
Returns true if this char is uppercase, and false otherwise.
'Uppercase' is defined according to the terms of the Unicode Derived Core
Property Uppercase.
Examples
Basic usage:
fn main() { assert!(!'a'.is_uppercase()); assert!(!'δ'.is_uppercase()); assert!('A'.is_uppercase()); assert!('Δ'.is_uppercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_uppercase()); }assert!(!'a'.is_uppercase()); assert!(!'δ'.is_uppercase()); assert!('A'.is_uppercase()); assert!('Δ'.is_uppercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_uppercase());
1.0.0fn is_whitespace(self) -> bool
Returns true if this char is whitespace, and false otherwise.
'Whitespace' is defined according to the terms of the Unicode Derived Core
Property White_Space.
Examples
Basic usage:
fn main() { assert!(' '.is_whitespace()); // a non-breaking space assert!('\u{A0}'.is_whitespace()); assert!(!'越'.is_whitespace()); }assert!(' '.is_whitespace()); // a non-breaking space assert!('\u{A0}'.is_whitespace()); assert!(!'越'.is_whitespace());
1.0.0fn is_alphanumeric(self) -> bool
Returns true if this char is alphanumeric, and false otherwise.
'Alphanumeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.
Examples
Basic usage:
fn main() { assert!('٣'.is_alphanumeric()); assert!('7'.is_alphanumeric()); assert!('৬'.is_alphanumeric()); assert!('K'.is_alphanumeric()); assert!('و'.is_alphanumeric()); assert!('藏'.is_alphanumeric()); assert!(!'¾'.is_alphanumeric()); assert!(!'①'.is_alphanumeric()); }assert!('٣'.is_alphanumeric()); assert!('7'.is_alphanumeric()); assert!('৬'.is_alphanumeric()); assert!('K'.is_alphanumeric()); assert!('و'.is_alphanumeric()); assert!('藏'.is_alphanumeric()); assert!(!'¾'.is_alphanumeric()); assert!(!'①'.is_alphanumeric());
1.0.0fn is_control(self) -> bool
Returns true if this char is a control code point, and false otherwise.
'Control code point' is defined in terms of the Unicode General
Category Cc.
Examples
Basic usage:
fn main() { // U+009C, STRING TERMINATOR assert!(''.is_control()); assert!(!'q'.is_control()); }// U+009C, STRING TERMINATOR assert!(''.is_control()); assert!(!'q'.is_control());
1.0.0fn is_numeric(self) -> bool
Returns true if this char is numeric, and false otherwise.
'Numeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No'.
Examples
Basic usage:
fn main() { assert!('٣'.is_numeric()); assert!('7'.is_numeric()); assert!('৬'.is_numeric()); assert!(!'K'.is_numeric()); assert!(!'و'.is_numeric()); assert!(!'藏'.is_numeric()); assert!(!'¾'.is_numeric()); assert!(!'①'.is_numeric()); }assert!('٣'.is_numeric()); assert!('7'.is_numeric()); assert!('৬'.is_numeric()); assert!(!'K'.is_numeric()); assert!(!'و'.is_numeric()); assert!(!'藏'.is_numeric()); assert!(!'¾'.is_numeric()); assert!(!'①'.is_numeric());
1.0.0fn to_lowercase(self) -> ToLowercase
Returns an iterator that yields the lowercase equivalent of a char.
If no conversion is possible then an iterator with just the input character is returned.
This performs complex unconditional mappings with no tailoring: it maps
one Unicode character to its lowercase equivalent according to the
Unicode database and the additional complex mappings
SpecialCasing.txt. Conditional mappings (based on context or
language) are not considered here.
For a full reference, see here.
Examples
Basic usage:
fn main() { assert_eq!('C'.to_lowercase().next(), Some('c')); // Japanese scripts do not have case, and so: assert_eq!('山'.to_lowercase().next(), Some('山')); }assert_eq!('C'.to_lowercase().next(), Some('c')); // Japanese scripts do not have case, and so: assert_eq!('山'.to_lowercase().next(), Some('山'));
1.0.0fn to_uppercase(self) -> ToUppercase
Returns an iterator that yields the uppercase equivalent of a char.
If no conversion is possible then an iterator with just the input character is returned.
This performs complex unconditional mappings with no tailoring: it maps
one Unicode character to its uppercase equivalent according to the
Unicode database and the additional complex mappings
SpecialCasing.txt. Conditional mappings (based on context or
language) are not considered here.
For a full reference, see here.
Examples
Basic usage:
fn main() { assert_eq!('c'.to_uppercase().next(), Some('C')); // Japanese does not have case, and so: assert_eq!('山'.to_uppercase().next(), Some('山')); }assert_eq!('c'.to_uppercase().next(), Some('C')); // Japanese does not have case, and so: assert_eq!('山'.to_uppercase().next(), Some('山'));
In Turkish, the equivalent of 'i' in Latin has five forms instead of two:
- 'Dotless': I / ı, sometimes written ï
- 'Dotted': İ / i
Note that the lowercase dotted 'i' is the same as the Latin. Therefore:
fn main() { let upper_i = 'i'.to_uppercase().next(); }let upper_i = 'i'.to_uppercase().next();
The value of upper_i here relies on the language of the text: if we're
in en-US, it should be Some('I'), but if we're in tr_TR, it should
be Some('İ'). to_uppercase() does not take this into account, and so:
let upper_i = 'i'.to_uppercase().next(); assert_eq!(Some('I'), upper_i);
holds across languages.
Trait Implementations
impl Display for char1.0.0
impl Debug for char1.0.0
impl Hash for char1.0.0
fn hash<H>(&self, state: &mut H) where H: Hasher
1.3.0fn hash_slice<H>(data: &[Self], state: &mut H) where H: Hasher
impl<'a> Pattern<'a> for char
Searches for chars that are equal to a given char