Rust Progamming: Storing UTF-8 Encoded Text with Strings

Author - FTT World

September 04, 2023

Strings in Rust are implemented as a collection of bytes, plus some methods to provide useful functionality when those bytes are interpreted as text. The String type in Rust is a growable, mutable, owned, UTF-8 encoded string type.

New Rustaceans commonly get stuck on strings for a combination of three reasons:

Rust's propensity for exposing possible errors

Strings being a more complicated data structure than many programmers give them credit for

UTF-8

These factors combine in a way that can seem difficult when you're coming from other programming languages.

One of the ways in which String is different from other collections is that indexing into a String is complicated by the differences between how people and computers interpret String data. For example, the following code will not compile:

let hello = "Здравствуйте";
let answer = &hello[0];

This is because the first character of the string "Здравствуйте" is the Cyrillic letter Ze, which takes two bytes to encode in UTF-8. Therefore, the index 0 does not correspond to a valid Unicode scalar value.

To index into a String safely, you can use the chars() method to iterate over the Unicode scalar values in the string. For example, the following code will print the first character of the string "Здравствуйте":

let hello = "Здравствуйте";

for c in hello.chars() {
    println!("{}", c);
    break;
}

This code will print the following output:

З

You can also use the bytes() method to iterate over the bytes in a String. This can be useful for tasks such as reading and writing binary data.

Rust Progamming: Storing UTF-8 Encoded Text with Strings

Post a Comment

C Programming: An Overview

Learn with us

Contact form

Rust Progamming: Storing UTF-8 Encoded Text with Strings

You may like these posts

Post a Comment

Contact form