Strings in Rust are implemented as a collection of bytes, plus some methods to provide useful functionality when those bytes are interpreted as text. The String
type in Rust is a growable, mutable, owned, UTF-8 encoded string type.
New Rustaceans commonly get stuck on strings for a combination of three reasons:
- Rust's propensity for exposing possible errors
- Strings being a more complicated data structure than many programmers give them credit for
- UTF-8
These factors combine in a way that can seem difficult when you're coming from other programming languages.
One of the ways in which String
is different from other collections is that indexing into a String
is complicated by the differences between how people and computers interpret String data. For example, the following code will not compile:
let hello = "Здравствуйте";
let answer = &hello[0];
This is because the first character of the string "Здравствуйте" is the Cyrillic letter Ze, which takes two bytes to encode in UTF-8. Therefore, the index 0 does not correspond to a valid Unicode scalar value.
To index into a String
safely, you can use the chars()
method to iterate over the Unicode scalar values in the string. For example, the following code will print the first character of the string "Здравствуйте":
let hello = "Здравствуйте";
for c in hello.chars() {
println!("{}", c);
break;
}
This code will print the following output:
З
You can also use the bytes()
method to iterate over the bytes in a String
. This can be useful for tasks such as reading and writing binary data.