String Literals in C Programming

0
String literals in C programming are sequences of characters enclosed in double quotes. They can include zero or more characters and come in different types, including character string literals, UTF-8 string literals, and wide string literals. 

The syntax for a string literal in C is as follows:
C
string-literal:
encoding-prefixopt " s-char-sequenceopt "
encoding-prefix:
u8
u
U
L
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except
the double-quote ", backslash \, or new-line character
escape-sequence
A sequence of adjacent string literal tokens should not include both a wide string literal and a UTF–8 string literal.

A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz". A UTF–8 string literal is the same, except prefixed by u8. A wide string literal is the same, except prefixed by the letter L, u, or U.

The same considerations apply to each element of the sequence in a string literal as if it were in an integer character constant (for a character or UTF–8 string literal) or a wide character constant (for a wide string literal), except that the single-quote’ is representable either by itself or by the escape sequence \’, but the double-quote " shall be represented by the escape sequence \".

In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence. If any of the tokens has an encoding prefix, the resulting multibyte character sequence is treated as having the same prefix; otherwise, it is treated as a character string literal. Whether differently-prefixed wide string literal tokens can be concatenated and, if so, the treatment of the resulting multibyte character sequence are implementation-defined.

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence. For UTF–8 string literals, the array elements have type char, and are initialized with the characters of the multibyte character sequence, as encoded in UTF–8. For wide string literals prefixed by the letter L, the array elements have type wchar_t and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by the mbstowcs function with an implementation-defined current locale. For wide string literals prefixed by the letter u or U, the array elements have type char16_t or char32_t, respectively, and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by successive calls to the mbrtoc16, or mbrtoc32 function as appropriate for its type, with an implementation-defined current locale. The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set is implementation-defined.

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

Examples

  • This pair of adjacent character string literals "\x12" "3" produces a single character string literal containing the two characters whose values are’\x12’ and’3’ , because escape sequences are converted into single members of the execution character set just prior to adjacent string literal concatenation.
  • Each of the sequences of adjacent string literal tokens "a" "b" L"c", "a" L"b" "c", L"a" "b" L"c", L"a" L"b" L"c" is equivalent to the string literal L"abc". Likewise, each of the sequences "a" "b" u"c", "a" u"b" "c", u"a" "b" u"c", u"a" u"b" u"c" is equivalent to u"abc".
Tags

Post a Comment

0Comments
Post a Comment (0)