HTML Common microsyntaxes

0

Common microsyntaxes in HTML, as defined by the HTML Standard of the WHATWG (Web Hypertext Application Technology Working Group), are essential for understanding how different data types, such as dates, numbers, and attributes, are processed and parsed within the language. These microsyntaxes help ensure consistency in handling data across various web applications and websites. In this article, we'll dive into some common microsyntaxes specified in the HTML Standard, with a particular focus on boolean attributes, keywords, and enumerated attributes.

Common Parser Idioms

To comprehend these microsyntaxes, it's crucial to understand some common parser idioms. Some micro-parsers have a standard pattern. They often have an 'input' variable, which holds the string being parsed, and a 'position' variable, which points at the next character to parse in the 'input' string. These variables are central to the parsing process, ensuring that data is accurately extracted and processed.

Boolean Attributes

Boolean attributes are attributes in HTML that represent a binary state – true or false. These attributes are associated with elements, and their presence or absence determines their value. If a boolean attribute is present on an element, it represents the 'true' value. If it is absent, it represents the 'false' value.

The value of a boolean attribute must either be an empty string or a value that matches the attribute's canonical name in an ASCII case-insensitive manner, with no leading or trailing whitespace. The values "true" and "false" are not allowed for boolean attributes. To represent 'false,' the attribute must be entirely omitted.

For example, a checkbox can be checked and disabled using boolean attributes:

HTML
<label><input type="checkbox" checked name="cheese" disabled> Cheese</label>

This could also be written equivalently as:

Here is an HTML snippet for a disabled checkbox that is checked:

HTML
<label><input type="checkbox" checked="checked" name="cheese" disabled="disabled"> Cheese</label>

Mixing attribute styles is also valid:

HTML
<label><input type="checkbox" checked name="cheese" disabled=""> Cheese</label>

These examples demonstrate the use of boolean attributes to control the checked and disabled states of a checkbox.

Keywords and Enumerated Attributes

Some HTML attributes are enumerated attributes, which means they can take on a finite set of states. The state of an enumerated attribute is determined by a combination of the attribute's value, a set of keyword-to-state mappings defined in the attribute's specification, and two special states: the invalid value default and the missing value default. To determine the state of an enumerated attribute:

  • If the attribute is not specified:
    • If the attribute has a missing value default state defined, return that missing value default state.
    • Otherwise, return no state.
  • If the attribute's value matches an ASCII case-insensitive keyword defined for the attribute, return the state represented by that keyword.
  • If the attribute has an invalid value default state defined, return that invalid value default state.
  • Return no state.

For an enumerated attribute to conform to authoring purposes, the attribute's value must match an ASCII case-insensitive keyword defined for that attribute, without any leading or trailing whitespace.

In cases where multiple keywords map to the same state, and there is a need to determine a canonical keyword for a state, the specification for the attribute explicitly defines the canonical keyword for that state.

These rules ensure that enumerated attributes are processed consistently, and their values are mapped to the correct states.

Colors 

In HTML, colors are commonly represented as three 8-bit numbers ranging from 0 to 255, indicating the red, green, and blue components in the 'srgb' color space. A simple color in HTML is a string exactly seven characters long, starting with a '#' character, followed by six ASCII hex digits. The first two digits represent the red component, the middle two the green, and the last two the blue. A lowercase simple color is a valid simple color that does not contain uppercase letters (A-F).

Parsing simple color values involves converting the hex digits into red, green, and blue components, and serializing them into a lowercase simple color.

Space-Separated Tokens

Space-separated tokens are strings consisting of one or more words, where words are sequences of characters separated by ASCII whitespace. These strings may have leading or trailing whitespace. They can be unordered sets, where tokens are unique but their order is not important, or ordered sets, where both uniqueness and order are significant. Some space-separated token sets have predefined allowed values; otherwise, any value is considered conforming.

Comma-Separated Tokens 

Comma-separated tokens are strings containing tokens separated by ',' characters. Tokens are sequences of characters without leading or trailing whitespace and excluding ',' characters. Leading and trailing whitespace around tokens is not considered part of the token. Like space-separated tokens, comma-separated token sets may have restrictions on valid tokens, or they may accept any value.

References

In HTML, references to elements within a document are specified using hash-name references. A valid hash-name reference consists of a '#' character followed by a string matching the 'name' attribute of an element in the same tree. Parsing a hash-name reference involves checking for the '#' character and searching for an element with a matching 'name' attribute. Importantly, id attributes are considered but are not the primary determinants of validity.

Media Queries

Media queries are used to define rules that apply to a web document based on the user's environment, such as screen size, resolution, or preferences. A valid media query list follows the specifications outlined in the Media Queries standard. The environment of the user is defined as the empty string, a string containing only ASCII whitespace, or a media query list that matches the user's environment according to Media Queries.

Unique Internal Values

Unique internal values in HTML are used for internal bookkeeping and data representation. These values are serializable, meaning they can be converted into a format that can be saved or transmitted. They are also comparable by value, meaning they can be compared for equality or ordering. However, they are never exposed to JavaScript or other scripting languages, making them purely for internal use within the browser or HTML processing engine.

Tags

Post a Comment

0Comments
Post a Comment (0)