HTML Common Microsyntaxes Dates and Times

0

In the realm of web development, the precise representation of dates and times is a fundamental necessity. It impacts a wide range of applications, from scheduling and event management to data storage and retrieval. The HTML standard, as defined by the Web Hypertext Application Technology Working Group (WhatWG), provides a detailed set of microsyntaxes for handling dates and times. In this article, we delve into the common infrastructure for dealing with dates and times, exploring the rules and guidelines outlined in the WhatWG specification.

Gregorian Calendar and Leap Years

The WhatWG specification adopts the Gregorian calendar, which is the modern calendar extrapolated backward to year 1. This proleptic-Gregorian calendar system is used as the wire format for date and time representations in HTML. It is important to note that this choice is somewhat arbitrary and based on cultural biases.

The specification accounts for leap years in the Gregorian calendar, which include years divisible by 400 or years divisible by 4 but not by 100. This ensures that date calculations are in line with the standards we use today.

Parsing Date and Time

The microsyntaxes defined in the HTML standard offer a detailed approach to parsing and representing dates and times, exceeding the level of detail provided by the ISO8601 format. Implementers are encouraged to carefully evaluate date parsing libraries, as ISO8601 libraries may not parse dates and times in exactly the same manner.

Months

A month consists of a specific proleptic-Gregorian date with no time-zone information, containing only the year and month. To be a valid month string, it must adhere to the following format:

  • Four or more ASCII digits representing the year (year > 0).
  • A hyphen-minus character ('-').
  • Two ASCII digits representing the month, where 1 ≤ month ≤ 12.

Parsing a month component involves collecting and interpreting ASCII digits and ensuring they meet the specified criteria. A valid month string is represented as year and month.

Dates

A date includes the year, month, and day in the proleptic-Gregorian calendar. It is a valid date string if it follows this format:

  • A valid month string representing the year and month.
  • A hyphen-minus character ('-').
  • Two ASCII digits representing the day, where 1 ≤ day ≤ maxday (calculated based on the month and year).

Parsing a date component involves first parsing the month and then verifying the day against the month's maximum days. A valid date string represents a full date.

Yearless Dates

Yearless dates represent a month and day within that month without an associated year. These strings follow this format:

  • Optionally, two hyphen-minus characters ('-').
  • Two ASCII digits representing the month (1 ≤ month ≤ 12).
  • A hyphen-minus character ('-').
  • Two ASCII digits representing the day, where 1 ≤ day ≤ maxday (maxday depends on an arbitrary leap year).

The rules for parsing yearless date components involve checking the format of the string and ensuring that the month and day fall within the specified ranges.

Times

A time represents the hour, minute, and second of a specific time, with an optional fraction of a second. To be a valid time string, it must have the following components:

  • Two ASCII digits representing the hour (0 ≤ hour ≤ 23).
  • A colon character (':').
  • Two ASCII digits representing the minute (0 ≤ minute ≤ 59).
  • Optionally, a colon character (':') if the second is nonzero.
  • Two ASCII digits representing the integer part of the second (0 ≤ s ≤ 59).
  • Optionally, a full stop character ('.') and one to three ASCII digits representing the fractional part of the second.

Leap seconds are not represented in this format.

Local Dates and Times

A local date and time combine a proleptic-Gregorian date with a time, all without a time zone. A valid local date and time string consists of these components in order:

  • A valid date string representing the date.
  • Either a Latin capital letter 'T' or a space character.
  • A valid time string representing the time.

The representation can be normalized by omitting the seconds component if it's zero, resulting in a shorter string for the same time.

Time Zones

Time zones play a critical role in representing time accurately, especially in a global context. The HTML Standard defines the format for valid time-zone offset strings. A time-zone offset is a signed number of hours and minutes, and it can be represented in two ways:

  • Using the character 'Z' (U+005A LATIN CAPITAL LETTER Z) to signify UTC.
  • Using a combination of a plus sign ('+') or a hyphen-minus ('-'), followed by two digits for hours and an optional colon (':') before two digits for minutes.

This format accommodates time-zone offsets ranging from -23:59 to +23:59. However, it's important to note that actual time zones typically fall within the range of -12:00 to +14:00, with minutes usually being 00, 30, or 45. The HTML Standard acknowledges the unpredictability of time zones due to political decisions.

Parsing a time-zone offset string involves the following steps:

  • Parse the time-zone offset component to obtain hours and minutes.
  • Check if there are no extra characters in the input string.

The rules are strict to ensure accurate parsing and handling of time zones, which are crucial for determining the correct time for various locations worldwide.

Global Dates and Times

A global date and time representation consists of a specific proleptic-Gregorian date and a time with a time-zone offset. The HTML Standard defines the format for valid global date and time strings, ensuring that date and time components are correctly structured.

A valid global date and time string must contain the following components in order:

  • A valid date string representing the date.
  • Either a 'T' character or a single space character.
  • A valid time string representing the time.
  • A valid time-zone offset string representing the time-zone offset.

The HTML Standard emphasizes the importance of using the proleptic-Gregorian calendar to interpret dates before the formation of Coordinated Universal Time (UTC) and time zones. Historical times that predate the standardization of time zones require careful consideration and conversion to ensure accuracy.

The rules for parsing a global date and time string are as follows:

  • Parse the date component to obtain year, month, and day.
  • Verify that the next character is either 'T' or a space.
  • Parse the time component to obtain hour, minute, and second.
  • Parse the time-zone offset component to obtain hours and minutes.
  • Ensure that there are no extra characters in the input string.

The guidelines for handling global dates and times are crucial for accurate representation and interpretation of date and time information on the web.

Weeks

The HTML Standard introduces a representation for weeks, consisting of a week-year number and a week number within that year. This representation is useful for various applications, including scheduling and event planning. It defines the format for valid week strings:

  • Four or more ASCII digits representing the year.
  • A hyphen-minus character ('-').
  • A 'W' character.
  • Two ASCII digits representing the week number within the year.

The rules for parsing a week string are as follows:

  • Parse the year from the input.
  • Verify the presence of the hyphen-minus and 'W' characters.
  • Parse the week number and ensure it is within a valid range for the given year.

The week representation allows for precise scheduling and tracking of events based on a week-based calendar system, closely aligned with ISO 8601 standards.

Durations

Durations in the HTML Standard consist of a specific number of seconds, ensuring a precise and consistent representation of time intervals. The format for valid duration strings is defined in two ways:

  • Using the 'P' character followed by components for days, hours, minutes, and seconds.
  • Utilizing duration time components, each with a specific time unit and scale, allowing flexibility in expressing durations.

The rules for parsing a duration string are comprehensive, ensuring that durations are accurately represented in terms of seconds. The duration format is based on ISO 8601, aligning it with international standards for time and date representations.

Vaguer Moments in Time

Vaguer moments in time encompass a broader range of date and time representations, including both dates and times. The HTML Standard defines the format for valid date or time strings, allowing for flexibility in representing moments in time.

The rules for parsing date or time strings involve the following steps:

  • Parsing a date component, which may include year, month, and day.
  • Handling the presence of a 'T' character or a space, which distinguishes between dates and times.
  • Parsing a time component, including hour, minute, and second.
  • Parsing a time-zone offset component if it's present.

The HTML Standard's approach to vaguer moments in time provides flexibility while maintaining consistency and accuracy in representing dates and times in web content.

Tags

Post a Comment

0Comments
Post a Comment (0)