HTML, or HyperText Markup Language, is the backbone of the World Wide Web. It is a fundamental markup language used to create and structure web content.
The Basics of HTML
A basic HTML document consists of various elements and text. It is primarily a set of tags that define the structure and content of a web page. Here's a simple example of an HTML document:
HTML
<!DOCTYPE html>
<html lang="en">
<head>
<title>Sample page</title>
</head>
<body>
<h1>Sample page</h1>
<p>This is a <a href="demo.html">simple</a> sample.</p>
</body>
</html>
In this example:
- `<!DOCTYPE html>` defines the document type as HTML5.
- `<html>` is the root element that encloses the entire web page.
- `<head>` contains metadata and information about the page.
- `<title>` sets the title of the page displayed in the browser tab.
- `<body>` contains the visible content of the page.
- `<h1>` is a heading element, and `<p>` is a paragraph element.
- `<a>` is an anchor element used to create hyperlinks.
- `<!-- this is a comment -->` is an HTML comment.
HTML documents are structured as a tree of elements, where each element is marked with a start tag (e.g., `<body>`) and an end tag (e.g., `</body>`). Some tags may not require an explicit end tag and can be self-closing.
Nested Tags
HTML tags must be correctly nested without overlapping. For example:
HTML
<p>This is <em>very <strong>wrong</strong>!</em></p>
<p>This <em>is <strong>correct</strong>.</em></p>
In the first paragraph, the tags are incorrectly nested, while in the second paragraph, they are properly nested.
Attributes in HTML
Elements can have attributes that control their behavior and appearance. Attributes are placed inside the start tag and consist of a name and a value, separated by an equals sign (=). Attribute values can be enclosed in single or double quotes. Here are some examples:
HTML
<input name="address" disabled>
<input name="address" disabled="">
<input name="address" maxlength=200>
<input name="address" maxlength='200'>
<input name="address" maxlength="200">
HTML user agents, such as web browsers, parse these attributes to determine how elements should be displayed and function.
The Document Object Model (DOM)
HTML documents are parsed by web browsers and converted into a Document Object Model (DOM) tree. The DOM is an in-memory representation of the web page, consisting of various nodes, including DocumentType, Element, Text, Comment, and ProcessingInstruction nodes.
For example, the following HTML snippet:
HTML
<!DOCTYPE html>
<html lang="en">
<head>
<title>Sample page</title>
</head>
<body>
<h1>Sample page</h1>
<p>This is a <a href="demo.html">simple</a> sample.</p>
</body>
</html>
Would be converted into the following DOM tree:
- DocumentType: html
- html lang="en"
- head
- #text: ⏎␣␣
- title
- #text: Sample page
- #text: ⏎␣
- body
- #text: ⏎␣␣
- h1
- #text: Sample page
- #text: ⏎␣␣
- p
- #text: This is a
- a href="demo.html"
- #text: simple
- #text: sample.
- #text: ⏎␣⏎
- #comment: this is a comment
- #text: ⏎⏎
The HTML document element is represented by the `<html>` element in the DOM tree. The DOM can be manipulated using scripts, typically in JavaScript, which allows you to change the content and behavior of the web page.
Styling HTML with CSS
HTML documents represent the structure of web content, but they can be styled using Cascading Style Sheets (CSS). CSS allows authors to define the appearance, layout, and design of HTML elements. For example:
HTML
<!DOCTYPE html>
<html lang="en">
<head>
<title>Sample styled page</title>
<style>
body { background: navy; color: yellow; }
</style>
</head>
<body>
<h1>Sample styled page</h1>
<p>This page is just a demo.</p>
</body>
</html>
In this example, CSS is used to set the background color to navy and the text color to yellow.
Writing Secure Applications with HTML
When using HTML to create interactive websites, it's essential to consider security to prevent vulnerabilities that could compromise the integrity of the site or its users. Here are some common security concerns:
- Not Validating User Input: Failing to validate and properly escape user-generated content can lead to security vulnerabilities, such as cross-site scripting (XSS) or SQL injection attacks. Always validate and escape data before displaying it on your web page.
- Cross-Site Scripting (XSS): XSS attacks occur when malicious scripts are injected into web pages and executed by unsuspecting users. To prevent XSS, validate and sanitize user input, and avoid allowing users to embed scripts in your content.
- SQL Injection: SQL injection attacks occur when an attacker manipulates database queries through user input. Ensure that you use prepared statements or parameterized queries to prevent SQL injection vulnerabilities.
- Cross-Site Request Forgery (CSRF): CSRF attacks involve tricking users into making unintended actions on websites. Implement security measures like user-specific tokens or check Origin headers on requests to prevent CSRF attacks.
- Clickjacking: Clickjacking is a technique where a user is tricked into interacting with a web page unintentionally. To prevent clickjacking, ensure your site is not embedded in iframes and only enable your interface when not in a frame.
- Common Pitfalls with Scripting in HTML: Scripts in HTML have "run-to-completion" semantics, meaning they run without interruption. However, parsing of HTML files occurs incrementally, so authors must avoid adding event handlers after events have already fired.
To ensure proper event handling, consider using event handler content attributes or adding event handlers in the same script when creating elements dynamically.
- Using Validators and Conformance Checkers: To catch common mistakes in your HTML code, use conformance checkers (validators). These tools can help you identify and fix errors in your HTML documents. The WHATWG maintains a list of such tools that you can use to validate your HTML code.