HTML

Resources

Current specification

WHATWG

w3schools

HTML on MDN

what is html?

Hyper Text Markup Language

Hypertext, because it is used to link multiple web pages

Markup language, because it is used to describe the structure of a web page

Other markup languages:

  • YAML
  • Markdown
  • LaTex

Influenced by SGML, and NOT XML

Latest release HTML5.2 (now a living standard)

HTML5 was driven independently by the WHATWG and later merged into the w3c

MIME Type: text/html

DOM is the data representation of the structure of HTML

distinction to xml

not all tags must be closed (e.g. link tag)

some tags will auto close depending on which elements are nested

this is called tag omission

HTML attributes and elements are not case sensitive (will be important later for custom elements)

allows attribute minimization/shortening <option selected /> is valid and does not need a value)

The HTML parser supports foreign content, for example SVG, MathML but these must be valid XML bacause the HTML parser switches to XML mode

further reading

https://wiki.whatwg.org/wiki/HTML_vs._XHTML

<!DOCTYPE>

Information for the html parser how to parse the document and for the browser to change the document mode

If the document should be parsed as XML/XHTML a <?xml version="1.0" encoding="UTF-8"?> annotation should be used prior to DOCTYPE

<!DOCTYPE html> is usually the only one which should be used, all other are legacy and should not be used anymore

Important for IE11 document mode selectionhttps://docs.microsoft.com/en-us/internet-explorer/ie11-deploy-guide/img-ie11-docmode-lg

General Structure


          

          

DOCTYPE declaration


          

html root node


          

html head section


          

charset to use, will be pre-fetched by the parser


          

meta information, can be used to describe your content for search engine optimization (SEO)


          

Title of the web page, will be displayed in the browsers tab


          

Reference to an external stylesheet, notice that the tag is not closed


          

embedded CSS stylesheet declaration


          

embedded Javascript declaration


          

HTML body, the visible content of the html document starts here


          

Comment node


          

HTML div tag node


          

text content node


            

HTML is hierarchically structured in a tree with the elements either as nodes or leaves all originating from the root html node

elements

Elements follow a semantic and should be used according to the semantic of the used tag

Anatomy of an element

<div id="test">text</div> <img id="test"/>
<div id="test">text</div> <img id="test"/>

Element with children

<div id="test">text</div> <img id="test"/>

Element without children

<div id="test">text</div> <img id="test"/>

start tag

<div id="test">text</div> <img id="test"/>

end tag

<div id="test">text</div> <img id="test"/>

tag name

<div id="test">text</div> <img id="test"/>

attribute

<div id="test">text</div> <img id="test"/>

attribute value

<div id="test">text</div> <img id="test"/>

content/children

keep in mind, not all tags support child nodes

Auto Closing and Tag omission

Some block level elements can't be nested and the html parser will auto close the tags before parsing the next block level element

this example...

...after reading the standard...

Tag omission
The start tag is required. The end tag may be omitted if the <p> element is immediately followed by an <address>, <article>, <aside>, <blockquote>, <div>, <dl>, <fieldset>, <footer>, <form>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <header>, <hr>, <menu>, <nav>, <ol>, <pre>, <section>, <table>, <ul> or another <p> element, or if there is no more content in the parent element and the parent element is not an <a> element.

...will be changed to this structure by the HTML parser

Some tags will be automatically closed depending on the elements that follow

If you're using HTML sanitizer for preventing XSS attacks you should check if the sanitizer works according to the specificationhttps://research.securitum.com/mutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass/

Some tags are self closing or the closing tag can be omitted

both br and img are self closing and the end-tag can be omitted

There are different content categories for tags

Structural elements

html, head, body

Metadata elements and Instructions

meta, link, title, etc...

Flow content elements

div, input, span, p, h1, etc...

Structuring content elements

main, nav, aside, etc...

Media/embedded content elements

img, object, video, script, style

Foreign content tags

svg, mathml

custom elements

all elements that have a - in the tag-name are treated as custom elements


            content
          

A more complete list of available content categories can be found here

Identifying elements in the DOM

<div id="test">text</div>
<div id="test">text</div>

elements can be identified via the id attribute

<label for="" /> and <a href="#id" /> use the id for reference

the id must be unique within an html document

<div id="test">text</div>

elements with an id attribute can be accessed by name in javascript on the global window object

there are more ways which will be explained when we get to CSS queries

Attributes

attributes are used to configure an element and can change its behavior and appearance
<div attr="string value"></div>

attributes can only contain string values

every built-in attribute maps to a specific element property in the DOM API

unfortunately the mapped property has not always the same name as the attribute. The for attribute maps to the htmlFor property

<input type="checkbox" checked disabled/>

the values can also be omitted if attributes map to a binary type

There are some attributes which will be handled in a special way

<div style="color: #0f0"></div>

style lets you inline css for a specific element

this should be used sparsely, always try to use classes and style via css selectors

data attributes

<div data-...="content">text</div>

data- attributes can be used to add custom attributes to elements without clashing with existing attributes

later with custom elements you can also define your own attributes

Layouts

all html elements have a predefined layout behavior associated

block elements

takes up all available horizontal space given by parent element

the next element will be put below the block scoped element

inline elements

       

inline elements are vertically aligned to the baseline

inline elements can not have a height or width set, unless they are defined as inline-block elements

all whitespace characters between inline elements are trimmed to a single white space character

onetwo

is rendered as onetwo


              one
              two
            

is also rendered as one two

table


          

a table consists of multiple elements


          

a thead element (can be omitted) which indicates the table header


          

a tbody element which contains the table body (can be omitted)


          

a tfoot element (can be omitted)


          

table rows


          

table header cells


          

table cells


          

table cells can also span rows and columns

There are more layout possibilities available with CSS, also there is a grid layout available with more layouting options for tables

Text Formatting

Text formatting can be done via special inline tags


Textisrenderedlike this

Textisrenderedlike this

ruby element

Text formatting especially for east asian languages

with ruby code: ( kan )( ji )

without ruby code: 漢 (kan) 字 (ji)

bdo element

Helpful if you mix languages written in different directions

hello hello

hello hello

Links And The Anchor Element

the reason why its called HyperTextMakupLanguage

Links are defined with the a tag

a stands for "anchor"

with links you can navigate to content within a web page or reference to external content

the content to which the link points is defined in the href attribute of the a element

href stands for "hyper reference"

navigating within document

to navigate within a document to an element it needs to be referenceable by the id attribute

define an unique id at an element

now you can link to this element by referencing to #<value of id>. The browser automatically scrolls to the referenced element once the link is clicked

link to external content

it is also possible to link to content either on the same host or somewhere else on the internet

You can link to content on another server, in this case the browser will open a new window or load the web page in the same window depending on the content of the target attribute

You can link to content which is relative to the current open web page location

in this case, if your current location is https://example.com/info/info.html the link will point to https://example.com/info/about.html

You can link to content which is relative to the current host

in this case, if your current location is https://example.com/info/info.html the link will point to https://example.com/about.html

Embedding Content

same as with links you can embed content hosted somewhere on the internet or relative to your current location

You can link to content on another server, in this case the browser will load the image and display it

the alt attribute should alway be set due to accessibility reasons, it describes the image

You can display images which are relative to the current open web page location

in this case, if your current location is https://example.com/info/info.html the link will point to https://example.com/info/image.png

You can display images which are relative to the current host

in this case, if your current location is https://example.com/info/info.html the link will point to https://example.com/assets/image.png

Responsive images

If the available viewport is larger than 800px use the large images, for highdpi monitors use the large@2xjpg

If the available viewport is larger than 400px use the medium images, for highdpi monitors use the medium@2xjpg

otherwise use the small images, for highdpi monitors use the small@2xjpg

as a fallback or if srcsets are not supported use the fallback.jpg image

other content

You can embed video content, also with srcsets

You can embed audio content, also with srcsets

You can embed svg


            
          

With iframes it is possible to embedd other content within your web site, the embedded content is isolated from your web page

There is a new proposal which will take iframes to the next level HTML Portals

Forms

form controls

there are plenty to choose from

HTML supports a lot of form handling out of the box without the need of javascript

form element

the form element acts as a wrapper around form controls, supports form validation and can execute a specified action

form controls can also be bound with the form attribute to a specific form, linked by the elements id

the autocomplete attribute tells the browser to fill out form elements with previous entered values

the form is able to perform the action specified in the attribute action

<input type="submit"> within a form can trigger validation and if everything is valid the associated action

<input type="reset"> within a form can reset the form values

with the type attribute of the button element you can also specify what the button triggers

you can set the URL which is called after a submit, method GET will send the form values as query parameters after the URL, POST will send send them in a form-data encoded POST body

radio button controls within a form are matched by name, if multiple controls with the same name exists the form value will be the value of the currently selected

labels can be associated with form controls via the form controls' id

form controls can be marked as required, thus preventing the form from submission if the controls have no valid value

some form controls can valdiate against a pattern or a specific value range

an empty string is treated as a matching pattern unless the required attribute is set

What is better *

Encodings

There are different methods to encode the data you are sending with a form

The encoding can be set with the enctype attribute

application/x-www-form-urlencoded is the default encoding

This is the default only encoding for GET requests


            

this form will be URL encoded and the values of the form fields will be sent as URL query parameter name=value pairs


            GET /submit?firstname=hagbard&lastname=celine
          

multipart/form-data should be used if you have more complex forms or values that can't be URL encoded like files


            

this encoding type will encode the form data in the request body


            POST /submit
            Content-Type: multipart/formdata; boundary=---boundary

            ---boundary
            Content-Disposition: form-data; name="username"

            hagbard
            ---boundary
            ...
          

text/plain is new for html5 but not much used

can be ignored for now

Payloads using the text/plain format are intended to be human readable. They are not reliably interpretable by computer, as the format is ambiguous (for example, there is no way to distinguish a literal newline in a value from the newline at the end of the value).[1]
you can create own form associated custom elements https://web.dev/more-capable-form-controls/

more on forms here

aria

Accessible Rich Internet Applications

Developers should prefer using the correct semantic HTML element over using ARIA. For instance, native elements have built-in keyboard accessibility, roles and states. However, if you choose to use ARIA, you are responsible for mimicking (the equivalent) browser behaviour in script.

- MDN

important if you implement your own widgets and want to keep you web application accessible

aria information are already set for native elements

for custom widgets and custom elements you have to asign the appropriate aria tags yourself

More information on this MDN Article