Hyper Text Markup Language
Hypertext, because it is used to link multiple web pages
Markup language, because it is used to describe the structure of a web page
Other markup languages:
Influenced by SGML, and NOT XML
Latest release HTML5.2 (now a living standard)
HTML5 was driven independently by the WHATWG and later merged into the w3c
MIME Type: text/html
DOM is the data representation of the structure of HTML
not all tags must be closed (e.g. link tag)
some tags will auto close depending on which elements are nested
this is called tag omission
HTML attributes and elements are not case sensitive (will be important later for custom elements)
allows attribute minimization/shortening <option selected />
is valid and
does not need a value)
The HTML parser supports foreign content, for example SVG, MathML but these must be valid XML bacause the HTML parser switches to XML mode
Information for the html parser how to parse the document and for the browser to change the document mode
If the document should be parsed as XML/XHTML a <?xml version="1.0" encoding="UTF-8"?>
annotation should be used prior to
DOCTYPE
<!DOCTYPE html>
is usually the only one which should be used, all other
are
legacy and should not be used anymore
Important for IE11 document mode selectionhttps://docs.microsoft.com/en-us/internet-explorer/ie11-deploy-guide/img-ie11-docmode-lg
DOCTYPE declaration
html root node
html head section
charset to use, will be pre-fetched by the parser
meta information, can be used to describe your content for search engine optimization (SEO)
Title of the web page, will be displayed in the browsers tab
Reference to an external stylesheet, notice that the tag is not closed
embedded CSS stylesheet declaration
embedded Javascript declaration
HTML body, the visible content of the html document starts here
Comment node
HTML div
tag node
text content node
HTML is hierarchically structured in a tree with the elements either as nodes or leaves all originating from the root html node
Elements follow a semantic and should be used according to the semantic of the used tag
<div id="test">text</div>
<img id="test"/>
<div id="test">text</div>
<img id="test"/>
Element with children
<div id="test">text</div>
<img id="test"/>
Element without children
<div id="test">text</div>
<img id="test"/>
start tag
<div id="test">text</div>
<img id="test"/>
end tag
<div id="test">text</div>
<img id="test"/>
tag name
<div id="test">text</div>
<img id="test"/>
attribute
<div id="test">text</div>
<img id="test"/>
attribute value
<div id="test">text</div>
<img id="test"/>
content/children
keep in mind, not all tags support child nodesSome block level elements can't be nested and the html parser will auto close the tags before parsing the next block level element
this example...
...after reading the standard...
Tag omission
The start tag is required. The end tag may be omitted if the <p> element is immediately followed by an <address>, <article>, <aside>, <blockquote>, <div>, <dl>, <fieldset>, <footer>, <form>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <header>, <hr>, <menu>, <nav>, <ol>, <pre>, <section>, <table>, <ul> or another <p> element, or if there is no more content in the parent element and the parent element is not an <a> element.
...will be changed to this structure by the HTML parser
Some tags will be automatically closed depending on the elements that follow
If you're using HTML sanitizer for preventing XSS attacks you should check if the sanitizer works according to the specificationhttps://research.securitum.com/mutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass/
Some tags are self closing or the closing tag can be omitted
both br
and img
are self closing and the end-tag
can be omitted
html, head, body
meta, link, title, etc...
div, input, span, p, h1, etc...
main, nav, aside, etc...
img, object, video, script, style
svg, mathml
all elements that have a -
in the tag-name are treated as custom elements
content
A more complete list of available content categories can be found here
<div id="test">text</div>
<div id="test">text</div>
elements can be identified via the id
attribute
<label for="" />
and <a href="#id" />
use the id for reference
the id must be unique within an html document
<div id="test">text</div>
elements with an id
attribute can be accessed by name in javascript on the
global window object
there are more ways which will be explained when we get to CSS queries
<div attr="string value"></div>
attributes can only contain string values
every built-in attribute maps to a specific element property in the DOM API
unfortunately the mapped property has not always the same name as the attribute. The for
attribute maps to the htmlFor
property
<input type="checkbox" checked disabled/>
the values can also be omitted if attributes map to a binary type
<div style="color: #0f0"></div>
style
lets you inline css for a specific element
this should be used sparsely, always try to use classes and style via css selectors
<div data-...="content">text</div>
data-
attributes can be used to add custom attributes to elements without
clashing with existing attributes
later with custom elements you can also define your own attributes
all html elements have a predefined layout behavior associated
takes up all available horizontal space given by parent element
the next element will be put below the block scoped element
inline elements are vertically aligned to the baseline
inline elements can not have a height or width set, unless they are defined as inline-block
elements
all whitespace characters between inline elements are trimmed to a single white space character
onetwo
is rendered as onetwo
one
two
is also rendered as one two
a table consists of multiple elements
a thead
element (can be omitted) which indicates the table header
a tbody
element which contains the table body (can be omitted)
a tfoot
element (can be omitted)
table rows
table header cells
table cells
table cells can also span rows and columns
There are more layout possibilities available with CSS, also there is a grid
layout available with more layouting options for tables
Text formatting can be done via special inline tags
Textisrenderedlike this
Textisrenderedlike this
ruby
element
Text formatting especially for east asian languages
with ruby code: 漢 字
without ruby code: 漢 (kan) 字 (ji)
bdo
elementHelpful if you mix languages written in different directions
hello hello
hello hello
the reason why its called HyperTextMakupLanguage
Links are defined with the a
tag
a stands for "anchor"
with links you can navigate to content within a web page or reference to external content
the content to which the link points is defined in the href
attribute of the
a
element
href stands for "hyper reference"
to navigate within a document to an element it needs to be referenceable by the id
attribute
define an unique id
at an element
now you can link to this element by referencing to #<value of id>
. The
browser automatically scrolls to the referenced element once the link is clicked
it is also possible to link to content either on the same host or somewhere else on the internet
You can link to content on another server, in this case the browser will open a new window or load the web
page in the same window depending on the content of the target
attribute
You can link to content which is relative to the current open web page location
in this case, if your current location is https://example.com/info/info.html
the link will point to https://example.com/info/about.html
You can link to content which is relative to the current host
in this case, if your current location is https://example.com/info/info.html
the link will point to https://example.com/about.html
same as with links you can embed content hosted somewhere on the internet or relative to your current location
You can link to content on another server, in this case the browser will load the image and display it
the alt
attribute should alway be set due to accessibility reasons, it
describes the image
You can display images which are relative to the current open web page location
in this case, if your current location is https://example.com/info/info.html
the link will point to https://example.com/info/image.png
You can display images which are relative to the current host
in this case, if your current location is https://example.com/info/info.html
the link will point to https://example.com/assets/image.png
If the available viewport is larger than 800px use the large images, for highdpi monitors use the large@2xjpg
If the available viewport is larger than 400px use the medium images, for highdpi monitors use the medium@2xjpg
otherwise use the small images, for highdpi monitors use the small@2xjpg
as a fallback or if srcsets
are not supported use the fallback.jpg image
You can embed video content, also with srcsets
You can embed audio content, also with srcsets
You can embed svg
With iframes it is possible to embedd other content within your web site, the embedded content is isolated from your web page
There is a new proposal which will take iframes to the next level HTML Portals
there are plenty to choose from
HTML supports a lot of form handling out of the box without the need of javascript
form
elementthe form
element acts as a wrapper around form controls, supports form
validation and can execute a specified action
form controls can also be bound with the form
attribute to a specific
form, linked by the elements id
the autocomplete
attribute tells the browser to fill out form elements
with previous entered values
the form
is able to perform the action specified in the attribute action
<input type="submit">
within a form
can
trigger validation and if everything is valid the associated action
<input type="reset">
within a form
can reset
the form values
with the type
attribute of the button
element you can also specify what the button triggers
you can set the URL which is called after a submit, method GET
will send the
form values as query parameters after the URL, POST
will send send them in a
form-data encoded POST body
radio button controls within a form are matched by name, if multiple controls with the same name exists the form value will be the value of the currently selected
labels can be associated with form controls via the form controls' id
form controls can be marked as required, thus preventing the form from submission if the controls have no valid value
some form controls can valdiate against a pattern or a specific value range
an empty string is treated as a matching pattern unless the required attribute is set
There are different methods to encode the data you are sending with a form
The encoding can be set with the enctype
attribute
application/x-www-form-urlencoded
is the default encoding
This is the default only encoding for GET requests
this form will be URL encoded and the values of the form fields will be sent as URL query parameter name=value pairs
GET /submit?firstname=hagbard&lastname=celine
multipart/form-data
should be used if you have more complex forms or values
that can't be URL encoded like files
this encoding type will encode the form data in the request body
POST /submit
Content-Type: multipart/formdata; boundary=---boundary
---boundary
Content-Disposition: form-data; name="username"
hagbard
---boundary
...
text/plain
is new for html5 but not much used
can be ignored for now
Payloads using the text/plain format are intended to be human readable. They are not reliably interpretable by computer, as the format is ambiguous (for example, there is no way to distinguish a literal newline in a value from the newline at the end of the value).[1]
more on forms here
Developers should prefer using the correct semantic HTML element over using ARIA. For instance, native elements have built-in keyboard accessibility, roles and states. However, if you choose to use ARIA, you are responsible for mimicking (the equivalent) browser behaviour in script.
- MDN
important if you implement your own widgets and want to keep you web application accessible
aria information are already set for native elements
for custom widgets and custom elements you have to asign the appropriate aria tags yourself
More information on this MDN Article