communication protocols

Interesting Resources

Wizard Zines by Julia Evans

RFC2616 - HTTP/1.1

RFC6455 - WebSockets

HTTP

The Hypertext transfer protocol was introduced along with the world wide web by Tim Berners-Lee in 1989

HTTP is a request-response protocol in the client-server computing model, the browser acts as a client and connects to a webserver by hostname or ip address

HTTP resources are identified and located on the network by Uniform Resource Locators (URLs), using the Uniform Resource Identifiers (URI's) schemes http and https

HTTP 1.1

Get Request

HTTP Method, path and protocol version

Header: the host where the above resource is requested

Response

Response status and status message

MIME type of the response data

Length of the response in bytes

Meta information about the content

Content separator (single line feed \n)

Content

Request with Content

HTTP Method, path and protocol version

Header: the host where the above resource is requested

Header: The mime type of the content

Header: Length of the content

Seperator for content (one line feed \n)

Content

Request Methods

HTTP has several "Methods" or "verbs" to express the intent of the request

GET
Retrieve data of the specified resource
HEAD
Same as get but send only the headers without the data
POST
Submit data to a resource, often causing a state change on the server
PUT
Update a resource with new content
DELETE
Delete a specific resource on the server
OPTIONS
Describe the communication options for the resource (used as a preflight request for CORS)

Status Codes

200

The status code 200 indicates a successful request-response

3XX

The 3XX status code range indicates the requested resource was moved or not changed

information about the new location are encoded within the response headers and are often handled by the browser

  • 301 Moved Permanently
  • 302 Found, temporary redirect
  • 304 Not Modified

4XX

The 4XX status code range indicates something with the request of the client was wrong

  • 400 Bad Request
  • 403 Forbidden
  • 404 Not Found
  • 418 I'm a teapot

5XX

The 5XX status code range indicates something with the server was wrong

  • 500 Internal Server Error
  • 503 Service Unavailable
  • 504 Gateway Timeout

HTTP 2/SPDY

Same methods and headers as http but a multiplexed connection

  • Multiple requests/responses with a single tcp connection
  • binary protocol for smaller transmission size
  • The server can push additional content without a specific request, for example sending css along the html site

The content push will be disabled in future because of performance implications

HTTPS

https encrypts the http traffic with TLS to prevent man in the middle attacks and should be mandatory

HTTP 3/QUIC

Next iteration of the HTTP Protocol, based on UDP and includes TLS no extra TLS handshake for encrypted connections necessary

Server Sent Events

Server send events are in contrast to web sockets a one way connection to send messages from a server to client, they consist of a persistent TCP connection with a server sending an text/event-stream response

More on server sent events

Server sent events are interesting if your server does not support web sockets but you want to use a push base protocol

Server Sent events can also be simulated with client side long polling, long polling can be implemented by keeping a request open until new data is available and as soon as the response is processed open the next request

for this to work the server must send empty packages to not trigger a timeout

Cookies

With the Cookie it is possible to share state between HTTP requests. A Cookie can be set by the client or the server and will be transferred with every request and response

CSP

Content Security Policy

The Content-Security-Policy header gives you control which scripts are allowed to be executed or which media is allowed to be loaded

What is an origin

https://subdomain.example.com:80/index.html

an origin is specified by the protocol, the host domain and the port, all responses from another host, subdomain or port are not from the same origin

  • Default all policies to 'self' also 'same origin'
  • Allow images to be loaded from everywhere
  • Restrict media to only be loaded from media1.com and media2.com
  • Executable scripts are only loaded from userscripts.example.com

A sample from twitter.com

Mapping of content security policy mappings

CORS is a technique in browsers to allow browsers to load content from other origins

Important this is not a security feature, other http client implementations are still able to get content

with the Access-Control-Allow-Origin header your server can indicate from from which origins a fetch is allowed, this is important if your page ist hosted on a different host than your API endpoints

static.example.com
The static content for your application
api.example.com
The API endpoints for your applications

A good explanation for CORS, CSRF, etc can be found in this HTTP 203 Episode

Web Sockets

Web sockets open a persistent bidirectional communication channel to pass messages between a client and a server

Web sockets are opened by a http request with the URI-protocol ws/wss

Request to open a web socket connection

Request path

Host where the request was send

Upgrade header indicates the client wants to upgrade the connection to a web socket connection

Web socket specific information

Server Response

Server upgraded the connection and is switching protocols

Upgrade information, switching to websocket protocol

Websocket specific information and confirmation for the key send in the request

unlike HTTP requests, web sockets connections are not restricted to the same origin policy. You have to manually verify the origin of the request

In browsers you're not able to add additional headers for example for authentication so you have to rely on cookies or a custom token based protocol

The web socket protocol supports two different modes for data

text
UTF-8 encoded strings
binary
binary data

Because web sockets is a push base communication protocol without the possibility to add backpressure your server can overwhelm the client if the client is to slow to process the pushed data

WebsocketStream API could change that in the future

Securing the connection

All discussed protocols with the exception of HTTP/3 do transfer their content unencrypted are are prone to man in the middle attacks

Because of the layer architecture of internet communication protocols all web protocols can be transferred encrypted with TLS if the server supports it by changing the protocol in the URI

All server by now should support TLS1.2 other encryption schemes should not be used anymore

http (unencrypted) <-> https (TLS encrypted)

ws (unencrypted) <-> wss (TLS encrypted)

encrypted communication must be supported by both server and client and can be combined with authentication

Authentication

There are different techniques to authenticate against a web service or maintain a session each with different tradeoffs

Cookies

Storing authentication information in the Cookie header

+

  • Works out of the box in every browser
  • Works with web sockets
  • Works with image tags
  • Works with OAUTH

-

  • Can be deactivated by the client
  • Can be complicated with other HTTP clients

HTTP Authentication

Set with the Authorization header

+

  • Works out of the box in every browser
  • Works with web sockets
  • Different Authorization Schemes (basic, digest, etc...)

-

  • User and password transferred in plain text
  • Authentication schemes other than basic are difficult to use wiht other http clients

Token Based

Set with a custom header in the request

+

  • Good for SPAs or other clients

-

  • Works only for SPAs
  • Does not work with web sockets

Client Side Certificates

Authentication can be done during the TLS handshake by providing client side certificates

+

  • Really good security

-

  • Complicated to distribute the certificates

REST, GraphQL

Besides serving static content HTTP is often used to expose APIs for services, there are two main pattern to serve APIs

REST

Representational state transfer

REST defines are pattern to name your URL paths and makes semantically correct use of HTTP request methods

Get a list with all todos

Get a specific todo

Add a new todo (add operation)

Update a specific todo (update operation)

Delete a todo

Parameters vs Body content

You can transfer data via http in two different ways, either as http query parameters or in the request body as content

Important: GET, OPTIONS and DELETE do usually not support content bodies

Binary data can only be transferred in the request body*

*except when encoding the data with base64

To design "good" REST APIs make use of the appropriate HTTP verbs and encode the information you want to transport in accordingly

GET /profile/234234234?format=json

Example for a GET request

GET /profile/234234234?format=json

Verb, in this case get a specific resource

GET /profile/234234234?format=json

API path

GET /profile/234234234?format=json

explicit resource id

GET /profile/234234234?format=json

additional (optional) query parameters

PUT /avatar

Example for a PUT request

PUT /avatar

Now you have several possibilities to encode the appropriate user

The standalone path, which is not a sub path for user also indicates that the avatar service has its own lifecycle and is possibly a standalone microservice

PUT /avatar Cookie: ...user=234234234...

Since we're usually logged in, we have a context where we can extract the specific user

PUT /avatar?user=234234234 Cookie: ...user=admin...

Additionally we can allow the user override with a query parameter

How you encode the paths in your API is actually up to you, there are some best practices which should be followed

Keep your REST API consistent and use the same principles for you complete API

Use verbs and paths accordingly, also try to encode primary ids in the path

follow the idempotent rule for specific verbs

A REST API defines a fixed set of API paths with a defined structure of the objects you can request. To mitigate this issue GraphQL was developed by facebook and can also be used in conjunction with REST

GraphQL

GraphQL works on REST and encodes the intent in a custom query language in the request body, it like SQL for REST

query a specific todo

query a specific todo and return only the due date

add a new todo and return the id

gRPC

Not used in the web, but used heavily for service to service communication

gRPC is a service to service protocol that uses HTTP/2 for transport and ProtoBuf for payload encoding

WebRTC

WebRTC enables fast peer to peer connection between multiple clients

WebRTC can be used to transfer video, audio or arbitrary data between multiple clients

WebRTC can be used to transfer video, audio or arbitrary data between multiple clients with the help of a discovery server