Module 0309: Logging in and session maintaining in PHP (server-side)

Tak Auyeung, Ph.D.

November 9, 2017

1 About this module

2 What is a session?

From the user’s perspective, a session is the continuity of interactions that are related. For example, in terms of shopping, the following actions are considered logically connected and continuity is expected (without having to constantly sign in again):

HTTP is a TCP (transmission control protocol), meaning that it is a connection-based protocol. This, in return, means an HTTP connection includes handshaking and guarantees content delivery. This may seem to imply that HTTP itself can easily maintain a session.

It is important that one HTTP connection is only used to request and reply with a simple file. A single HTTP connection is used to get an HTML file (generated or static), a PNG file, an animated GIF, and etc. This means to when we view a so-called page, it is the result of potentially more than 100 HTTP connections!

When a user clicks a button or a link, that triggers the loading another “page”, which means potentially more than 100 additional connections.

The worst part is that HTTP itself considers each and every connection independent from the previous one. HTTP has no concept of logging in, and it has no way to understand connections are related because they come from the same user. Also, HTTP itself has no authentication (log in) mechanism.

Great! Someone else has to pick up the tab.

3 HTTP cookies

A “cookie” is a small chunk of data that has an identifier and stored on client (browser) side. A cookie has the following attributes:

The way cookies are utilized is as follows:

Indeed, cookies provide the mechanism to support the illusion of sessions!

4 A PHP Session

PHP handles the low level mechanisms to transmit, parse and match cookies so that PHP code does not need to perform these rather tedious tasks. Note that, by default, PHP sessions are persistent across HTTP server restarts, but not OS restarts. This means that as long as the operating system supporting the server keeps running, sessions are maintained.

Individual PHP scripts can choose to use alternative ways to store session related information so that sessions can survive across OS reboots. However, this is an advanced topic and is well beyond the scope of CISW410.

4.1 Starting a session

The call to session_start must be placed at the top of a PHP script (file) before any HTML code is emitted. This is because it affects the header portion of an HTTP reply that we normally do not see.

session_start actually performs two tasks. If an underlying session is not present, it generates the necessary reply header to request the browser to store session related cookies. If an underlying session is present because the browser sends back a session related cookie from a previous reply, then PHP reloads all the session related content into the super global variable (associative array) $_SESSION.

4.2 Implicit end session

When a script terminates, all session related data is automatically stored. There is no need to call session_write_close.

4.3 Session variables

The following code stores a value into a session variable:

 
<?php 
  session_start(); 
  $_SESSION[somevar] = somevalue; 
?>

The lifespan of a session variable begins after a call to session_start and the first assignment with the session variable on the left hand side of the assignment operator. Throughout the execution of a script, the session variable can be used as if it is any other variable.

At the end of a script that begins with session_start(), a session variable “hibernates”. This means the value of the session variable is stored in a persistent way. Depending on how PHP is set up, this persistence is definitely across the restart of the HTTP server itself, but also potentially across reboots of the entire system.

When a script is started by an HTTP request that has a pass-back cookie of the same value as a previous session, the call to session_start reloads the corresponding session variables into the associative array. This is the cool part of session variables, it permits values to be preserved across invocations of scripts (not even necessarily the same script!) as long as a match with a pass-back session cookie value is successful.

5 Authentication

A PHP session via the use of session variables is a convenient way to use variables across invocations of scripts. It can be used, for instance, to enable the use of shopping carts. However, session variables do not automatically enable authentication.

Authentication is the establishment of the identity of a user, often done to make sure no one may assume the identity of someone else. This is often done by a user name (that is not a secret) combined with a password (that is a secret). More common now, however, is the use of 2-factor authentication where a code is sent back to the authenticating user via a means different from the client-server connection that is being authenticated. For example, a text message may be sent to the cell phone number of the authenticating person. That code needs to be entered as a part of the authentication process.

Also, more and more often now is the use of single sign in. For example, many web sites now use Google sign in authentication. This method offloads the actual sign in mechanism to Google, but the third-party web application gets the result in the form of a universal user ID.

To learn more, check out Google OAuth. Be warned, however, that OAuth makes full use of object-oriented programming, and as a result is not covered in this class.

Short of using Google OAuth or a similar method from other vendors, a web site can always maintain its own user authentication system. This approach has advantages and disadvantages.

The main advantage is being independent from any external resources.

There are many disadvantages. The first one is the liability of identity theft. When a web site maintains its own identity authentication system, then it is also automatically assuming the liability of maintaining privacy and security related to such identities. Identity theft is a big business, and it is fairly difficult for a small web site to have the resources to address all potential vulnerabilities.

The second disadvantage is from the perspectives of the users. Unless a user is willing to use the same password for all different web sites, it is a hassle to keep track of which identity of which web site uses which password.

The following sections assume that a web site chooses to maintain its own user identities.

5.1 User and password storing

To keep track of users, a database table should be used. There are a few minimum fields in this table:

Important: do not store a password in plain text in a database table! Why, some people may ask. If a database is compromised, it is compromised! What is the difference of storing password encrypted versus in plain text?

If passwords are stored in plain text, it means a hacker has a way to get into each account of the hacked site. However, given that many people use the same password on different sites, this also means a hacker can now try out the same username/email and password on other web sites.

Okay, so we don’t store passwords in plain text, then what do we store?

The convention in the past is to store the MD5 hash of the original password. An MD5 hash is a known algorithm that computes a “checksum” that has a fixed bit width (length) from any text. The idea is that given an MD5 hash, it has no quick inverse function to reverse to the string that generates it.

From the time that MD5 hash was invented, computers have gotten much faster, and storage has gotten much cheaper. As a result, there is now rainbow tables published online that looks up a matching string given an MD5 hash. This means any password stored in MD5 hash is not much better than in plain text.

SH2 is considered a stronger hash function where it is more difficult to find a string given a particular SH2 hash. However, it is still considered unhackable. In fact, all hash functions cannot be considered unhackable because the algorithm is known, and there is no secret key involved.

Whether a keyed encryption offer additional protection over a stronger hash function depends on how a site is hacked. If a site is hacked by SQL injection (a very common vector to hack a web site), then keyed encryption offers an additional layer of protection. On the other hand, if a site is hacked to the point that a hacker has access to the PHP files, then encryption offers no additional protection because the encryption key will be available to the hacker as well.

This is, unless, the encryption uses key pair. A key-pair encryption method for encoding password is more secure than hash functions or symmetric encryption. By “throwing away” the private key and only use for one-way encryption, key-pair encryption offers the best protection for storing passwords.

Unfortunately, MySQL does not support any key-pair encryption method in its library. PHP does have functions to perform key-pair encryption and deciphering.

To get an RSA key pair, you can use the openssl command line interface. The important point is that you can throw away the private key of the key pair because we do not want to decipher!

The following describes how to store a password:

The following describes how to check a password:

5.2 Post authentication tracking

After authentication success, a script may use a SESSION variable to track the authentication status. For example, a SESSION variable can be used to track the ID of the user. It is important to provide a method to sign off so the corresponding SESSION variable is unset.

6 How much to track using SESSION variables?

The question is how much is related to a particular browsing session, and how much needs to persist past that. In other words, when a user signs in using a different computer using a different browser on another IP address, how much data needs to be retained?

If the answer is “need not persist past browser session”, then SESSION variables can be an answer. On the other hand, if the answer is “need to persist to another browser session”, then SESSION variables should not be used.

Generally speaking, authentication tracking should be the only item tracked by SESSION variables. Anything else should be tracked explicitly using database tables specific to the web application. For example, items in a shopping cart should persist across browser sessions.

7 User permission and access control

After a user signs in, the processing of further requests should not assume the proper use of the user ID or permissions there of. The enforcement of what an authenticated user can and cannot do is strictly the responsibility of a script and not the PHP implementation of SESSION variables.

This means that for every operation, user access control must be considered. This typically involves two steps. When database tables are designed, care must be taken to associate user-related items to user IDs. When a database is accessed, the ID of the authenticated user must be used to match the foreign key of user ID.

There is no universal way to handle or structure user access control as much of that depends on the application. Nonetheless, it is crucial to evaluate each and every query to make sure an authenticated user does not have access to too much data.

Because SESSION variables, unlike GET and POST variables, are stored and maintained only on the server side, SESSION variables are considerably more secure and less likely to be hacked. This means that some values stored in hidden fields of a form should be stored as SESSION variables. This includes the ID of a user and any data that is needed to perform a chain of actions.