Home » What technical details should a programmer of a web application consider before making the site public?

What technical details should a programmer of a web application consider before making the site public?

Solutons:


The idea here is that most of us should already know most of what is on this list. But there just might be one or two items you haven’t really looked into before, don’t fully understand, or maybe never even heard of.

Interface and User Experience

  • Be aware that browsers implement standards inconsistently and make sure your site works reasonably well across all major browsers. At a minimum test against a recent Gecko engine (Firefox), a WebKit engine (Safari and some mobile browsers), Chrome, your supported IE browsers (take advantage of the Application Compatibility VPC Images), and Opera. Also consider how browsers render your site in different operating systems.
  • Consider how people might use the site other than from the major browsers: cell phones, screen readers and search engines, for example. — Some accessibility info: WAI and Section508, Mobile development: MobiForge.
  • Staging: How to deploy updates without affecting your users. Have one or more test or staging environments available to implement changes to architecture, code or sweeping content and ensure that they can be deployed in a controlled way without breaking anything. Have an automated way of then deploying approved changes to the live site. This is most effectively implemented in conjunction with the use of a version control system (git, Subversion, etc.) and an automated build mechanism (Ant, NAnt, etc.).
  • Don’t display unfriendly errors directly to the user.
  • Don’t put users’ email addresses in plain text as they will get spammed to death.
  • Add the attribute rel="nofollow" to user-generated links to avoid spam.
  • Build well-considered limits into your site – This also belongs under Security.
  • Learn how to do progressive enhancement.
  • Redirect after a POST if that POST was successful, to prevent a refresh from submitting again.
  • Don’t forget to take accessibility into account. It’s always a good idea and in certain circumstances it’s a legal requirement. WAI-ARIA and WCAG 2 are good resources in this area.
  • Read Don’t Make Me Think.

Security

  • It’s a lot to digest but the OWASP development guide covers Web Site security from top to bottom.
  • Know about Injection especially SQL injection and how to prevent it.
  • Never trust user input, nor anything else that comes in the request (which includes cookies and hidden form field values!).
  • Hash passwords using salt and use different salts for your rows to prevent rainbow attacks. Use a slow hashing algorithm, such as bcrypt (time tested) or scrypt (even stronger, but newer) (1, 2), for storing passwords. (How To Safely Store A Password). The NIST also approves of PBKDF2 to hash passwords”, and it’s FIPS approved in .NET (more info here). Avoid using MD5 or SHA family directly.
  • Don’t try to come up with your own fancy authentication system. It’s such an easy thing to get wrong in subtle and untestable ways and you wouldn’t even know it until after you’re hacked.
  • Know the rules for processing credit cards. (See this question as well)
  • Use SSL/TLS/HTTPS for any sites where sensitive data is entered (like credentials, Personally Identifiable Information, credit card info). Let’s Encrypt is a free certificate authority which can help.
  • Prevent session hijacking.
  • Avoid cross site scripting (XSS).
  • Avoid cross site request forgeries (CSRF).
  • Avoid Clickjacking.
  • Keep your system(s) up to date with the latest patches.
  • Make sure your database connection information is secured.
  • Keep yourself informed about the latest attack techniques and vulnerabilities affecting your platform.
  • Read The Google Browser Security Handbook.
  • Read The Web Application Hacker’s Handbook.
  • Consider The principle of least privilege. Try to run your app server as non-root. (tomcat example)
  • Put rel="noopener noreferrer" on all user-provided links with target="_blank" to prevent JavaScript on the destination page from redirecting your page to somewhere else, such as a fake login page. More Info
  • Consider using a strict Content Security Policy.

Performance

  • Implement caching if necessary, understand and use HTTP caching properly as well as HTML5 Manifest.
  • Optimize images – don’t use a 20 KB image for a repeating background.
  • Compress content for speed, see brotli, gzip/deflate (deflate is better).
  • Combine/concatenate multiple stylesheets or multiple script files to reduce the number of browser connections and improve gzip ability to compress duplications between files.
  • Take a look at the Yahoo Exceptional Performance site, lots of great guidelines, including improving front-end performance and their YSlow tool (requires Firefox, Safari, Chrome or Opera). Also, Google page speed (use with browser extension) is another tool for performance profiling, and it optimizes your images too.
  • Use CSS Image Sprites for small related images like toolbars (see the “minimize HTTP requests” point)
  • Use SVG image sprites for small related images like toolbars. SVG coloring is bit tricky. You can read about it here.
  • Busy web sites should consider splitting components across domains. Specifically…
  • Static content (i.e. images, CSS, JavaScript, and generally content that doesn’t need access to cookies) should go in a separate domain that does not use cookies, because all cookies for a domain and its subdomains are sent with every request to the domain and its subdomains. One good option here is to use a Content Delivery Network (CDN), but consider the case where that CDN may fail by including alternative CDNs, or local copies that can be served instead.
  • Minimize the total number of HTTP requests required for a browser to render the page.
  • Choose a template engine and render/pre-compile it using task-runners like gulp or grunt
  • Make sure there’s a favicon.ico file in the root of the site, i.e. /favicon.ico. Browsers will automatically request it, even if the icon isn’t mentioned in the HTML at all. If you don’t have a /favicon.ico, this will result in a lot of 404s, draining your server’s bandwidth.

SEO (Search Engine Optimization)

  • Use “search engine friendly” URLs, i.e. use example.com/pages/45-article-title instead of example.com/index.php?page=45
  • When using # for dynamic content change the # to #! and then on the server $_REQUEST["_escaped_fragment_"] is what googlebot uses instead of #!. In other words, ./#!page=1 becomes ./?_escaped_fragments_=page=1. Also, for users that may be using FF.b4 or Chromium, history.pushState({"foo":"bar"}, "About", "./?page=1"); Is a great command. So even though the address bar has changed the page does not reload. This allows you to use ? instead of #! to keep dynamic content and also tell the server when you email the link that we are after this page, and the AJAX does not need to make another extra request.
  • Don’t use links that say “click here”. You’re wasting an SEO opportunity and it makes things harder for people with screen readers.
  • Have an XML sitemap, preferably in the default location /sitemap.xml.
  • Use <link rel="canonical" ... /> when you have multiple URLs that point to the same content, this issue can also be addressed from Google Webmaster Tools.
  • Use Google Webmaster Tools and Bing Webmaster Tools.
  • Install Google Analytics right at the start (or an open source analysis tool like Matomo).
  • Know how robots.txt and search engine spiders work.
  • Redirect requests (using 301 Moved Permanently) asking for www.example.com to example.com (or the other way round) to prevent splitting the google ranking between both sites.
  • Know that there can be badly-behaved spiders out there.
  • If you have non-text content look into Google’s sitemap extensions for video etc. There is some good information about this in Tim Farley’s answer.

Technology

  • Understand HTTP and things like GET, POST, sessions, cookies, and what it means to be “stateless”.
  • Write your XHTML/HTML and CSS according to the W3C specifications and make sure they validate. The goal here is to avoid browser quirks modes and as a bonus make it much easier to work with non-traditional browsers like screen readers and mobile devices.
  • Understand how JavaScript is processed in the browser.
  • Understand how JavaScript, style sheets, and other resources used by your page are loaded and consider their impact on perceived performance. It is now widely regarded as appropriate to move scripts to the bottom of your pages with exceptions typically being things like analytics apps or HTML5 shims.
  • Understand how the JavaScript sandbox works, especially if you intend to use iframes.
  • Be aware that JavaScript can and will be disabled, and that AJAX is therefore an extension, not a baseline. Even if most users leave it on now, remember that NoScript is becoming more popular. Even though modern crawling bots support indexing JavaScript-generated content, consider using server-side rendering for other crawling bots or users who have disabled JavaScript.
  • Learn the difference between 301 and 302 redirects (this is also an SEO issue).
  • Learn as much as you possibly can about your deployment platform.
  • Consider using a Reset Style Sheet or normalize.css.
  • Consider JavaScript frameworks (such as jQuery, MooTools, Prototype, Dojo or YUI 3), which will hide a lot of the browser differences when using JavaScript for DOM manipulation.
  • Taking perceived performance and JS frameworks together, consider using a service such as the Google Libraries API to load frameworks so that a browser can use a copy of the framework it has already cached rather than downloading a duplicate copy from your site.
  • Don’t reinvent the wheel. Before doing ANYTHING search for a component or example on how to do it. There is a 99% chance that someone has done it and released an OSS version of the code.
  • On the flipside of that, don’t start with 20 libraries before you’ve even decided what your needs are. Particularly on the client-side web where it’s almost always ultimately more important to keep things lightweight, fast, and flexible.

Bug fixing

  • Understand you’ll spend 20% of your time coding and 80% of it maintaining, so code accordingly.
  • Set up a good error reporting solution.
  • Have a system for people to contact you with suggestions and criticisms.
  • Document how the application works for future support staff and people performing maintenance.
  • Make frequent backups! (And make sure those backups are functional) Have a restore strategy, not just a backup strategy.
  • Use a version control system to store your files, such as Subversion, Mercurial or Git.
  • Don’t forget to do your Acceptance Testing. Frameworks like Selenium can help. Especially if you fully automate your testing, perhaps by using a Continuous Integration tool, such as Jenkins.
  • Make sure you have sufficient logging in place using frameworks such as log4j, log4net or log4r. If something goes wrong on your live site, you’ll need a way of finding out what.
  • When logging make sure you capture both handled exceptions, and unhandled exceptions. Report/analyse the log output, as it’ll show you where the key issues are in your site.

Other

  • Implement both server-side and client-side monitoring and analytics (one should be proactive rather than reactive).
  • Use services like UserVoice and Intercom (or any other similar tools) to constantly keep in touch with your users.
  • Follow Vincent Driessen’s Git branching model

Lots of stuff omitted not necessarily because they’re not useful answers, but because they’re either too detailed, out of scope, or go a bit too far for someone looking to get an overview of the things they should know. Please feel free to edit this as well, I probably missed some stuff or made some mistakes.

Related Solutions

Explaining computational complexity theory

Hoooo, doctoral comp flashback. Okay, here goes. We start with the idea of a decision problem, a problem for which an algorithm can always answer "yes" or "no." We also need the idea of two models of computer (Turing machine, really): deterministic and...

Building a multi-level menu for umbraco

First off, no need pass the a parent parameter around. The context will transport this information. Here is the XSL stylesheet that should solve your problem: <!-- update this variable on how deep your menu should be --> <xsl:variable...

How to generate a random string?

My favorite way to do it is by using /dev/urandom together with tr to delete unwanted characters. For instance, to get only digits and letters: tr -dc A-Za-z0-9 </dev/urandom | head -c 13 ; echo '' Alternatively, to include more characters from the OWASP...

How to copy a file from a remote server to a local machine?

The syntax for scp is: If you are on the computer from which you want to send file to a remote computer: scp /file/to/send username@remote:/where/to/put Here the remote can be a FQDN or an IP address. On the other hand if you are on the computer wanting to...

What is the difference between curl and wget?

The main differences are: wget's major strong side compared to curl is its ability to download recursively. wget is command line only. There's no lib or anything, but curl's features are powered by libcurl. curl supports FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP,...

Using ‘sed’ to find and replace [duplicate]

sed is the stream editor, in that you can use | (pipe) to send standard streams (STDIN and STDOUT specifically) through sed and alter them programmatically on the fly, making it a handy tool in the Unix philosophy tradition; but can edit files directly, too,...

How do I loop through only directories in bash?

You can specify a slash at the end to match only directories: for d in */ ; do echo "$d" done If you want to exclude symlinks, use a test to continue the loop if the current entry is a link. You need to remove the trailing slash from the name in order for -L to...

How to clear journalctl

The self maintenance method is to vacuum the logs by size or time. Retain only the past two days: journalctl --vacuum-time=2d Retain only the past 500 MB: journalctl --vacuum-size=500M man journalctl for more information. You don't typically clear the journal...

How can I run a command which will survive terminal close?

One of the following 2 should work: $ nohup redshift & or $ redshift & $ disown See the following for a bit more information on how this works: man nohup help disown Difference between nohup, disown and & (be sure to read the comments too) If your...

Get exit status of process that’s piped to another

bash and zsh have an array variable that holds the exit status of each element (command) of the last pipeline executed by the shell. If you are using bash, the array is called PIPESTATUS (case matters!) and the array indicies start at zero: $ false | true $...

Execute vs Read bit. How do directory permissions in Linux work?

When applying permissions to directories on Linux, the permission bits have different meanings than on regular files. The read bit (r) allows the affected user to list the files within the directory The write bit (w) allows the affected user to create, rename,...

What are the pros and cons of Vim and Emacs? [closed]

I use both, although if I had to choose one, I know which one I would pick. Still, I'll try to make an objective comparison on a few issues. Available everywhere? If you're a professional system administrator who works with Unix systems, or a power user on...

How do I use pushd and popd commands?

pushd, popd, and dirs are shell builtins which allow you manipulate the directory stack. This can be used to change directories but return to the directory from which you came. For example start up with the following directories: $ pwd /home/saml/somedir $ ls...

How to forward X over SSH to run graphics applications remotely?

X11 forwarding needs to be enabled on both the client side and the server side. On the client side, the -X (capital X) option to ssh enables X11 forwarding, and you can make this the default (for all connections or for a specific connection) with ForwardX11 yes...

What does “LC_ALL=C” do?

LC_ALL is the environment variable that overrides all the other localisation settings (except $LANGUAGE under some circumstances). Different aspects of localisations (like the thousand separator or decimal point character, character set, sorting order, month,...

What is a bind mount?

What is a bind mount? A bind mount is an alternate view of a directory tree. Classically, mounting creates a view of a storage device as a directory tree. A bind mount instead takes an existing directory tree and replicates it under a different point. The...

Turn off buffering in pipe

Another way to skin this cat is to use the stdbuf program, which is part of the GNU Coreutils (FreeBSD also has its own one). stdbuf -i0 -o0 -e0 command This turns off buffering completely for input, output and error. For some applications, line buffering may...

Can less retain colored output?

Use: git diff --color=always | less -r --color=always is there to tell git to output color codes even if the output is a pipe (not a tty). And -r is there to tell less to interpret those color codes and other escape sequences. Use -R for ANSI color codes only....