Web Authoring FAQ: Web Publishing

This document answers questions asked frequently by web authors. While its focus is on HTML-related questions, this FAQ also answers some questions related to CSS, HTTP, JavaScript, server configuration, etc.

This document is maintained by Darin McGrew <darin@htmlhelp.com> of the Web Design Group, and is posted regularly to the newsgroup comp.infosystems.www.authoring.html. It was last updated on April 26, 2007.

Section 4: Web Publishing

Where can I put my newly created Web pages?
How can I get my own domain name?
How can I block my hosting service's advertisements?
Where can I announce my site?
Is there a way to get indexed better by the search engines?
How do I prevent my site from being indexed by search engines?
How do I redirect someone to another page?
How do I password protect my web site?
How do I stop my page from being cached?
How can I disable the browser's right-click options? How can I protect my source, images, etc. from being copied?
How do I hide my URL?
How do I detect what browser is being used?
How do I get my visitors' email addresses?
Why is my custom 404 Not Found message not displayed?

4.1. Where can I put my newly created Web pages?

Many ISPs offer web space to their customers. Disk space and bandwidth will be limited, and there may be other restrictions; for example, many do not allow commercial use of this space.

There are companies and individuals offering "free" web space. Most are supported by advertising displayed on authors' web pages. There are often restrictions, as with hosting provided by ISPs to their customers.

There are also many web space providers (aka presence providers) who will sell you space on their servers. Prices will range from as little as US$1 per month, up to US$100 per month or more, depending upon your needs. Non-virtual Web space is typically the cheapest, offering a URL like: http://www.webhost.example/yourname/ For a little more, plus the cost of registering a domain name, you can get virtual web space, which will allow you to have a URL like http://www.yourname.example/.

If you have some permanent connection to the Internet, perhaps via leased line from your ISP then you could install an httpd and operate your own Web server. There are several Web servers available for almost all platforms.

If you just wish to share information with other local users, or people on a LAN or WAN, you could just place your HTML files on the LAN for everyone to access, or alternatively if your LAN supports TCP/IP then install a Web server on your computer.

4.2. How can I get my own domain name?

The Internet Corporation for Assigned Names and Numbers (ICANN) maintains a list of accredited registrars . Any of the companies on this list can register a domain name for you.

[Table of Contents]

4.3. How can I block my hosting service's advertisements?

Check the Terms of Service (TOS) agreement for your hosting service. It almost certainly prohibits interfering with the advertisements they add to your web pages. If you use some trick to block their advertisements on your own, then your hosting service may delete your account for violating its TOS.

However, there may be other options. Some hosting services will remove the advertisements if you pay a small monthly fee. Others will remove their default pop-up advertisements if you add static banners yourself.

[Table of Contents]

4.4. Where can I announce my site?

comp.infosystems.www.announce -- a moderated newsgroup specifically geared toward this subject. You need to obtain its FAQ list before posting to it.
http://www.submit-it.com/ lets you submit site information to 10 different major index sites for free. If you wish to pay you may submit your site to more than 400 sites.
How To Announce Your New Web Site FAQ

[Table of Contents]

4.5. Is there a way to get indexed better by the search engines?

There is no single technique, but a number of factors can help.

Search engines index the textual content of your site, so use a meaningful <TITLE>, use meaningful headings (<H1>, <H2>, and so on), and provide meaningful ALT text for images.
Many search engines ignore frames, so avoid them, and be sure to provide useful NOFRAMES content if you do use them.
Most search engines ignore image maps, forms, and JavaScript, so make sure that navigating your site doesn't depend on them. Provide normal links for site navigation.
Avoid using META refresh, because many search engines penalize sites that use it (META refresh has been used to trick search engines).
The indexing programs of some search engines (including AltaVista and Infoseek) will also take into account <META NAME="keywords" CONTENT="..."> tags that appear in the <HEAD> part of your documents. However, META keywords have been used to trick search engines, so many will ignore your keywords list if you repeat a given keyword too often. At this writing, "too often" means "more than 7 times" to some popular engines, but that may change in the future as indexing programs are changed to defend against trickery.
If you include a <META NAME="description" CONTENT="..."> tag in the <HEAD> part of your documents, then some search engines will use the content of this tag as your site's description when displaying search results. This won't affect your ranking in searches, but it can help search engine users understand what your site offers when a search does find your site.

The CONTENT attribute of the META keywords and description tags may contain up to 1022 characters, but no markup other than entities.

You might want to preview your site with a text-only browser like Lynx, to get an idea of how your site appears to search engines.

Finally, note that some search engines ignore sites hosted by well-known free hosting services. Other search engines index only a certain number of documents per server, so while early customers of free hosting services may be indexed, later customers may be ignored.

4.6. How do I prevent my site from being indexed by search engines?

The Robots Exclusion Protocol allows Web site administrators to specify parts of their sites that robots should not visit by providing a /robots.txt document.

The Robots META tag allows HTML authors to specify whether robots should index a document, and whether robots should harvest additional URLs from a document. The Robots META tag requires no server administration.

4.7. How do I redirect someone to another page?

The most reliable way is to configure the server to send out a redirection instruction when the old URL is requested. Then the browser will automatically get the new URL. This is the fastest and most efficient way, and is the only way described here that can convince indexing robots to phase out the old URL. For configuration details consult your server admin or documentation (with NCSA or Apache servers, use a Redirect statement in .htaccess).

If you can't set up a redirect, there are other possibilities. These are inferior because they tell the search engines that there's still a page at the old location, not that the page has moved to a new location. But if it's impossible for you to configure redirection at your server, here are two alternatives:

Put up a small page with text like "This page has moved to http://new.url/ -- please adjust your bookmarks."
A META Refresh tag won't work for all browsers and can break the "back" button. For example:
<meta http-equiv="Refresh" content="[x]; URL=[newURL]">
which will load [newURL] after [x] seconds. This should go in the HEAD of the document. But if you do this, also include a short text saying "Document moved to [newURL]" for other browsers.

4.8. How do I password protect my web site?

Password protection is done through HTTP authentication. The configuration details vary from server to server, so you should read the authentication section of your server documentation. Contact your server administrator if you need help with this.

JavaScript password scripts provide only a facade of security. At a fundamental level, they work in one of two ways. Some scripts convert the password into a URL, which keeps the document secret by limiting the number of people who know its URL. Some scripts check the password and then go to a specific URL, which protects the document only from those who don't view the JavaScript source to get the URL of the document. Neither mechanism is really secure.

4.9. How do I stop my page from being cached?

Browsers cache web documents; they store local copies of documents to speed up repeated references to documents that haven't changed. Also, many browsers are configured to use public proxy caches, which serve many users (e.g., all customers of an ISP, or all employees behind a corporate firewall). To effectively control how your documents are cached you must configure your server to send appropriate HTTP headers.

The Expires header is understood by virtually all caches. The cached document will be retrieved again automatically once it has expired. The Expires header must contain an HTTP date, which must be Greenwich Mean Time (GMT), not local time.

HTTP 1.1 introduced the Cache-Control header, which provides more flexibility for telling caches how to handle the document.

The configuration details vary from server to server, so check your server documentation.

The Pragma header is generally ineffective because its meaning is not standardized and few caches honor it. Using <META HTTP-EQUIV=...> elements in HTML documents is also generally ineffective; some browsers may honor such markup, but other caches ignore it completely.

4.10. How can I disable the browser's right-click options? How can I protect my source, images, etc. from being copied?

You cannot interfere with the browser's right-click options with HTML. Sometimes, JavaScript can do this, however:

These scripts annoy visitors who lose ready access to their browsers' normal context-menu functions (e.g., "Open in new window" or "Bookmark link"). These scripts can also interfere with features like mouse gestures.
Nothing (including these scripts) can prevent anyone from copying your source or images. The browser cannot display your document without the source and images, so your web server must send them to the browser. Even without the various "save" functions in today's browsers, someone can retrieve your source or images from the browser's cache, request them from the server with some other tool, or use a screen-capture tool to copy the images.
These scripts do nothing when JavaScript is disabled or unavailable, when JavaScript access to right-click events is disabled, on systems without mice, or on systems with single-button mice.

4.11. How do I hide my URL?

You can't. URLs are fundamental to navigation on the WWW. The URL is necessary for the browser to be able to retrieve your document. It is impossible to hide the URL of a resource from the browser.

With that said, it is possible to somewhat obscure URLs via a misfeature of frames.

4.12. How do I detect what browser is being used?

Many browsers identify themselves when they request a document. A CGI script will have this information available in the HTTP_USER_AGENT environment variable, and it can use that to send out a version of the document which is optimized for that browser.

Keep in mind not all browsers identify themselves correctly. For example, Microsoft Internet Explorer identifies itself as Netscape Navigator, and many other browsers identify themselves as Microsoft Internet Explorer.

And of course, if a cache proxy keeps a version intended for one brower, someone with another browser may get that version, rather than the version intended for the other browser.

For these reasons and others, it is not a good idea to play the browser guessing game.

4.13. How do I get my visitors' email addresses?

You can't. Although each request for a document is usually logged with the name or address of the remote host, the actual username is almost never logged as well. This is mostly because of performance reasons, as it would require that the server uses the ident protocol to see who is on the other end. This takes time. And if a cache proxy is doing the request, you don't get anything sensible.

But just stop to think for a minute... would you really want every single site you visit to know your email address? Imagine the loads of automated thank you's you would be receiving. If you visited 20 sites, you would get at least 20 emails that day, plus no doubt they would send you invitations to return later. It would be a nightmare as well as an invasion of privacy!

In Netscape 2.0, it was possible to automatically submit a form with a mailto as action, using JavaScript. This would send email to the document's owner, with the address the visitor configured in the From line. Of course, that can be "mickey.mouse@disney.com". This was fixed by Netscape 2.01.

The most reliable way is to put up a form, asking the visitor to fill in his email address. To increase the chances that visitors will actually do it, offer them something useful in return.

[Table of Contents]

4.14. Why is my custom 404 Not Found message not displayed?

If no browser displays your custom 404 Not Found messages, then your server probably is not configured properly.

If only Internet Explorer ignores your custom 404 Not Found messages, then you've been caught by its "friendly" HTTP error messages. When a special HTTP response (e.g., a 404 Not Found response) is shorter than 512 bytes, Internet Explorer substitutes its own message for the one delivered by the server. As a user of Internet Explorer, you can disable this feature in the "Advanced" options panel. As a web author, your only recourse is to make your 404 Not Found message longer.

[Table of Contents]

Web Authoring FAQ: Web Publishing

Section 4: Web Publishing

4.1. Where can I put my newly created Web pages?

See also

4.2. How can I get my own domain name?

4.3. How can I block my hosting service's advertisements?

4.4. Where can I announce my site?

4.5. Is there a way to get indexed better by the search engines?

See also

4.6. How do I prevent my site from being indexed by search engines?

See also

4.7. How do I redirect someone to another page?

See also

4.8. How do I password protect my web site?

See also

4.9. How do I stop my page from being cached?

See also

4.10. How can I disable the browser's right-click options? How can I protect my source, images, etc. from being copied?

See also

4.11. How do I hide my URL?

See also

4.12. How do I detect what browser is being used?

See also

4.13. How do I get my visitors' email addresses?

4.14. Why is my custom 404 Not Found message not displayed?