If you want to become a web developer, you need to understand the basics. In this article we focus on the primary markup language you absolutely need to master, HTML.
HTML stands for HyperText Markup Language. It’s a special language that web browsers understand. It’s basically a bunch of instructions about the structure of the document.
Hypertext is probably the most important part of HTML letters because it means that you can link one document to another one. Hypertext is a text file that contains links to other text files.
This is the foundation stone of web, which at the beginning wasn’t much more than few pages interconnected with links. Today, the web is no longer just about text. Most of it is created by Hypermedia including video, photos and music. But if you strip down all the bells and whistles, you will still find this fundamental functionality of mutual linking between documents at the core of any modern web application including Twitter, Gmail and even the project we are about to build.
Markup means that HTML surrounds regular text with special code which tells the browser how to display the content of the document. It tells the browser what the structure of the document is.
This special code consists of the so-called tags. A tag is a special annotation, but don’t worry, it is human readable and makes perfect sense even for the beginners.
For example, the <title> is a tag that tells the browser that anything wrapped inside this tag is, well, the title.
<title>This is the title of the book</title>
This way, not only the browser, but even you, as a web developer, can perfectly understand what the meaning of the text inside the <title> tag is.
Next example is the <p> tag which is a shortcut for paragraph. This tag tells the web browser that the content of this tag should be displayed as a paragraph, which basically means on the new line. We will discuss the anatomy of tags in details later, but you might have already noticed that the <title> and the <p> tags have fancy counterparts at the end which contain a forward slash: </title> and </p>. In order to wrap the content inside, most of the tags have two parts, the opening part at the beginning of the content to be wrapped, and the closing part at the end. But there are tags with just the opening part as well. More on that later.
Language means that HTML has some special syntax or the rules you need to follow to create the correct content. Just like with other languages where subjects and verbs have a special place in the sentence, HTML tags have a special structure you need to use. For instance, when you need to nest tags, you must place the closing part of the tag correctly.
Anatomy of HTML tag
A tag is always made up of two building blocks. Element name and brackets.
Together they create a tag. Most of the tags surround some content. Such tags have two parts.
There is the opening part, which is present in every tag, and there is a closing part, which some tags are missing. The opening part can have so-called attributes with values that further specify the behavior of the tag. The closing part has no such thing, only the forward-slash followed by the name of the tag.
The opening part of the tag tells the browser where is the beginning of the content that should be treated by the browser in some special way. The closing part of the tag tells the browser where the content for applying special rules stops.
Only those tags that wrap some content need to have both the opening and the closing parts, but some tags don’t wrap content. Such tags lack the closing part. This is the case of images for example, as you will see later.
When writing tags, you need to be very careful about spacing. No space is allowed between the opening bracket of the opening part of the tag and the name of the element. Similarly, no space is allowed between the opening bracket of the closing part of the tag and forward slash.
However, there must be space between the name of the element and the name of the attribute. All other spaces are allowed but ignored by the browser.
It doesn’t matter if you use single or double quotes to wrap the value of the attribute. In HTML 5, you don’t even have to use the quotes at all, but it’s a good practice to use them, because with quotes, the code is more readable.
You might still find in the code something like this: <br />. This is a so-called self-closing tag and it’s a relic from XHTML standard, which is no longer in use but still accepted by browsers which know how to display it correctly. Since it is not legal in HTML 5 standard, try to avoid using it.
Basic HTML document structure
Before we will dive into the description of the most popular HTML elements, I will show you the basic structure of the HTML document which every HTML page should follow in order to be standard-compliant.
Open your favorite code editor, create a new file, name it index.html and save it on Desktop. Every HTML page should start with the declaration of the type of the document:
It doesn’t matter if it’s lowercase or uppercase, just make sure there is no space between the first bracket, the exclamation mark and the doctype.
If you omit this declaration, the web browser will still display the page, but it will treat it as something not following the HTML 5 standard. For a non-compliant content, browsers sometimes use the so-called quirk-mode where the layout can look strange, styles might be applied differently than you would expect and so on. So, there’s good reason to use this declaration.
Next is the <html> tag followed by the closing counterpart at the very end of the document: </html>
Inside the <html> tag is the <head></head> tag and the <body></body> tag, both with their closing counterparts.
The <head> tag usually contains information needed to render the page properly, like character encoding, while the <body> tag contains the actual content of the page.
The last tag you need to add to have a standard-compliant HTML document is the <title> tag which belongs inside the <head> tag and contains the title of the page.
Your code should look like this now:
<!doctype html> <html> <head> <title>My First Web Page</title> </head> <body> </body> </html>
Go ahead and check on https://validator.w3.org/#validate_by_input if your code is valid. Just copy it to the clipboard, open the website and paste the code to the form.
Hit the big Check button and you should see the green message that no errors or warnings were found. With the latest standard, you might get the warning that you should add the lang attribute, which specifies the language of the document, to the <html> tag. In such case, change your code like this:
<!doctype html> <html lang="en"> <head> <title>My First Web Page</title> </head> <body> </body> </html>
Check the code again and everything should be fine.
When you take a closer look at your code, you will see, that tags are nested. The <title> tag is inside the <head> tag which is inside the <html> tag.
This nesting is very frequent in HTML language and you need to be very careful to properly place the closing part of the corresponding tag.
If you mismatch this, the browser will probably cope with it, but your code won’t be clean and standard-compliant. This can get you in a lot of trouble once you have plenty of code in your file because it will get harder and harder to read and understand the code.
Some code editors help you with the proper nesting by allowing you to fold and expand the code based on the nesting.
Both Atom and Brackets editors have small arrows next to the opening part of the HTML tag if such tag contains a nested tag. Click this arrow to fold the nested code, click it again to see the code expand.
When you look again at our code, you can see that there’s a space at the beginning of each new line. Sometimes there’s more space, sometimes less.
You might have noticed, that the amount of space corresponds to the level of nesting of the tag. The more the tag is nested, the more space is added.
This is called the indentation. Space is made by the TAB key and the indentation makes it very easy to read the code.
It’s a good idea to specify the character set so the browser knows how to display specific characters. This is done with the <meta> tag which belongs inside the <head> tag.
You can’t probably go wrong with Unicode charset. Open the index.html file you’ve saved on Desktop and insert this tag inside the <head> tag: <meta charset=”utf-8”>
Two things to notice here:
- The <meta> tag has no closing part
- We specified the charset attribute and assigned it a value utf-8
The <body> tag is the storage for all the content of the page. Go ahead and type something special between the opening part and the closing part of the <body> tag.
Your final code should look like this now:
<!doctype html> <html> <head> <meta charset="utf-8"> <title>My First Web Page</title> </head> <body> This is my first web page! :-) </body> </html>
Save the changes, locate the index.html file on Desktop and double click on it. It should be automatically opened in your default web browser and you should see the basic text on a blank page. Great job!
Remember that the browser reads the code like we humans do, from the top to the bottom, and renders it the same way.
This might not look like something strange, but it will be very important in the future, when we will talk about linking other content to the page and how the position of this content alters the look and behavior of the page.
Sometimes, you need a portion of your code to be ignored by the browser. You can select the code to be ignored and comment it out.
To do that, you will insert at the end.
In modern code editors, like Atom, Brackets or VS Code, such code will be greyed out, giving you the hint that this code won’t be interpreted by the browser.
Prior to HTML 5, there were two types of elements, block-level elements and inline elements. The main difference is whether they can contain other elements nested inside or not.
With HTML 5, though, things got more complicated, because, now we have not two, but seven types of elements.
However, understanding those two old categories is still very practical, because they work very well with existing CSS rules.
By default, block-level elements are rendered by the browser to always start on the new line. You can change this behavior with CSS rules, but we will get to that later.
Block-level elements can contain inline elements or other block-level elements within them. They are roughly equivalent to the new HTML 5 category called Flow Content.
The most generic block-level element is the <div> element. The div is a shortcut which stands for the division.
Inline elements are by default rendered on the same line. Again, you can change this behavior with CSS rules, and I will show you later how to do it.
Inline elements can contain only other inline elements, but not block-level elements. They roughly behave as a new HTML 5 category called Phrasing Content.
The most generic inline element is the <span> element.
Let’s look at the example to see the difference between block-level elements and inline elements. We will use two basic elements, <p> and <a>. The <p> element represents the paragraph and this element is suitable for text which you want to format as a paragraph of some article.
The <a> element represents the anchor and it is the element for linking one document to another document or to create a link to the specific section of the same document. We will talk about these elements and many more in detail, but for now, this should suffice. It happens that the <p> element is by default defined as a block-level element while the <a> element is by default defined as an inline element.
Let’s look at this code:
<!doctype html> <html> <head> <meta charset="utf-8"> <title>Block-Level vs Inline Elements</title> </head> <body> <p>*** PARAGRAPH 1: This is the first paragraph ***</p> <p>*** PARAGRAPH 2: This is the second paragraph ***</p> <a href="#">LINK 1: This is the first link</a> <p> *** PARAGRAPH 3: This is the third paragraph *** <a href="#">LINK 2: This is the second link</a> </p> </body> </html>
Basic HTML elements
A lot of HTML elements have a name implying some meaning. The <title> tag is a good example, because it implies immediately, that the content of this element is the title of the page. Some elements use shortcuts, like the <p> for paragraph or the <br> for break.
There are six headings available, from <h1> to <h6>.
The <h1> is the most important heading and the <h6> is the least important.
How the heading element looks by default depends on the specific browser and its default styling of HTML elements.
Even though there are some disputes about how much the heading element influences the SEO (search engine optimization) results of the website in search engine rankings, it’s a good practice to use at least one <h1> element which should contain the description of the content of the page.
Let’s look at some new tags introduced in HTML 5 which make it easier to understand the structure of the document.
The <header> tag should include for example company logo, some slogan, maybe even navigation bar, which has its own tag.
The <nav> tag usually contains links to other pages located either on the same website or links to some external resources.
The <section> tag specifies a section of the document which is distinguished from other parts. It usually contains articles.
The <article> tag is for articles which are usually made of paragraphs. Even articles can have sections within them, there’s no rule against it.
The <aside> tag can contain information about some related content, for example in case of a blog post, related posts could be displayed there.
The <footer> tag usually contains copyright information, links to social media, contact information and so on.
Now, since all these semantic tags are block-level elements, it would be perfectly legit to use just the <div> tags instead, but you can imagine that semantic tags help to make the content much easier to understand.
There are two basic kinds of lists you can create in HTML. Unordered lists, denoted by the <ul> tag and ordered lists, denoted by the <ol> tag. Both lists have list items, denoted by the <li> tag.
Unordered list uses bullet points or any other of the available graphic elements in front of list items, while ordered list uses numbers, or letters based on the style. The default values are bullets for the unordered list and numbers for the ordered list.
Every item inside the <ul> or <ol> tags is wrapped by the <li> tag. It’s not allowed to have just a text content inside the <ul> or <ol> tags.
In order to create a nested list, you need to create yet another list inside the list item where you want your list to be nested.
Unordered lists have a bullet as a list item marker set by default. This can be changed by CSS styling. Ordered lists however have an attribute called type which can have five different values.
In order to change the list item marker for the ordered list, you add the attribute type with the value of your choice next to the element name in the opening part of the tag:
By default, the list items will be numbered by numbers, but you can number them also with uppercase letters <ol type =”A”>, lowercase letters <ol type =”a”>, uppercase roman numbers <ol type=”I”> and lowercase roman numbers <ol type=”i”>.
Sometimes, you need to tell the browser that it should not interpret specific characters as HTML code but rather render them as a simple text. In other words, to escape them.
In order to allow the browser to make a distinction between HTML code and a plain text, we use the so-called character entities.
There are three characters that should be escaped in all situations to prevent any rendering issues.
These are characters are:
- less than (<),
- greater than (>),
- ampersand (&).
To escape those characters, you will use entities. Instead of using <, you will use < where lt stands for less than, instead of using >, you will use > where gt stands for great than, and instead of using &, you will use & where amp stands for ampersand.
Entities will also help you with characters which are not present on the regular keyboard, like a character for the copyright sign ©
There are two types of links, internal links, and external links. Both use the <a> tag with the href attribute which stands for hypertext reference.
The value of the href attribute specifies what the browser will show you once you click the link. It can either load the content from the specific directory, typically some image, media file or another page, regardless of whether it is stored on your local machine or on the Internet. Or it can take you to a specific place on the very same page.
The text or the image of the link that you can click belongs between the opening and the closing parts of the <a> tag.
It’s quite interesting that the <a> tag is both the block-level element and the inline element at the same time. This allows us to use the <div> tag inside the <a> tag for creating a clickable region, like for example a company logo, instead of just clicking the inline element.
This behavior wasn’t available prior to HTML 5 though, so in order to achieve such functionality, you would have to use all sorts of workarounds.
We have five links here. The first two links demonstrate the difference between the inline element and the block-level element. Notice how the description of the second link is nested inside the <div> tag, which allows the link to take the whole width of the page.
The destination of the third link is set to a different directory.
The destination of the fourth link is set to a specific section within the same page. This section is called the footer and it is located at the very bottom of the page.
The destination of the fifth link is set to the external website.
Generally, spaces are not recommended in the URL addresses because they present the possibility of being misunderstood for various reasons. That’s why spaces should be always encoded. But it’s easier and a good practice to use hyphens or underscores instead of spaces in the names of files and directories.
When you link the content within the same page, section identifiers allow you to jump from one region of the page to another one. This is typically very useful if you want to go from the footer of the page to the top, especially on rather long pages.
Also, the section identifier can be part of URL address. In such case, when you open the URL with the section identifier in the new window, the browser will not only load the page, but it will also navigate you to that specific section automatically.
Section identifiers are special values of the href attribute, and they consist of two parts. The hash (#) sign followed by the value of the name or the id attribute of the element where the browser should navigate.
So, if you have the <h1 name=”here”>, the link to that element would look like this: <a href=”#here”>.
When linking to the external resources, don’t forget to add the protocol (http://) at the beginning of the link. It’s not enough to write just “www.zavrel.net” because the browser will think that this is the name of the file in your directory and will display the error because it won’t be able to find this file.
Also, with external links, it’s helpful to use the target attribute with the _blank value which will open the link in a new browser window or in a new tab. This is a good strategy if you don’t want the user to leave your website altogether, but rather show him something else while keeping your website still loaded in the original window so he can easily return to it.
To insert an image in HTML code, you need to use the <img> tag where the img stands for image.
The <img> tag has the src attribute and the value of this attribute is the URL address of the image file. It can be a local file saved in the same or a different directory, but it can also be a file available on the Internet. In such a case, you need to write the whole URL including the http:// prefix. The easiest way is to display the image in the browser, copy the whole URL address from the address bar and paste it in your code.
It’s also a good idea to add the alt attribute with the description of the image. This helps visually impaired people understand what’s on the image.
Tables are useful when you need to display structured data. Each table must start with the <table> tag. Each table row is defined with the <tr> tag where tr stands for table row. A table cell is defined with the <td> tag where td stands for table data.
The table can also have a header which is defined with the <th> tag and th stands for table header. If you want to use a caption with a table, you need to put it inside the <caption></caption> tag.
Before HTML 5 and CSS 3 were introduced, it was a common practice among web developers to use tables for visual layouts of the pages. Sometimes, it also helped them cope with inconsistencies between different browsers.
Today, it is not recommended. Especially when working with responsive design, it’s a bad idea to use tables as a layout wireframe.
A form is a tool for collecting the input information from the user of the website. Each form must be wrapped within the <form> element.
Inside the form, there can be many <input> elements. Their role is distinguished by the type of attribute.
Related data in the form can be grouped with the <fieldset> element. To define a caption for a specific fieldset, you use the <legend> element.