HTML to JSON Converter

Mar 16, 2019

HTML to JSON Converter is used to convert HTML document to JSON by extracting the rows from HTML tables & converting it to JSON format. HTML is parsed, data types are automatically detected & converted to appropriate format in the JSON output. And finally the JSON output is formatted & indented for easy viewing.



Input: Paste HTML content below



Settings

HTML is the language of the internet. It is what creates HTML pages (even this one). In the old days, HTML used to be static with some JavaScript added into the mix for dynamic behavior and effects. Then HTML was served dynamically from the server side with the advent of server side programming languages such as PERL, PHP, ASP. And now there is a new trend where HTML is again being served as static resources with JSON (from REST web services) and JavaScript making it dynamic.

JavaScript Object Notation (JSON), pronounced as Jason, is the most common data interchange format on the web. Douglas Crockford first released the JSON specification in the early 2000s. It is a simple format that is easier to comprehend than XML. It is also smaller in size because it does not have closing tags. A wide variety of programming languages can parse JSON files. They also support the serialization of data structures to JSON. You can copy JSON text to JavaScript and start using them without any modifications.

Settings Explained
  • 1. Indent

    This setting governs whether or not the Output is indented. The indented Output is easier to comprehend. On the other hand, a non-indented output is compact. The smaller size is best for transmission over the network. So, we often minify JSON by removing non-essential whitespace.

    Indentation On

    {
      "name": "John Doe",
      "age": 69
    }
    Indentation Off

    {"name":"John Doe","age":69}
  • 2. Unescape Json

    If selected and the input appears to be HTML wrapped in JSON string, the input is unescaped before processing.

  • 3. Mode

    The conversion mode to use. Available options: Generic, Table & JSON-LD

    Generic

    In this mode all HTML nodes are converted into JSON objects & properties. Available options are Attribute Prefix & Text Property Name

    Input
    <html>
    <body>
    <table style="width: 100%">
        <tr>
            <th>Firstname</th>
            <th>Lastname</th>
            <th>Age</th>
        </tr>
        <tr>
            <td>Jill</td>
            <td>Smith</td>
            <td>50</td>
        </tr>
        <tr>
            <td>Eve</td>
            <td>Jackson</td>
            <td>94</td>
        </tr>
    </table>
    </body>
    </html>
    Output
    {
      "html": {
        "body": {
          "table": {
            "@style": "width: 100%",
            "tr": [
              {
                "th": [
                  "Firstname",
                  "Lastname",
                  "Age"
                ]
              },
              {
                "td": [
                  "Jill",
                  "Smith",
                  "50"
                ]
              },
              {
                "td": [
                  "Eve",
                  "Jackson",
                  "94"
                ]
              }
            ]
          }
        }
      }
    }
    Table

    In this mode HTML <TABLE> nodes are converted into JSON objects & properties. Each <TR> is converted into a JSON object. The cells from the header row become JSON property names while the cells from other rows become the values of the JSON properties.

    Input
    <html>
    <body>
    <table style="width: 100%">
        <tr>
            <th>Firstname</th>
            <th>Lastname</th>
            <th>Age</th>
        </tr>
        <tr>
            <td>Jill</td>
            <td>Smith</td>
            <td>50</td>
        </tr>
        <tr>
            <td>Eve</td>
            <td>Jackson</td>
            <td>94</td>
        </tr>
    </table>
    </body>
    </html>
    Output
    [
      {
        "Firstname": "Jill",
        "Lastname": "Smith",
        "Age": 50
      },
      {
        "Firstname": "Eve",
        "Lastname": "Jackson",
        "Age": 94
      }
    ]
    JSON-LD

    In this mode all JSON-LD is extracted from the HTML and outputed as JSON. Each JSON-LD item becomes an array item in the final output

    Input
    <html>
    <body>
        <div class="row">
            <script type="application/ld+json">
                {
                "@context": "http://schema.org/",
                "@type": "Person",
                "name": "Jane Doe",
                "jobTitle": "Professor",
                "telephone": "(425) 123-4567",
                "url": "http://www.janedoe.com"
                }
            </script>
        </div>
        <div class="row">
            <script type="application/ld+json">
                {
                "@context": "http://schema.org/",
                "@type": "Person",
                "name": "John Doe",
                "jobTitle": "Dancer",
                "telephone": "(425) 123-4568",
                "url": "http://www.johndoe.com"
                }
            </script>
        </div>
    </body>
    </html>
    Output
    [
      {
        "@context": "http://schema.org/",
        "@type": "Person",
        "name": "Jane Doe",
        "jobTitle": "Professor",
        "telephone": "(425) 123-4567",
        "url": "http://www.janedoe.com"
      },
      {
        "@context": "http://schema.org/",
        "@type": "Person",
        "name": "John Doe",
        "jobTitle": "Dancer",
        "telephone": "(425) 123-4568",
        "url": "http://www.johndoe.com"
      }
    ]
  • 4. Attribute Prefix

    The prefix to use for properties corresponding to HTML attributes. Set blank to use no prefix

    Input

    <html>
    <body>
    <table style="width: 100%">
        <tr>
            <th>Firstname</th>
            <th>Lastname</th>
            <th>Age</th>
        </tr>
        <tr>
            <td>Jill</td>
            <td>Smith</td>
            <td>50</td>
        </tr>
        <tr>
            <td>Eve</td>
            <td>Jackson</td>
            <td>94</td>
        </tr>
    </table>
    </body>
    </html>
    Attribute Prefix: @

    {
      "html": {
        "body": {
          "table": {
            "@style": "width: 100%",
            "tr": [
              {
                "th": [
                  "Firstname",
                  "Lastname",
                  "Age"
                ]
              },
              {
                "td": [
                  "Jill",
                  "Smith",
                  "50"
                ]
              },
              {
                "td": [
                  "Eve",
                  "Jackson",
                  "94"
                ]
              }
            ]
          }
        }
      }
    }
    Attribute Prefix: Empty

    {
      "html": {
        "body": {
          "table": {
            "style": "width: 100%",
            "tr": [
              {
                "th": [
                  "Firstname",
                  "Lastname",
                  "Age"
                ]
              },
              {
                "td": [
                  "Jill",
                  "Smith",
                  "50"
                ]
              },
              {
                "td": [
                  "Eve",
                  "Jackson",
                  "94"
                ]
              }
            ]
          }
        }
      }
    }
  • 5. Text Property Name

    The name of the property that holds the value of HTML text nodes

    Input

    <html>
    <body>
    <div>
        <p>
            Pre Header
            <h1>Title</h1>
            Post Header
        </p>
    </div>
    </body>
    </html>
    Text Property Name: #text

    {
      "html": {
        "body": {
          "div": {
            "p": {
              "h1": "Title",
              "#text": [
                "Pre Header",
                "Post Header"
              ]
            }
          }
        }
      }
    }
    Text Property Name: text

    {
      "html": {
        "body": {
          "div": {
            "p": {
              "h1": "Title",
              "text": [
                "Pre Header",
                "Post Header"
              ]
            }
          }
        }
      }
    }
Comments 3

Alphonse

Alphonse

  • 2 years ago

When you make this HTML,
you loose informations in JSON and you are unable to get the links in the good place ...

And is the sourcecode available for download ?

<html>
<body>
<h2>Size Attributes</h2>
<p class="toto">Images in HTML have a set of size attributes,<a href="xxeee">sqfsqdddd</a> which specifies the width and height of the image:<a href="xx">sqfsqd</a> sqfdss</p>
<img src="img_girl.jpg" width="500" height="600">
</body>
</html>

Sai Chand

Sai Chand

  • one year ago

Hi,

Can I get the HTML to JSON Converter : Code to download ?

Sahan Ravindu

Sahan Ravindu

  • 2 months ago

Can I get the HTML to JSON Converter as an API service or the code?

History
Mar 16, 2019
Handle HTML encoded values

May 26, 2018
Automatic input detection
May 26, 2018
Support for JSON-LD
May 15, 2018
Generic HTML Conversion Support
Sep 27, 2017
Tool Launched