Convert HTML Document to JSON Popular

Mar 16, 2019 1 Comments

HTML to JSON Converter is used to convert HTML document to JSON by extracting the rows from HTML tables & converting it to JSON format. HTML is parsed, data types are automatically detected & converted to appropriate format in the JSON output. And finally the JSON output is formatted & indented for easy viewing.



Input: Paste HTML content below



Settings

JSON to HTML Converter

Convert JSON data to HTML table

View Tool
Background Information

HTML is the language of the internet. It is what creates HTML pages (even this one). In the old days, HTML used to be static with some JavaScript added into the mix for dynamic behavior and effects. Then HTML was served dynamically from the server side with the advent of server side programming languages such as PERL, PHP, ASP. And now there is a new trend where HTML is again being served as static resources with JSON (from REST web services) and JavaScript making it dynamic.

JavaScript Object Notation (JSON) pronounced as "Jason" is the de facto standard for data interchange on the web these days. It is a simple format that is easier to comprehend than XML. It also has less size than XML because of no closing tags. Interacting with JSON from JavaScript is extremely seamless. JSON format was first specified by Douglas Crockford in the early 2000s

Settings Explained
  • 1. Indent

    This setting governs whether or not the output is indented. The indented output is easier for humans to comprehend. On the other hand, a non-indented output is compact & smaller in size (best for transmission). So, JSON is often minified which compacts & compresses the output by removing non-essential whitespace.

    Indentation On

    {
      "name": "John Doe",
      "age": 69
    }
    Indentation Off

    {"name":"John Doe","age":69}
  • 2. Unescape Json

    If selected and the input appears to be HTML wrapped in JSON string, the input is unescaped before processing.

  • 3. Mode

    The conversion mode to use. Available options: Generic, Table & JSON-LD

    Generic

    In this mode all HTML nodes are converted into JSON objects & properties. Available options are Attribute Prefix & Text Property Name

    Input
    <html>
    <body>
    <table style="width: 100%">
        <tr>
            <th>Firstname</th>
            <th>Lastname</th>
            <th>Age</th>
        </tr>
        <tr>
            <td>Jill</td>
            <td>Smith</td>
            <td>50</td>
        </tr>
        <tr>
            <td>Eve</td>
            <td>Jackson</td>
            <td>94</td>
        </tr>
    </table>
    </body>
    </html>
    Output
    {
      "html": {
        "body": {
          "table": {
            "@style": "width: 100%",
            "tr": [
              {
                "th": [
                  "Firstname",
                  "Lastname",
                  "Age"
                ]
              },
              {
                "td": [
                  "Jill",
                  "Smith",
                  "50"
                ]
              },
              {
                "td": [
                  "Eve",
                  "Jackson",
                  "94"
                ]
              }
            ]
          }
        }
      }
    }
    Table

    In this mode HTML <TABLE> nodes are converted into JSON objects & properties. Each <TR> is converted into a JSON object. The cells from the header row become JSON property names while the cells from other rows become the values of the JSON properties.

    Input
    <html>
    <body>
    <table style="width: 100%">
        <tr>
            <th>Firstname</th>
            <th>Lastname</th>
            <th>Age</th>
        </tr>
        <tr>
            <td>Jill</td>
            <td>Smith</td>
            <td>50</td>
        </tr>
        <tr>
            <td>Eve</td>
            <td>Jackson</td>
            <td>94</td>
        </tr>
    </table>
    </body>
    </html>
    Output
    [
      {
        "Firstname": "Jill",
        "Lastname": "Smith",
        "Age": 50
      },
      {
        "Firstname": "Eve",
        "Lastname": "Jackson",
        "Age": 94
      }
    ]
    JSON-LD

    In this mode all JSON-LD is extracted from the HTML and outputed as JSON. Each JSON-LD item becomes an array item in the final output

    Input
    <html>
    <body>
        <div class="row">
            <script type="application/ld+json">
                {
                "@context": "http://schema.org/",
                "@type": "Person",
                "name": "Jane Doe",
                "jobTitle": "Professor",
                "telephone": "(425) 123-4567",
                "url": "http://www.janedoe.com"
                }
            </script>
        </div>
        <div class="row">
            <script type="application/ld+json">
                {
                "@context": "http://schema.org/",
                "@type": "Person",
                "name": "John Doe",
                "jobTitle": "Dancer",
                "telephone": "(425) 123-4568",
                "url": "http://www.johndoe.com"
                }
            </script>
        </div>
    </body>
    </html>
    Output
    [
      {
        "@context": "http://schema.org/",
        "@type": "Person",
        "name": "Jane Doe",
        "jobTitle": "Professor",
        "telephone": "(425) 123-4567",
        "url": "http://www.janedoe.com"
      },
      {
        "@context": "http://schema.org/",
        "@type": "Person",
        "name": "John Doe",
        "jobTitle": "Dancer",
        "telephone": "(425) 123-4568",
        "url": "http://www.johndoe.com"
      }
    ]
  • 4. Attribute Prefix

    The prefix to use for properties corresponding to HTML attributes. Set blank to use no prefix

    Input

    <html>
    <body>
    <table style="width: 100%">
        <tr>
            <th>Firstname</th>
            <th>Lastname</th>
            <th>Age</th>
        </tr>
        <tr>
            <td>Jill</td>
            <td>Smith</td>
            <td>50</td>
        </tr>
        <tr>
            <td>Eve</td>
            <td>Jackson</td>
            <td>94</td>
        </tr>
    </table>
    </body>
    </html>
    Attribute Prefix: @

    {
      "html": {
        "body": {
          "table": {
            "@style": "width: 100%",
            "tr": [
              {
                "th": [
                  "Firstname",
                  "Lastname",
                  "Age"
                ]
              },
              {
                "td": [
                  "Jill",
                  "Smith",
                  "50"
                ]
              },
              {
                "td": [
                  "Eve",
                  "Jackson",
                  "94"
                ]
              }
            ]
          }
        }
      }
    }
    Attribute Prefix: Empty

    {
      "html": {
        "body": {
          "table": {
            "style": "width: 100%",
            "tr": [
              {
                "th": [
                  "Firstname",
                  "Lastname",
                  "Age"
                ]
              },
              {
                "td": [
                  "Jill",
                  "Smith",
                  "50"
                ]
              },
              {
                "td": [
                  "Eve",
                  "Jackson",
                  "94"
                ]
              }
            ]
          }
        }
      }
    }
  • 5. Text Property Name

    The name of the property that holds the value of HTML text nodes

    Input

    <html>
    <body>
    <div>
        <p>
            Pre Header
            <h1>Title</h1>
            Post Header
        </p>
    </div>
    </body>
    </html>
    Text Property Name: #text

    {
      "html": {
        "body": {
          "div": {
            "p": {
              "h1": "Title",
              "#text": [
                "Pre Header",
                "Post Header"
              ]
            }
          }
        }
      }
    }
    Text Property Name: text

    {
      "html": {
        "body": {
          "div": {
            "p": {
              "h1": "Title",
              "text": [
                "Pre Header",
                "Post Header"
              ]
            }
          }
        }
      }
    }
 
Comments 1

Alphonse

Alphonse

  • one month ago

When you make this HTML,
you loose informations in JSON and you are unable to get the links in the good place ...

And is the sourcecode available for download ?

<html>
<body>
<h2>Size Attributes</h2>
<p class="toto">Images in HTML have a set of size attributes,<a href="xxeee">sqfsqdddd</a> which specifies the width and height of the image:<a href="xx">sqfsqd</a> sqfdss</p>
<img src="img_girl.jpg" width="500" height="600">
</body>
</html>

History
Mar 16, 2019
Handle HTML encoded values

May 26, 2018
Automatic input detection
May 26, 2018
Support for JSON-LD
May 15, 2018
Generic HTML Conversion Support
Sep 27, 2017
Tool Launched