Convert HTML Document to JSON


HTML to JSON Converter is used to convert HTML document to JSON by extracting the rows from HTML tables & converting it to JSON format. The JSON output is formatted & indented for easy viewing. Data types are automatically detected & converted to appropriate format in the JSON output.



Input: Paste HTML content below



Settings


Output: Converted JSON


HTML is the language of the internet. It is what creates HTML pages (even this one). In the old days, HTML used to be static with some JavaScript added into the mix for dynamic behavior and effects. Then HTML was served dynamically from the server side with the advent of server side programming languages such as PERL, PHP, ASP. And now there is a new trend where HTML is again being served as static resources with JSON (from REST web services) and JavaScript making it dynamic.

JSON pronounced as "Jason" is the de facto standard for data interchange on the web these days. It is a simple format that is easier to comprehend than XML. It also has less size than XML because of no closing tags. Interacting with JSON from JavaScript is extremely seamless.

Indent
This setting governs whether or not the output is indented. The indented output is easier for humans to comprehend. On the other hand, a non-indented output is compact & smaller in size (best for transmission). So, JSON is often minified which compacts & compresses the output by removing non-essential whitespace.
Indentation On
{
  "name": "John Doe",
  "age": 69
}
Indentation Off
{"name":"John Doe","age":69}
Unescape Json
If selected and the input appears to be HTML wrapped in JSON string, the input is unescaped before processing.
Mode
The conversion mode to use. Available options: Generic, Table & JSON-LD
Generic
In this mode all HTML nodes are converted into JSON objects & properties. Available options are Attribute Prefix & Text Property Name
Input
<html>
<body>
<table style="width: 100%">
    <tr>
        <th>Firstname</th>
        <th>Lastname</th>
        <th>Age</th>
    </tr>
    <tr>
        <td>Jill</td>
        <td>Smith</td>
        <td>50</td>
    </tr>
    <tr>
        <td>Eve</td>
        <td>Jackson</td>
        <td>94</td>
    </tr>
</table>
</body>
</html>
Output
{
  "html": {
    "body": {
      "table": {
        "@style": "width: 100%",
        "tr": [
          {
            "th": [
              "Firstname",
              "Lastname",
              "Age"
            ]
          },
          {
            "td": [
              "Jill",
              "Smith",
              "50"
            ]
          },
          {
            "td": [
              "Eve",
              "Jackson",
              "94"
            ]
          }
        ]
      }
    }
  }
}
Table
In this mode HTML <TABLE> nodes are converted into JSON objects & properties. Each <TR> is converted into a JSON object. The cells from the header row become JSON property names while the cells from other rows become the values of the JSON properties.
Input
<html>
<body>
<table style="width: 100%">
    <tr>
        <th>Firstname</th>
        <th>Lastname</th>
        <th>Age</th>
    </tr>
    <tr>
        <td>Jill</td>
        <td>Smith</td>
        <td>50</td>
    </tr>
    <tr>
        <td>Eve</td>
        <td>Jackson</td>
        <td>94</td>
    </tr>
</table>
</body>
</html>
Output
[
  {
    "Firstname": "Jill",
    "Lastname": "Smith",
    "Age": 50
  },
  {
    "Firstname": "Eve",
    "Lastname": "Jackson",
    "Age": 94
  }
]
JSON-LD
In this mode all JSON-LD is extracted from the HTML and outputed as JSON. Each JSON-LD item becomes an array item in the final output
Input
<html>
<body>
    <div class="row">
        <script type="application/ld+json">
            {
            "@context": "http://schema.org/",
            "@type": "Person",
            "name": "Jane Doe",
            "jobTitle": "Professor",
            "telephone": "(425) 123-4567",
            "url": "http://www.janedoe.com"
            }
        </script>
    </div>
    <div class="row">
        <script type="application/ld+json">
            {
            "@context": "http://schema.org/",
            "@type": "Person",
            "name": "John Doe",
            "jobTitle": "Dancer",
            "telephone": "(425) 123-4568",
            "url": "http://www.johndoe.com"
            }
        </script>
    </div>
</body>
</html>
Output
[
  {
    "@context": "http://schema.org/",
    "@type": "Person",
    "name": "Jane Doe",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "url": "http://www.janedoe.com"
  },
  {
    "@context": "http://schema.org/",
    "@type": "Person",
    "name": "John Doe",
    "jobTitle": "Dancer",
    "telephone": "(425) 123-4568",
    "url": "http://www.johndoe.com"
  }
]
Attribute Prefix
The prefix to use for properties corresponding to HTML attributes. Set blank to use no prefix
Input
<html>
<body>
<table style="width: 100%">
    <tr>
        <th>Firstname</th>
        <th>Lastname</th>
        <th>Age</th>
    </tr>
    <tr>
        <td>Jill</td>
        <td>Smith</td>
        <td>50</td>
    </tr>
    <tr>
        <td>Eve</td>
        <td>Jackson</td>
        <td>94</td>
    </tr>
</table>
</body>
</html>
Attribute Prefix: @
{
  "html": {
    "body": {
      "table": {
        "@style": "width: 100%",
        "tr": [
          {
            "th": [
              "Firstname",
              "Lastname",
              "Age"
            ]
          },
          {
            "td": [
              "Jill",
              "Smith",
              "50"
            ]
          },
          {
            "td": [
              "Eve",
              "Jackson",
              "94"
            ]
          }
        ]
      }
    }
  }
}
Attribute Prefix: Empty
{
  "html": {
    "body": {
      "table": {
        "style": "width: 100%",
        "tr": [
          {
            "th": [
              "Firstname",
              "Lastname",
              "Age"
            ]
          },
          {
            "td": [
              "Jill",
              "Smith",
              "50"
            ]
          },
          {
            "td": [
              "Eve",
              "Jackson",
              "94"
            ]
          }
        ]
      }
    }
  }
}
Text Property Name
The name of the property that holds the value of HTML text nodes
Input
<html>
<body>
<div>
    <p>
        Pre Header
        <h1>Title</h1>
        Post Header
    </p>
</div>
</body>
</html>
Text Property Name: #text
{
  "html": {
    "body": {
      "div": {
        "p": {
          "h1": "Title",
          "#text": [
            "Pre Header",
            "Post Header"
          ]
        }
      }
    }
  }
}
Text Property Name: text
{
  "html": {
    "body": {
      "div": {
        "p": {
          "h1": "Title",
          "text": [
            "Pre Header",
            "Post Header"
          ]
        }
      }
    }
  }
}
Feedback