HTML is the language of the internet. It is what creates HTML pages (even this one). In the old days, HTML used to be static with some JavaScript added into the mix for dynamic behavior and effects. Then HTML was served dynamically from the server side with the advent of server side programming languages such as PERL, PHP, ASP. And now there is a new trend where HTML is again being served as static resources with JSON (from REST web services) and JavaScript making it dynamic.
JavaScript Object Notation (JSON), pronounced as Jason, is the most common data interchange format on the web. Douglas Crockford first released the JSON specification in the early 2000s. It is a simple format that is easier to comprehend than XML. It is also smaller in size because it does not have closing tags. A wide variety of programming languages can parse JSON files. They also support the serialization of data structures to JSON. You can copy JSON text to JavaScript and start using them without any modifications.
Settings Explained
1. Indent
This setting governs whether or not the Output is indented. The indented Output is easier to comprehend. On the other hand, a non-indented output is compact. The smaller size is best for transmission over the network. So, we often minify JSON by removing non-essential whitespace.
Indentation On
{ "name": "John Doe", "age": 69 }
Indentation Off
{"name":"John Doe","age":69}
2. Unescape Json
If selected and the input appears to be HTML wrapped in JSON string, the input is unescaped before processing.
3. Mode
The conversion mode to use. Available options: Generic, Table & JSON-LD
Generic
In this mode all HTML nodes are converted into JSON objects & properties. Available options are Attribute Prefix & Text Property Name
Input
<html> <body> <table style="width: 100%"> <tr> <th>Firstname</th> <th>Lastname</th> <th>Age</th> </tr> <tr> <td>Jill</td> <td>Smith</td> <td>50</td> </tr> <tr> <td>Eve</td> <td>Jackson</td> <td>94</td> </tr> </table> </body> </html>
Output
{ "html": { "body": { "table": { "@style": "width: 100%", "tr": [ { "th": [ "Firstname", "Lastname", "Age" ] }, { "td": [ "Jill", "Smith", "50" ] }, { "td": [ "Eve", "Jackson", "94" ] } ] } } } }
Table
In this mode HTML <TABLE> nodes are converted into JSON objects & properties. Each <TR> is converted into a JSON object. The cells from the header row become JSON property names while the cells from other rows become the values of the JSON properties.
Input
<html> <body> <table style="width: 100%"> <tr> <th>Firstname</th> <th>Lastname</th> <th>Age</th> </tr> <tr> <td>Jill</td> <td>Smith</td> <td>50</td> </tr> <tr> <td>Eve</td> <td>Jackson</td> <td>94</td> </tr> </table> </body> </html>
Output
[ { "Firstname": "Jill", "Lastname": "Smith", "Age": 50 }, { "Firstname": "Eve", "Lastname": "Jackson", "Age": 94 } ]
JSON-LD
In this mode all JSON-LD is extracted from the HTML and outputed as JSON. Each JSON-LD item becomes an array item in the final output
Input
<html> <body> <div class="row"> <script type="application/ld+json"> { "@context": "http://schema.org/", "@type": "Person", "name": "Jane Doe", "jobTitle": "Professor", "telephone": "(425) 123-4567", "url": "http://www.janedoe.com" } </script> </div> <div class="row"> <script type="application/ld+json"> { "@context": "http://schema.org/", "@type": "Person", "name": "John Doe", "jobTitle": "Dancer", "telephone": "(425) 123-4568", "url": "http://www.johndoe.com" } </script> </div> </body> </html>
Output
[ { "@context": "http://schema.org/", "@type": "Person", "name": "Jane Doe", "jobTitle": "Professor", "telephone": "(425) 123-4567", "url": "http://www.janedoe.com" }, { "@context": "http://schema.org/", "@type": "Person", "name": "John Doe", "jobTitle": "Dancer", "telephone": "(425) 123-4568", "url": "http://www.johndoe.com" } ]
4. Attribute Prefix
The prefix to use for properties corresponding to HTML attributes. Set blank to use no prefix
Input
<html> <body> <table style="width: 100%"> <tr> <th>Firstname</th> <th>Lastname</th> <th>Age</th> </tr> <tr> <td>Jill</td> <td>Smith</td> <td>50</td> </tr> <tr> <td>Eve</td> <td>Jackson</td> <td>94</td> </tr> </table> </body> </html>
Attribute Prefix: @
{ "html": { "body": { "table": { "@style": "width: 100%", "tr": [ { "th": [ "Firstname", "Lastname", "Age" ] }, { "td": [ "Jill", "Smith", "50" ] }, { "td": [ "Eve", "Jackson", "94" ] } ] } } } }
Attribute Prefix: Empty
{ "html": { "body": { "table": { "style": "width: 100%", "tr": [ { "th": [ "Firstname", "Lastname", "Age" ] }, { "td": [ "Jill", "Smith", "50" ] }, { "td": [ "Eve", "Jackson", "94" ] } ] } } } }
5. Text Property Name
The name of the property that holds the value of HTML text nodes
Input
<html> <body> <div> <p> Pre Header <h1>Title</h1> Post Header </p> </div> </body> </html>
Text Property Name: #text
{ "html": { "body": { "div": { "p": { "h1": "Title", "#text": [ "Pre Header", "Post Header" ] } } } } }
Text Property Name: text
{ "html": { "body": { "div": { "p": { "h1": "Title", "text": [ "Pre Header", "Post Header" ] } } } } }
History
- Mar 16, 2019
- Handle HTML encoded values
- May 26, 2018
- Automatic input detection
- May 26, 2018
- Support for JSON-LD
- May 15, 2018
- Generic HTML Conversion Support
- Sep 27, 2017
- Tool Launched
Comments 3
Alphonse Copy Link
When you make this HTML,
you loose informations in JSON and you are unable to get the links in the good place ...
And is the sourcecode available for download ?
<html>
<body>
<h2>Size Attributes</h2>
<p class="toto">Images in HTML have a set of size attributes,<a href="xxeee">sqfsqdddd</a> which specifies the width and height of the image:<a href="xx">sqfsqd</a> sqfdss</p>
<img src="img_girl.jpg" width="500" height="600">
</body>
</html>
Sai Chand Copy Link
Hi,
Can I get the HTML to JSON Converter : Code to download ?
Sahan Ravindu Copy Link
Can I get the HTML to JSON Converter as an API service or the code?