Convert HTML document to JSON by extracting HTML tables from it
View ToolJavaScript Object Notation (JSON), pronounced as Jason, is the most common data interchange format on the web. Douglas Crockford first released the JSON specification in the early 2000s. It is a simple format that is easier to comprehend than XML. It is also smaller in size because it does not have closing tags. A wide variety of programming languages can parse JSON files. They also support the serialization of data structures to JSON. You can copy JSON text to JavaScript and start using them without any modifications.
HTML is the language of the internet. It is what creates HTML pages (even this one). In the old days, HTML used to be static with some JavaScript added into the mix for dynamic behavior and effects. Then HTML was served dynamically from the server side with the advent of server side programming languages such as PERL, PHP, ASP. And now there is a new trend where HTML is again being served as static resources with JSON (from REST web services) and JavaScript making it dynamic.
Settings Explained
1. Indentation
This setting governs how the output is indented which is something that varies depending upon your text editor settings or server side scripting language/framework that generates HTML markup. You have choice between the following indentation levels:-
- 2 Spaces
- 3 Spaces
- 4 Spaces
- 1 Tab
- None
2 Spaces
<html> <body> <div>The name's Bond, James Bond.</div> <marqueue>Shaken, not stirred</marqueue> </body> </html>
3 Spaces
<html> <body> <div>The name's Bond, James Bond.</div> <marqueue>Shaken, not stirred</marqueue> </body> </html>
4 Spaces
<html> <body> <div>The name's Bond, James Bond.</div> <marqueue>Shaken, not stirred</marqueue> </body> </html>
1 Tab
<html> <body> <div>The name's Bond, James Bond.</div> <marqueue>Shaken, not stirred</marqueue> </body> </html>
None
<html> <body> <div>The name's Bond, James Bond.</div> <marqueue>Shaken, not stirred</marqueue> </body> </html>
2. Path Delimiter
This setting governs how the column names are formed in complex JSON structures which have nested objects. You can choose between any of the 4 delimiters which will be used to join the property names and create the output CSV column.
Uderscore Path Delimiter
name|department|address_city|address_state John Doe|Engineering|Atlanta|Georgia Jane Doe|Billingr|Hayward|California
Double Underscore Path Delimiter
name|department|address__city|address__state John Doe|Engineering|Atlanta|Georgia Jane Doe|Billingr|Hayward|California
Slash Path Delimiter
name|department|address/city|address/state John Doe|Engineering|Atlanta|Georgia Jane Doe|Billingr|Hayward|California
Dot Path Delimiter
name|department|address.city|address.state John Doe|Engineering|Atlanta|Georgia Jane Doe|Billingr|Hayward|California
3. Cleanse Boolean Values
If selected, Boolean values in Sentence & Capital cases such as True, False, TRUE & FALSE are transformed to their lowercase versions before conversion to CSV. This makes sure that the JSON is valid by cleansing the boolean values. According to the JSON specification, such boolean values are invalid and only lowercase true/false are valid.
Use this if your input JSON has such boolean values in Sentence or Capital casing. However, this will also transform such words inside JSON strings (i.e inside double quotes)
4. Handle Multiple Jsons
If selected, multiple JSONs in the input are handled. Each Valid JSON must completely exist in one line.
Handle Multiple Jsons On
The following input works
{"name": "Robin Hood","department": "","manager": "","salary": 200} {"name": "Arsene Wenger","department": "Bar","manager": "Friar Tuck","salary": 50} {"name": "Friar Tuck","department": "Foo","manager": "Robin Hood","salary": 100}
Handle Multiple Jsons Off
The following input does not work
{"name": "Robin Hood","department": "","manager": "","salary": 200} {"name": "Arsene Wenger","department": "Bar","manager": "Friar Tuck","salary": 50} {"name": "Friar Tuck","department": "Foo","manager": "Robin Hood","salary": 100}
History
- Mar 16, 2019
- Support for TH header tags
- Mar 15, 2019
- Tool Launched
Comments 0