
THE OVERVIEW
Derivedata is the hub empowered robust platform which does continuous monitoring of different kinds of websites (HTML, RSS, AJAX, Angular, React etc.) and delivers structured information in the form of API.
Derivedata is the hub empowered robust platform which does continuous monitoring of different kinds of websites (HTML, RSS, AJAX, Angular, React etc.) and delivers structured information in the form of API.
The platform has a simple dashboard which enables system administrators to receive email alerts and statistics for rendering reports and charts.
To design a page monitoring and extraction system which can monitor and scrape data from websites, providing useful information.
The platform should be able to deliver information instantly as and when its published in the websites.
The system should be able to notify administrators with email alerts in case the configured threshold is exceeded.
Our solution on page monitoring and extraction rely on the powerful technique of rabbitmq and reds queue system to identify newly added contents / updated contents from the websites. It then extracts this into structured API with metadata like initial revision, current revision and previous revision.
As soon as a website is added into our Node Js platform with the required configuration, the extractor system crawls carefully to identify all the potential links of the website called spiders and stores them on to elastic search.
Based on the configured intervals, the extractor system looks for new / updated contents on the website, scraps it and updates the API which can be embedded into any platform like mobile or web.
Unit 6, Hounslow Business Park,
Alice Way, Hounslow,
Middlesex
TW3 3UD
302 Cocoa Studios, The Biscuit Factory,
Tower Bridge, London, SE16 4DG
IOD 118 Pall Mall
+44 (0) 20 70 60 5304