Jul 11, 2022 by Roman Landenband
This is fine... Unless, you are trying to track that page for changes... Which may trigger a false positive for every time you sample that page off by a fraction of a second from the previous time you visited
As it happens, mutating content is the root cause and source of the top complaints from our users, looking to get notified only if content changed. So we set out to research a more reliable change tracking.
What's needed to capture modern web page? an overview
Beyond being conservative in terms of resources needed to load a page, some parts of the page will not load until brought into the browser view-port, therefore, to "capture" a page does not strictly equal "loading it".
Roughly, in order for us to "see" the fully loaded web page, we need to handle-
- wait for the body "onload" event
- images loading (responsive + scroll discovered) - if wanting to do a visual capture of said page
- dynamic content loading
- lazy content loading (activated only if inside viewport)
This mimics a web user visiting a page, waiting for it to load, scrolling through, etc..
Detecting site changes- what are they?
While different people may need to track different changes (graphic designers, analysts, product managers, marketers etc..) we will focus on a specific type of change here, where content is added, removed or modified.
Therefore, ideally, for every-time we visit a page, we want to know if & how the content has changed since the last time, and possibly, how it changed overtime.
Outlining our approach
- wait for the page to load
- observe mutations
- keep track of all the places where content has been mutating
- remove mutating content for the sake of reliable change detection
POC code with comments below. Copy and paste into dev console in your browser of choice to see the full cycle
You can start using the new feature in page tracking options