You need to enable JavaScript to run this app.
PricingBlogGet startedSign-in
mutation, illustration

A more reliable page change tracking by ignoring mutations

Jul 11, 2022 by Roman Landenband

To standout and sell more "ACME Widgets™", modern webpages are sprinkled with Javascript to make things appear, disappear or transform (or mutate, as I freely use the term to mean these)

This is fine... Unless, you are trying to track that page for changes... Which may trigger a false positive for every time you sample that page off by a fraction of a second from the previous time you visited

As it happens, mutating content is the root cause and source of the top complaints from our users, looking to get notified only if content changed. So we set out to research a more reliable change tracking.

What's needed to capture modern web page? an overview

Beyond being conservative in terms of resources needed to load a page, some parts of the page will not load until brought into the browser view-port, therefore, to "capture" a page does not strictly equal "loading it".

Roughly, in order for us to "see" the fully loaded web page, we need to handle-

  1. wait for the body "onload" event
  2. images loading (responsive + scroll discovered) - if wanting to do a visual capture of said page
  3. dynamic content loading
  4. lazy content loading (activated only if inside viewport)

This mimics a web user visiting a page, waiting for it to load, scrolling through, etc..

Detecting site changes- what are they?

While different people may need to track different changes (graphic designers, analysts, product managers, marketers etc..) we will focus on a specific type of change here, where content is added, removed or modified.

Therefore, ideally, for every-time we visit a page, we want to know if & how the content has changed since the last time, and possibly, how it changed overtime.

Alas, due to content manipulated via Javascript ...for the purpose of creating an appealing webpage to sell more Widgets™... doing a naïve diff between two snapshots is unreliable. All the mutating parts need to be eliminated to reduce false positives.

Outlining our approach

  1. wait for the page to load
  2. observe mutations
  3. keep track of all the places where content has been mutating
  4. remove mutating content for the sake of reliable change detection

POC code with comments below. Copy and paste into dev console in your browser of choice to see the full cycle

"use strict";
/*jshint esversion: 8 */
/*jshint browser: true */

const myScript = document.createElement('script');
myScript.type = "module";

// language=JavaScript
myScript.innerHTML = `
  // we use the excellent "finder" library by "antonmedv"
  // https://github.com/antonmedv/finder
  // to resolve CSS selectors from DOM elements
  import {finder} from 'https://medv.io/finder/finder.js'

  // list of DOM selectors where mutations have been observed
  const mutationSelectors = new Set();
  // assign to window object to retrieve via debug console or another script
  window._mutationSelectors = mutationSelectors;

  // once consecutive maxIdleTicks have been counted, we are done
  const maxIdleTicks = 3;
  // how often to check for maxIdleTicks
  const intervalMS = 2000;

  // MutationObserver
  const observer = new MutationObserver(function callback(mutationList, observer) {
      mutationList.forEach((mutation) => {
        try {
          // we would never want the body element to end up
          if (mutation.target !== document.body) {
            // we use "finder" to get the best DOM selector for the mutating element
            const selector = finder(mutation.type === 'characterData' ? mutation.target.parentElement : mutation.target)
            mutationSelectors.add(selector);
          }
        } catch (e) {
          console.log("err", mutation, e);
        }
      })

    }
  )

  // we only care about content changes (characterData) and added/removed content (childList/subtree)
  const observerOptions = {
    childList: true,
    attributes: false,
    subtree: true,
    characterData: true
  }

  const initialize = () => {
    observer.observe(document.body, observerOptions);
    // the size of mutating selectors when checked last
    let lastSeenSize = 0;
    // counting towards maxIdleTicks
    let idleTickCount = 0;

    const ref = window.setInterval(() => {
      if (window._mutationSelectors.size !== lastSeenSize) {
        lastSeenSize = window._mutationSelectors.size;
      } else {
        // once maxIdleTicks is reached, we are done
        if (idleTickCount > maxIdleTicks) {
          clearInterval(ref)
          window.wrapUp();
        } else {
          idleTickCount++;
        }
      }
    }, intervalMS);
  }

  // we normally are interested in changes that happen once the body is loaded
  if (document.readyState === "complete") {
    initialize()
  } else {
    window.addEventListener("load", initialize);
  }

  window.wrapUp = () => {
    [...window._mutationSelectors].map(msItem => {
      Array.from(document.querySelectorAll(msItem)).map(el => {
        // hold on to element's parent
        const parentEl = el.parentElement;
        // remove element from DOM
        el.remove();
        // remove parent from DOM if empty
        if (parentEl.childElementCount === 0) {
          parentEl.remove();
        }
      });
    });
    console.log("all done, dump body text", document.body.innerText);
  };

`;

document.head.appendChild(myScript);

CueTap users?

You can start using the new feature in page tracking options