WebOrion® is glad to introduce a new feature into the existing Integrity Analytics engine – Smart Image Hash (SIH). SIH helps reduce false alerts regarding image changes by analysing them in a smarter way.
Images can make websites look more attractive and have been widely adopted ever since the inclusion of the image tag () in the HTML 2.0 specification back in 1995. Images are usually much bigger than text contents, hence methods to deliver images on the internet are constantly being improved upon to avoid network jams. Image compression and Content Delivery Networks (CDNs) are two common ways to help image delivery.
Some Background On Image Compression
File compression tools such as ZIP or RAR are commonly used to reduce file sizes. Similarly, images can be compressed to reduce their file sizes to improve their loading and transmission time. There are both lossy (e.g., jpeg) and lossless (e.g., png, bmp, tiff) compression techniques that are widely used. In the internet web browsing world, lossy compression (such as jpeg) is commonly used because its transmission is efficient. Our human eyes are also unable to tell the difference between the compressed and uncompressed images. E.g., can you tell the difference between the two images above, in Figure 1? The size of left image is 374 KB while the right one is 274 KB. The difference is imperceptible to the human eye, but it may make a big difference in the transmission and loading time of the image. The difference in loading time may be significant when there are many images to be loaded on the same webpage.
Image Compression In Content Delivery Networks (CDNs)
The user experience of site visitors is of utmost importance when visiting a site. Their experience may be affected if loading times are slow. This is where CDNs come in. A CDN works like a cache to improve response and loading time. It can deliver web content much faster and more reliably because it comprises many nodes across the world. In addition, most CDNs have a feature to enable image compression to specifically improve the response and loading time of images. Although image compression is a useful feature to have, this may create problems for agentless web monitoring solutions that make use of image hashing. Image hashing can be used to actively detect any changes as each unique image source has its own unique hash. Thus, the compressed and original image may have different hash values, potentially triggering many false alerts.
Why Smart Image Hash (SIH)?
Though the compressed and original images look the same to the human eye, they are actually different files and will produce different file hashes (eg. MD5 or SHA). Previously without SIH, if the website enables image compression, an alert will always be triggered upon polling because the image files (i.e. hashes) have changed from the baseline. However, the alert is a false alert as the image is changed due to compression rather than defacement.
Our new SIH feature is introduced to tackle this issue by taking advantage of proven image hash algorithms. File hash algorithm such as MD5 is sensitive to every bit in a file, which will generate different hashes for compressed and original images. However, image hash algorithms can output similar hashes for compressed and original images and quite different result for unlike images. We have tested various image hashing algorithms (including p-hash, d-hash, a-hash, etc) and found a combination that produces the best results for checking web images, whether they are compressed or defaced.
How It Works, In A Nutshell
When SIH is enabled, our WebOrion engines will check the baseline and latest images. It will return a score of these two images to indicate how similar they are – the minimum score is zero while the maximum score is 100. The lower the score, the more similar the two images are. A score of 0 means that the two images are totally identical, while a score of 100 means that the two images are completely different. In our testing of SIH, we have found that a score of 10 and below typically means the two images are essentially the same (even if compression is enabled). Thus, we recommend 10 to be our default setting which can be adjusted to cater to different websites if required.
Conclusion
SIH is a useful feature that works best in scenarios where the monitored websites are using CDNs, triggering many false alerts due to compressed images. SIH allows optimal monitoring of these images by allowing image compression, without triggering false and unnecessary alerts. This feature is currently included for free for our customers under our Content and Integrity Analytics engines. We look forward to you trying out this feature and giving us feedback on SIH!