Take care of your selectors!

Recently I have shown two scenarios, where selectors can be used as a way to cache several different views of a single page. This allows one to avoid HTTP parameters quite often, reducing the load on your machines and speeding up your website.

Let’s assume that you have the already mentioned handle /etc/medialibrary/trafficjam.html and your templates support to display the image in 3 different sizes “preview”,”big” and “original”. So what does happen, if somebody chooses to request the URL “/etc/medialibrary/trafficjam.tiny.html”?

I checked some CQ-based websites and tested this behaviour. Just adding a dummy-selector. In most cases you get a proper page rendered, looking the same way as without the selector. So most templates (and also template developer) ignore the selector, if the that specific template isn’t expected to handle them. So it is good, isn’t it?

Well, in combination with the dispatcher cache it isn’t good. Because the dispatcher caches everything which is returned with an HTTP statuscode of 200 from CQ. So just adding a “foo”-selector will place another copy of the page to the dispatcher cache. This happens also with a “foo1” selector and so on. In the end the disk is full and the dispatcher cannot write any more files to the disk, but will forward every request to your CQ.

So, how can you bypass this problem? As said, the dispatcher caches only, when it receives an HTTP statuscode 200. So you need to add some code to your templates which always check the selectors. If this specific template doesn’t support any selector, fine. If called with a selector, don’t return a statuscode 200, but a 302 (permanent redirect) to the same page without any selectors or just a plain 404 (“file not found”); because calling this page with selectors isn’t a valid action and should never happen, such a statuscode is ok. The same applies when the templates supports a limited set of selectors (“preview”, “big” and “original” in the example above); just add them as a whitelist and if the given selector doesn’t match, return a 302 or 404 code.

So you don’t pollute your cache and still have the flexibility to use selectors. I think that this will outweigh the cost of adjusting your templates.

5 thoughts on “Take care of your selectors!

  1. The same thing you have to do for the extensions. 😉
    But why should it happen that a wrong selector or extension will be called?
    A wrong implementation?
    Users that calls the wrong pages?

    1. Yes, it also applies to extensions.

      Why would one ever trying to login using the root-account on a foreign computer for which he hasn’t the allowance to do so? Why would some people running denial of service actions against internet websites? If there’s a possibility to misuse a feature, people will sooner or later misuse it, usually without the knowledge of the owner …

  2. I accept the dos example, but the password?!

    only a few people know about the caching of the system. So i have to know that the system makes a new file on diskspace if i call the page with other selectors. –> thats the reason why I ask: Who should do that?

    but your right. this is a problem, but on my opinion, there should be disk monitoring to check the disk size and if the free disk size is less than a configured amount then an automatic clean (delete dispatcher cache) should happen.

    you can’t ever react on each selector/extension on this world
    you will always miss some.

    and you can’t garantie that each techie now about the whitelist. sometime there should be a fast reaction on a problem an then you will forget it.

    and an implementation of a whitelist, costs a lot of money and as shown above you can’t get everything

    a monitoring & cleaning version costs perhaps a tool or 1 day of configuration and thats it

    1. Hi stejan

      You’re right, in reality it’s the easier way to remove parts of the cache; but if someone is misusing your system, you need tot make sure that you only remove the files, which are not requested from a “regular” user. Otherwise your cache-hit ratio in the dispatcher goes down quite dramatically and your CQ will probably overload. Then you need to think about if the dispatcher is useful at all.

      So your objective is to keep garbage out of the dispatcher cache, so you don’t need to clear you dispatcher cache every once in a while.

      Regarding whitelisting: Yes, it may be a pain in the ass to keep your whitelists up to date. Maybe you don’t know every selector, because they are dynamically created. But at one point you have to start. Security isn’t that easy 🙂

Comments are closed.