All tags:

sling:alias property stops working in some scenarios when using AEM 6.3 and ways to resolve it

Created:2017-09-27
Updated:2017-10-23

Case

This is probably AEM 6.3 specific issue, hopefully Adobe is going to have this issue fixed soon. We’ve been experiencing sling:alias loss which affects most of the sites and pages. Since upgrade to AEM 6.3.

Our site is global and non-English websites rely heavily on AEM's sling:alias page property to support local language specific urls while the page names keep the same as that of English version for better tracking and comparison purpose. Let's say, we have a page design named "how-it-works" and it exists in 3 language versions, content layout in author would be something like below:

  • /content/our-company/gb/en/how-it-works.html
  • /content/our-company/mx/es/how-it-works.html (with sling:alias value "cómo-funciona" for Spanish)
  • /content/our-company/tr/tr/how-it-works.html (with sling:alias value "nasıl-çalışır" for Turkish)

These 3 pages will be visible by Google like below

  • https://www.example.com/en-gb/how-it-works.html
  • https://www.example.com/es-mx/cómo-funciona.html (by browser optimized URL display)
  • https://www.example.com/tr-tr/nasıl-çalışır.html (by browser optimized URL display)

while keep the same page name of "how-it-works" -- this can be used to compare these 3 pages in different markets while these 3 pages are having different urls.

After upgrade to AEM 6.3, authors noticed sometimes after content editing and page activation in author, the urls relying on these sling:alias are partially unaccessible with 404 code. I went to publishers and confirmed that sling:alias does exist there. This can be temporariry solved by changing sling:alias to something else and then change it back so the cache gets refreshed. But appparently this is not an option for us who rely so heavily on sling:alias property.

The thing is, authors didn't find any pattern to reproduce this issue -- sometimes it happens, sometimes it doesn't.......

Why

We thought it’s random or permission related or something wrong in our side. That's why I spent a lot of time checking these normal settings. I finally confirmed that nothing wrong in configuration or our code after several hours and started to suspect AEM silng:alias mechanism's robustness itself.

I then got lucky after I found I was able to reproduce this issue if I import different packages (one from QA and one from live) to my local publisher. Then by following this only clue, after hours of having fun with decompiled Java code, I finally found the root cause.

So, in this version of sling, by default, there is a caching mechanism to deal with sling:alias so that alias configurations are normally read from cache instead of JCR node. This also means this cache needs to be refreshed by some event listener each time JCR nodes get updated/deleted.

In class “org.apache.sling.resourceresolver.impl.mapping.MapEntries”, method “removeResource” deals with node removal event. As shown below, “null” is passed to “removeAlias” when “contentPath” (as a caching key) is “/content/our-company/mx/es” and the actual removed node is, say, “/content/our-company/mx/es/jcr:content/a/b/c/d/e”. Then, because “null” is passed to “removeAlias”, that method removes this caching key from caching dictionary. This causes alias caching under /content/Englishtown/mx/es gets cleared. Further child page activation might be able to bring themselves’ alias back, but the list won’t cover all until all child pages are activated.

remove-alias-null.png

How to solve it

I found in code there should be a switch which controls whether this advanced caching feature should be turned on. It's in in OSGi configuration: "Apache Sling Resource Resolver Factory".

The issue was gone after I changed this configuration. It might not cause some performance issue for us since we have another 2 layers of caching mechanisms on top of publishers.

optimize-alias-resolution.png

Follow up

I take this issue as something serious since returning 404 for some urls really hurts clients like us. I got surprised when 6.3 has been out for several months and this issue has been there for that long…… other uses don’t use alias?

I was about to send a put request to sling project before I found this issue has been spotted in This case: and fixed in this commit (July 24th this year): with code commit comment:
SLING-7018: Fix a bug that removed to many aliases in certain cases when a resource got removed.

Glad to know it's been noticed and fixed in the latest version of sling. For AEM, I'd wait until a stable release patch is out.

This article's tags: