Thursday 19 July 2007

Performance issues resolution in Ax Part I

This is a subject on which a lot has been written by any number of people so I will concentrate on some small parts of the issues relating to this, and this is not meant as anything else but a starting point.

Mostly when you have problems in Ax, as in most other ERP systems :-), it is related to some sort of contention that is concurrent access to key central information, or alternatively to large quantities of data being used/created.

In most of the instances I have seen the quickest fix on SQL is always just adding more memory, not that I am a big advocate of throwing money at the problem but well that is normally the cheapest solution these days.

In SQL 2000 and 2005, the main issue is centered around one fact, lock escalation, basically your server can if it deems to be running out of memory decide to start locking not each row but a page (unit of storage used to contain multiple rows in the database) and even the whole table. When this happens you tend to get a situation where everyone is waiting for each other.

Picture Picadilly circus bus stop on a busy afternoon where you have lots of busses all coming in to pick up / drop off passengers and having to wait for each other and thereby causing havoc for other traffic, that is the best physical representation.

In Ax say the CustTrans table is concerned or InventTrans then as one process is using it another 5 different ones want to use it too, they cannot as the first is blocking and they use other tables which they then block for others thus making a traffic jam in the database.

SQL decides to do this lock escalation when it has used more than a certain percentage of the available memory that it has to manage locks, thus if you give it more memory to play with it is less likely to do lock escalation and therefore to create the traffic jam.

In SQL 2005 some new functionality has been added concerning Index Lock escalation, however this is hard to add in Ax as information on the indexes, as these are created by the syncronization engine. You would have to add the information by hand on each index :-( after creation and then redo this everytime Ax runs a syncronization.

Additionally as you can see in this KB where means of reducing the lock escalation are discussed it is possible to switch of using the startup flag -T1211 however the resulting error condition of it running out of memory is not pretty.

One option that seems intriguing is the option of making an incompatible higher level block

BEGIN TRAN
SELECT * FROM mytable (UPDLOCK, HOLDLOCK) WHERE 1=0
WAIT FOR DELAY '1:00:00'
COMMIT TRAN

in Ax the tables classically affected are a series of well known ones, the problem with the above in Ax (as well as most other ERP's) is that one hides others etc. If f.ex. you resolve your issues on CustTrans then perhaps CustTransOpen or CustSettlement becomes the problem etc etc so this is often not a solution that we can use to resolve our escalation issues, also unless we do the locking for the whole business day we do not handle all the cases :-(.

Changing our queries is the next possibility mentioned to create smaller transactions.

Well, that is something that is possible in a limited fashion, you will have to clarify functionally with the business whether they can accept the consequences, f.ex. not make sales orders with moe than xx lines in them. Not do recap invoicing, etc etc.

What you can do is use the system to help you detect where you have problems and also use the system to dissect exactly the problem is being originated from.

A number of tools are availeable to you, the main one which I recommend all Ax sysadmins to get familiar with is the SQL tab under the User Options of each User.

This provides an invaluable help in order to get information about what parts of your system are not performing as expected.

What I suggest all sysadmins to do is to as a minimum activate the SQL tracing set a threshold of perhaps 2000 (2 secs it is in milliseconds) and to store long queries and deadlocks in the Table (database).

This will store in the database (accessible under Administration/Inquiries/Database/SQL Trace log the detail of every time a query took more than 2 seconds to execute, including the complete call stack of the process when this happened. You can even jump directly to the offending code and view / edit it.

With a little reflection it is always possible to optimise such queries and to thus reduce your performance issues, it is just a question of thinking about it clearly.

In V4 a few impediments have been placed on your way to having this level of information you have to enable Client Tracing on the AOS instance otherwise the table will be empty as the process will not be called.

Also in V3 KR1+ you have an extended toolkit that you can install, after installing the KR which you should have acces to through customersource or partnersource this allows you to control in detail what is being tracked and also allows you to log blocking behaviour, any process waiting more than 3 seconds on another will automatically log the fact that it is being blocked and thus you can see the effect that the "picadilly peak hour jam" has had on your system.

The good thing about logs is you can be factual about the increase / resolutions that you apply to your system.

Unfortunately MS have not seen fit to extend the above tracing extension into V4 probably because it was a separate development path. I have I think now managed to migrate most of it onto a V4 SP1 based system, the only issue I have is with the blocking monitor, I need to work a little on the SQL side to find out what I need to do exactly in order for it to operate.

Once it is complete I will post it here for you.

Take Care
Sven

2 comments:

Simon Buxton said...

Good

Simon Buxton said...

Good post.

With 64bit SQL and Windows memory issues will eventually dissapear (as long as the customer actually puts a decent amount of memory in the box), leaving disk contention and intellegient choice of indexes.