Last year, at a customer running a rather complex Service-oriented architecture, we observed resource demands were increasing, both in terms of server load and personnel hours required for day to day maintenance.
The performance issues were primarily
- services with rather big (50k) requests/responses,
- services with a high volume of very small requests/responses,
- limited number of VMs in cluster (very limited SAN capacity),
- busy-hour traffic profile,
- growing traffic,
- test environment not representative to production
While maintenance / operations mostly struggled with
- invalid requests and responses,
- low log readability;
- a lot of redundant log data (within same Breadcrumb / correlation id),
- deep XML structures,
- XML as text and CDATA (inner XML – yikes),
- various sensitive data exposed in logs,
- weekly remote services hiccups / downtime.
As ‘buy more servers’ was not an option, the silver bullet would be some kind of platform-wide upgrade which would ease strain on the system.
To alleviate some of the logging issues, we had previously created a custom log interceptor, which could both filter and import ‘inner XML’ so that the resulting document was pretty-printed as one, easy-to-read document. It was a DOM-based approach, and while it performed reasonably well, certainly there was a lot of overhead. Multi-threading/statelessness meant that for each request/response, various factories were initialized, and lots of object creation would take place during processing.
And while the logging might seem like a minor factor in the big enterprisy puzzle, relying heavily on external partners means many components do nothing more than proxy external calls; roughly the XML was processed twice, in the SOAP component and in our logging interceptor. Logging overhead meant requests/responses consuming more CPU, increasing the the response time and thus holding on to resources (memory, pooled connections etc) for longer.
Could we do better? Yes. Taking one step back and seeing the big picture, we note that all this XML parsing and serialization for logging purposes is taking place within an existing SOAP stack. A stack to which XML parsing and serialization are key features, and which it does very well – named CXF. Relatively free of risk, we can rely on the existing stack for handling special (error) cases, and thus build a simplified processor optimized for the most commons cases – specific to our logging requirements. Adding to the motivation is also my interest for the JVM runtime, performance testing and indeed stable and predictable systems.
I am happy to announce that two new projects have recently been published on Greenbird’s Github pages – xml-formatter-core and xml-formatter-components. The latter include modules for CXF, Mule ESB and JAX-RS and corresponding examples. The most important features:
- High-performance reformatting of XML
- Advanced filtering options
- Max text and/or CDATA node sizes
- Reformatting of XML within text and/or CDATA nodes (n-dimensional)
- Anonymize of element and/or attribute contents
- Removal of sub-trees
For more concrete details, see these simple examples of reformatted output.
While some of the features are unique in themselves, how about performance? The following metrics were set up:
- none – no logging
- raw – logging without pretty-printing
- cxf-transformer – logging with CXF implementation in Mule ESB 3.2 to 3.6 – Transformer-based implementation
- cxf-stax – logging with the latest CXF implementation – StAX-based implementation
- xf-plain – logging with the new module but without any additional features (CXF drop-in replacement)
- xf-filter – logging with the new module, with a simple SOAP-Header SAML anonymize filter (masks username and password)
Below are some results from the open source load testing tool Gatling. Targeted is a simple SOAP proxy service implemented on Mule ESB community edition version 3.6 with a lightweight Tomcat 8 backend, over 60 seconds with back to back requests, all on a single machine. Note the Req/s column.
1k payloads – 150 users:
For such a small payload, most of the time spent seems to be on initialization and logging itself, rather than pretty-printing. The cxf-transformer is heavily affected.
2k payloads – 150 users:
We’re starting to see the xf-plain pulling ahead, while cxf-transformer is still last.
6k payloads – 125 users:
Still the same, and xf-filter and cxf-stax have approximately the same number of requests per second.
21k payloads – 50 users:
For a larger payloads like this, initialization cost becomes less important than processing capacity. The cxf-transformer gains somewhat on the field. xf-plain has the most requests per second.
Comparing cxf-transformer, cxf-stax and xf-plain, we see that throughput is improved with the new module, and also response times are more consistent. The xf-filter has approximately the same performance as cxf-stax. cxf-transformer is the slowest.
Obviously you should do performance testing on your own stack, and over a longer time span – there were some transients and dips in performance during testing which were not inherently intuitive, although some noise was expected by running on a single machine.
The bottom line: Performance can be improved, for some scenarios by more than 10%, by a rather trivial update to the later Mule ESB versions, if you’re logging pretty-printed SOAP traffic. The new module adds filtering capabilities without taking serious performance hits.