Thursday, July 25, 2013

Oracle SOA Blackbelt training June 2013 Berlin

The Oracle SOA Blackbelt training in Berlin in June of 2013 has provided me with some valuable new insights into various topics related to Oracle SOA Suite. Below are some examples of things which I found interesting to share. These are mostly not literally from the slides but written down in my own words. Some examples have been expanded a bit by additional resources I've found. This not a complete list as the training covers quite a lot of material. I've focused on topics which can relatively easily be implemented or considered. The training also covered a lot of background which is harder to summarize and make concrete in practices/suggestions. The topics are various and not written down in a particular order.

SOA Best practices

The presentation on SOA best practices contained a lot of good suggestions. These are a few of them.

Problem; Over usage of dehydration causes much overhead. Examples; synchronous non-idempotent services, multiple mid-process receives, dehydrate/wait activities in processes.
Recommendations; Avoid chattiness, design services to be idempotent, if possible avoid asynchronous services (callbacks cause thread/transaction overhead)

Problem; Usage of FlowN where N is unconstrained can cause resource problems and lack of control.
Recommendation; Do not base N in FlowN on the data. Design the process using the driver / worker pattern (driver hands small chunks to the worker and the worker processes this). This can for example be implemented by using queues for decoupling/performance.

Problem; Asynchronous services cause overhead. This can become a problem if there are large numbers of asynchronous processes waiting for a response since for every callback, a new thread/transaction is needed and a callback needs to be matched to a correlation table which takes longer if there are a lot of open processes.
Recommendation; Design processes to be synchronous as much as possible. avoid nesting of asynchronous processes. also avoid synchronous processes calling asynchronous processes

Problem; A single BPEL process does batch processing of a large amount of messages. This takes a lot of memory and causes a lot overhead for storing audit information.
Recommendations; Put the work to be done in a separate BPEL process and optimize this process.  design for worst case scenario's. implement retry mechanisms in fault-policies. implement your own scheduling mechanism to spread the load. if no message level processing is needed, ODI might be an option.

Problem; Scope variables are dehydrated and when the variables become large, this causes overhead.
Recommendations; Use local variables whenever possible. assign portions of the message to scope variables.

BPEL is meant for service orchestration. It's not a procedural programming language.
Recommendations; use declarative constructs instead of elaborate custom constructions. use the skip condition instead of if statements. use assertions before and after invokes. use pick activities to time responses

Problem; Identifying BPEL processes can be difficult due to lack of business content in the EM views.
Recommendation; Set the composite title to a business value. it is possible to search for this name. business transaction keys and sensors (both have to be custom implemented) can be used to identify a flow instead of only the ECID since the ECID can not always be traced back to business context.

Diagnostics

I usually tend to look in the log files and in the Enterprise Manager if something goes wrong. There are however several other options;
- creating dumps. the following provides a nice overview; http://docs.oracle.com/cd/E25178_01/admin.1111/e10226/soacompapp_diag.htm#BABJFIFG
- collect info from the MBean browser; oracle.as.soainfra.bpm; Server/bpel:CubeDispatcher ReadXMLDispatcherTrace and oracle.as.soainfra.bpel; Server/BPELEngine:SyncProcessStats and AsyncProcessStats

Database growth (BPEL/BPM)

A concern for managing SOA Suite installations is the growth in database size. It is essential to think about a cleaning strategy. What I learned during the training is that with some programming practices the amount of information saved can also be reduced.

What causes growth;
- creating process instances
- updates to a message payload (workflow)
- asynchronous operations
- process scopes, task assignments. looping back to tasks
- audit (entry and exit of scope and model elements)

Thus it can be more efficient to build larger processes instead of multiple smaller ones. Also scoping and the use of a lot of model elements causes the database size to increase. For example, it could be (untested) more efficient (database space wise) to create a single assign activity with a lot of actions in it instead of several smaller assign activities.

Authorization, authentication and policies

Oracle Platform Security Services provides an abstraction layer to authorization/authentication/role/group providers. In Oracle BPM, users/groups are used but also application roles. The users/groups can be stored by LDAP providers. If there are multiple LDAP providers, these providers can be virtualized by Oracle Virtual Directory (OVD). When OVD is not available at a customer, libOVD can be used to provide a limited lightweight alternative. See for example http://fusionsecurity.blogspot.nl/2012/06/libovd-when-and-how.html. Application roles are stored in a policy store which can also be LDAP based. See for example; http://docs.oracle.com/cd/E12839_01/core.1111/e10043/cfgauthr.htm. Another option is to have it database based. See for example; http://redstack.wordpress.com/2011/10/29/soa11g-database-as-a-policy-store/. Identities can be queried via the browser, for example at; http://localhost:7001/integration/services/IdentityService/identity.

Transactions

The Blackbelt training contained a presentation on the BPEL Engine internals. This provided some additional points to pay attention to when developing. There are 4 'types' of BPEL processes. These 4 types can be categorized by 2 properties; synchronous or asynchronous and durable or transient. The different types behave differently in respect to transactions and threads. This has consequences for exception handling/propagation. The following should be avoided; a transient asynchronous process and a durable synchronous process. Transaction semantics can also have consequences for performance. See for example; http://javaoraclesoa.blogspot.nl/2013/06/oracle-soa-11g-bpel-transaction.html

The Event Delivery Network

Events can be published with different settings; guaranteed delivery and once-and-only-once (OAOO). With the guaranteed delivery setting, local transactions are used (non-XA), the EDN_EVENT_QUEUE is used and there is the possibility of duplicate messages (in case of catastrophic failures). Also, the dequeue transaction is committed when all subscribers have received the message. Retries are not possible in case one subscriber fails to pickup the message. Once and only once setting works differently. A second queue is used; EDN_OAOO_QUEUE. Each subscriber picks up the message in it's own transaction. An XA connection is used with global transactions and the dequeue action can be retried. The EDN can be debugged in the following ways; by using the EDN servlet; http://<host_name>:<port_number>/soa-infra/events/edn-db-log (when EDN is AQ based (which is the default)). This servlet uses the EDN_LOG_MESSAGES table in the SOAINFRA schema. The following loggers are related oracle.integration.platform.blocks.event, oracle.integration.platform.blocks.event.saq, oracle.integration.platform.blocks.event.jms. In the Enterprise Manager, the log level can be tuned. The delivery of messages can be paused by setting the 'Paused' property of oracle.as.soainfra.config/EDNConfig:edn in the System MBean browser.

Local Invocation Optimization

If certain criteria are met, the SOAP/HTTP layer can be skipped when calling a service. The criteria are; the processes have to be on the same server, client/server policies must allow it (this is a property in the policy file). The same server requirement has implications for the use of loadbalancers in cluster configurations. Check the following part of the documentation for more details on how the 'same server check' is performed; http://docs.oracle.com/cd/E28271_01/admin.1111/e10226/soainfra_config.htm. You should also check out https://forums.oracle.com/thread/2302988 for some more information on how to make sure local optimization is used in case of clustering/load balancing setup's. A recommendation is to avoid too many small processes since this increases complexity (in order to achieve local optimization) and overhead.

BPEL fault handling best practices

The following best practices were mentioned in the training (I've rephrased them for brevity);
- always have a catch-all block (selectionFailure for example cannot be caught by using a fault policy)
- use named exceptions for business faults
- when using fault policies, always have a default action
- rethrow faults from fault policies in order to catch them in a BPEL process (when no fault handling action has been defined in the policy)
- notify the source system something has gone wrong
- think about how enable automatic recovery can have impact on transactions and business functionality. this can also be disabled; http://www.albinsblog.com/2011/10/oracle-soa-suite-11g-disabling-auto.html#.Ue_sxm3JWVA
- for asynchronous processes, after a mid-process receive, check the response (which can contain business faults) and terminate the process on error after sending a message to a notification service

MDS

An efficient way to use the MDS is to use a local filebased repository during development and use the database based MDS during runtime. MDS configuration is stored in adf-config.xml files. MDS files can be referred to by a path ; <Store_Root>/<Partition>/<Namespace>/<Resource>. When an MDS object changes, all dependant resources need to be recompiled. The MDS can be used to avoid server startup issues due to dependencies. See; https://blogs.oracle.com/aia/entry/aia_11g_best_practices_for_dec.

Mediator

The Mediator is the only product which has an out of the box resequencer to provide ordening of messages. See for example; http://docs.oracle.com/cd/E17904_01/integration.1111/e10224/med_resequencer.htm for details. Also it currently is the only component supporting Schematron based validations (see for example; http://beatechnologies.wordpress.com/2011/04/06/using-schematron-in-oracle-soa-suite-11g-for-validating-xml/). The Mediator has an Hearbeat infrastructure; if in a clustered environment one instance of a Mediator fails (for whatever reason) this is detected and the other node in the cluster will process the message. Information on message 'lease' is stored in the table; MEDIATOR_CONTAINERID_LEASE. A message is locked when processing starts and released afterwards. The Heartbeat framework can be configured. See; http://docs.oracle.com/cd/E14571_01/integration.1111/e10226/med_config.htm#BABEDHBJ. Sequential routing rules are executed in a single transaction and thread. Parallel routing rules use 3 threads; inbound threads, locker threads and worker threads. Each worker thread uses it's own transaction. Parallel routing rules can be debugged by looking at the MEDIATOR_DEFERRED_MESSAGE. One row is one message in a parallel routing rule.

Oracle Service Bus

Local transport can be used for internal service chaining (for example when calling reusable components). Local transport services can not be invoked from outside the service bus and are not published to a UDDI. See for example http://docs.oracle.com/cd/E23943_01/dev.1111/e15866/local.htm. The split/join pattern which can be implemented in the OSB uses a BEA BPEL implementation and works in-memory (not persistent). The service bus is minimalistic; by default, most features are turned off and the focus is on performance/high throughput. Oracle BPEL for example is more maximalistic and if you want high performance there you should turn things off.

Cube engine internals

Work items for the same instance are not allowed to execute concurrently. This implicates that the parallel execution in for example for-each/while loops is 'simulated' and not truly parallel. This knowledge helps understanding the behaviour of the nonBlockingInvoke setting. See; http://docs.oracle.com/cd/E23943_01/core.1111/e10108/bpel.htm#ASPER99890. The nonBlockingInvoke setting creates new threads for invocations but the invocations are still executed in sequence. In practice, this leads to performance degradation.