SharpRobotica.com – Sharp ideas for the software side of robotics

Sharp ideas for the software side of robotics

Karto Mapping Goes OSS and now Integrates with ROS!

The past year has been very exciting watching the momentum that’s been building behind Willow Garage‘s Robot Operating System (ROS).  While ROS is going in a very good direction, its introduction to the market arguably made it just “that much harder” for developers to select which platform to use for the software backbone of their mobile robotic platform.  For example, when one looks for software options for the implementation of SLAM, there is a plethora of possibilities including numerous proprietary and open-source options.  While nodes have been specifically developed for ROS to provide SLAM capabilities, it’s terrific news to see that Karto mapping, by SRI International, has gone open-source and has worked with Willow Garage to be integrated with ROS’ navigation stack.

In addition to being integrated with ROS, going open-source opens the door for this solid SLAM component to become fully integrated with other robotics software backbones such as Orca, Carmen, and Urbi Forge, to name a few.

The original announcement may be found at http://www.ros.org/news/2010/04/karto-mapping-now-open-source-and-on-coderosorg.html.

Enjoy!
Billy McCafferty

Checklist for Developing Message-Based Systems

Even when developing the most basic CRUD application, we ask ourselves a number of questions – whether we realize it or not – during the initial phases of development concerning the architecture and construction of the project.  Where will the data be persisted?  What mechanism will be used to communicate with the database?  How will data from the database be transformed into business objects?  Should separated interfaces and dependency injection be employed to maintain a clean separation of concerns between application logic and data access objects?  What UI components will be leveraged to speed development of the UI layer?  When developing message-based systems (see Message-Based Systems for Maintainable, Asynchronous Development for an introduction) it’s immensely helpful to formalize such questions about how the systems will be designed and developed.  Creating a checklist of such items to decide upon, and formalizing answers for the application context, helps to:

  • Standardize how components (applications participating in the message-based system) will communicate with the messaging middleware;
  • Provide architectural guidance throughout project development concerning how components will leverage message end-points;
  • Define the data model that will be used for sending commands and exchanging data among components;
  • Assist new developers on the team with getting up to speed with maintaining and extending the message-based system; and
  • Support consistency (and predictable maintainability) amongst the components of the system.

This post provides an addendum, if you will (I’m sure you will), to Message-Based Systems for Maintainable, Asynchronous Development for complementing the article with a checklist of questions and architectural topics that should be discussed, decided upon, and/or spiked before beginning development of your budding message-based system.  Accordingly, this does not describe a methodology for developing message-based systems, but simply a checklist of topics which should be taken into consideration before development begins.  Many of the checklist items below serve as good starting points for team discussion and for the development of architectural spikes for demonstrating implementation details.

Messaging Middleware

  • What messaging middleware solution will form the background of the message-based system?
  • How are channels addressed in the selected middleware?  (E.g., as a string or as a port number?)
  • How does the nomenclature of the middleware match up to standard elements of message-based systems and design patterns?  E.g., in Robot Operating System (ROS), a channel is called a “topic.”  Likewise, is their middleware specific nomenclature for elements such as component, publisher, subscriber, point-to-point channel, publish/subscribe channel, message, command message, request-reply message, router, message end-point, and other messaging design patterns?
  • What mechanism is available for peeking at messages going over a point-to-point channel?
  • What utilities are available for viewing active channels, publishers, and subscribers?
  • What other tools does the middleware provide for debugging and observing the messaging system?  Will custom utilities need to be developed to autment development and debugging capabilities?

Integration with Messaging Middleware

  • How will message end-points publish a messages?
  • How will message end-points subscribe to a channel for messages?
  • How will message end-points poll a channel?
  • What technique will be used to decouple the component (participating application) from the message end-point when sending a message?  E.g., dependency injection, separated interface, or other?
  • What technique will be used to decouple the component from the message end-point when receiving a message or polling a channel?  E.g., callback, separated interface, or other?
  • Should exceptions raised by the messaging middleware be wrapped in component specific exceptions for looser coupling?  If so, what will be the standard exceptions?  Additionally, how will such exceptions be logged and tracked for debugging; e.g., email alerts, logging, exception message channel?

Channels and Routing

  • What channels will be created and what data type should each channel carry?
  • Will a hierarchical naming strategy be leveraged for naming and organizing channels?  If so, what are the naming conventions for the project?
  • Has an invalid message channel been setup with a logger to capture and record invalid messages?
  • How should a predictive router be implemented, when needed?
  • How should a reactive router (e.g., a message filter) be implemented, when needed?

Message Elements

  • What data elements may the header of a message contain and how should each be specified?  E.g., return address, message expiration, timestamp, origin, format indicator, etc.
  • How should a message identifier and correlation identifier be generated and included in the header?
  • What is the messaging data model supported and/or enforced by the messaging middleware?  E.g., ROS supports http://www.ros.org/wiki/msg.
  • What canonical data model will be used to imbue message content with domain specific semantic meaning?  E.g., JAUS.
  • How will message translators and/or message mappers be leveraged for converting between each component’s domain model and the canonical data model?
  • How will a command, document, event and request-reply message be composed?

Other Topics

  • How will components maintain information that is associated with a correlation identifier?
  • Are there existing components which need to interact with the messaging middleware which cannot be modified?  If so, how will each communicate with the messaging middleware?
  • How will each component participate with the message based system; e.g., as a polling consumer, as an event-driven consumer, as strictly an event publisher, as a combination of these?
  • Should exceptions raised by a component participate in a global exception tracking and debugging mechanism?

As stated previously, this checklist is not intended to provide a methodology for developing message-based systems, but should serve as a good basis to make sure “T’s are crossed and I’s are dotted” when deciding upon the major architectural aspects and development techniques that will be leveraged throughout the development of your (team’s) message-based system.

Enjoy!
Billy McCafferty

Message-Based Systems for Maintainable, Asynchronous Development

Preface (you know it’s good if there’s a preface)

In Architectural Paradigms of Robotic Control, a number of architectures were reviewed including deliberative, reactive, and hybrid architectures.  Each of these exhibit a clean separation of concerns with layering and encapsulation of defined behaviors.  When implemented, the various capabilities, such as planners and mobility controllers, are encapsulated into discrete components for better reusability and maintainability.  A pivotal aspect not discussed in the previous article is how the various system layers and components communicate with each other, such as reporting sensor feedback and sending commands to actuator controllers.  Effectively resolving this communication challenge is not only important to robotic systems but to many other industries and domains for the successful integration of disparate applications.

To give credit where credit is due, this article pulls quite heavily from the patterns, taxonomy, and best practices presented in Enterprise Integration Patterns (Hohpe, 2003).  This well organized book is chock full of hard learned lessons and solid guidelines for developing maintainable message-based systems.  This article should not be seen as an adequate replacement for that book (it’s more like cliff notes with a spackling of robotics bias); indeed, Enterprise Integration Patterns should have a prominent place on your bookshelf if you’re developing message-based systems – so read this post and then browse http://www.eaipatterns.com/ while waiting for your copy to arrive to delve deeper.

A Need for Message-Based Systems

A few industries in particular, such as finance, healthcare and robotics, are demanding integration of an intimidating number of separate technologies that may be spread across computers, networks, and/or built upon a variety of technological platforms.  Not only is this integration tricky, it can come with a significant cost to performance and maintainability if not implemented correctly.  Accordingly, a solution is needed which facilitates loosely coupled integration while accommodating the performance demands of the task at hand.  Taking a message-oriented approach to inter-application communications is one such way to accommodate these demands in a maintainable manner without sacrificing performance.  This article gives an introduction to developing message-based systems using messaging middleware, describes taxonomy for discussing messaging topics and patterns, and includes a number of best practices.

Before delving further, it’s important to clarify a few terms that will be used frequently:

  • Messaging Middleware (aka, a message bus):  a 3rd party application which provides messaging infrastructure and capabilities (e.g., MSMQ, MS Concurrency and Coordination Runtime (CCR), Robot Operating System (ROS)),
  • Component:  a stand-alone application or piece of executable code which communicates with the messaging middleware,
  • Message-based system:  the entirety of the system including all integrated components and the messaging middleware.

As stated, messaging provides one means of facilitating inter-component communications.  But as with any design approach, the project requirements must be carefully considered to determine if messaging is the appropriate mechanism for integration.  While messaging is robust and facilitates integration, it also adds complexity and indirection.  So before deciding to use messaging as the means of integration, consider all component integration options including (Hohpe, 2003):

  • File Transfer:  wherein a component produces files of shared data which other components consume,
  • Shared Database:  each component stores and retrieves data from a common database,
  • Remote Procedure Invocation:  each component exposes specific procedures to be invoked remotely for exposing behavior and exchanging data,
  • Messaging:  each component connects to a common messaging system, using messages to invoke behavior and exchange data.

Determining which integration approach is most suitable to your project’s needs is beyond the scope of this article, focusing instead specifically on messaging.  In turn, we’ll review important elements of developing a message-based system including:  message channels, messages, message routers, and message endpoints.

Message Channels

When a component sends information to another component in a message-based system, it adds the information to a message channel.  The receiving component then retrieves the information from the message channel.  Different channels are created for each kind of data to be carried; having a separate channel for each datatype better enables receiving components to know what kind of data will be retrieved from a given channel.  For using a channel, each channel is addressable for sending and retrieving messages to/from them.  How a channel is addressed varies depending on the messaging middleware being leveraged, but it’s usually a port number or a unique string identifier.  As a good practice for keeping channels organized, if string identifiers are available, a hierarchical naming convention may be employed to label channels by type and name; e.g., a channel carrying laser scans might be called “Perception/LaserScans.”

There are two basic kinds of message channels:

  • Point-to-Point Channels (aka – client/server style):  routes messages for components to talk directly with other components; e.g., a remote procedure call to another component.  A message over a point-to-point channel only has a single receiver; so while the sender may not necessarily know who the receiver is, the sender can rest assured that the message will only be received by one receiver – it’s a FIFO queue.
  • Publish-Subscribe Channels (aka – broadcast):  routes messages for components to publish data and an arbitrary number of components to subscribe to that data.  A copy of the message is generated for each subscriber on the channel.

In addition to channels intended to carry information among components, it is a good practice to setup an invalid message channel that bad-formed or unreadable messages may be forwarded to for logging and to assist with debugging.

Messages

With a function call, a simple parameter or object reference may be passed and retrieved by the invoked method, sharing the same memory space.  But when passing data between two processes with separate memory spaces, the data must be packaged into a “message” adhering to an agreed upon format which the receiver will be able to disassemble and understand.  The sender of the message passes the message via a message channel.  The receiver retrieves the message from the message channel and transforms the message into internal data structures appropriate for the task at hand.

A message is made up of two parts:

  • Header:  describes the data being transmitted and details concerning the message itself; e.g., origin, timestamp information, message expiration (if content is time-sensitive), message identifier, correlation identifier, return address, etc., and
  • Body:  the data content that the receiver is looking to use.

When sending a message, the sender intends for the message to be used, or responded to, in a particular way.  The intention of the message may be described as being one of the following:

  • Command Message:  invokes a procedure in another application,
  • Document Message:  passes a set of data to another application,
  • Event Message:  notifies another application of a change in state, and
  • Request-Reply:  requests a reply from another application.

Event messages deserve a bit more discussion.  In its simplest form, an event message would simply be informational, letting subscribers know that an event has occurred; e.g., a new laser scan is available.  If subscribers would like details concerning the event, they would send a request-reply to the sender of the event to provide further details; e.g., the laser scan details.  Alternatively, the event could be a document as an event to inform subscribers that an event has occurred along with the details of that event; e.g., a new laser scan is available with laser scan details included.  The size of the event details and the rapidity in which the event occurs should be considered when deciding between publishing simple event messages and document messages as events.

Request-reply messages could also use a bit more describing.  A request-reply is usually implemented as two point-to-point channels.  The first channel delivers the request as a command message, while the second carries the reply back to the requestor as a document message.  To keep the replying component more loosely coupled and reusable, the requestor should include a return address indicating the channel that the replier should use to publish the reply.  After receiving the reply, a challenge for the requestor is to then correlate the reply to the original request.  If the requestor is sending a number of requests in succession, it will likely be difficult to keep clear – if it matters – which request a reply is associated with.  To resolve this, every request may include a unique message identifier that the replier would then include as a correlation identifier.  (A message could have both a message Id and a correlation Id.)  The requestor uses the correlation identifier to “jog its memory” concerning which request the response is for.  But frequently, a request-reply is in context of a particular domain object, such as a terrain map or a bank transaction; but the correlation Id doesn’t include such information.  To assist, the requestor can maintain a mapping (e.g., hashtable) between message Ids and relevant domain object Ids which are related to the original request.  When the reply is received, the mapping may be used to load the appropriate domain objects and take further action, accordingly.

Obviously, it is important that the senders and receivers of a message system agree upon the format that messages will take for clear interoperability, better reusability of components, and extensibility of the system.  Consequently, a canonical data model should be well defined that all applications will adhere to.  The canonical data model does not dictate how each application’s domain model must be structured, only how each application must format data within messages.  Message translators are developed to convert the sending application’s domain model into the canonical data model before sending a message; receivers of messages then use their own message translators to translate the message into their own domain.  This mechanism allows applications built on completely different technologies (e.g., C#, Lisp, and C++) to communicate with each other and exchange data.  Many off the shelf messaging systems define their canonical data model which must be adhered to.  For example, the Robot Operating System (ROS), which we’ll looking at in more detail in subsequent posts, defines their canonical model at http://www.ros.org/wiki/msg.  But the canonical data model need not be limited to defining the types of primitives available and how to include them in messages.

Domain-specific canonical data models may augment message formatting rules, adding semantic meaning to the data within a message.  For example, the Joint Architecture For Unmanned Systems (JAUS) is a set of message guidelines for the domain of unmanned systems, such as autonomous vehicles.  The JAUS guidelines provide domain specific rules for communicating data, such as propulsion and braking commands, sensor events, pose and location information, etc.  To demonstrate, JAUS message types include (Siciliano, 2008):

  • Command:  initiate mode changes or actions,
  • Query:  used to solicit information from a component,
  • Inform:  response to a query,
  • Event set up:  passes parameters to set up an event, and
  • Event notification:  sent when the event happens.

A challenge in dealing with canonical data models is how to handle changes to the model.  In order to support backwards compatibility of existing components when the canonical data model changes, new message channels could be created to carry the messages adhering to the model; e.g., “Perception/LaserScans_V1″ and “Perception/LaserScans_V2.”  Alternatively, the existing channels could continue to be leveraged to carry messages adhering to different version of the canonical data model.  To do so, the message, within its header, would include a format indicator, such as a version number or format document (e.g., DTD) reference.  But if a sender knows that receivers of a particular message are mixed in what format is being used, a component would need to send two messages, one for each version of the canonical data model.  Certainly, this is an important consideration when deciding which components should (or even can) be upgraded to newer formats, and in what order.

Message Routers

While the heavy lifting of the message routing is handled by the messaging middleware itself, there are times when it is useful to augment the middleware with custom message routers to support unique scenarios.

Suppose the destination of a message may change based on the number of messages that have already passed over a channel.  In this scenario, the sender of a message may not know how many messages have been passed over a channel since other senders may have been publishing messages on the same channel.  Consequently, a message router may subscribe to the channel to determine where each message should be forwarded to, based on the described business rules.  Once the destination is determined, the router would then place the message on a subsequent channel to be delivered to the appropriate destination.  This intermediary routing is described as predictive routing as the message router is aware of every possible destination and the rules for routing, accordingly.  If the routing is based on content within the message itself, such as threshold values, then the custom router is known as a content-based router.

A drawback to using message routers is that if the routing rules change frequently, the message router will need to be modified just as often.  To help remedy this, if the rules are expected to change frequently, configurable routing rules (e.g., via XML) could be employed to enable easier management of routing rules.

Let’s now consider another scenario wherein it’s left up to the subscribers to determine which messages they’re interested in; i.e., subscribers will be responsible for filtering out the messages they’re uninterested in.  In this reactive routing scenario, each subscriber would provide a respective message filter which is similar to a message router, but simply forwards, or does not forward, a message onto a subsequent channel that the destination subscriber is listening to.  Frequently, message filters decide to forward, or not forward, based on content in the message itself; e.g., only forwarding orders that have a coupon included.  While being similar to a message router in basic functionality, a message filter only has one possible channel to forward the message onto.

Deciding between predictive and reactive filtering must take into account a number of considerations.  Is the message content sensitive?  Do you need to minimize network traffic?  Do you need to be able to add and remove subscribers easily?  Is the predictive router becoming a bottleneck of message dissemination?  For further guidance on selecting among routing options, see (Hohpe, 2003), ppg. 241-242.

Message Endpoints

Each messaging middleware option (e.g., CCR, ROS) has unique requirements for communicating with it.  Each has its own API, its own means of addressing channels, and its own rules for packaging messages.  Ideally, the components of the system should not be aware of the specifics of communicating with the messaging middleware.  Furthermore, while unlikely to occur, the middleware should be able to be replaced with another, requiring little, if any, changes to the components.  Accordingly, the components must be loosely coupled to the messaging middleware.  Message endpoints provide the bridge between the domain of each component and the API of the messaging middleware.

Message endpoints are similar in nature to repositories when communicating with a database.  Repositories encapsulate the code required to communicate with a database to store and retrieve data while being able to convert information from the database into domain objects.  If the database changes, or if the mechanism for database communication changes (e.g., ADO.NET to NHibernate), then, ideally, only the repositories are affected.  The rest of the application knows little about database communications outside of the repository interfaces.  Likewise, components of a message-based system should not be aware of messaging details outside of the message endpoint interfaces, which provide the means to send and receive data.  The message endpoint accepts a command or data, converts it into a message, and publishes it onto the correct channel.  Additionally, the message endpoint receives messages from a channel, converts the content into the domain of the component, and passes the domain objects to the component for further action.  Internally, the message endpoint implements a message mapper to convert between the component domain objects and the canonical data model.

While posting to a channel is rather straight forward, a message endpoint may receive a message by acting as a:

  • Polling Consumer:  wherein the receiver looks at a channel on a regular basis for new messages and/or as soon as it completes the processing of a previous message.  This frees the receiver from having to deal with messages as soon as they arrive on the channel in favor of dealing with messages went it’s ready and willing.  A consideration to keep in mind is that messages may queue up on a channel while waiting to be retrieved by the polling consumer.  Additionally, a polling consumer may take up threads and resources while polling a channel, even if the channel is empty.
  • Event-Driven Consumer:  wherein the message is given to the receiver as soon as it arrives on the channel.  The benefits to this include avoiding messages queuing up while being able to process messages asynchronously.  But the receiver is no longer in control of the timing in which it processes messages and must handle messages as soon as they arrive on a channel.

Many messaging middleware solutions include support for transactions that the message endpoints may leverage, making the endpoints transactional clients.  To illustrate the need for this, suppose the receiver of a request-reply command crashes just moments after the command message is consumed and removed from the channel.  When it recovers, the command message is lost and the sender will never receive a reply.  Using a transaction, the command message is not removed from the channel until the response is completed and sent.  Committing the transaction removes the command message from the channel and adds the reply document message to the reply channel.

There are a few recommendations which should be considered when developing message endpoints.  In accordance with the SRP, message endpoints should be able to receive messages or send messages, but not both in the same message endpoint.  Furthermore, a message endpoint should only communicate with one message channel.  If a component needs to send a message on separate channels, it would leverage multiple message endpoints to do so.  If your components are developed in line with DDD, it would be each component’s application services layer which would communicate with the message endpoints, preferably via their interfaces instead of concrete instances.  This facilitates swapping out the message endpoints with mock objects for unit testing.  Leveraging separated interfaces and dependency injection helps enable this approach.  While these guidelines introduce more objects and indirection, they are proven practices for increasing maintainability of the component and reusability of the message endpoints.

Performance of Message Based Systems

One would likely be quick to assume that developing a message-based system has a huge cost to performance.  Domain objects are converted to messages, messages are passed over channels, routers and filters intercept and forward messages, and messages are converted back into domain objects…this sounds like a heck of a lot going on.  But because each component executes in its own thread or process, they need not wait for other components to complete their job before being able to move on to handling another message or task.

In the figure above (adapted from Hohpe, 2003, pg. 73), note that a sequential process requires that each message life cycle is completed in full before moving on to the next message.  But in an asynchronous, message-based system, each component can move onto a subsequent message just as soon as it has completed its part in the last one.  This greatly compensates for the extra overhead imparted by the messaging infrastructure.

With that said, there are some component responsibilities which take a long time to complete and may impede the speed by which messages are processed.  For example, imagine a component which takes images from a web cam and extracts information such as human figures or road signs.  This is likely a time consuming process and would impact the turn around time in which the component could process subsequent messages.  To alleviate this bottleneck, multiple instances of the same component may subscribe to the same point-to-point channel.  Recall that a point-to-point channel ensures that each message only has a single receiver.  The instances of the component then become competing consumers of each message.  So if one instance of the component is still processing an earlier message, another instance can grab the next message that arrives for concurrent processing.  Power in numbers!

Monitoring and Debugging

Due to the widely asynchronous nature of message-based systems, attention must be given to monitoring and debugging techniques to observe system behavior and iron out problems.

Logging message content is an invaluable measure towards getting a clear look at what communications are taking place.  To facilitate this, logging could be added directly into sending and receiving components, but logging message content should not be a concern of components; accordingly, using a monitoring utility to log such information is a cleaner separation of concerns.  Certainly a benefit of publish-subscribe channels is that a monitoring component may subscribe to all messages and log the information to a file or console window.  While it’s just as valuable to monitor messages on point-to-point channels, if a monitor were to consume a message over a point-to-point channel, the message would be noted as consumed and would no longer be available to the intended receiver.  Because such monitoring capability is so helpful in developing and debugging, many messaging middleware options include a “peek” option which allows a monitoring utility to review message contents on a point-to-point channel without actually consuming the message.  This capability should be taken into consideration when comparing messaging middleware alternatives.

In addition to monitoring message content sent over channels, it’s also assistive to monitor active subscriptions to various channels to accurately determine which components are receiving which messages.  This capability is typically built into messaging middleware solutions and is very assistive during development.

It’s possible that even if a message is routed appropriately, the receiving component may not know what to do with the message due to an incorrectly formatted header or body content.  Invalid messages such as this should be forwarded to an invalid message channel which an error logging utility would monitor and log, accordingly.  An invalid message channel is setup just like any other channel but with the intention of exposing such messages for debugging purposes.

Closing Thoughts

Developing asynchronous, message-based systems requires a paradigm shift from the more traditional, synchronously executed applications that most people are familiar with.  It is not appropriate for every application domain and should be seen as one additional architectural option to consider when developing applications.  But in some domains, such as robotics or in the integration of disparate applications, this approach to development is absolutely pivotal in providing responsive behavior without sacrificing maintainability of the overall system.  Indeed, by splitting responsibilities into discrete components, loosely coupled to each other via messaging middleware, immensely complex problems can be broken down into understandable chunks while being flexible enough to accommodate changes to the underlying middleware or introduction of new components.

In the next couple of posts, we’ll look at a checklist for developing message-based systems followed with examples in CCR and ROS.

Enjoy!

Billy McCafferty


References

Hohpe, G., Woolf, B.  2003.  Enterprise Integration Patterns:  Designing, Building, and Deploying Messaging Solutions.

Siciliano, B., Khatib, O.  2008.  Springer Handbook of Robotics.

Architectural Paradigms of Robotic Control

Introduction

While “architecture” is likely one of the least definable terms in software development, it is unavoidably a topic which has one of the greatest impacts on the extensibility and maintainability of an application.  Indeed, a frequent cause of an application re-write is due to the architectural decisions that were made early in the project, or the lack thereof.  Certainly one of the difficulties in instituting proper architectural practices in a project lies in the fact that architectural decisions must carefully be decided at many different levels of project development.  Obviously, the architectural decisions at each level should be made in context of the project requirements in order to facilitate a proper balance of speed of development, scalability of development, and maintainability of the application in the future.

To illustrate, consider a basic eCommerce site vs. a corporate financial management application which integrates with various third party tools.  There are two aspects of architecture which must be carefully considered.  The first is deciding which project contexts will require architectural consideration.  The second is defining the architectural approach to meet the needs of the given project context, and implementing any necessary prefactoring, accordingly.

For a basic eCommerce site, the project contexts requiring architectural consideration would likely include:  appropriate separation between presentation and control, separation between control and the domain, separation between the domain and data access, and appropriate integration with a payment gateway.  Perhaps the active record pattern might be chosen as the data access mechanism.  If inversion of control seems overkill for such a small application, perhaps direct communications to the payment gateway via the domain objects might be chosen as the means of integration.  Accordingly, judicious prefactoring would suggest that an architectural spike be created demonstrating appropriate use of the active record pattern with a selected tool along with an example of how and where integration with the payment gateway would take place.

For the more complex and demanding needs of the financial management application, architecture considerations, in addition to those taken into account for the basic eCommerce site, might include:  what integration patterns might be leveraged to facilitate client integration, messaging patterns that might facilitate server side integration with third party financial calculators, and where inversion of control is appropriate.  Perhaps RESTful services would be included to provide integration support for clients requiring such capabilities.  Messaging via a publish/subscribe mechanism using a composed message processor might be selected as the ideal means for coordinating data with third party vendors on the server side.  Again, proper refactoring, from an architectural perspective would include developing architectural spikes along with the foundational pieces of such an application to provide appropriate guidance for developers assisting with the project.

Developing software for robotics is no different, in this respect, than developing any other application.  One must decide where the key architectural decisions need to be made and then provide adequate decisions and guidance to facilitate development in accordance with those decisions.  As stated, these decisions must be made in consideration of the project needs.  This post focuses on one project context, the architectural context of determining the overall approach to robotic motor control.  We’ll delve into three major architectural approaches to motor control along with highlighting a few specific implementations.

Deliberative vs. Reactive Robotics

Certainly the best source of intelligent systems to study and emulate are those found in nature.  From the humble ant to the exalted human, evolution has honed a variety of strategies for dealing with the physical world.  Accordingly, if we are to create intelligent systems, a good place to start is by emulating the behaviors and responses to stimuli demonstrated by living creatures.

In the early days of robotics, it was presumed that the most effective approach to emulating intelligence was by taking in detailed measurements provided by sensors, creating an internal representation of the world (e.g., grid-based maps), making plans based on that internal representation, and sending pre-planned commands to actuators (devices which convert energy into motion) for moving and interacting with the world.  Inevitable, this approach presumes that the internal representation of the world is highly accurate, that the sensor reading are precise, and that there is enough time to carry out a plan before the world changes.  In other words, this approach is highly deliberative.  Obviously, the world is highly dynamic, sensor readings are sometimes erroneous, and time is sometimes of the essence when interacting with the world.  E.g., you probably wouldn’t want to spend much time creating an internal representation of the world if there’s a Mac truck speeding towards you.

At the other extreme from deliberative is a reactive approach to interacting with the world.  Taking a purely reactive approach does away with plans and internal representations of the world altogether; instead, a reactive approach focuses on using the world as its own model and reacts, accordingly.  Instead of plans, reactive approaches often rely upon finite state machines to modify behavior as the world changes.  Rodney Brooks changed robotics thinking upside down when he introduced this paradigm shift in the 80s.  The robots he and his team produced were much faster in their reaction times, when compared to deliberative robots, and exhibited surprisingly complex behavior, appearing quite intelligent in many scenarios.  But as is any extreme, a purely reactive approach to the world had it’s own drawbacks; its difficulty in managing complex scenarios which demand careful planning is one such example.

The figure above illustrates some of the differences between deliberative and reactive systems.  Adapted from (Arkin, 1998).

While there are certainly pros and cons to either approach, there are some scenarios which are appropriate to one or the other.  While still others may require more of a hybrid approach.  Let’s now take a look at each approach in more detail.

Deliberative/Hierarchical Style

One of the first successful architectural implementations on a mobile robotic platform was the well known robot known as Shakey, created at Stanford University in the 1960s.  Of particular note was the robot’s architecture which was made up of three predominant layers:  a sensing layer, a planning layer, and an execution layer.  Accordingly, as information would be made available by sensors, Shakey would make plans on how to react to the perceived world and send those plans on to the execution layer for low level control of actuators.  This began the architectural paradigm known as sense-plan-act (SPA).  As the SPA approach matured, additional layers were introduced in a hierarchical style, typically all of which having access to a global world model.  The upper layers in the hierarchy would use the world model to make plans reaching into the future.  After plan development completed at the highest levels, the plans would be broken down into smaller commands and passed to lower levels for execution.  Lower layers would then further decompose the commands into more actionable tasks which would ultimately be passed on to actuators at the lowest level.  In a hierarchical approach, the sensing layers participate in keeping the internal representation of the world updated.  Furthermore, to better accommodate dynamic changes to the world environment, lower planning layers may suggest changes to plans based on recent changes to the world model.

The diagram at right roughly illustrates this hierarchical architectural approach.  As should be noted, the sensing layers update the world model while a hierarchy of planning and execution layers formulate plans and breaks those plans down into actionable tasks.  Adapted from (Kortenkamp, 1998).

4D/RCS

The current pinnacle of hierarchical architectures may be found in the reference architecture known as 4D/RCS (4 Dimensional / Real-time Control System).  4D/RCS is the latest version of the RCS reference model architecture for intelligent systems that has been evolving for decades.  With 4D/RCS, six (more or less) layers are defined for creating and decomposing plans into low level action:

  • Unit/Vehicle/Mission Layer:  this layer decomposes the overall plan into actions for individual modules.  Plans are regenerated every 2 hours to account for changes to the world model.
  • Module Layer:  converts actions into tasks to be performed on specific objects and schedules the tasks, accordingly.  Plans are regenerated every 10 minutes.
  • Task Layer:  converts actions on a single object into sequences of moves to carry out the action.  Plans are regenerated every minute.
  • Elemental Layer:  creates plans for each move, avoiding obstacles and collisions.  Plans are regenerated every 5 seconds.
  • Primitive Layer:  determines motion primitives (speed, trajectory) to facilitate smooth movement.  Plans are regenerated every 0.5 second.
  • Servo Layer:  sends servo control commands (velocity, force) to actuators.  Plans are regenerated every 0.05 seconds.

At the heart of each and every layer are one or more 4D/RCS nodes which contain a behavior generation process which accepts commands from the next higher level and issues subgoals to the behavior generation process at lower levels.  Furthermore, each node reports its task execution status up the chain for consideration into further planning.  While plans are being solidified and disseminated, a sensory processing process in each node receives sensory input from lower levels, updates the world model, and collates the sensory data into larger units which it passes on to the next higher level; e.g., points get converted to lines which get converted to surfaces which get converted to objects with each successive rise through the 4D/RCS layers  A visual summary of a 4D/RCS node quickly shows just how complex such a system can become.  But sometimes the demands of a task require a respective amount of sophistication in the architectural approach.   A comprehensive review of the 4D/RCS approach is discussed in (Albus, 2006).

The primary advantage to deliberative, hierarchical approaches such as 4D/RCS is that competent plans for managing complex scenarios can be generated and broken down into smaller chunks for lower levels to execute.  But an obvious disadvantage to this comes in the form of much added complexity to the overall system while penalizing the speed at which the system can react to a changing environment.

Reactive Style

In the 1980s, Rodney Brooks, in an effort to overcome some of the limitations of sense-plan-act that Shakey and other such robots exhibited, introduced the concept of reactive control.  “Simply put, reactive control is a technique for tightly coupling perception and action, typically in the context of motor behaviors, to produce timely robotic response in dynamic and unstructured worlds.”  (Arkin, 1998).  In other words, by eliminating the reliance on maintaining an internal world model and avoiding large amounts of time generating plans, simple responses can be executed in reaction to specific stimuli, thus exhibiting behavior similar to that of living organisms.  If there was any doubt in what Brooks was trying to imply, he boldly titled one of his works, Planning is Just a Way of Avoiding Figuring Out What to do Next.  This paradigm shift away from planning turned sense-plan-act into a simpler sense-act.

Accordingly, in a sense-act paradigm, the primary focus is in carefully defining behaviors and the environment stimulus which should invoke those behaviors. This focus is deeply rooted in the idea ofbehaviorism; from a behaviorism perspective, behavior is defined in terms of stimulus and response by observing what an organism does.  This approach of developing robotics around such behaviors, unsurprisingly enough, is commonly referred to as Behavior Based Robotics.  By leveraging simple state machines, which we’ll examine below, to define which behaviors are active for a given stimuli, the overall architectural complexity is greatly reduced while giving rise to responsive, seemingly intelligent behaviors.  (The question of whether or not the robot is truly intelligent and/or self-aware is a debate that I’ll leave to others such as Searle and Dennett.)  While the reactive approach took the spotlight for a number of years, and is still very appropriate in some cases, its limitations in managing complex scenarios and performing sophisticated planning indicated that this approach was not the end all panacea for all situations.

The diagram at right clarifies that reactive systems remove planning and deal with sensory input within the context of each behavior.  I.e., “Behavior 1″ has no knowledge of the feedback coming from “Sensor 3.”  Adapted from (Kortenkamp, 1998).

A Design Methodology for Reactive Systems

Due to the simplistic nature of reactive systems, the corresponding design methodology for developing such systems is rather straight-forward (Kortenkamp, 1998):

  1. Choose the tasks to be performed.  E.g., find kitchen.
  2. Break down the tasks in terms of specific motor-behaviors necessary to accomplish each task.  E.g., find wall, follow wall, recognize kitchen.
  3. For each action, find a condition under which each action should be taken.  E.g., wall found.
  4. For each condition, find a set of perceptual measurements that are sufficient to determine the truth of the condition.  E.g., long straight line connected to obstacle.
  5. For each measurement, design a custom sensory process to compute it.  E.g., bumper sensor activated, edge length over threshold found.

When implemented, we find that the defined behaviors can be realized as states within a finite state machine and that found conditions act as the mechanism for changing from one behavior state to another.  What’s missing is an arbitration mechanism to determine which behavior wins out in light of competing conditions.  Let’s briefly look at implementation approaches to reactive systems along with how such arbitration is achieved.

Subsumption & Motor Schemas

A key concern with reactive, behavior based robotics is determining which behavior should take precedence if conditions exist to activate two or more behaviors.  An arbitration mechanism needs to be introduced to resolve such competing situation.  Brooks dealt with this issue by proposing an architecture known as subsumption.  Simply enough, a subsumption architecture still uses a finite state machine to codify behaviors and transitions, but introduces behavioral layers which can provide input to other layers and override behaviors of lower layers.  The advantage is in facilitating more sophisticated behaviors as a sum of “lesser” behaviors.  It’s better understood by reviewing the following example from (Arkin, 1998):

In the example, there are three behavioral layers:  Avoid-Objects, Explore, and Back-Out-of-Tight-Situations.  The Avoid-Objects Layer’s responsibility is to avoid objects by moving away from any perceived obstacles.  The Explore Layer’s responsibility is to cover large areas of space in the absence of obstacles.  The highest level assists the robot in getting out of tight spaces if the lower layers are unable to do so.  Each discrete behavior can invoke transitions to other behaviors and/or provide input or advice to subsequent behaviors based on perceived conditions.  The “trick” of subsumption is that higher levels can suppress commands between lower level behaviors; consequently, higher layers are able to handle more complex scenarios by manipulating lower level behaviors in addition to its own.  To illustrate suppression, note, in the above example that the “Reverse” behavior in the Back-Out-of-Tight-Situations Layer suppresses any commands that the “Forward” behavior is sending to the actuators.  By doing so, complex behaviors may emerge using a number of basic behaviors and a relatively simple architectural approach.

A major challenge in using a subsumption architecture is deciding the appropriate hierarchy of behavioral layers.  In more complex scenarios, it quickly gets sticky deciding which layer should have the right to suppress which other layers.  (Otherwise known as spaghetti-prone.)  Additionally, since the layers have bi-directional dependencies amongst each other, in order to provide input and suppression, changes to layers can have large impacts on other layers, often resulting in shotgun surgery with any change.  Ronald Arkin introduced a subsequent architecture to address these, and other concerns, with an approach known as Motor-Schema based control which does away with arbitrating competing behaviors.  Each active behavior in a Motor-Schema based control system calculates a vector to be carried out by an actuator (e.g., wheels or arm).  Using vector addition, a final vector for each actuator is computed and sent for execution.  With this approach, the output of behaviors is combined instead of arbitrated.  This avoids the need to determine suppression hierarchies and makes a more extensible application, within the limits of behavior based robotics.

Hybrid Style

Most modern architectural approaches to robotics control attempt to combine the planning capabilities of deliberative systems with the responsiveness of reactive systems.  Appropriately enough, this is referred to as a hybrid style.  While the deliberative approach takes a sense-plan-act perspective and the reactive approach follows with plan-act, a hybrid approach typically takes the form of plan, sense-act.  So while the sense-act layers carry out behaviors, the deliberative planning layer can observe the progress of reactive behaviors and suggest direction based on reasoning, planning and problem-solving.  The diagram at right crudely demonstrates this combination of the two approaches.  Adapted from (Kortenkamp, 1998).

Three-Layer Architecture (3T)

James Firby’s thesis proposing Reactive Action Packages (RAPs) (Firby, 1989) provided a solid approach for integrating deliberative planning with reactive behaviors in the form of a three layer architecture.  A multitude of subsequent architectures have emulated a similar approach, coming to be known as 3T architecture.  From Eran Gat’s essay Three Layer Architecture (Kortenkamp, 1998):

Three-layer architectures organize algorithms according to whether they contain no state, contain state reflecting memories about the past, or contain state reflecting predictions about the future.  Stateless sensor-based algorithms inhabit the control component.  Algorithms that contain memory about the past inhabit the sequencer.  Algorithms that make predictions about the future inhabit the deliberator. … In 3T the components are called the skill layer, the sequencing layer, and the planning [deliberative] layer.

As implied by being a hybrid approach, this 3T architecture is not mutually exclusive to behavior driven robotics.  Indeed, the skill layer itself is made up of unique behaviors which resemble that of reactive systems.  The sequencing layer then uses the world model to determine when a change in behavior is required.  The planning layer can then monitor the lower layers and perform more deliberative, time consuming processes such as path planning.  The primary variation amongst 3T implementations lies in the decision between whether the planning layer “runs the show” or if the sequencing layer takes command and invokes the planning layer only when needed.  Atlantis is one such example that leaves primary control to the sequencing layer to invoke the planning layer.  Other examples of 3T implementations include Bonasso’s Integrating Reaction Plans and Layered Competences through Synchronous Control and Standford’s Junior which took 2nd place in Darpa’s Urban Challenge.

Wrapping Up

The intention of this article has been to provide an introductory overview of paradigms of robotic control including deliberative/hierarchical systems, reactive systems, and hybrid systems.  While each approach is appropriate in select contexts, a hybrid architecture is very adaptable for accommodating a large variety of robotic control scenarios with sufficient planning and reactive capabilities.  The interested reader is encouraged to dig further into the references and links provided to learn more about these various approaches and design methodologies for implementation.

In the next post, we will examine messaging strategies to facilitate the communications amongst the layers and components of robotic systems.

Billy McCafferty


References

Albus, J., Madhavan, R., Messina, E.  2006.  Intelligent Vehicle Systems: A 4D/RCS Approach.

Arkin, R.  1998.  Behavior Based Robotics.

Firby, J.  1989.  Adaptive Execution in Complex Dynamic Worlds.

Kortenkamp, D., Bonasso, R., Murphy, R.  1998.  Artificial Intelligence and Mobile Robots.

© 2011-2014 Codai, Inc. All Rights Reserved