Considerations for sustainable Platform Success


Who is the main stakeholder, customer role or user profile of a platform? When you consider this question deeply, you will find that the main customer personas to focus on are software engineers, architects, developers, builders – the people that build systems, solutions and products on top of your platform. Here we will explore a bit how to make them happy, plus some other success factors.

Low operational qualities as major risk for failure

The operational and developmental qualities of the platform features are crucial to the acceptance in the existing and new solutions. However, there are some general things that should be mentioned in addition because of the system wide impact.

Typically, it will be expected that the platform will support the maximum of all solution requirements concerning these qualities. It is fast enough, scales enough, is stable enough and it is secure enough to satisfy all solution demands. Realistically not all requirements will be met, especially not in the first versions in the platform – due to the simple fact that optimization in multiple directions is either hardly possible or associated with increased cost and time-to-market. So you need to manage the solution’s demands and make transparent which qualities the platform will be able to archive in which point in time – without over-promising!

So, what are these operational qualities after all? Out of the long list of software system qualities, I belive the following are most crucial for platform success:

  • Security: With ever rising security threads we, as software architects, need to make sure that we understand the risk profiles and implement a measurement plan that mitigates the risk while balancing security level and cost. A single security incident can take platform businesses out of business due to the loss of trust and customer confidence. The damage from loss of trust can be considerably higher then the direct damage of e.g. compromised data.
  • Scalability: A system needs to scale along with the business. Avoiding overprovisioning (leads to unneccesary cost) and underprovisioning (leads to bad customer experience) a system should be able to scale to the required size and performance. Not being able to scale with a growing business can kill the business.
  • Robustness: Desinging for failure and embracing the fact that things go bad in production lead to a ressiliance design that can handle failure of system components during runtime to maximize the positive customer experience.
  • Usability: Them most secure and most robust system has no value when it is unusable from the end user point of view. This includes availablity of the system and acceptable response times, but also proper user workflows, efficiency and ergonomics of the application. And, most obious, a feature set that provides actual value.
  • Cost efficiency: In the end the software systems we design are built to enable a business, and this business may be at risk when operational cost is too high. So a good balance between all the other qualities and cost is important.

In addition, a new requirement will rise that typically is not named explicitly until it becomes obvious. The term “developer habitability” has evolved in computer science over the last years and names it quite good. It means that the developers of solutions, we call them builders, which are built on top of the platform should be supported in implementing and operating a concrete system both efficiently and effectively. The platform must be attractive to be used; people shall like to work with on a daily basis.

Low developmental qualities as major risk for failure – Focus on developer habitability

There are many aspects to consider here in order to increase the usability for the SW engineers, architects, developers and testers that want to create concrete solutions and products based on the platform.

The acceptance in a developer community of the platform provisioning will be a major success factor. If solution builders do not accept the platform because it is not usable – the platform is useless. It must be understood, that this kind of usability requires other measure beide the “common usability” for end users. Developers look deeply into the APIs; they will identify flaws quickly and have a sense for instabilities and the maturity of a service. And typically developers do not like external dependencies so much, so when you want them to use your platform at all, and like it, you need to design it for this group of people.

Let’s discuss some details:

  • Architect and Developer level documentation

Architects need to understand the concepts that the platform uses to be able to build larger solutions and products on top of it. How are the design tactics for major operational qualities, how does integration work, how is testing supported? That is concept level documentation. Then, you need to provision developer guidelines concerning the programming model and used technology so that simply trials as well as production level implementations are done quickly and without major search in forums and search engines. API references need to be up to date with the live APIs always.

  • Provisioning of How-Tos and cookbooks for a quick start and a motivating introduction

When developers and architects are confronted with a new technology they spend some time to evaluate and test it to make a judgment. This time is limited to maybe a few hours. If they do not succeed to build a first minimal setup in that time, they may go away and look for something else. That means, the platform documentation should include templates, examples and quick-start guides, and the platform must be stable and functional.

  • Provisioning of working examples in code and reference architectures for recurring challenges

To even provide faster adoption, it is required to build working examples for standard use cases with hints how to modify that to the actual needs. Reference architectures are a part of that.

  • Provisioning of tools for efficient development and deployment like intellisense support, error analysis tools, configuration wizards, consoles, command line interfaces, IDE plugins, tool chain support and similar.

Developers use IDEs to write code, run tests, analyze errors, debug and interact with 3rd party platforms. The better your platform is accessible from the developer tools, the easier the adoption will be. You may also consider to provide specific tools for diagnosis, tracing and operational monitoring to ease the life-cycle management of the solutions that build on your platform.

  • Focus on essence and simplicity in the platform concepts and architecture in order to avoid steep learning curves

Even very large platforms can be crafted around easy to understand concepts and key strategies. Today, people expect a high level of value from a platform while it is easy to use it and does not require months of training to get started. Finding the sweet spot here can boost your adoption rate.

  • Focus on a stable and usable API with proper documentation

Good API design and version management is a key skill in SW engineering today. The challenge in a nutshell is this: You want to improve and enhance the platform and at the same time you do not want to force customers and solution builders into unwanted efforts because you introduced braking changes in your APIs. People that have a dependency on the platform hate it to be forced to spend efforts to adjust code just because the platform has changed. So, you need a strategy to provide APIs that are compile time (technology) and runtime (behavior) compatible over years. You need a deprecation strategy that includes customer notifications. You need to make sure that changes in the platform code do not affect solution implementations by building test frameworks and using e.g. API versioning with support for old versions. This is a tricky and not so easy to manage area.

  • Focus on a durable runtime stability, resilience and uptime

Imagine you want to try out a platform and it is not stable while you try it. Imagine you have built a product based on a platform and then during your customer presentation the system does not perform because the platform is down. What happens – you kick it out of your dependency list. A platform is designed to host many solutions and so the impact of instability and downtime is multiplied. Builders want to make sure that the parts they depend can be trusted to be available.

  • Consistent concepts (one solution per problem) for error handling, configuration and diagnosis

It is a pain for every developer when he deals with different services and features within a platform and is confronted with different and inconsistent concepts. For instance, error handling and diagnosis data is a prominent example. In general, specifically the operational relevant concepts of the platform should be the same. Otherwise, developers will quickly turn away.

  • Focus on expressiveness to diagnose, e.g. when generating error messages

“unknown error occurred, please contact your admin”. If you see that message in the platform when you are with a customer and your product just crashed, you will go crazy. The platform will have problems and errors will come, so make sure that developers and builders get as much information as possible that enables them to either solve the problem directly and quickly or provide qualified and reproducible input to the platform team to help quick and sustainable fixes.

  • Foster a community

When more and more people use your platform, you should consider to spend efforts to foster a community. This helps to scale service request and can enable builders to help each other. Also it provides a good way for structured feedback from a growing user base. Platform engineers should be there to provide high quality answers to questions in a quick way, so that developers can maintain a good pace in building a solution.

As a general rule, it is worth to invest into platform features that enable solution builders to work without the help of the platform team. If this is not the case, your platform team will be bound to help users and this will not scale well. So everything that allows self-service and automated or tool based support will help you to scale the platform adoption.

Re-Platforming into the Cloud – Focus on operational benefits through managed services

Literally every software system is based on some infrastructure on which it relies and which it uses to build higher value services. Like with the term “platform”, the term “infrastructure” is not sharply defined. The infrastructure that is used in a concrete system can also be provided as platforms – so these two terms may overlap a bit. Other terminology for infrastructure is “middleware” or “generic tools”, even “hardware” or “operating system”.

With “infrastructure” I refer to all parts of a system that are building the foundation for the execution of our business logic. Here are some examples:

  • The hardware platform for compute, storage and networking and its interfaces like interrupts and I/O ports
  • The Virtualization and Isolation layers
  • The operating system
  • A runtime container
  • A service locator
  • A framework for parallel code execution (multi-core)
  • The persistence framework and a database
  • The inter-process communication system (e.g. messaging, RPC, RMI, HTTP…)
  • A business rule engine
  • A web-server with servlet container
  • The .Net framework
  • The Java 7 SE JDK
  • A scripting engine and interpreter
  • A browser like the internet explorer
  • A web-app engine
  • A cloud provider’s compute, network and storage services like Amazon’s EC2, VPCs, EBS, S3
  • Distributed databases like Amazon’s RDS and DynamoDB
  • A virtualization stack along with management tools like VMWare
  • The network with routers and switches
  • A data analytics and data science system
  • Tools for authentication, authorization and encryption
  • … the list is long …

There is much more to mention. Infrastructure will always include aspects from the physical level, like hardware and its abstraction, as well as quite high-level enterprise aspects like a service oriented infrastructure, messaging or cloud stacks. It contains basic services that all together build the environment in which a concrete business feature will be deployed and operated. There are many technical decisions on the way that all lead to the final setup.

When now a concrete existing solution shall be migrated to make use of a platform and its features, it is important to understand how much of the infrastructure needs to be harmonized in order to allow a migration.

A simple example:

Solution A is using an Oracle Database via a JDBC driver from a native Java application. The java application is standalone, so does not need a special runtime container. Its UI is also written in Java (Swing).

The platform was also written in Java, but makes use of a cloud based distributed container service, e.g. Amazon Elastic Container Service, that handles cluster functionality and scaling. The platform uses the Java Persistence API (JPA) in order to talk to a mySQL database, hosted as a managed service in Amazon RDS. In order to make use of the enhanced scalability and availability that comes along with the new platform, solution A shall be migrated to make use of the platform features and its provided infrastructure.

This example shows that now solution A and the platform had different decisions concerning their infrastructure during their development. In order to allow a solution A+ that integrates into the platform, the infrastructure needs to be harmonized – in this case this means a major re-design and a new architecture for the existing solution A.

Harmonizing the infrastructure may lead to substantial changes in existing solutions. However, such a harmonization may also lead to substantial improvements. The major benefits in a harmonization of used infrastructure are:

  • Reduced complexity through reduced technological diversity
  • Reduced need to hold special knowledge for the different tools and frameworks
  • Reduced need for updating and maintaining infrastructure
  • Reduced cost for support and licenses for infrastructure components, chance to negotiate higher volume license agreements or use managed services
  • Easier exchange of people and code between solutions and products
  • Increased level of understanding among solutions and platform teams
  • Chance for leveraging innovations from new technology

However, as outlined above, there might be a specific reason why a concrete product or solution picked a certain infrastructure for its design. Operational qualities are vital to the success of a solution and a migration to another (platform) infrastructure might not be possible because of quality demands. So, again, this needs to be carefully checked.

The concrete efforts for a harmonization of infrastructure may be very high, depending on how good the infrastructure was encapsulated and how dependent the business logic is to the infrastructure APIs on code level. Not rarely such a decision can mean a re-writing of the business features because of the impact of the technology decision. It is a quality aspect in SW engineering, how well dependencies to external technology is encapsulated, depending on how it is done, the efforts for a change will look very different.

Let’s make two more prominent examples from different domains which both require substantial work for a migration:

1) A SW system for an embedded device, like a micro controller based automation device, was designed for a single-CPU machine. The SW was written in the assumption that the infrastructure is a single-core CPU and that all runs sequentially. Now, the hardware is upgraded to a 4-CPU board and comes along with a new HW abstraction layer infrastructure. In order to make use of the new hardware and its API, the SW system at hand needs to be migrated to the new infrastructure. This actually means that new interfaces must be used and the SW must be made fit for concurrent and parallel execution. This does not only imply changes in the code logic but also means to adapt to a completely new programming model and making use of parallel concepts. For instance, the infrastructure may demand a use of Intel’s multi core APIs or e.g. the new C++ 11 language features for multi-threading. Our business logic now is not compatible and a migration to the new HW-platform is a substantial change that needs a new implementation to a larger extend.

2) A distributed web application was written for the installation into dedicated web-servers with dedicated databases. It is using a local open source database instance and was built with an early version of the Microsoft .net framework. Now, with increasing customer counts, the system needs to grow further. On the other hand, new web applications were built with AWS cloud services and now make use of the scalability and the business model that this cloud offering provides. For these newer web applications, a platform was created which eases the use of AWS and already provides some tools for the target market. In order to now make use of this new platform in the existing, old, distributed web application, the web application must be made ready to run on an AWS infrastructure. This introduces new conceptual aspects like the messaging, persistence and identity management (provided by AWS APIs) that now needs to be adopted by the existing solution. This also will lead to code changes in the existing solution in order to adapt to the new infrastructure. The actual required efforts can be very low and there are strategies to do the re-platforming in a low risk way. See below for more about this topic.

Of course, there are also examples for easier harmonization tasks that will not have such a substantial impact. Nevertheless, for a realistic calculation of risks and cost that need to be considered it is really important to understand how much infrastructure will have to change for existing solutions if they shall be based on a new (or existing) platform. And please note: Not a single new feature will be realized with all the effort – this alone imposes a real challenge in motivating sponsors for the investment because you may need to proof to them that it is better to invest into re platforming rather than into new functionality.

If the infrastructure does not have to change much, the chances for a successful migration will be much higher due to less technical risk and increased economic benefit.

Because of this fact, an organization may also think of other options that favor a less integrated and more loosely coupling of existing solutions and new solutions based on a new platform. Then, we turn the approach from a platform migration task to an application integration task – which may be a more economical alternative.

Mastering Platform-as-a-Service operations, externally and internally

Building and operating your platform does not necessarily mean that you run a platform as a service model. But it makes a lot of sense. Let’s find out why.

Platform approaches are not easy to lead to success and one of the main reasons is on the commercial side. When you first release your platform, the feature set is low and so the price to use it must be relatively low as well. And the price needs to be considerably lower that the effort a team would spend to build and run the features on their own. You need to attract users to get feedback and establish a customer base. On the other hand, you need to raise your customer base quickly to compensate all the cost for the platform development and operations, exponential user onboarding is an often dreamed goal. If you run a specialized platform, e.g. in a PLE environment, you may have a limited set of users. In this case, the price needs to go down while the feature set rises and you need to proof that the cost is less than developing and operating the features individually in each product.

Platforms are typically a low-margin, high volume business when focusing on generic functionality and a high-margin, low volume business when focusing on highly specialized functionality. In any case, you need to know how your cost situation is and the usage of the platform must have a price. This is pretty clear, when you externalize your platform as a business. It is not so clear when it is an internal platform approach. Specifically, in this case, it is of paramount importance that you treat internal users like any external customers. There should be no difference, except that you may only have internal customers, which eases a few elements like contracting and billing. Why should you do this? To avoid internal politics that deal with cross-X platform funding and conflicting priorities in the different organizational units.

Here is a short list of best practices that you might want to consider, when developing the platform as a service concept for your platform:

  • Separate the platform functionality into distinct smaller services – one service does one thing well
  • For each of the services define the API, the feature set and a price model
  • Base the prices on usage, based on metrics that are specific for the service. Not all services will have the same cost drivers, so think about where the customer value is and where the cost drivers are and build metrics that reflect that
  • Metrics are collected during runtime for each user of the platforms
  • For that you need an account and tenancy concept on the commercial side. Billing is based on consumption per service and per platform account. The technical requirements must be implemented into each of the platform services
  • Even if you only serve internal customers, the bill and the margin might be zero dollars, still you need to be transparent on cost generation per account
  • Run roadmaps and releases per service, not for the entire platform in one block
  • Build the system in a way that users can decide to use some of your services, there should be no obligation to use all of it
  • Prioritize new developments purely on user feedback
  • Establish a transparency to your entire user base on what the current development priorities are.
  • Expect that there will be a continuous ask for more functionality and that you will never be able to implement all of that in a given time frame
  • Manage the complete operation of the services of your platform. You need to provide SLAs to your users on what they can expect
  • Provide transparent metrics on operational KPIs like service uptime at all times
  • Treat all your user’s systems as production system, even if they use it as a test / dev system

The key element is to run and operate the platform as a set of managed services that all have a clearly defined scope and a price model that is based on consumption. Based on user feedback you will learn if the platform will pay off. Only when the platform cost (development, maintenance, infrastructure, operations, management, …) are covered or exceeded by actual usage * price, the platform will sustain. This requires the measurement of the relevant KPIs and clear platform boundaries. Operational reviews are an essential process to check development of the KPIs and allow proper adjustment.

Organizational and social challenges around internal platform re-use

When we talk about extracting a platform from existing solutions or motivating new solutions to use one platform, this always somehow implies that we want to change the existing solutions and plans as well. They shall use the platform features once it is available and so stop using its own implementation and concepts. They shall use the platform’s programming model, its infrastructure and maybe also its data model.

What has the platform to offer? Hopefully a technological innovation, more features and more quality as any individual solution could provide. The question about the justification and the need for such an effort will be raised on different levels on the way. Actually, the change which is introduced here does not only affect technical things like architecture and source code. It affects the people. It affects them in the way they need to think, in the way how they are connected with “their” solution and in the way they are convinced that all this is a good idea anyway.

The involved people, also called stakeholders, need to be convinced about the benefits of breaking up existing – and potentially successful – solutions to build something larger and even more successful. The problem here, the larger goal of a global optimization is often not seen in the local area where a solution was born. There is a natural resistance against the change, especially when the solution owner (e.g. a department) is successful even without any platform. So, questions may come up:

  • Why should we abandon our working solution?
  • Why should we wait for the platform to be completed?
  • Who pays for the extra efforts, we do not see any benefit for our business?
  • Why is the new platform less performing than our “old” solution?
  • 6 months of work and not a single new feature, I will not pay for this!
  • We have been successful with this solution for years and now we should change it? Why?
  • Who ensures that this other platform team does even know what we need?

It all starts with the business cases and incentives. If the departments which own the existing solutions have to sacrifice any portion of their revenue for the sake of “some global optimization” they will ask for the return of invest. So a business case must be developed with the business owners of the existing solutions that convinces everyone about the economic justification for the project – even if individual people will have to pay more in the first place. If this is not given, there will be no motivation for the solution owners to do any change at all.

This done, the technical aspects will become important. The platform will only be accepted by the solution developers if it provides the correct features in sufficient quality along with acceptable developer habitability (see above). If solution architects or developers do not get support in using the platform or they see increased efforts in adapting it, they will jeopardize the business case with effort estimations for the migration. And they will find may flaws and issues at the platform if they don’t like the idea that somebody else will provide critical components for their solution on which they do not have full control.

It is not only that the solution architects and developers have to use the new platform APIs to reuse its functionality. It is also the case that they have to abandon some of their sovereign rights. Up to now they were fully in control of the code that builds their solution and now they have external dependencies. Typically, people don’t like this and so they will search for reasons why not to do it. It is a human attribute which can only be overcome by motivation through conviction. They need to understand and believe in the good of the approach in the first place. That requires intense communication from the very beginning and really good reasoning. All stakeholders need to be involved and should be part of the decision process in order to guarantee a buy in. If people are just confronted with orders from their superiors – which in the worst case are not even explained – they will form resistance.

External dependencies were mentioned here as a loss of power of the own solution. This is true also in another dimension as well: Roadmaps and project planning. A solution that is based on a platform requires a stable release of that platform before the solution can be released. At this point, the technical dependencies turn into business relevant dependencies when it is about time-to-market and release cycles. The solutions may have external (customer driven) forces which urge for bug fixes, patches and updates. As long as the problems can be fixed on solution side, there will be no new problem. But platform issues will also be found which then will put the customer pressure of every solution down to the platform team. They have to deal with the conflicting priorities and have to service their provisioning. The solution teams will experience a reduced freedom and autonomy in their development which does not only have business impact but also will influence the motivation of the team. This should be considered.

The platform team therefore has to come up with a cross-solution, cross-customer roadmap and must be able to negotiate priorities among the solution projects (if known at all, see above, not all platform teams know their users). A strong leadership, transparent decisions and communication skills are vital for success.

In summary, the platform ownership means making tough decisions, communication is crucial when dealing with contradicting requirements even in situations of high pressure from different solution projects and customers. The organizational settings and responsibilities need to be clearly defined so that all stakeholders do know at all times who can make the final call if tough decisions must be made. Decentralization in the platform team, e.g. by building service teams that all have a clear but rather small scope, may help in a way; these teams can work on local priorities and so avoid decision paralysis.

Summary

Software platforms are software system and so all the best practices in software engineering apply as well. In platform development, however, you need to specifically consider the operational qualities of the system during runtime; ensuring platform stability while scaling to thousands of clients that use your platform. And you need to specifically consider who your customer is, the developers that build solutions on top of your platform, and serve them well. These two areas come with complex and expensive challenges. If you master them, you might be on a great path to build a sustainable and successful platform system.

We also discussed the need to run your platform as a service, with a business model, measured with KPIs and delivered as a managed asset to your users. No matter if your users are internal (the same organization) or external. It is crucial for the platform team to be able to prioritize work based on customer value and have a clear understanding of the cost and price of platform services. With this the platform approach is measureable and every (potential) user of the platform can decide to use or not to use it based on facts. This is essential to make the platform a commercial success.

If you are considering to run a platform that should consolidate existing solutions or products, factor in the social challenge.

Categories: Platform DevelopmentTags: ,

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.