The purpose of this document is to provide DataStax Enterprise software implementation approaches, as well as, lead and support solutions for non-technical team members. The intended audience of this document is: Organizational Leaders, Program/Project Managers, Architects, Team Leaders (Management), and other team members who lead and manage people and their projects. That is, this document is geared to the people who must answer the question: who does what? Even though this is not a technical document, engineering and technology-focused individuals may find it beneficial for project execution.
The document is broken into the following sections:
-
Introduction
-
High-level Project Approach
-
Risk Management
-
Suggested Skills
-
Execution Preparation Matrix
-
Conclusion
Today’s large, established, and traditional enterprises are experiencing consumer driven pressure to transform the way in which the enterprise interacts with their customers. These businesses are turning to DataStax Enterprise as the technology that enables the business process transformation. Most of these companies have moved from more traditional business pratices to Internet Enterprise business practices. The industry standard to date, includes business transactions that are dictated by well-defined, strictly-controlled customer interactions. The trend is shifting away from business practices which control how a customer buys, communicates, and receives services to business prcatices that promote immediate responses to customer demands.
Like any business transformation initiative, the method of execution used to implement the business transformation is almost equally as vital to the success of the transformation as the technology used to enable the transformation. The DataStax Enterprise platform is proven technology which can offer the right responses to the modern customer’s demands. This guide seeks to provide a proven approach to implement this technology, resulting in the desired business transformation for large enterprises.
This document provides a non technical, comprehensive approach for the implementation of DataStax Enterprise to efficiently and effectively achieve the enterprise business transformation. The sections of this document have been deliberately selected to help enterprise leaders understand key project success items and importantly, the key risk items that must be carefully managed to avoid undesired setbacks during implementation.
The high-level approach for implementing a system on the DataStax Enterprise platform requires similar approaches to developing a distributed, customer facing, revenue generating (i.e. mission critical) application. For example, planning, communication, and execution are all required items for the success of mission critical applications and are also required for implementing systems on DataStax Enterprise.
The following graphic highlights the unique/required steps in a high-level approach for implementing a solution on DataStax Enterprise.
Note: There are a couple of key project lifecycle phases explicitly omitted from this diagram, as they contain no DataStax-specific, Discovery, Planning, and Production Deployment items. Further items not depicted in this graph, such as application and functional/security requirements are assumed to be included in the development approach.
This diagram depicts a methodology-agnostic approach to projet implementation. These phases could be included as major milestones within a Waterfall, Agile, Kanban, or other project management methodology.
The following list provides detail and context for the high-level approach diagram.
-
Requirements Phase
-
DataStax Milestones:
-
Data Model Requirements
-
Security and Encryption Requirements
-
Service Level Agreements
-
Operational Requirements (Monitor and Manage)
-
Search Requirements (Optional – Only for DataStax Search)
-
Analytics Requirements (Optional – Only for DataStax Analytics)
-
-
The pervasive sentiment in the Apache Cassandra community as well as in the DataStax Enterprise community is that one of the keys to success is "getting the data model right". To enable a scalable data model, specific data model requirements are required.
-
For next generation, transformation, upgrade, etc. projects, a great starting point for data model requirements is to enable query level logging from within the existing database. Then, sort the query logs in order of occurrence, starting with the most accessed queries first. These queries will provide most, if not all, of the requirements needed to produce the data model for DataStax.
-
For new application/functionality requirements, treat the requirements phase of the project as you would in acheiving any API requirements effort. That is, define specific Create, Read, Update, and Delete (CRUD) requirements with a special focus on the Read requirements. Specific requirements for the WHERE or BY clause of read operations are required for successful data model design.
-
-
DataStax Security and Encryption requirements encapsulate the following areas:
-
Authentication Requirements (i.e. Kerberos, Password, SSL, LDAP, etc.)
-
Authorization Requirements (i.e. access to Schema, Table, or other database components)
-
As DataStax Enterprise is a distributed system, encryption requirements should be defined at 2 distinct levels (note, compression design choices will occur at this level as well)
-
Client Application to DataStax (the Cluster)
-
Node-to-Node (Inter-Cluster)
-
-
-
Defining Service Level Agreements (SLAs) for each CRUD operation (in terms of latency measured in milliseconds), as well as for system uptime is highly recommended to guide the design and delivery of the solution. An absence of SLAs is a project management failure, which has a high probability of increased project duration and decreased product quality.
-
Chances are that you are working to build a mission critical application that will function at a very large scale serving millions or more of customer requests per day. Defining the requirements for the operational monitoring and management of the system is highly recommended during this phase of the project. There is a large risk that post-production system issues go either undetected or require an increased amount of time/duration/effort to resolve if clear operational requirements are absent from the onset of system implementation.
-
If the project’s scope includes DataStax Search components, then, similarly to data model requirements, search requirements are required at this stage to provide enough clarity to develop the DataStax Search views (SOLR cores) that will enable search functionality. The requirements should be clear enough to determine the fields that will be searched on and returned in the results. The requirements should be clear enough to delineate how search will be conducted, i.e. multiple search fields or single search field, the use of faceted results vs. ranked list results, etc.
-
If the project’s scope includes DataStax Analytics components, then Analytics requirements should be captured at this time. Analytics requirements should incorporate the statistical algorithms, required data sources, data movement/modifications, security/access, and other analytical requirements at a clear enough level to enable a thorough design.
-
-
Design Phase
-
DataStax Milestones:
-
Data Model Design
-
Data Access Object Design
-
Data Movement Design
-
Operational Design (Management and Monitoring)
-
Search Design (Optional)
-
Analytics Design (Optional)
-
-
The Data Model design should include the following components in a clear format that all team members can understand. The following link will provide in depth reference material for data modeling in DataStax: http://www.datastax.com/resources/data-modeling.
-
Keyspace Design (Replication Strategy, Name)
-
Table Design (Table Names, Partition Keys, Clustering Columns (if applicable), and physical table properties as necessary (i.e. encryption, bloom filter settings, etc.)
-
Any relationships between tables. Note that database joining within DataStax Enterprise is not technically feasible. However, relationships between tables are still important, especially for the application developers.
-
-
Applications built on DataStax Enterprise are more successful when applications leverage simple Data Access Objects to encapsulate and abstract data manipulation logic. This is opposed to the current trend in application development, where projects leverage frameworks to encapsulate, abstract, and represent database components as application objects, i.e. Hibernate, LinQ, JPA, ORM, etc. Designing the Data Access Object, as much as possible, up front will help the application development team as they build out higher-level functionality.
-
Data Movement design includes items such as batch and real-time data integration between systems, ETL, Change Data Capture, data pipelines, etc. Capturing data transformation logic clearly is essential to the success of data integration initiatives. Items such as data types, transformation logic, error handling, look-ups, and data normalization should be clearly documented as part of Data Movement design.
-
Operational Design includes topics such as tooling and the techniques used to deploy new nodes, configure and upgrade nodes in the cluster, backup and restore operations, cluster monitoring, Opscenter use, repairs, alerting, disaster management processes, etc. Several organizations leverage a "playbook" approach to Operational Design.
-
It is recommended to incorporate items such as searchable terms, returned terms, tokenizers, filters, multi-document search terms, etc. in the Search Design for each searchable view, SOLR Core, that will be included in the application. Please see here for more information on the items available for design with DataStax Search - http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/srch/srchTOC.html.
-
When working with DataStax Analytics, it is important to first determine which Analytics components will be leveraged in the solution. Once that decision has been made, then specific, functionally aligned design items should be produced, such as Hive table structures, Map Reduce workflows, etc.
-
-
Implementation Phase
-
DataStax Milestones:
-
Infrastructure
-
Deployment and Configuration Management Mechanism
-
Software Components (Data Model and Application)
-
Unit Testing of Components
-
-
This phase of the approach is typical for any type of software project. This is where "things" are actually built and implemented. Building out infrastructure and software components do not require any special DataStax centric highlights.
-
Deployment and Configuration Management Mechanisms are going to be key to managing a distributed system. It is recommended that all operational items are automated, as much as feasible, to optimize the process of deploying and/or configuring nodes in the cluster. Tools like Opscenter, Docker, Vagrant, Chef, Puppet, etc. can be leveraged to help quickly deliver the operational components necessary to manage the full software solution.
-
Unit Testing of functionality becomes a bit more complex with distributed systems compared to single node systems. Specific defects, such as race conditions, are only observed "at scale". Because of this, it is recommended that unit testing be executed over a small cluster, that contain more than a single node. Tools such as ccm can be used by developers to automate the process of quickly launching test clusters as part of a unit test.
-
-
Pre-Production Testing Phase
-
DataStax Milestones:
-
Defect tracking items (JIRA, Log of Issues, etc.)
-
Operational readiness checklist completed
-
-
This is perhaps the most critical phase of this approach. This phase enables the project team to identify actual issues prior to going to production. As stated in the unit testing section, specific defects will not be observed until the software solution is functioning "at scale" under normal and extraordinary conditions for a period of time. These steps are deliberately provided in the approach to enable the identification of "at scale" problems preemptively.
-
This phase should encompass a two week period where, at minimum, one of the weeks is dedicated to running the application at production scale. Only observations should be made during this period of the project. Note that it may take several iterations of configuration, code change, and refactoring to enable the application to execute for a full week. The one week recommendation to ensure there are enough data points to conclude that the application and infrastructure are adequate to handle a production workload. Apache Cassandra needs to be stressed for this amount of time to determine if read performance degrades, due to compaction design items, or if it remains acceptable.
-
Here is a list of items that should be included in an Operational Readiness Checklist for DataStax Enterprise:
-
Replace a downed node and a dead seed node
-
Configure and execute repair (ensure repair completes within GC_Grace_Period)
-
Add a node to a cluster
-
Replace a downed Data Center
-
Add a Data Center to the cluster
-
Decommission a node
-
Restore a backup
-
At a Cluster Level and Per Node Level, report on errors, throughput, latency, resource saturation, bottlenecks, compactions, flushes, and health
-
-
-
Scale and Enhancements Phase
-
This phase is provided to highlight the normal, operational mode of an application built on DataStax Enterprise. This is a predictable eventuality which can be addressed by adding nodes to expand capacity to the system. Scaling with DataStax Enterprise is as simple as that.
-
As mentioned above, this approach is methodology-agnostic. The stages in the approach can be executed as single, individual phases in a Waterfall approach or by iterating over each phase in small, horizontal slices of functionality that include a facet of each phase. Please note that Pre-Production testing should be executed as a single phase including all planned functionality for Production deployment.
There is an additional approach that shows how small, agile teams can go from Prototype (PoC) to Production without much refactoring. Here is a link to referenced approach - http://www.slideshare.net/planetcassandra/jake-luiciani-poc-to-production-c
The attached presentation is intended for technical audiences. It provides some good details on data modeling as well as Pre-Production testing. The main takeaway is that, if the PoC is well constructed, then you can move directly into the Pre-Production testing phase of this approach, skipping the requirements through implementation phases. This highlights the scaling advantage of Apache Cassandra and DataStax Enterprise.
What would a technology project be without risk? That’s a trick question, because without risk, there are no rewards. This is especially true for the types of transformational applications that are being built on top of the DataStax Enterprise platform. The huge scale that DataStax Enterprise can enable at millions of transactions per second with tens of Petabytes of live data, transforms small risk into large issues if the initial risk is not identified and managed.
Some key areas involving project risk management for DataStax Enterprise and Apache Cassandra are addressed below. This section does not provide an exhaustive list of risk management items for large, distributed applications; only DataStax-centric items are covered.
Risk Item |
Description |
Impact Severity |
Mitigation Effort |
Potential Impact |
Mitigation Technique |
Shared Storage |
Using a shared storage disk system to store data within Apache Cassandra/DataStax Enterprise. Shared Storage could be NAS, SAN, Amazon EBS, etc. See here for information on the risk of Shared Storage. |
Critical |
Large |
Revenue impacting due to
|
|
Relational Model Port |
Team wants to "port" or move an existing relational data model into DataStax with redesigning the model for Apache Cassandra. This item may appear to save time on a project by skipping "steps", but will cost more time/resources in the full duration of the project. |
Critical |
Large |
Revenue impacting due to
|
|
Lack of "At Scale" Testing |
Placing items into Production without Production-like load testing over many days. Testing for too short of a period, less than 5 days, is the equivalent of not testing due to the manner of which DataStax manages data files. |
Critical |
Large |
Revenue impacting due to
|
|
Slow Network Connections |
The network used to connect nodes to other nodes or client applications is not fast enough or large enough to handle the amount of network traffic that will be placed onto it from the full application. This involves DataStax Enterprise and the client application stack. |
Critical |
Large |
Revenue impacting due to
|
|
Lack of Operational Readiness |
DataStax Enterprise is built on the premise of operational simplicity, but it is always a good idea to ensure the operations team is prepared to manage the system in production prior to deploying the system. |
Critical |
Medium |
Revenue impacting due to
|
|
Lack of Security |
No security in included in the system, i.e. no authentication, authorization, nor encryption is included. |
Critical |
Medium |
Data breach, revenue, profit, etc. impacting due to
|
|
Lack of Training |
DataStax Enterprise leverages new technologies to perform operations that engineers have been executing for years but at a scale that engineers have previously not been able to accomplish. Like any new, transformational technology, proper knowledge is required from the full project team to be successful. |
Critical |
Medium |
Revenue impacting due to
|
|
Incorrectly Sized Machines |
DataStax node machines are sized either too small (specifically CPU and Memory) or too large (specifically disk) compared to specifications found here or here. |
Critical |
Medium |
End User experience impacting
|
|
Incorrectly Sized Cluster |
The total amount of processing (CPU, RAM) or disk space is not adequate enough to handle both anticipated "normal" load as well as exceptional "spike" load. |
High |
Medium |
Revenue impacting due to
|
|
Too Many Tables |
A data model is created that will contain more than 500 tables within a single DataStax cluster. |
High |
Medium |
Revenue impacting due to
Operational capabilities impacted due to
|
|
Large Data Values |
Storing data values that are larger than 10 MBs per column or 100 MBs per row (called a Cassandra partition) is not a good design for DataStax. |
High |
Medium |
End User experience impacting due to:
|
|
Cross Cluster Operations |
Any client operation that preforms "Cross Cluster" operations, such as reading or writing leveraging QUORUM Consistency Levels, means that the operation is including all Data Centers in the operation read/write path. |
High |
Small |
End User experience impacting due to:
|
|
Heavy Use of Secondary Indexes |
Data models that rely on more than two arbitrarily chosen indexes based on heuristics, secondary indexes to satisfy query requirements are over using secondary indexes. This risk item indicates an issue with the data model design. |
Medium |
Medium |
Revenue impacting due to
|
|
Lack of Requirements |
Lack of clear requirements, particularly to help guide the data model design. Or, constantly changing data model requirements. |
Medium |
Small |
Extended implementation duration due to:
|
|
Active-Passive Architecture |
A stand-by Data Center is included in the Cluster topology design. Stand-by Data Center means that a Data Center is included in the system infrastructure but won?t be actively used by any client applications. |
Low |
Small |
Increased Project Cost due to
|
|
The required skillset for the development effort depends on the type of application being built. This section discusses specific skills that will enable successful deployment and application builds with DataStax Enterprise. These skills could be supplied by one or several team members/roles. Individual team members that possess the listed skills are very valuable assets for implementation; the individual is able to work across all technologies included in DataStax Enterprise.
For DataStax Enterprise implementations that will leverage Apache Cassandra only, i.e. no Analytics or Search components, the following skills and roles are recommended.
Skill | Description | Impact to Project |
---|---|---|
Linux Experience |
Team members who posses a deep understanding, and have several years of administration experience, of the Linux operating system are required for DataStax Enterprise implementations. Specifically, the Linux skillset requires "know-how" for system diagnosis/monitoring, network troubleshooting, software installation,disk/partition configuration, os administration. Deep expertise is very beneficial for troubleshooting purposes. Apache Cassandra is tightly integrated with the Linux system and relies on the Linux OS for items such as disk management, cache management, etc. |
The project will benefit from this skill during the following tasks:
|
Java Experience |
Even for teams that choose a different technology for the application, having someone on the team who is knowledgeable about Java in general, and the JVM specifically, will benefit the team. Apache Cassandra, which powers DataStax Enterprise, is a JAVA application. Though standard and recommended JAVA configurations are included with Cassandra, having a team member who can tune JAVA and the JVM, will benefit the performance of Apache Cassandra and DataStax. |
The project will benefit from this skill during the following tasks:
|
Distributed Systems Development Experience |
DataStax Enterprise is a distributed system. Therefore, it has some unique nuances to it, which pose design/development challenges for application developers. Team members who have worked with distributed systems will benefit the team. |
The project will benefit from this skill during the following tasks:
|
Automated Configuration and Deployment Experience |
It is common for deployments of DataStax Enterprise to leverage tens to hundreds, if not thousands, of nodes. Having a team member who can automate the deployment and configuration of these nodes is very beneficial to the project. |
The project will benefit from this skill during the following tasks:
|
Physical Data Modeling Experience |
DataStax Enterprise relies on the data modeling guidelines of Apache Cassandra. These guidelines align more the dimensional modeling compared to 3rd normal form (3NF) data modeling found in most online applications. Having a team member who has experience in both dimensional and 3NF physical data modeling will be beneficial for the team. |
The project will benefit from this skill during the following tasks:
|
For DataStax Enterprise implementations that will leverage DataStax Analytics components, the following skillsets are recommended.
Skill | Description | Impact to Project |
---|---|---|
Data Analysis (Analytics) Experience |
Regardless of the tooling deciscions for DataStax Enterprise, having a team member who is competent in Analytics will be an asset to the team. This skill will enable a consultative/guidance role for the project team. This can help the team chose the appropriate algorithms, pipeline techniques, and visualization techniques for Analytics. |
The project will benefit from this skill during the following tasks:
|
Hadoop Experience |
If Hadoop, or one of its components will be leveraged for the DataStax Analytics tool, then someone with the appropriate Hadoop toolset will help. They will need to augment the team and provide experience with these tools. |
The project will benefit from this skill during phases of the project. |
Spark Experience |
If Spark will be leveraged for the DataStax Analytics tool, then someone with Spark experience will be beneficial to the team. This skillset implies some experience with Scala as well. |
The project will benefit from this skill during phases of the project. |
For DataStax implementations that will leverage DataStax Search components, the following skillsets are recommended, in addition to what has been presented for Apache Cassandra.
Skill | Description | Impact to Project |
---|---|---|
SOLR Experience |
Apache SOLR is the underlying technology used by DataStax to provide search functionality. This tool is very powerful, but has a lot of options. Having a skilled SOLR team member will be beneficial to the project. |
The project will benefit from this skill during phases of the project. |
This Matrix provides project leaders with a way to quantify their teams execution capabilities and preparedness.
The matrix summarizes and quantifies the items highlighted here. This quantified method will help Project Managers determine if the application and team are ready for Production. A total score less than 60 means that the application and team are not ready for Production. Note the weighting scales are different per topic. Specifically, the Pre-Production Testing phase is weighted very heavily to emphasize its importance.
To use this matrix, simply place a check, or other, mark in the box that applies per topic item. Then, once all items have been checked, summarize the score and compare it to the Production threshold of 60. Please contact DataStax for assistance if any of the topic items are deficient.
This matrix can also be used to pinpoint issues within the team or application.
Topic |
Item |
|||||
Approach Total |
||||||
Requirements |
Incomplete (0) |
Mostly Incomplete (2) |
Some Parts Complete (3) |
Mostly Complete (4) |
Complete (5) |
|
Data Model Requirements |
||||||
Security and Encryption Requirements |
||||||
Service Level Agreements |
||||||
Operational Monitoring and Management |
||||||
Design |
Incomplete(0) |
Mostly Incomplete (2) |
Some Parts Complete (3) |
Mostly Complete (4) |
Complete (5) |
|
Data Model Design |
||||||
Data Access Object Design |
||||||
Data Movement Design |
||||||
Operational Design (Management and Monitoring) |
||||||
Search Design (Optional) |
||||||
Analytics Design (Optional) |
||||||
Implementation |
Incomplete(0) |
Mostly Incomplete (2) |
Some Parts Complete (3) |
Mostly Complete (4) |
Complete (5) |
|
Infrastructure |
||||||
Database Components |
||||||
Application Components |
||||||
Deploy and Configuration Mechanisms |
||||||
Unit Testing Components |
||||||
Pre-Production Testing |
Incomplete(-10) |
Mostly Incomplete (-5) |
Some Parts Complete (1) |
Mostly Complete (5) |
Complete (10) |
|
Executed for 2 Weeks |
||||||
Issue Tracking and Resolution |
||||||
Operational Checklist |
||||||
Deploy and Configuration Mechanisms |
||||||
Risk Total |
||||||
Critical Risk Severity |
Non Existent (1) |
Exists (-5) |
||||
Shared Storage |
||||||
Relational Model Port |
||||||
Lack of "At Scale" Testing |
||||||
Slow Network Connections |
||||||
Lack of Operational Readiness |
||||||
Lack of Security |
||||||
Lack of Training |
||||||
High Risk Severity |
Non Existent (1) |
Exists (-4) |
||||
Incorrectly Sized Machines |
||||||
Incorrectly Sized Cluster |
||||||
Too Many Tables |
||||||
Large Data Values |
||||||
Cross Cluster Operations |
||||||
Medium Risk Severity |
Non Existent (1) |
Exists (-3) |
||||
Heavy Use of Secondary Indexes |
||||||
Lack of Requirements |
||||||
Low Risk Severity |
Non Existent (1) |
Exists (-2) |
||||
Active-Passive Architecture |
||||||
Skillset Total |
||||||
Level of Expertise |
Beginner (1) |
Novice (2) |
Competent (3) |
Advanced (4) |
Expert (5) |
|
Linux Experience |
||||||
Java Experience |
||||||
Distributed Systems Development Experience |
||||||
Automated Configuration and Deployment Experience |
||||||
Physical Data Modeling Experience |
||||||
Data Analysis (Analytics) Experience (Optional) |
||||||
Hadoop Experience (Optional) |
||||||
Spark Experience (Optional) |
||||||
SOLR Experience (Optional) |