This is the first post from series of blog posts related to ‘Must Have Key Business Analytics and Insights‘.
Road to building good big data architecture
In this post I will walk you through key requirements and how to define and build target audience (individual business entity or unit) driven business analytics; and in subsequent posts will go deeper in to each target audience.
In today’s technology world, every company should have business insights and analytics; which directly helps one or all of the following aspects by driving the business to next level.
- user acquisition
- user experience
- user retention
- user engagement
- drive more revenue
- growth hacking
As part of day-to-day implementation of big data analytics platform through scalein consulting for various companies; we normally get pretty mixed requirements in first cut, except in few cases where the business owners know what data points they need.
In many cases we have to understand the business logic and what the immediate and long-term goals are and walk through the common data metrics, key performance indicators (KPI’s) by explaining how that is applicable to business as a whole and its individual entities (like product, marketing, operations, etc.) separately by going through series of Q&A.
Define the requirements
In general the required analytics are defined by immediate business needs or by exploring what data segments can drive the business insights or combination of both.
Being said, the requirements will be loosely defined by the following approaches :
- top down approach, where requirements define what needs to be exposed through the data. Few example requirements:
- how many users are signing on day-to-day and average user sign-up rate
- how often users are coming to the platform and whats the average duration and spend
- identify and bucket users based on their lead source, activity, spend, demographic; so that marketing team can use this data points for effective campaigns for new user acquisition
- what features will be used by what percentage of users; so product team can focus
- predict user behavior within the platform (experience, spend, drop-outs, etc); so business executives can make key decisions
- bottom up approach, identify and expose as many key data points as possible from currently available data for consumption, so that the business can take advantage of it. For example:
- user usage, spend, login patterns (for both good & bad)
- bucket-ing (new ) user demographic, profile data (gender, age, interest, income)
- identify users who will be a target for promotional emails (users who did not logged in for a while, least spend, new feature that may interests some users etc)
- combination of bottom up & top down, where both requirements and available data points helps to drive what needs to be built and exposed to business users. This is the most common approach with many of our customers.
Target audience, prioritization
Without knowing the target audience and/or whats the end goal is; it is very hard to implement any analytics. One should avoid implementing analytics platform without knowing the end goal or business requirements.
Once the requirements defined; then the next logical step is to segment them to individual target audience within the business entity for data consumption and refine how they will be able to consume the data.
- Marketing focused
- Executive focused
- Product centric
- Business user focused (employees “ engineering, operations, etc.)
Data consumption, presentation
Once the requirements is segmented to target audience and prioritized (who needs what first, is it marketing team or product team or executives), then we have to understand how to present the data in a most usable form to these business users.
- SQL as an interface + R integration for data analysts (MySQL, Vertica, Redshift, …); so that they can sample, identify patterns, segment and predict the histogram
- Simple fixed reporting (email or dashboards with histogram and trending)
- Reporting with slice and dice capabilities (tools like tableau, pentaho, platfora, domo, ..)
- XLS/CSV dumps
Finally to the architecture
After having in-depth understanding of what needs to be expected, then we can propose and implement high performance and highly scalable back-end data architecture, which includes:
- identifying various data sources that has to be sourced first, and how to extract (scripting or programming language)
- transformation, aggregation and loading (ETL tools like kettle, talend, etc or simple hive/pig latin/SQL can work or custom scripting)
- what analytical data store to use (hadoop for everything or use vertica /greenplum /netezza /teradata /redshift as data store).
In the upcoming posts, I will be going in-depth on each of the target audience (product, executive, marketing and business user) and what kind of data that needs to be collected, sourced and exposed to each of these segments and how these data points helps to do their job on day to day.