Here is the typical “Big” data architecture, that covers most components involved in the data pipeline. More or less, we have the same architecture in production in number of places (with some varying components due to data sources and data consumption is varied from company to company).
“Big” Data Architecture
Any data architecture loosely consists of four major logical components:
1. Data Source
True source of data coming from heterogeneous data sources. This is typically your data stores (SQL or NoSQL) that gives a structured data or any other data coming through APIs or other means (semi-structured or un-structured).
-
Data from SQL, NoSQL stores (MySQL, Oracle, PostgreSQL, MongoDB, etc. – Mostly structured)
-
(Semi/Un)-structured data (CRM, marketing, campaign, spend, revenue, leads, etc.)
-
Web logs or other log files (weblogs, user clicks, user visits, activity, etc.)
2. Data Transformation
Transformation of data from one form to another, its either part of ETL (Extract, Transform and Load) or import/export tools and/or scripts. Mainly used to load all sources of data into data processing pipeline.
Log management tools can also be considered as part of ETL, as they generate useful events from log files and present dashboard with alerting system in place or they can be directly loaded into data processing stores.
-
ETL, ELTL tools (bash/python/perl/java scripts, Business Objects, SSIS, Kettle, etc.)
-
SQOOP (Data Source to Hadoop data transformation tool, JDBC compatible)
-
Import/Export tool (SQL/NoSQL vendor specific tools)
-
Log Management tools (Splunk, Syslog, Custom log filter scripts, flume, scribe, loggly, etc.)
3. Data Processing or Data Integration
This is yet another source of vast data by combining both structured and un-structured data in one place (either real-time or incremental loading); mainly for data processing (Data Warehousing or Analytics) and generate usable data (materialized or aggregated) that can be consumed by data consumption components.
-
Hadoop and Ecosystem (Hadoop/HDFS, Map-reduce, HBase, Hive, Impala, Pig etc) – uses HDFS as native storage
-
Data Warehouse and Analytics solution (MySQL, SQL Server, Vertica, Green Plum, Aster Data, Exadata, SAP HANA, IBM Netezza, IBM Pure Data, Tera Data, etc.) – Uses vendor specific storage, optionally uses HDFS, even though with degraded performance.
-
In-memory Analytics (SAS, Kognitio, Druid, etc.). This is an emerging market and trying to take advantage by reading directly from HDFS. We will see lot of in-memory analytics in coming days.
4. Data Consumption
Data consumption components that either consumes or exposes the data in usable form to end users or to other layers internally (ad-hoc) or externally (using APIs)
-
Reporting (custom dashboards, micro strategy, pentaho, business objects, cognos, hyperion, tableau, etc.)
-
Search or Discovery (solr, elastic search, tibco spotfire, datameer etc.)
-
Data Science, Mining and Analysis (mainly for internal data analysis to predict or estimate the overall performance and also drive recommendation using set of algorithms, user defined map-reduce jobs or ad-hoc queries)
Apart from the four logical components, monitoring plays a crucial role in detecting any failure within the data pipeline along with threshold changes to identify any bottlenecks in terms of performance, scalability and overall throughput.



New blog post: Typical “Big” Data Architecture – http://t.co/i4rJYshu
New #mysql planet post : Typical “Big” Data Architecture http://t.co/jqjqKaPy
Really typical but good summarized / Typical “Big” Data Architecture B! http://t.co/AgNBRpZb
Typical “Big” Data Architecture http://t.co/B6s0JzSt via @zite
Typical Big Data Architecture – http://t.co/MlOaavi4
Typical “Big” Data Architecture http://t.co/ecBtcaiC via @zite
Venu blogs a “typical” data architecture. He missed Memcached and storage later, but its too real and ugly: http://t.co/kXtKq5MW
Typical “Big” Data Architecture http://t.co/ZOoAEPpY
Typical “Big” Data Architecture http://t.co/j6yNm9vA via @zite
Typical “Big” Data Architecture http://t.co/ZQdpwEHL via @zite
Typical “Big” Data Architecture http://t.co/sWpjusiw via @prismatic
Typical “Big” Data Architecture « Venu Anuganti Blog http://t.co/kcmmE4h2 @stephdokin @vishaltx
Simple and to the point RT @bobehayes: Typical “Big” Data Architecture « Venu Anuganti Blog http://t.co/8NsKNS0k @stephdokin @vishaltx
Typical “Big” Data Architecture http://t.co/KSuDYSA8
“Typical “Big” Data Architecture” http://t.co/InCbOB5s – recommended via @Prismatic
Venu Anuganti Blog Typical Big Data Architecture http://t.co/0VS42FCc #BigData
Typical “Big” Data Architecture http://t.co/ouhZSoMh via @prismatic
Venu Anuganti Blog » Typical “Big” Data Architecture: http://t.co/7ENDcjt7
Where is the support for unstructured data? The architecture is missing support for natural language processing and machine learning approaches to detect patterns in human languages. Not everything fits into a table. ETLs can’t accurate parse language into tabular data.
Typical “Big” Data Architecture http://t.co/IX1WPjTX via @zite #bigdata
Typical “Big” Data Architecture http://t.co/Tgv3XSwo #bigdata
Thanks Olin. NLP and Text mining is something that is missing (esp on how to extract); I will probably cover that as separate topic in coming days as it has its own significance.
Include #Pentaho Yeah! RT @imbigdata: Venu Anuganti Blog Typical Big Data Architecture http://t.co/CavgPvdM #BigData
Typical “Big” Data Architecture http://t.co/XiyDoknB #BigData
Typical “Big” Data Architecture http://t.co/fFWMzL1y via @zite
Typical “Big” Data Architecture http://t.co/1KZRVU9t @nilo83link @truccomario @marco_gallinari please warm up your keyboard!
Typical “Big” Data Architecture. http://t.co/wvqV8yig
MT @stephdokin @bobehayes: Typical #BigData Architecture http://t.co/JLYtBftz @stephdokin @vishaltx vía @BigDataClub
Typical BIG Data Architecture – http://t.co/mDW39Tmr
Simple and to the point RT @bobehayes: Typical “Big” Data Architecture « Venu Anuganti Blog http://t.co/8NsKNS0k @stephdokin @vishaltx
Simple and to the point RT @bobehayes: Typical “Big” Data Architecture « Venu Anuganti Blog http://t.co/8NsKNS0k @stephdokin @vishaltx
Typical “Big” Data Architecture http://t.co/fFWMzL1y via @zite
Simple and to the point RT @bobehayes: Typical “Big” Data Architecture « Venu Anuganti Blog http://t.co/8NsKNS0k @stephdokin @vishaltx
Typical “Big” Data Architecture http://t.co/1KZRVU9t @nilo83link @truccomario @marco_gallinari please warm up your keyboard!
Typical “Big” Data Architecture. http://t.co/wvqV8yig
Typical BIG Data Architecture – http://t.co/mDW39Tmr
Typical “Big” Data Architecture http://t.co/uQ9UJypR
Typical “Big” Data Architecture http://t.co/WfCnyFqC via @prismatic
http://t.co/zZWpm5fm Typical “Big” data architecture, that covers mo… http://t.co/rA95gAnH
#Fail: This typical “Big” data architecture is missing everything unstructured. Big Data requires machine learning. http://t.co/2io7PjHK
#Fail: This typical “Big” data architecture is missing everything unstructured. Big Data requires machine learning. http://t.co/2io7PjHK
#Fail: This typical “Big” data architecture is missing everything unstructured. Big Data requires machine learning. http://t.co/2io7PjHK
Typical “Big” Data Architecture http://t.co/uQ9UJypR
Typical “Big” Data Architecture http://t.co/WfCnyFqC via @prismatic
http://t.co/zZWpm5fm Typical “Big” data architecture, that covers mo… http://t.co/rA95gAnH
Typical “Big” Data Architecture http://t.co/7E2kiANL via #bigdata
surprised to see about 11 emails so far in my inbox about missing NLP component from this blog post http://t.co/Psw6UAJb
Typical “Big” Data Architecture http://t.co/OY9WcifT
#iaflash Typical “Big” Data Architecture http://t.co/HrAIHrg2 http://t.co/CupBjopd
Typical “Big” Data Architecture http://t.co/J3skyrPU via @zite
Typical “Big” Data Architecture http://t.co/kY2LoJiG
Typical “Big” Data Architecture http://t.co/fFWMzL1y via @zite
Typical “Big” Data Architecture http://t.co/J3skyrPU via @zite
Typical “Big” Data Architecture http://t.co/kY2LoJiG
Typical #bigdata #architecture – http://t.co/KS7uniYQ #bbdd #web
Typical “Big” Data Architecture http://t.co/nEQprNSG via @zite
Typical #bigdata #architecture – http://t.co/KS7uniYQ #bbdd #web
Typical “Big” Data Architecture http://t.co/k8iD478G
Typical “Big” Data Architecture http://t.co/TSrdkZg4 me encanta ese gráfico
Typical “Big” Data Architecture http://t.co/mXFTywST via @zite
Thanks for such a useful article the database is always my favourite subject and you explained the things well.
Venue Anuganti's typical big data architecture: http://t.co/xMJTOxRh
Typical “Big” Data Architecture http://t.co/NQ1XJPuW via @zite
[...] Muzammil: Thanks for such a useful artic… [...]
RT @xguru: Typical “Big” Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
RT @xguru: Typical “Big” Data Architecture http://t.co/X8vZqfUX 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
[...] Familiar with various tools and components in the data architecture [...]
RT @vanuganti: Typical “Big” Data Architecture http://t.co/U7lXNSms
RT @xguru: Typical “Big” Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
[...] data architecture, that covers most components involved in the data pipeline.See on venublog.com 이것이 좋아요:좋아하기Be the first to like [...]
RT @xguru: Typical “Big” Data Architecture http://t.co/r4YgH8Zq 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
RT @xguru: Typical "Big" Data Architecture http://t.co/X8vZqfUX 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
RT @xguru: Typical "Big" Data Architecture http://t.co/r4YgH8Zq 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
readkev: http://t.co/4Xh2pPbK Typical “Big” data architecture, that … http://t.co/5ArGg64K
Typical "Big" Data Architecture http://t.co/yXgad9qI 너무 간략화 된것 같기는 한데.. 그림한장과 함께 짧은글로 정리되어서 좋네요
Venu Anuganti Blog » Typical “Big” Data Architecture http://t.co/LyuwIkD5
RT @vanuganti: Typical “Big” Data Architecture http://t.co/MXcXjSoo
RT @vanuganti: Typical Big Data Architecture http://t.co/MXcXjSoo
Typical “Big” Data Architecture http://t.co/JNTRrZj7
[...] The “typical” Big Data Architecture model above comes from one of my favorite sources of inspiration the Venu Anuganti Blog. [...]
[...] for all log events in big data analytics; which will avoid log processing needs as described in bigdata architecture and more than that, splunk only charges for storage and not for API [...]
[...] Article original : ici [...]
Typical ldquo;Bigrdquo; Data Architecture – http://t.co/iptCtu1Fgx via @vanuganti
Typical “Big” Data Architecture http://t.co/n7gbpOd3Rl (via @vanuganti)
Typical “Big” Data Architecture http://t.co/NkUmfO6LZr
Typical Big Data architecture http://t.co/j1zOl6HZXS
Not sure there is really a typical big data architecture, but but good example anyway. http://t.co/SKzXxtFdeM
Typical ldquo;Bigrdquo; Data Architecture – http://t.co/75Hrcy8rcN @vanugantiさんから
RT @jameskobielus: “Typical “Big” Data Architecture” (http://t.co/kYyXVbfbAj) JK–I’d put distancing quotes around “typical.”< my head hurts
RT @jameskobielus: “Typical “Big” Data Architecture” (http://t.co/sVRQX9ONMt) JK–I’d put distancing quotes around “typical.” There’s no typical architecture
Typical “Big” Data Architecture http://t.co/ER6mbRPzUj #bigdata
Typical “Big” Data Architecture | Venu Anuganti Blog – http://t.co/prmN0knSXI
Typical “Big” Data Architecture http://t.co/v5qynqHRwB
RT @OnSoftware: Typical “Big” Data Architecture http://t.co/ER6mbRPzUj #bigdata
Typical “Big” Data Architecture http://t.co/UH8k2Rueps via @prismatic #GoodRead
nice high-level big data infra:
http://t.co/QLFEekoBhJ
Typical Big Data Architecture – http://t.co/XPhKTTxBmz via @vanuganti #bigdata
“Typical “Big” Data Architecture” http://t.co/UcWGmI7u02 – read via @Prismatic
Typical “Big Data” architecture: http://t.co/bpGKFMciXS
Typical “Big” Data Architecture http://t.co/7csGPX0HDi via @prismatic
[...] Hana eventually becomes the core of SAP’s software as a service (SaaS) and more than that, the solution can replace need for many components like OLTP, NoSQL, ETL, Warehouse, Datamart and OLAP in the typical (big) data architecture. [...]
Typical “Big” Data Architecture
http://t.co/L2wKtVzrGE
[...] opportunities for learning about their customers and will have difficulties to understanding the required architectures to support business [...]
[...] opportunities for learning about their customers and will have difficulties to understanding the required architectures to support business [...]