尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
CNIT131 Internet Basics &
Beginning HTML
Week 15 – Big Data
http://fog.ccsf.edu/~hyip
What is Big Data?
• Big data is an all-encompassing term for any collection of data sets so
large and complex that it becomes difficult to process using on-hand
data management tools or traditional data processing applications. –
(From Wikipedia)
• Big Data refers to extremely vast amounts of multi-structured data
that typically has been cost prohibitive to store and analyze. (My
view)
• NOTE: However, big data is only referring to digital data, not the
paper files stored in the basement at FBI headquarters, or piles of
magnetic tapes in our data center.
Big Data – a Brief History
Types of Big Data
In the simplest terms, Big Data can be broken down into two basic
types, structured and unstructured data.
• Structured – Predefined data type
• Spreadsheets and Oracle Relational database
• Unstructured – is non pre-defined data model or is not organized in a
pre-defined manner.
• Video, Audio, Images, Metadata, etc…
• Semi-structured – Structured data embedded with some unstructured
data
• Email, Text Messaging
A Big Data Platform must:
• Analyze a variety of different types of information
This information by itself could be unrelated, but when paired with other
information can illustrate a causation for an various events that the business
can take advantage.
• Analyze information in motion
Various data types will be streaming and contain a large amount of
"bursts". Ad hoc analysis needs to be done on the streaming data to search
for relevant events.
• Cover extremely large volumes of data
Due to the proliferation of devices in the network, how they are used, along
with customer interactions on smartphones and the web, a cost efficient
process to analyze the petabytes of information is required
A Big Data Platform must: (2)
• Cover varying types of data sources
Data can be streaming, batch, structured, unstructured, and semi-
structured, depending on the information type, where it comes
from and its primary use. Big Data must be able to accommodate
all of these various types of data on a very large scale.
• Analytics
Big Data must provide the mechanisms to allow ad-hoc queries,
data discovery and experimentation on the large data sets to
effectively correlate various events and data types to get an
understanding of the data that is useful and addresses business
needs.
Five Characteristics of Big Data
• Big Data is defined by five characteristics:
 Volume: Data created by and moving through today’s services may describe tens of millions of customers, hundreds
of millions of devices, and billions of transactions or statistical records. Such scale requires careful engineering, as it
is necessary to carefully conserve even the number of CPU instructions or operating system events and network
messages per data items. Parallel processing is a powerful tool to cope with scale. MapReduce computing
frameworks like Hadoop and storage systems like HBASE and Cassandra provide low-cost, practical system
foundations. Analysis also requires efficient algorithms, because “data in flight” may only be observed one time, so
conventional storage-based approaches may not work. Large volumes of data may require a mix of “move the data
to the processing” and “move the processing to the data” architectural styles.
 Velocity: Timeliness is often critical to the value of Big Data. For example, online customers may expect promotions
(coupons) received on a mobile device to reflect their current location, or they may expect recommendations to
reflect their most recent purchases or media that was accessed. The business value of some data decays rapidly.
Because raw data is often delivered in streams, or in small batches in near real-time, the requirement to deliver rapid
results can be demanding and does not mesh well with conventional data warehouse technology.
Five Characteristics of Big Data (2)
 Variety: Big Data often means integrating and processing multiple types of data. We can consider most data sources as structured,
semi-structured, or unstructured. Structured data refers to records with fixed fields and types. Unstructured data includes text,
speech, and other multimedia. Semi-structured data may be a mixture of the above, such as web documents, or sparse records with
many variants, such as personal medical records with well defined but complex types.
 Veracity: Data sources (even in the same domain) are of widely differing qualities, with significant differences in the coverage,
accuracy and timeliness of data provided. Per IBM's Big Data website, one in three business leaders don't trust the information they
use to make decisions. Establishing trust in big data presents a huge challenge as the variety and number of sources grows..
 Variability: Beyond the immediate implications of having many types of data, the variety of data may also be reflected in the
frequency with which new data sources and types are introduced.
NOTE: Big Data generally includes data sets with sizes beyond the ability of commonly-used software tools to capture, manage, and process
the data within a tolerable elapsed time. Big data sizes are a constantly moving target, from hundreds of terabytes to many petabytes of data
in a single data set. With this difficulty, a new tool sets has arisen to handle making sense over these large quantities of data. Big data is
difficult to work with using relational databases, desktop statistics and visualization packages, requiring instead "massively parallel software
running on tens, hundreds, or even thousands of servers".
References
• Discovering the Internet: Complete, Jennifer Campbell, Course
Technology, Cengage Learning, 5th Edition-2015, ISBN 978-1-285-
84540-1.
• Basics of Web Design HTML5 & CSS3, Second Edition, by Terry Felke-
Morris, Peason, ISBN 978-0-13-312891-8.
• A Very Short History of Big Data.

More Related Content

Similar to big_data.ppt

Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Md. Salman Ahmed
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
ijtsrd
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
Dr.Florence Dayana
 
Unit 1
Unit 1Unit 1
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
Prof.Balakrishnan S
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
Sandip Tipayle Patil
 
Big data
Big dataBig data
All About Big Data
All About Big Data All About Big Data
All About Big Data
Sai Venkatesh
 
big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptx
ssuser96aab9
 
Big data
Big dataBig data
Big data
madhavsolanki
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Big data
Big dataBig data
Big data
Mahmudul Alam
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
Rajesh Kumar
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
subhashchandra197
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
bobosenthil
 
Big data by Ravi .pdf
Big data by Ravi .pdfBig data by Ravi .pdf
Big data by Ravi .pdf
RAVIPSHARMA2
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
hktripathy
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
Dr Geetha Mohan
 
Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
Bhagya Patil
 

Similar to big_data.ppt (20)

Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
 
Unit 1
Unit 1Unit 1
Unit 1
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data
Big dataBig data
Big data
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptx
 
Big data
Big dataBig data
Big data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data
Big dataBig data
Big data
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Big data by Ravi .pdf
Big data by Ravi .pdfBig data by Ravi .pdf
Big data by Ravi .pdf
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
 

Recently uploaded

Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In BangaloreBangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
yashusingh54876
 
Call Girls Lucknow 8923113531 Independent Call Girl Service in Lucknow
Call Girls Lucknow 8923113531 Independent Call Girl Service in LucknowCall Girls Lucknow 8923113531 Independent Call Girl Service in Lucknow
Call Girls Lucknow 8923113531 Independent Call Girl Service in Lucknow
hiju9823
 
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts ServicePune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
vashimk775
 
IBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTXIBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTX
EbtsamRashed
 
PCI-DSS-Data Security Standard v4.0.1.pdf
PCI-DSS-Data Security Standard v4.0.1.pdfPCI-DSS-Data Security Standard v4.0.1.pdf
PCI-DSS-Data Security Standard v4.0.1.pdf
incitbe
 
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls HyderabadHyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
2004kavitajoshi
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
PsychoTech Services
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
frp60658
 
Independent Call Girls In Bangalore 9024918724 Just CALL ME Book Beautiful Gi...
Independent Call Girls In Bangalore 9024918724 Just CALL ME Book Beautiful Gi...Independent Call Girls In Bangalore 9024918724 Just CALL ME Book Beautiful Gi...
Independent Call Girls In Bangalore 9024918724 Just CALL ME Book Beautiful Gi...
uthkarshkumar987000
 
🔥Book Call Girls Lucknow 💯Call Us 🔝 6350257716 🔝💃Independent Lucknow Escorts ...
🔥Book Call Girls Lucknow 💯Call Us 🔝 6350257716 🔝💃Independent Lucknow Escorts ...🔥Book Call Girls Lucknow 💯Call Us 🔝 6350257716 🔝💃Independent Lucknow Escorts ...
🔥Book Call Girls Lucknow 💯Call Us 🔝 6350257716 🔝💃Independent Lucknow Escorts ...
AK47
 
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
yuvishachadda
 
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
hanshkumar9870
 
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
mona lisa $A12
 
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
mparmparousiskostas
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering RoadshowDirect Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Gabi Münster
 
一比一原版(sfu学位证书)西蒙弗雷泽大学毕业证如何办理
一比一原版(sfu学位证书)西蒙弗雷泽大学毕业证如何办理一比一原版(sfu学位证书)西蒙弗雷泽大学毕业证如何办理
一比一原版(sfu学位证书)西蒙弗雷泽大学毕业证如何办理
gebegu
 
Health care analysis using sentimental analysis
Health care analysis using sentimental analysisHealth care analysis using sentimental analysis
Health care analysis using sentimental analysis
krishnasrigannavarap
 

Recently uploaded (20)

Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In BangaloreBangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
 
Call Girls Lucknow 8923113531 Independent Call Girl Service in Lucknow
Call Girls Lucknow 8923113531 Independent Call Girl Service in LucknowCall Girls Lucknow 8923113531 Independent Call Girl Service in Lucknow
Call Girls Lucknow 8923113531 Independent Call Girl Service in Lucknow
 
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts ServicePune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
 
IBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTXIBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTX
 
PCI-DSS-Data Security Standard v4.0.1.pdf
PCI-DSS-Data Security Standard v4.0.1.pdfPCI-DSS-Data Security Standard v4.0.1.pdf
PCI-DSS-Data Security Standard v4.0.1.pdf
 
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls HyderabadHyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
 
Independent Call Girls In Bangalore 9024918724 Just CALL ME Book Beautiful Gi...
Independent Call Girls In Bangalore 9024918724 Just CALL ME Book Beautiful Gi...Independent Call Girls In Bangalore 9024918724 Just CALL ME Book Beautiful Gi...
Independent Call Girls In Bangalore 9024918724 Just CALL ME Book Beautiful Gi...
 
🔥Book Call Girls Lucknow 💯Call Us 🔝 6350257716 🔝💃Independent Lucknow Escorts ...
🔥Book Call Girls Lucknow 💯Call Us 🔝 6350257716 🔝💃Independent Lucknow Escorts ...🔥Book Call Girls Lucknow 💯Call Us 🔝 6350257716 🔝💃Independent Lucknow Escorts ...
🔥Book Call Girls Lucknow 💯Call Us 🔝 6350257716 🔝💃Independent Lucknow Escorts ...
 
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
 
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
 
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
 
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering RoadshowDirect Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
 
一比一原版(sfu学位证书)西蒙弗雷泽大学毕业证如何办理
一比一原版(sfu学位证书)西蒙弗雷泽大学毕业证如何办理一比一原版(sfu学位证书)西蒙弗雷泽大学毕业证如何办理
一比一原版(sfu学位证书)西蒙弗雷泽大学毕业证如何办理
 
Health care analysis using sentimental analysis
Health care analysis using sentimental analysisHealth care analysis using sentimental analysis
Health care analysis using sentimental analysis
 

big_data.ppt

  • 1. CNIT131 Internet Basics & Beginning HTML Week 15 – Big Data http://fog.ccsf.edu/~hyip
  • 2. What is Big Data? • Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications. – (From Wikipedia) • Big Data refers to extremely vast amounts of multi-structured data that typically has been cost prohibitive to store and analyze. (My view) • NOTE: However, big data is only referring to digital data, not the paper files stored in the basement at FBI headquarters, or piles of magnetic tapes in our data center.
  • 3. Big Data – a Brief History
  • 4. Types of Big Data In the simplest terms, Big Data can be broken down into two basic types, structured and unstructured data. • Structured – Predefined data type • Spreadsheets and Oracle Relational database • Unstructured – is non pre-defined data model or is not organized in a pre-defined manner. • Video, Audio, Images, Metadata, etc… • Semi-structured – Structured data embedded with some unstructured data • Email, Text Messaging
  • 5. A Big Data Platform must: • Analyze a variety of different types of information This information by itself could be unrelated, but when paired with other information can illustrate a causation for an various events that the business can take advantage. • Analyze information in motion Various data types will be streaming and contain a large amount of "bursts". Ad hoc analysis needs to be done on the streaming data to search for relevant events. • Cover extremely large volumes of data Due to the proliferation of devices in the network, how they are used, along with customer interactions on smartphones and the web, a cost efficient process to analyze the petabytes of information is required
  • 6. A Big Data Platform must: (2) • Cover varying types of data sources Data can be streaming, batch, structured, unstructured, and semi- structured, depending on the information type, where it comes from and its primary use. Big Data must be able to accommodate all of these various types of data on a very large scale. • Analytics Big Data must provide the mechanisms to allow ad-hoc queries, data discovery and experimentation on the large data sets to effectively correlate various events and data types to get an understanding of the data that is useful and addresses business needs.
  • 7. Five Characteristics of Big Data • Big Data is defined by five characteristics:  Volume: Data created by and moving through today’s services may describe tens of millions of customers, hundreds of millions of devices, and billions of transactions or statistical records. Such scale requires careful engineering, as it is necessary to carefully conserve even the number of CPU instructions or operating system events and network messages per data items. Parallel processing is a powerful tool to cope with scale. MapReduce computing frameworks like Hadoop and storage systems like HBASE and Cassandra provide low-cost, practical system foundations. Analysis also requires efficient algorithms, because “data in flight” may only be observed one time, so conventional storage-based approaches may not work. Large volumes of data may require a mix of “move the data to the processing” and “move the processing to the data” architectural styles.  Velocity: Timeliness is often critical to the value of Big Data. For example, online customers may expect promotions (coupons) received on a mobile device to reflect their current location, or they may expect recommendations to reflect their most recent purchases or media that was accessed. The business value of some data decays rapidly. Because raw data is often delivered in streams, or in small batches in near real-time, the requirement to deliver rapid results can be demanding and does not mesh well with conventional data warehouse technology.
  • 8. Five Characteristics of Big Data (2)  Variety: Big Data often means integrating and processing multiple types of data. We can consider most data sources as structured, semi-structured, or unstructured. Structured data refers to records with fixed fields and types. Unstructured data includes text, speech, and other multimedia. Semi-structured data may be a mixture of the above, such as web documents, or sparse records with many variants, such as personal medical records with well defined but complex types.  Veracity: Data sources (even in the same domain) are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. Per IBM's Big Data website, one in three business leaders don't trust the information they use to make decisions. Establishing trust in big data presents a huge challenge as the variety and number of sources grows..  Variability: Beyond the immediate implications of having many types of data, the variety of data may also be reflected in the frequency with which new data sources and types are introduced. NOTE: Big Data generally includes data sets with sizes beyond the ability of commonly-used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, from hundreds of terabytes to many petabytes of data in a single data set. With this difficulty, a new tool sets has arisen to handle making sense over these large quantities of data. Big data is difficult to work with using relational databases, desktop statistics and visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers".
  • 9. References • Discovering the Internet: Complete, Jennifer Campbell, Course Technology, Cengage Learning, 5th Edition-2015, ISBN 978-1-285- 84540-1. • Basics of Web Design HTML5 & CSS3, Second Edition, by Terry Felke- Morris, Peason, ISBN 978-0-13-312891-8. • A Very Short History of Big Data.
  翻译: