尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
Mining Big Data: Current
State of work and Challenges
Group members:
Misbah Rashid
Mariam Rashid
About Journal
• The journal is published in the year 2015 in (IJANA) International
Journal of Advanced Networking and Applications
• The journal was published by Kaushika Pal and Dr. Jatinderkumar R.
Saini.
Brief Overview
• Introduction to big data
• Big Data Mining
• Big data mining importance
Introduction To Big Data
• Huge amount of data are generated and collected from various sources like
sensors, devices etc. all are in different formats from connected or independent
application.
• This data has to be processed, investigated, stored and understood. Considering
internet data the web pages indexed by Google were One million in 1998, One
billion in 2000 and one trillion in 2008.
• Examples are from social media- Facebook, Twitter, GooglePlus, YouTube,
LinkedIn.
• Each of these site receives huge volume of data on a daily basis.
• Smartphones are now highly connected to internet and use and store data on
web and thus increasing web volume.Twitter process around 400 millions tweets
each day.
• Smartphones are the real producer of big data, and it is up to us how we can
utilize that data to change our lives.
• Data created via smartphones can be put to good use. Smartphone
usage patterns helped researchers in Africa determine where malaria
outbreaks were occurring and where the affected people went [10].
This information can be used to determine where to best distribute
medicines more efficiently. This is the power of big data analysis
which has a positive impact on humanity.
Big Data Mining
• Big data mining is referred to the collective data miming.
• Extraction techniques that are performed on large volume of data.
• We need new tools and new algorithm to deal with all this huge amount of
data. While working with Big Data 7 V’s have to be considered for Big Data
Management
• Volume:every industry is flooded with data, which can be extremely
valuable, if it can be used to retrieve important information.
• Variety:90% of data generated is amorphous coming in all shapes and
forms-the data is generated from geo-spatial, tweets, photos and videos
uploading on social networking sites, which can be analysed for content
• Velocity:Velocity’ refers to the increasing speed at which this data is
created, and the increasing speed at which the data can be processed,
stored and analysed.
• Value: The probable value of Big Data is huge.
• Variability: Variability refers to data whose meaning is constantly changing.
There are changes in the structure of data and how users want to interpret
that data.
• Veracity: Big Data Veracity refers to the noise and abnormality in data. In
scoping out your big data strategy you need to help keep your data clean
and processes to keep ‘dirty data’ from accumulating in your systems.
• Visibility: Data from different sources should be visible to the technology
stack making up Big Data.Certain data which are crucial are available but
not visible to Big Data.
Literature Review
• Mining heterogeneous information networks is a new and promising
research frontier in Big Data mining. It considers interconnected, various
different types of data, including the relational database data, as
heterogeneous information networks.
• Mining Big Data in Real Time discusses the challenges in structured pattern
classification. The classification methods mostly deal with vector data. To
apply them to graph pattern classification can be converted into vectors of
attributes. Each and every attributes indicates the presence or absence of
sub patterns. Attributes are created for every frequent sub patterns. The
number of such sub patterns can be very large.
• Data Mining with Big data had drawn our attention on challenges with
mining big data at three levels dealing with data, model, and system.
Application Of Big Data Mining
• Business: expands customer intelligence, improves
operational efficiencies, customer personalization. To gain deep
customer requirements one need strong personal connections
and give customized services if possible which will drive more
sales.
• Managing demands in the market By capturing external
market and retailer data in real time to sense, evaluate, and
answer to demand indicators faster than ever before.
• Fraud detection: By analysing certain abnormal pattern from
various data sources, fraud can be detected in financial
transaction, health insurance etc
Challenges
• Variety and Heterogeneity: Different sources generate Big Data leading to great variety
or heterogeneity of big data. Heterogeneity in big data deals with structured, semi-
structured, and even entirely unstructured data concurrently. The challenge is to unveil
or extract the hidden knowledge in such data sets.
• Scalability: The extraordinary volume requires high scalability of its data management
and mining tools. However, most algorithms currently used in data mining do not scale
very well when applied to very large data sets because they were initially developed
and tested upon smaller data sets. we have such large data sets that these algorithms
are no longer efficient enough for mining and analysing
• Velocity/Speed: The capability of fast accessing and mining big data is highly essential.
Mining of a task must be finished within a definite period of time, otherwise, the
processing/mining results becomes less valuable or even worthless. However design of
new and more efficient indexing schemes is much desired, but remains one of the
greatest challenges to the research community.
Challenges
• Privacy Crisis: Data privacy has been always an issue. The concern has become
extremely serious with big data mining that often requires personal information in
order to produce relevant/accurate results such as location-based and personalized
services. Also, with the huge volume of big data such as social media that contains
incredible amount of highly interrelated personal information, each bit of information
can be mined out. Every transaction regarding our daily life is being pushed to online
and leaves a trace there: we comminute with friends via email, instant message, blog,
and Facebook; we do shopping and pay our bills online; credit card companies hold our
confidential identity information. As time goes, your personal information will be
scattered here or there. Everyone would easily gain the privilege of using powerful
tools to extract your confidential information.
• Garbage Mining: As the volume of data is increasing day by day so the amount of
irrelevant and unnecessary data is also increasing.Garbage minig is to extract the
hidden data and clean it from important data. It is not easy as it is difficult to extract
hidden data from bulk of data and then clean it. Garbage mining remains one of the
greatest challenges
Appreciation
• In this journal, author has fully explained the insights about the
mining of big data including the main concerns and main challenges
for the future.
• The most positive aspect of this article is its clarity in the statement of
research problem
• The author selected 14 relevant sources published between the years
of (2012) and (2014). Ten of these references were primary sources.
The author did a reasonable job of highlighting the previous search on
topics related to their research and even provided comparisons of
literature when possible.
Critic
• The statement of the problem was implied in the abstract section of
the article but the specific problem is not being addressed until the
author has described the usefulness of mining big data later in the
article.
• The author has not clearly explained the applications of mining big
data in medical, healthcare and engineering.
• The author has disscussed the big data in terms of mobile phones.The
scope of big data is far more than what author has disscussed.
Future work
• The techniques will be developed to overcome the challenges facing
in mining big data
• Social media and Big Data be used to understand public opinion
trends.
Thank You
Big data Mining

More Related Content

What's hot

Importance of Data Analytics
 Importance of Data Analytics Importance of Data Analytics
Importance of Data Analytics
Product School
 
Big data
Big dataBig data
Big data
Big dataBig data
Big data
Big dataBig data
Big data
Sakshi Chawla
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
suresh sood
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
Hagar Alaa el-din
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
Aswadmehar
 
Big Data
Big DataBig Data
Big data
Big dataBig data
Big data
ArtiSolanki5
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
Petr Novotný
 
Big data.
Big data.Big data.
Big data.
MeganShaw38
 
Big data
Big dataBig data
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
T.S. Lim
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
Shatavisha Roy Chowdhury
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
Praneet Samaiya
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
Anand572211
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
René Kuipers
 
Big data
Big dataBig data
Big data
Pooja Shah
 
Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
Dr Anjan Krishnamurthy
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
Tiago Knoch
 

What's hot (20)

Importance of Data Analytics
 Importance of Data Analytics Importance of Data Analytics
Importance of Data Analytics
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Big data.
Big data.Big data.
Big data.
 
Big data
Big dataBig data
Big data
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
 
Big data
Big dataBig data
Big data
 
Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 

Similar to Big data Mining

Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
subhashchandra197
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
infinix8
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
Padma Metta
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Umair Shafique
 
20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx
SyauqiAsyhabira1
 
Big Data Challenges and solutions.pptx
 Big Data Challenges and solutions.pptx Big Data Challenges and solutions.pptx
Big Data Challenges and solutions.pptx
jawaria11
 
Big Data World
Big Data WorldBig Data World
Big Data World
Hossein Zahed
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
Editor IJCATR
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
Prashant Kumar Jadia
 
Big data
Big dataBig data
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
Samiksha880257
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
ssuser0413ec
 
NCCT.pptx
NCCT.pptxNCCT.pptx
BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
vidhi171881
 
PresentationBig Data111111111111111.pptx
PresentationBig Data111111111111111.pptxPresentationBig Data111111111111111.pptx
PresentationBig Data111111111111111.pptx
harshadbhaitalpada49
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
mohamedibrahim946387
 
Big Data for Development
Big Data for DevelopmentBig Data for Development
Big Data for Development
Joud Khattab
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
salutiontechnology
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
DATAVERSITY
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
Data Blueprint
 

Similar to Big data Mining (20)

Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx
 
Big Data Challenges and solutions.pptx
 Big Data Challenges and solutions.pptx Big Data Challenges and solutions.pptx
Big Data Challenges and solutions.pptx
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Big data
Big dataBig data
Big data
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
NCCT.pptx
NCCT.pptxNCCT.pptx
NCCT.pptx
 
BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
 
PresentationBig Data111111111111111.pptx
PresentationBig Data111111111111111.pptxPresentationBig Data111111111111111.pptx
PresentationBig Data111111111111111.pptx
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
Big Data for Development
Big Data for DevelopmentBig Data for Development
Big Data for Development
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 

More from MariamKhan120

Artificial Intelligence I What is AI? I Introduction to Artificial Intelligence
Artificial Intelligence I What is AI? I Introduction to Artificial Intelligence Artificial Intelligence I What is AI? I Introduction to Artificial Intelligence
Artificial Intelligence I What is AI? I Introduction to Artificial Intelligence
MariamKhan120
 
Data Mining
Data MiningData Mining
Data Mining
MariamKhan120
 
E-learning
E-learningE-learning
E-learning
MariamKhan120
 
Porte's Five Forces Model
Porte's Five Forces ModelPorte's Five Forces Model
Porte's Five Forces Model
MariamKhan120
 
Ernst & Young- Knowledge Management
 Ernst & Young- Knowledge Management Ernst & Young- Knowledge Management
Ernst & Young- Knowledge Management
MariamKhan120
 
Scorpio Technique
Scorpio TechniqueScorpio Technique
Scorpio Technique
MariamKhan120
 
Waste Management Using IOT
Waste Management Using IOTWaste Management Using IOT
Waste Management Using IOT
MariamKhan120
 
Microsoft Company
Microsoft CompanyMicrosoft Company
Microsoft Company
MariamKhan120
 
Incremental model
Incremental modelIncremental model
Incremental model
MariamKhan120
 
Spiral Model
Spiral  ModelSpiral  Model
Spiral Model
MariamKhan120
 
RAD Model
RAD ModelRAD Model
RAD Model
MariamKhan120
 
Agile Model
Agile ModelAgile Model
Agile Model
MariamKhan120
 
Six Sigma and Quality Management System
Six Sigma and  Quality Management SystemSix Sigma and  Quality Management System
Six Sigma and Quality Management System
MariamKhan120
 
Capability Maturity Model Integration (CMMI)
Capability Maturity Model Integration (CMMI)Capability Maturity Model Integration (CMMI)
Capability Maturity Model Integration (CMMI)
MariamKhan120
 
White Box Testing
White Box Testing White Box Testing
White Box Testing
MariamKhan120
 
Blood Bank Management System
Blood Bank Management SystemBlood Bank Management System
Blood Bank Management System
MariamKhan120
 
Black Box Testing
Black Box TestingBlack Box Testing
Black Box Testing
MariamKhan120
 
School management system
School management systemSchool management system
School management system
MariamKhan120
 
Motorola Marketing Startegies
Motorola Marketing StartegiesMotorola Marketing Startegies
Motorola Marketing Startegies
MariamKhan120
 
Software development life cycle (sdlc)
Software development life cycle (sdlc)Software development life cycle (sdlc)
Software development life cycle (sdlc)
MariamKhan120
 

More from MariamKhan120 (20)

Artificial Intelligence I What is AI? I Introduction to Artificial Intelligence
Artificial Intelligence I What is AI? I Introduction to Artificial Intelligence Artificial Intelligence I What is AI? I Introduction to Artificial Intelligence
Artificial Intelligence I What is AI? I Introduction to Artificial Intelligence
 
Data Mining
Data MiningData Mining
Data Mining
 
E-learning
E-learningE-learning
E-learning
 
Porte's Five Forces Model
Porte's Five Forces ModelPorte's Five Forces Model
Porte's Five Forces Model
 
Ernst & Young- Knowledge Management
 Ernst & Young- Knowledge Management Ernst & Young- Knowledge Management
Ernst & Young- Knowledge Management
 
Scorpio Technique
Scorpio TechniqueScorpio Technique
Scorpio Technique
 
Waste Management Using IOT
Waste Management Using IOTWaste Management Using IOT
Waste Management Using IOT
 
Microsoft Company
Microsoft CompanyMicrosoft Company
Microsoft Company
 
Incremental model
Incremental modelIncremental model
Incremental model
 
Spiral Model
Spiral  ModelSpiral  Model
Spiral Model
 
RAD Model
RAD ModelRAD Model
RAD Model
 
Agile Model
Agile ModelAgile Model
Agile Model
 
Six Sigma and Quality Management System
Six Sigma and  Quality Management SystemSix Sigma and  Quality Management System
Six Sigma and Quality Management System
 
Capability Maturity Model Integration (CMMI)
Capability Maturity Model Integration (CMMI)Capability Maturity Model Integration (CMMI)
Capability Maturity Model Integration (CMMI)
 
White Box Testing
White Box Testing White Box Testing
White Box Testing
 
Blood Bank Management System
Blood Bank Management SystemBlood Bank Management System
Blood Bank Management System
 
Black Box Testing
Black Box TestingBlack Box Testing
Black Box Testing
 
School management system
School management systemSchool management system
School management system
 
Motorola Marketing Startegies
Motorola Marketing StartegiesMotorola Marketing Startegies
Motorola Marketing Startegies
 
Software development life cycle (sdlc)
Software development life cycle (sdlc)Software development life cycle (sdlc)
Software development life cycle (sdlc)
 

Recently uploaded

ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
ScyllaDB
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
ScyllaDB
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
ScyllaDB
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
Neeraj Kumar Singh
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
ScyllaDB
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
UmmeSalmaM1
 

Recently uploaded (20)

ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
 

Big data Mining

  • 1. Mining Big Data: Current State of work and Challenges Group members: Misbah Rashid Mariam Rashid
  • 2. About Journal • The journal is published in the year 2015 in (IJANA) International Journal of Advanced Networking and Applications • The journal was published by Kaushika Pal and Dr. Jatinderkumar R. Saini.
  • 3. Brief Overview • Introduction to big data • Big Data Mining • Big data mining importance
  • 4. Introduction To Big Data • Huge amount of data are generated and collected from various sources like sensors, devices etc. all are in different formats from connected or independent application. • This data has to be processed, investigated, stored and understood. Considering internet data the web pages indexed by Google were One million in 1998, One billion in 2000 and one trillion in 2008. • Examples are from social media- Facebook, Twitter, GooglePlus, YouTube, LinkedIn. • Each of these site receives huge volume of data on a daily basis. • Smartphones are now highly connected to internet and use and store data on web and thus increasing web volume.Twitter process around 400 millions tweets each day. • Smartphones are the real producer of big data, and it is up to us how we can utilize that data to change our lives.
  • 5. • Data created via smartphones can be put to good use. Smartphone usage patterns helped researchers in Africa determine where malaria outbreaks were occurring and where the affected people went [10]. This information can be used to determine where to best distribute medicines more efficiently. This is the power of big data analysis which has a positive impact on humanity.
  • 6. Big Data Mining • Big data mining is referred to the collective data miming. • Extraction techniques that are performed on large volume of data. • We need new tools and new algorithm to deal with all this huge amount of data. While working with Big Data 7 V’s have to be considered for Big Data Management • Volume:every industry is flooded with data, which can be extremely valuable, if it can be used to retrieve important information. • Variety:90% of data generated is amorphous coming in all shapes and forms-the data is generated from geo-spatial, tweets, photos and videos uploading on social networking sites, which can be analysed for content
  • 7. • Velocity:Velocity’ refers to the increasing speed at which this data is created, and the increasing speed at which the data can be processed, stored and analysed. • Value: The probable value of Big Data is huge. • Variability: Variability refers to data whose meaning is constantly changing. There are changes in the structure of data and how users want to interpret that data. • Veracity: Big Data Veracity refers to the noise and abnormality in data. In scoping out your big data strategy you need to help keep your data clean and processes to keep ‘dirty data’ from accumulating in your systems. • Visibility: Data from different sources should be visible to the technology stack making up Big Data.Certain data which are crucial are available but not visible to Big Data.
  • 8. Literature Review • Mining heterogeneous information networks is a new and promising research frontier in Big Data mining. It considers interconnected, various different types of data, including the relational database data, as heterogeneous information networks. • Mining Big Data in Real Time discusses the challenges in structured pattern classification. The classification methods mostly deal with vector data. To apply them to graph pattern classification can be converted into vectors of attributes. Each and every attributes indicates the presence or absence of sub patterns. Attributes are created for every frequent sub patterns. The number of such sub patterns can be very large. • Data Mining with Big data had drawn our attention on challenges with mining big data at three levels dealing with data, model, and system.
  • 9. Application Of Big Data Mining • Business: expands customer intelligence, improves operational efficiencies, customer personalization. To gain deep customer requirements one need strong personal connections and give customized services if possible which will drive more sales. • Managing demands in the market By capturing external market and retailer data in real time to sense, evaluate, and answer to demand indicators faster than ever before. • Fraud detection: By analysing certain abnormal pattern from various data sources, fraud can be detected in financial transaction, health insurance etc
  • 10. Challenges • Variety and Heterogeneity: Different sources generate Big Data leading to great variety or heterogeneity of big data. Heterogeneity in big data deals with structured, semi- structured, and even entirely unstructured data concurrently. The challenge is to unveil or extract the hidden knowledge in such data sets. • Scalability: The extraordinary volume requires high scalability of its data management and mining tools. However, most algorithms currently used in data mining do not scale very well when applied to very large data sets because they were initially developed and tested upon smaller data sets. we have such large data sets that these algorithms are no longer efficient enough for mining and analysing • Velocity/Speed: The capability of fast accessing and mining big data is highly essential. Mining of a task must be finished within a definite period of time, otherwise, the processing/mining results becomes less valuable or even worthless. However design of new and more efficient indexing schemes is much desired, but remains one of the greatest challenges to the research community.
  • 11. Challenges • Privacy Crisis: Data privacy has been always an issue. The concern has become extremely serious with big data mining that often requires personal information in order to produce relevant/accurate results such as location-based and personalized services. Also, with the huge volume of big data such as social media that contains incredible amount of highly interrelated personal information, each bit of information can be mined out. Every transaction regarding our daily life is being pushed to online and leaves a trace there: we comminute with friends via email, instant message, blog, and Facebook; we do shopping and pay our bills online; credit card companies hold our confidential identity information. As time goes, your personal information will be scattered here or there. Everyone would easily gain the privilege of using powerful tools to extract your confidential information. • Garbage Mining: As the volume of data is increasing day by day so the amount of irrelevant and unnecessary data is also increasing.Garbage minig is to extract the hidden data and clean it from important data. It is not easy as it is difficult to extract hidden data from bulk of data and then clean it. Garbage mining remains one of the greatest challenges
  • 12. Appreciation • In this journal, author has fully explained the insights about the mining of big data including the main concerns and main challenges for the future. • The most positive aspect of this article is its clarity in the statement of research problem • The author selected 14 relevant sources published between the years of (2012) and (2014). Ten of these references were primary sources. The author did a reasonable job of highlighting the previous search on topics related to their research and even provided comparisons of literature when possible.
  • 13. Critic • The statement of the problem was implied in the abstract section of the article but the specific problem is not being addressed until the author has described the usefulness of mining big data later in the article. • The author has not clearly explained the applications of mining big data in medical, healthcare and engineering. • The author has disscussed the big data in terms of mobile phones.The scope of big data is far more than what author has disscussed.
  • 14. Future work • The techniques will be developed to overcome the challenges facing in mining big data • Social media and Big Data be used to understand public opinion trends.

Editor's Notes

  1. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  2. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  3. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  4. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  5. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  6. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  7. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  8. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  9. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  翻译: