尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
Social Informatics Data
             Grid
Cyberinfrastructure for Collaborative Research in the
      Neural, Social and Behavioral Sciences

                Bennett I. Bertenthal
                 Indiana University
               bbertent@indiana.edu
Infrastructure for Social and
              Behavioral Sciences
Goal:
  Compare, measure and search for patterns in structured, semi-
  structured, and heterogeneous data sets.


Challenge:
  Integrate information over time, place, and types of data


Needs:
  (1) Data interface (shared datasets & databases)
  (2) Service interface (shared tools for analysis)
  (3) Intellectual interface (shared problems & theories)
Primary Objectives

• Develop prototype of core facility for collecting multiple
  measures of time-synchronized data

• Develop integrated tools for storage, retrieval,
  annotation, and analyses of multiple data sets at
  different time scales

• Develop scripts for parallelizing code to run on grid
  clusters
What is SIDGrid?
Social Informatics Data Grid
             • A general purpose architecture
               for streaming data applications
               (e.g., video, audio, time series)
             • Built on well established
               database, multimedia and web
               and grid services standards
             • Time alignment in distributed
               heterogeneous datasets
                – Software and hardware based
                – Integrated with existing laboratory
                  time stamping and registration
                  techniques
             • Scalable
                – Number of datasets
                – Types of data
                – Multiple end user applications
Server




Client
Client Side
Client Side
• Leveraging efforts for annotation and analysis of multimodal data
   – Familiarity and Interoperability
        • Elan (Max Planck Institute for Psycholinguistics, The Netherlands)
        • Talkbank (Carnegie Mellon University, US)
        • Digital Replay System (Nottingham University, UK)
   – XML, Java
   – Cross platform interoperability
• Adding SIDGrid functionality to Elan
   – Minimally intrusive
        • Avoid complicated co-development w/ELAN team
    – Browsing SIDGrid data
    – Additional data types
    – Upload / Download to SIDGrid server
66 GB 5 mov 2 wav …
368 GB 23 mov 6 wav …
 5 GB 1 mov 0 wav …
21 GB 3 mov 12 wav …
 4 GB 9 mov 1 wav …
 4 GB 4 mov 1 wav …
 1 GB 0 mov 2 wav …
945 GB 1 mov 66 wav …
 8 GB 3 mov 0 wav …
20 GB 13 mov 2 wav …
Server Side
.mov    .wav        .eaf         GB

10      0      0           45

 4     30      0           20

 2     2       1           3

12     100         9       200

 1     1       1           1

 6     2       0           12

400     0       1          1001

 0     666      1          312

 0     0       13          0.1

 0     0       0           0.0

18      4      0           66
Search and Query
                   (4,000 projects)
• Data Files
  –   Names
  –   Keywords
  –   Attributes (keyword-value)
  –   Date
  –   Type (Elan, Chat)
• Contents of Files
  – Metadata
  – Tier
  – Annotations
Server Side
• Web services
  – Query
  – Data download / upload

• Portal interface
   – Security
   – Data and metadata browsing
   – Preview
   – Tags, attributes
   – Projects
   – Groups
   – Search
   – Data transformation using grid resources
Science Gateway
What Is The TeraGrid?
                                    (circa 2006)
 75 Teraflops (trillion calculations per second)            • 16 Supercomputers - 9 different types, multiple sizes
 = 12,500 faster than all 6 billion humans on
 earth each doing one calculation per second
                                                            • World’s fastest network
                                                            • Globus Toolkit and other middleware providing single
                                                      ANL
                                                              login, application management, data movement, web
30 Gigabits per second to large sites                         services
= 20-30 times major university connections
= 30,000 times my home broadband
= 1 full length feature film per second

                         LA                                       Starlight                   Atlanta




                  SDSC                        TACC   NCSA    PU           IU   PSC                  ORNL
Scripts for Running Jobs on Grid

• Matlab (high-level language and interactive environment for peforming
   computationally intensive tasks)

• R (software environment for statistical computing and graphics)
• Praat (software for acoustic analysis)
• Free Surfer (automated tools for reconstruction of the brain’s cortical
   surface from structural MRI data)

• AFNI (programs for processing, analyzing, and displaying FMRI data)
• SUMA (adds cortical surface based functional imaging analysis to the
   AFNI suite of programs)
Advantages of Grid Computing

• Vastly expanded computing and storage
• Reduced effort as needs scale up
• Improved resource utilization; lower costs
• Facilities and models for collaboration
• Sharing of tools, data, and procedures and
  protocols
• Recording, assessment and reuse of complex
  tasks
Lessons Learned

• Fast prototyping vs production quality software
   – After one year of development, no product available for user
     feedback
   – Optimal design vs practical design
• Public vs private website
   – Need for dissemination
   – Need for security and protection of user groups and data
• Tools for diverse user groups with varying degrees of
  technical expertise
   – Non-intuitive interface with minimal user support
       • Importance of user manuals, technical support, and FAQs
• Multiple levels of privacy and confidentiality dictated by
  type of data and informed consent
If you build it, will they come?

• Dissemination of SIDGrid
   – Website and movie
   – Invited workshops at UofC and IU
   – Pre-conference workshops
• Start-up is time consuming
   – Scale of most projects conducted by social scientists does not
     justify time to learn web services and tools
   – Added value for larger, collaborative projects requires shift in
     goals and organization of research
• Resistance to data sharing
   – Original proposal required that all data stored on SIDGrid
     servers would be publicly available
Objections to Data Sharing

•   It’s my data!
•   Protection of confidentiality and anonymity
•   Need to first establish standards for coding and analysis
•   Reporting of misleading and confusing findings
•   Raw data but not coded data should be shared
    – Annotation and coding is very time consuming and should not
      become available to others
• If availability of web and software tools were contingent
  on sharing data, most users would opt out
Questions

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
Nguyen Cao
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
Yukti Kaura
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
Imviplav
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
Mk Kim
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dataconomy Media
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
17aroumougamh
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
Maryan Faryna
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
Avkash Chauhan
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
Arvind Kalyan
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
C. Scyphers
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
RightScale
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
Vishwajeet Jadeja
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
Apache Apex
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Ashok Royal
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
mark madsen
 

What's hot (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 

Viewers also liked

Hoffman nsf presentation hoffman-25-aug11.ppt
Hoffman nsf presentation hoffman-25-aug11.pptHoffman nsf presentation hoffman-25-aug11.ppt
Hoffman nsf presentation hoffman-25-aug11.ppt
Jesse Lingeman
 
Aslin.discussion
Aslin.discussionAslin.discussion
Aslin.discussion
Jesse Lingeman
 
Galloway
GallowayGalloway
Galloway
Jesse Lingeman
 
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDASupporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Jesse Lingeman
 
Flyer I Maintain
Flyer I MaintainFlyer I Maintain
Flyer I Maintainkikinelson
 
Photo album 2011
Photo album 2011Photo album 2011
Photo album 2011
Carlos Rivero
 
Children's Multicultural Library Learning Event
Children's Multicultural Library Learning EventChildren's Multicultural Library Learning Event
Children's Multicultural Library Learning Event
LaurieRogers
 
Its About Time: Analyzing Temporal MicroLevel Behavioral Patterns
Its About Time: Analyzing Temporal MicroLevel Behavioral PatternsIts About Time: Analyzing Temporal MicroLevel Behavioral Patterns
Its About Time: Analyzing Temporal MicroLevel Behavioral Patterns
Jesse Lingeman
 

Viewers also liked (8)

Hoffman nsf presentation hoffman-25-aug11.ppt
Hoffman nsf presentation hoffman-25-aug11.pptHoffman nsf presentation hoffman-25-aug11.ppt
Hoffman nsf presentation hoffman-25-aug11.ppt
 
Aslin.discussion
Aslin.discussionAslin.discussion
Aslin.discussion
 
Galloway
GallowayGalloway
Galloway
 
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDASupporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
 
Flyer I Maintain
Flyer I MaintainFlyer I Maintain
Flyer I Maintain
 
Photo album 2011
Photo album 2011Photo album 2011
Photo album 2011
 
Children's Multicultural Library Learning Event
Children's Multicultural Library Learning EventChildren's Multicultural Library Learning Event
Children's Multicultural Library Learning Event
 
Its About Time: Analyzing Temporal MicroLevel Behavioral Patterns
Its About Time: Analyzing Temporal MicroLevel Behavioral PatternsIts About Time: Analyzing Temporal MicroLevel Behavioral Patterns
Its About Time: Analyzing Temporal MicroLevel Behavioral Patterns
 

Similar to Bertenthal

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
Nagarjuna D.N
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
Dr. Anita Goel
 
Big data
Big dataBig data
Big data
roysonli
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
marpierc
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
Aniekan Akpaffiong
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
Amr Kamel Deklel
 
Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
DataTactics
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
Khazret Sapenov
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
Globus
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
Rebekah Rodriguez
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Big Data
Big Data Big Data
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
SoftServe
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
Yehia El-khatib
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
Éric Dusablon
 

Similar to Bertenthal (20)

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Big data
Big dataBig data
Big data
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Big Data
Big Data Big Data
Big Data
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
 

More from Jesse Lingeman

Messinger.openshapa.091511
Messinger.openshapa.091511Messinger.openshapa.091511
Messinger.openshapa.091511
Jesse Lingeman
 
Mac whinney macw
Mac whinney macwMac whinney macw
Mac whinney macw
Jesse Lingeman
 
Gray 110916 ns-fwkshp
Gray 110916 ns-fwkshpGray 110916 ns-fwkshp
Gray 110916 ns-fwkshp
Jesse Lingeman
 
Davis kean.open shapa
Davis kean.open shapaDavis kean.open shapa
Davis kean.open shapa
Jesse Lingeman
 
Borner links
Borner linksBorner links
Borner links
Jesse Lingeman
 
Altman links
Altman linksAltman links
Altman links
Jesse Lingeman
 
Alibali mult data streams a
Alibali mult data streams aAlibali mult data streams a
Alibali mult data streams aJesse Lingeman
 

More from Jesse Lingeman (9)

Messinger.openshapa.091511
Messinger.openshapa.091511Messinger.openshapa.091511
Messinger.openshapa.091511
 
Mac whinney macw
Mac whinney macwMac whinney macw
Mac whinney macw
 
Gray 110916 ns-fwkshp
Gray 110916 ns-fwkshpGray 110916 ns-fwkshp
Gray 110916 ns-fwkshp
 
Davis kean.open shapa
Davis kean.open shapaDavis kean.open shapa
Davis kean.open shapa
 
Borner links
Borner linksBorner links
Borner links
 
Altman links
Altman linksAltman links
Altman links
 
Alibali mult data streams a
Alibali mult data streams aAlibali mult data streams a
Alibali mult data streams a
 
Test1
Test1Test1
Test1
 
Test2
Test2Test2
Test2
 

Recently uploaded

Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
Neeraj Kumar Singh
 
ScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside LookScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside Look
ScyllaDB
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
Aggregage
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
ScyllaDB
 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
Enterprise Knowledge
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
ScyllaDB
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
Paige Cruz
 
Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0
Neeraj Kumar Singh
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
TechOnDemandSolution
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
 
Supplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdfSupplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdf
gaydlc2513
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
Databarracks
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
UiPathCommunity
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
SOFTTECHHUB
 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
Safe Software
 

Recently uploaded (20)

Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
 
ScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside LookScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside Look
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
 
Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
 
Supplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdfSupplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdf
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
 

Bertenthal

  • 1. Social Informatics Data Grid Cyberinfrastructure for Collaborative Research in the Neural, Social and Behavioral Sciences Bennett I. Bertenthal Indiana University bbertent@indiana.edu
  • 2. Infrastructure for Social and Behavioral Sciences Goal: Compare, measure and search for patterns in structured, semi- structured, and heterogeneous data sets. Challenge: Integrate information over time, place, and types of data Needs: (1) Data interface (shared datasets & databases) (2) Service interface (shared tools for analysis) (3) Intellectual interface (shared problems & theories)
  • 3. Primary Objectives • Develop prototype of core facility for collecting multiple measures of time-synchronized data • Develop integrated tools for storage, retrieval, annotation, and analyses of multiple data sets at different time scales • Develop scripts for parallelizing code to run on grid clusters
  • 5.
  • 6. Social Informatics Data Grid • A general purpose architecture for streaming data applications (e.g., video, audio, time series) • Built on well established database, multimedia and web and grid services standards • Time alignment in distributed heterogeneous datasets – Software and hardware based – Integrated with existing laboratory time stamping and registration techniques • Scalable – Number of datasets – Types of data – Multiple end user applications
  • 9. Client Side • Leveraging efforts for annotation and analysis of multimodal data – Familiarity and Interoperability • Elan (Max Planck Institute for Psycholinguistics, The Netherlands) • Talkbank (Carnegie Mellon University, US) • Digital Replay System (Nottingham University, UK) – XML, Java – Cross platform interoperability • Adding SIDGrid functionality to Elan – Minimally intrusive • Avoid complicated co-development w/ELAN team – Browsing SIDGrid data – Additional data types – Upload / Download to SIDGrid server
  • 10.
  • 11. 66 GB 5 mov 2 wav … 368 GB 23 mov 6 wav … 5 GB 1 mov 0 wav … 21 GB 3 mov 12 wav … 4 GB 9 mov 1 wav … 4 GB 4 mov 1 wav … 1 GB 0 mov 2 wav … 945 GB 1 mov 66 wav … 8 GB 3 mov 0 wav … 20 GB 13 mov 2 wav …
  • 13. .mov .wav .eaf GB 10 0 0 45 4 30 0 20 2 2 1 3 12 100 9 200 1 1 1 1 6 2 0 12 400 0 1 1001 0 666 1 312 0 0 13 0.1 0 0 0 0.0 18 4 0 66
  • 14.
  • 15.
  • 16. Search and Query (4,000 projects) • Data Files – Names – Keywords – Attributes (keyword-value) – Date – Type (Elan, Chat) • Contents of Files – Metadata – Tier – Annotations
  • 17. Server Side • Web services – Query – Data download / upload • Portal interface – Security – Data and metadata browsing – Preview – Tags, attributes – Projects – Groups – Search – Data transformation using grid resources
  • 18.
  • 20.
  • 21. What Is The TeraGrid? (circa 2006) 75 Teraflops (trillion calculations per second) • 16 Supercomputers - 9 different types, multiple sizes = 12,500 faster than all 6 billion humans on earth each doing one calculation per second • World’s fastest network • Globus Toolkit and other middleware providing single ANL login, application management, data movement, web 30 Gigabits per second to large sites services = 20-30 times major university connections = 30,000 times my home broadband = 1 full length feature film per second LA Starlight Atlanta SDSC TACC NCSA PU IU PSC ORNL
  • 22.
  • 23.
  • 24. Scripts for Running Jobs on Grid • Matlab (high-level language and interactive environment for peforming computationally intensive tasks) • R (software environment for statistical computing and graphics) • Praat (software for acoustic analysis) • Free Surfer (automated tools for reconstruction of the brain’s cortical surface from structural MRI data) • AFNI (programs for processing, analyzing, and displaying FMRI data) • SUMA (adds cortical surface based functional imaging analysis to the AFNI suite of programs)
  • 25. Advantages of Grid Computing • Vastly expanded computing and storage • Reduced effort as needs scale up • Improved resource utilization; lower costs • Facilities and models for collaboration • Sharing of tools, data, and procedures and protocols • Recording, assessment and reuse of complex tasks
  • 26.
  • 27. Lessons Learned • Fast prototyping vs production quality software – After one year of development, no product available for user feedback – Optimal design vs practical design • Public vs private website – Need for dissemination – Need for security and protection of user groups and data • Tools for diverse user groups with varying degrees of technical expertise – Non-intuitive interface with minimal user support • Importance of user manuals, technical support, and FAQs • Multiple levels of privacy and confidentiality dictated by type of data and informed consent
  • 28. If you build it, will they come? • Dissemination of SIDGrid – Website and movie – Invited workshops at UofC and IU – Pre-conference workshops • Start-up is time consuming – Scale of most projects conducted by social scientists does not justify time to learn web services and tools – Added value for larger, collaborative projects requires shift in goals and organization of research • Resistance to data sharing – Original proposal required that all data stored on SIDGrid servers would be publicly available
  • 29. Objections to Data Sharing • It’s my data! • Protection of confidentiality and anonymity • Need to first establish standards for coding and analysis • Reporting of misleading and confusing findings • Raw data but not coded data should be shared – Annotation and coding is very time consuming and should not become available to others • If availability of web and software tools were contingent on sharing data, most users would opt out
  翻译: