Big Data Now
Artigo: Big Data Now. Pesquise 862.000+ trabalhos acadêmicosPor: feliperenz • 6/8/2014 • 9.376 Palavras (38 Páginas) • 519 Visualizações
Sep 25 – 27, 2013
Boston, MA
Oct 28 – 30, 2013
New York, NY
Nov 11 –13, 2013
London, England
©2013 O’Reilly Media, Inc. O’Reilly logo is a registered trademark of O’Reilly Media, Inc. 13110
Change the world with data.
We’ll show you how.
strataconf.com
www.it-ebooks.info
O’Reilly Media, Inc.
Big Data Now: 2012 Edition
www.it-ebooks.info
ISBN: 978-1-449-35671-2
Big Data Now: 2012 Edition
by O’Reilly Media, Inc.
Copyright © 2012 O’Reilly Media. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
O’Reilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (http://my.safaribooksonline.com). For
more information, contact our corporate/institutional sales department: (800)
998-9938 or corporate@oreilly.com.
Cover Designer: Karen Montgomery Interior Designer: David Futato
October 2012: First Edition
Revision History for the First Edition:
2012-10-24 First release
See http://oreilly.com/catalog/errata.csp?isbn=9781449356712 for release details.
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered
trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products
are claimed as trademarks. Where those designations appear in this book, and
O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed
in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher
and authors assume no responsibility for errors or omissions, or for damages resulting
from the use of the information contained herein.
www.it-ebooks.info
Table of Contents
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Getting Up to Speed with Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
What Is Big Data? 3
What Does Big Data Look Like? 4
In Practice 8
What Is Apache Hadoop? 10
The Core of Hadoop: MapReduce 11
Hadoop’s Lower Levels: HDFS and MapReduce 11
Improving Programmability: Pig and Hive 12
Improving Data Access: HBase, Sqoop, and Flume 12
Coordination and Workflow: Zookeeper and Oozie 14
Management and Deployment: Ambari and Whirr 14
Machine Learning: Mahout 14
Using Hadoop 15
Why Big Data Is Big: The Digital Nervous System 15
From Exoskeleton to Nervous System 15
Charting the Transition 16
Coming, Ready or Not 17
3. Big Data Tools, Techniques, and Strategies. . . . . . . . . . . . . . . . . . . . . 19
Designing Great Data Products 19
Objective-based Data Products 20
The Model Assembly Line: A Case Study of Optimal
Decisions Group 21
Drivetrain Approach to Recommender Systems 25
Optimizing Lifetime Customer Value 28
Best Practices from Physical Data Products 31
The Future for Data Products 35
iii
www.it-ebooks.info
What It Takes to Build Great Machine Learning Products 35
Progress in Machine Learning 36
Interesting Problems Are Never Off the Shelf 37
Defining the Problem 39
4. The Application of Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Stories over Spreadsheets 41
A Thought on Dashboards 43
Full Interview 43
Mining the Astronomical Literature 43
Interview with Robert Simpson: Behind the Project and
What Lies Ahead 48
Science between the Cracks 51
The Dark Side of Data 51
The Digital Publishing Landscape 52
Privacy by Design 53
5. What to Watch for in Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Big Data Is Our Generation’s Civil Rights Issue, and We
Don’t Know It 55
Three
...