97 Things Every Data Engineer Should Know
شارك

97 Things Every Data Engineer Should Know
Tobias Macey
Table of Contents
Preface xiii
- A (Book) Case for Eventual Consistency . 1
Denise Koessler Gosnell, PhD - A/B and How to Be . 3
Sonia Mehta - About the Storage Layer . 5
Julien Le Dem - Analytics as the Secret Glue for Microservice
Architectures 7
Elias Nema - Automate Your Infrastructure . 9
Christiano Anderson - Automate Your Pipeline Tests 11
Tom White - Be Intentional About the Batching Model in Your
Data Pipelines 13
Raghotham Murthy - Beware of Silver-Bullet Syndrome . 17
Thomas Nield
iii9. Building a Career as a Data Engineer 19
Vijay Kiran - Business Dashboards for Data Pipelines 21
Valliappa (Lak) Lakshmanan - Caution: Data Science Projects Can Turn into the
Emperor’s New Clothes . 23
Shweta Katre - Change Data Capture 26
Raghotham Murthy - Column Names as Contracts 28
Emily Riederer - Consensual, Privacy-Aware Data Collection 30
Katharine Jarmul - Cultivate Good Working Relationships with Data
Consumers 32
Ido Shlomo - Data Engineering != Spark 34
Jesse Anderson - Data Engineering for Autonomy and Rapid
Innovation . 36
Jeff Magnusson - Data Engineering from a Data Scientist’s Perspective . 38
Bill Franks - Data Pipeline Design Patterns for Reusability and
Extensibility . 40
Mukul Sood - Data Quality for Data Engineers 42
Katharine Jarmul
iv Table of Contents21. Data Security for Data Engineers 44
Katharine Jarmul - Data Validation Is More Than Summary Statistics 46
Emily Riederer - Data Warehouses Are the Past, Present, and Future 48
James Densmore - Defining and Managing Messages in Log-Centric
Architectures . 50
Boris Lublinsky - Demystify the Source and Illuminate the
Data Pipeline 52
Meghan Kwartler - Develop Communities, Not Just Code . 54
Emily Riederer - Effective Data Engineering in the Cloud World 56
Dipti Borkar - Embrace the Data Lake Architecture 58
Vinoth Chandar - Embracing Data Silos 61
Bin Fan and Amelia Wong - Engineering Reproducible Data Science Projects 63
Dr. Tianhui Michael Li - Five Best Practices for Stable Data Processing 65
Christian Lauer - Focus on Maintainability and Break Up Those
ETL Tasks . 67
Chris Moradi
Table of Contents v33. Friends Don’t Let Friends Do Dual-Writes 69
Gunnar Morling - Fundamental Knowledge 71
Pedro Marcelino - Getting the “Structured” Back into SQL . 73
Elias Nema - Give Data Products a Frontend with Latent
Documentation . 76
Emily Riederer - How Data Pipelines Evolve 78
Chris Heinzmann - How to Build Your Data Platform like a Product . 80
Barr Moses and Atul Gupte - How to Prevent a Data Mutiny 83
Sean Knapp - Know the Value per Byte of Your Data 85
Dhruba Borthakur - Know Your Latencies 87
Dhruba Borthakur - Learn to Use a NoSQL Database, but Not like
an RDBMS . 89
Kirk Kirkconnell - Let the Robots Enforce the Rules 91
Anthony Burdi - Listen to Your Users—but Not Too Much 93
Amanda Tomlinson - Low-Cost Sensors and the Quality of Data . 95
Dr. Shivanand Prabhoolall Guness
vi Table of Contents46. Maintain Your Mechanical Sympathy 97
Tobias Macey - Metadata ≥ Data . 99
Jonathan Seidman - Metadata Services as a Core Component of the Data
Platform 101
Lohit VijayaRenu - Mind the Gap: Your Data Lake Provides No ACID
Guarantees . 103
Einat Orr - Modern Metadata for the Modern Data Stack . 105
Prukalpa Sankar - Most Data Problems Are Not Big Data Problems . 107
Thomas Nield - Moving from Software Engineering to Data
Engineering 109
John Salinas - Observability for Data Engineers . 111
Barr Moses - Perfect Is the Enemy of Good . 114
Bob Haffner - Pipe Dreams . 116
Scott Haines - Preventing the Data Lake Abyss 118
Scott Haines - Prioritizing User Experience in Messaging Systems 120
Jowanza Joseph
Table of Contents vii58. Privacy Is Your Problem 122
Stephen Bailey, PhD - QA and All Its Sexiness . 124
Sonia Mehta - Seven Things Data Engineers Need to Watch Out for
in ML Projects . 126
Dr. Sandeep Uttamchandani - Six Dimensions for Picking an Analytical Data
Warehouse . 128
Gleb Mezhanskiy - Small Files in a Big Data World 131
Adi Polak - Streaming Is Different from Batch 134
Dean Wampler, PhD - Tardy Data . 136
Ariel Shaqed - Tech Should Take a Back Seat for Data Project
Success . 138
Andrew Stevenson - Ten Must-Ask Questions for Data-Engineering
Projects 140
Haidar Hadi - The Data Pipeline Is Not About Speed . 143
Rustem Feyzkhanov - The Dos and Don’ts of Data Engineering 145
Christopher Bergh - The End of ETL as We Know It 148
Paul Singman
viii Table of Contents70. The Haiku Approach to Writing Software 151
Mitch Seymour - The Hidden Cost of Data Input/Output 153
Lohit VijayaRenu - The Holy War Between Proprietary and Open Source
Is a Lie 155
Paige Roberts - The Implications of the CAP Theorem 157
Paul Doran - The Importance of Data Lineage 159
Julien Le Dem - The Many Meanings of Missingness . 161
Emily Riederer - The Six Words That Will Destroy Your Career 163
Bartosz Mikulski - The Three Invaluable Benefits of Open Source for
Testing Data Quality 165
Tom Baeyens - The Three Rs of Data Engineering 167
Tobias Macey - The Two Types of Data Engineering and
Data Engineers 169
Jesse Anderson - The Yin and Yang of Big Data Scalability 171
Paul Brebner - Threading and Concurrency in Data Processing 173
Matthew Housley, PhD
Table of Contents ix82. Three Important Distributed Programming Concepts 175
Adi Polak - Time (Semantics) Won’t Wait . 177
Marta Paes Moreira and Fabian Hueske - Tools Don’t Matter, Patterns and Practices Do 179
Bas Geerdink - Total Opportunity Cost of Ownership 181
Joe Reis - Understanding the Ways Different Data Domains
Solve Problems 183
Matthew Seal - What Is a Data Engineer? Clue: We’re Data Science
Enablers 185
Lewis Gavin - What Is a Data Mesh, and How Not to Mesh It Up 187
Barr Moses and Lior Gavish - What Is Big Data? . 189
Ami Levin - What to Do When You Don’t Get Any Credit . 191
Jesse Anderson - When Our Data Science Team Didn’t Produce Value 193
Joel Nantais - When to Avoid the Naive Approach 195
Nimrod Parasol - When to Be Cautious About Sharing Data . 197
Thomas Nield - When to Talk and When to Listen 199
Steven Finkelstein
x Table of Contents95. Why Data Science Teams Need Generalists, Not
Specialists 201
Eric Colson - With Great Data Comes Great Responsibility . 203
Lohit VijayaRenu - Your Data Tests Failed! Now What? 205
Sam Bail, PhD
Contributors . 207
Index
كلمة سر فك الضغط : books-world.net
The Unzip Password : books-world.net
تحميل
شارك
تعليقات