97 Things Every Data Engineer Should Know

97 Things Every Data Engineer Should Know
اسم المؤلف
Tobias Macey
التاريخ
التصنيف
المشاهدات
646
التقييم
Loading...
التحميل

97 Things Every Data Engineer Should Know
Tobias Macey
Table of Contents
Preface xiii

  1. A (Book) Case for Eventual Consistency . 1
    Denise Koessler Gosnell, PhD
  2. A/B and How to Be . 3
    Sonia Mehta
  3. About the Storage Layer . 5
    Julien Le Dem
  4. Analytics as the Secret Glue for Microservice
    Architectures 7
    Elias Nema
  5. Automate Your Infrastructure . 9
    Christiano Anderson
  6. Automate Your Pipeline Tests 11
    Tom White
  7. Be Intentional About the Batching Model in Your
    Data Pipelines 13
    Raghotham Murthy
  8. Beware of Silver-Bullet Syndrome . 17
    Thomas Nield
    iii9. Building a Career as a Data Engineer 19
    Vijay Kiran
  9. Business Dashboards for Data Pipelines 21
    Valliappa (Lak) Lakshmanan
  10. Caution: Data Science Projects Can Turn into the
    Emperor’s New Clothes . 23
    Shweta Katre
  11. Change Data Capture 26
    Raghotham Murthy
  12. Column Names as Contracts 28
    Emily Riederer
  13. Consensual, Privacy-Aware Data Collection 30
    Katharine Jarmul
  14. Cultivate Good Working Relationships with Data
    Consumers 32
    Ido Shlomo
  15. Data Engineering != Spark 34
    Jesse Anderson
  16. Data Engineering for Autonomy and Rapid
    Innovation . 36
    Jeff Magnusson
  17. Data Engineering from a Data Scientist’s Perspective . 38
    Bill Franks
  18. Data Pipeline Design Patterns for Reusability and
    Extensibility . 40
    Mukul Sood
  19. Data Quality for Data Engineers 42
    Katharine Jarmul
    iv Table of Contents21. Data Security for Data Engineers 44
    Katharine Jarmul
  20. Data Validation Is More Than Summary Statistics 46
    Emily Riederer
  21. Data Warehouses Are the Past, Present, and Future 48
    James Densmore
  22. Defining and Managing Messages in Log-Centric
    Architectures . 50
    Boris Lublinsky
  23. Demystify the Source and Illuminate the
    Data Pipeline 52
    Meghan Kwartler
  24. Develop Communities, Not Just Code . 54
    Emily Riederer
  25. Effective Data Engineering in the Cloud World 56
    Dipti Borkar
  26. Embrace the Data Lake Architecture 58
    Vinoth Chandar
  27. Embracing Data Silos 61
    Bin Fan and Amelia Wong
  28. Engineering Reproducible Data Science Projects 63
    Dr. Tianhui Michael Li
  29. Five Best Practices for Stable Data Processing 65
    Christian Lauer
  30. Focus on Maintainability and Break Up Those
    ETL Tasks . 67
    Chris Moradi
    Table of Contents v33. Friends Don’t Let Friends Do Dual-Writes 69
    Gunnar Morling
  31. Fundamental Knowledge 71
    Pedro Marcelino
  32. Getting the “Structured” Back into SQL . 73
    Elias Nema
  33. Give Data Products a Frontend with Latent
    Documentation . 76
    Emily Riederer
  34. How Data Pipelines Evolve 78
    Chris Heinzmann
  35. How to Build Your Data Platform like a Product . 80
    Barr Moses and Atul Gupte
  36. How to Prevent a Data Mutiny 83
    Sean Knapp
  37. Know the Value per Byte of Your Data 85
    Dhruba Borthakur
  38. Know Your Latencies 87
    Dhruba Borthakur
  39. Learn to Use a NoSQL Database, but Not like
    an RDBMS . 89
    Kirk Kirkconnell
  40. Let the Robots Enforce the Rules 91
    Anthony Burdi
  41. Listen to Your Users—but Not Too Much 93
    Amanda Tomlinson
  42. Low-Cost Sensors and the Quality of Data . 95
    Dr. Shivanand Prabhoolall Guness
    vi Table of Contents46. Maintain Your Mechanical Sympathy 97
    Tobias Macey
  43. Metadata ≥ Data . 99
    Jonathan Seidman
  44. Metadata Services as a Core Component of the Data
    Platform 101
    Lohit VijayaRenu
  45. Mind the Gap: Your Data Lake Provides No ACID
    Guarantees . 103
    Einat Orr
  46. Modern Metadata for the Modern Data Stack . 105
    Prukalpa Sankar
  47. Most Data Problems Are Not Big Data Problems . 107
    Thomas Nield
  48. Moving from Software Engineering to Data
    Engineering 109
    John Salinas
  49. Observability for Data Engineers . 111
    Barr Moses
  50. Perfect Is the Enemy of Good . 114
    Bob Haffner
  51. Pipe Dreams . 116
    Scott Haines
  52. Preventing the Data Lake Abyss 118
    Scott Haines
  53. Prioritizing User Experience in Messaging Systems 120
    Jowanza Joseph
    Table of Contents vii58. Privacy Is Your Problem 122
    Stephen Bailey, PhD
  54. QA and All Its Sexiness . 124
    Sonia Mehta
  55. Seven Things Data Engineers Need to Watch Out for
    in ML Projects . 126
    Dr. Sandeep Uttamchandani
  56. Six Dimensions for Picking an Analytical Data
    Warehouse . 128
    Gleb Mezhanskiy
  57. Small Files in a Big Data World 131
    Adi Polak
  58. Streaming Is Different from Batch 134
    Dean Wampler, PhD
  59. Tardy Data . 136
    Ariel Shaqed
  60. Tech Should Take a Back Seat for Data Project
    Success . 138
    Andrew Stevenson
  61. Ten Must-Ask Questions for Data-Engineering
    Projects 140
    Haidar Hadi
  62. The Data Pipeline Is Not About Speed . 143
    Rustem Feyzkhanov
  63. The Dos and Don’ts of Data Engineering 145
    Christopher Bergh
  64. The End of ETL as We Know It 148
    Paul Singman
    viii Table of Contents70. The Haiku Approach to Writing Software 151
    Mitch Seymour
  65. The Hidden Cost of Data Input/Output 153
    Lohit VijayaRenu
  66. The Holy War Between Proprietary and Open Source
    Is a Lie 155
    Paige Roberts
  67. The Implications of the CAP Theorem 157
    Paul Doran
  68. The Importance of Data Lineage 159
    Julien Le Dem
  69. The Many Meanings of Missingness . 161
    Emily Riederer
  70. The Six Words That Will Destroy Your Career 163
    Bartosz Mikulski
  71. The Three Invaluable Benefits of Open Source for
    Testing Data Quality 165
    Tom Baeyens
  72. The Three Rs of Data Engineering 167
    Tobias Macey
  73. The Two Types of Data Engineering and
    Data Engineers 169
    Jesse Anderson
  74. The Yin and Yang of Big Data Scalability 171
    Paul Brebner
  75. Threading and Concurrency in Data Processing 173
    Matthew Housley, PhD
    Table of Contents ix82. Three Important Distributed Programming Concepts 175
    Adi Polak
  76. Time (Semantics) Won’t Wait . 177
    Marta Paes Moreira and Fabian Hueske
  77. Tools Don’t Matter, Patterns and Practices Do 179
    Bas Geerdink
  78. Total Opportunity Cost of Ownership 181
    Joe Reis
  79. Understanding the Ways Different Data Domains
    Solve Problems 183
    Matthew Seal
  80. What Is a Data Engineer? Clue: We’re Data Science
    Enablers 185
    Lewis Gavin
  81. What Is a Data Mesh, and How Not to Mesh It Up 187
    Barr Moses and Lior Gavish
  82. What Is Big Data? . 189
    Ami Levin
  83. What to Do When You Don’t Get Any Credit . 191
    Jesse Anderson
  84. When Our Data Science Team Didn’t Produce Value 193
    Joel Nantais
  85. When to Avoid the Naive Approach 195
    Nimrod Parasol
  86. When to Be Cautious About Sharing Data . 197
    Thomas Nield
  87. When to Talk and When to Listen 199
    Steven Finkelstein
    x Table of Contents95. Why Data Science Teams Need Generalists, Not
    Specialists 201
    Eric Colson
  88. With Great Data Comes Great Responsibility . 203
    Lohit VijayaRenu
  89. Your Data Tests Failed! Now What? 205
    Sam Bail, PhD
    Contributors . 207
    Index

كلمة سر فك الضغط : books-world.net
The Unzip Password : books-world.net

تحميل

يجب عليك التسجيل في الموقع لكي تتمكن من التحميل

تسجيل | تسجيل الدخول