Big commercial websites breathe data: they create a lot of it very fast, but also need the feedback based on the very same data to become better and better. In this book, we present our journey towards a very generic solution to gather and utilize all data produced by our web application and associated systems, be it technical or business data.
We show what technologies we have evaluated and which tools emerged in the end as the (currently) best solution for us. By showing you our ideas, our process, the drawbacks and the solutions, we provide a guide towards building your own data infrastructure. Further, we explore the possibilities to access and harness the data using the map/reduce approach in order to prepare you for the most challenging part of it all: gaining relevant knowledge you did not had before.