{"id":2099,"date":"2023-12-21T10:40:56","date_gmt":"2023-12-21T10:40:56","guid":{"rendered":"https:\/\/i-spark.nl\/?p=2099"},"modified":"2026-01-02T11:36:20","modified_gmt":"2026-01-02T11:36:20","slug":"the-migration-from-aws-with-redshift-to-databricks-for-e-commerce-company-wwl","status":"publish","type":"post","link":"https:\/\/i-spark.nl\/en\/blog\/the-migration-from-aws-with-redshift-to-databricks-for-e-commerce-company-wwl\/","title":{"rendered":"The migration from AWS with Redshift to Databricks for e-commerce company WWL"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">The i-spark team is actively engaged with World Wide Lighting [WWL], continuing their multi-year collaboration as a vital part of the data team using i-spark\u2019s <a href=\"https:\/\/i-spark.nl\/en\/products\/data-team-as-a-service\/\">Data Team-as-a-Service solution<\/a>. About a year ago, the e-commerce lighting company WWL reached a point where it had to choose between investing in its current data platform or switching to a modern cloud solution that could scale with the growth of the business. WWL aimed to enhance the reliability, stability, and speed of their data processes, including clickstream and ERP data, from their 24 diverse international webshops, while maintaining acceptable costs. This consultancy role and later the execution role around this issue was a perfect fit for i-spark.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Their existing data platform was based on an Amazon Redshift data warehouse within AWS. However, they experienced challenges in reliability and resiliency, due to high demand and usage of the platform. Investing in the current platform would inevitably encounter the same challenges sooner or later, ultimately leading to possible large investments with an uncertain outcome. Therefore, the decision to migrate to a modern data platform was made.<\/span><\/p>\n<h2>Tool Selection<\/h2>\n<p><span style=\"font-weight: 400;\">I-spark explored multiple cloud solutions and <strong><a href=\"https:\/\/i-spark.nl\/en\/data-technologies\/databricks\/\">Databricks<\/a><\/strong> turned out to be the best fit for WWL\u2019s requirements, as this platform was more comprehensive. <\/span><span style=\"font-weight: 400;\">Databricks also had an advantage that it\u2019s accessible to a broader group of users such as analysts due to useful features such as the SQL editor. <\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">The many processes that ran on the AWS platform, such as services like Lambda, Glue, and ECS that were used, were perfect candidates for migration to Databricks. Additionally, Databricks offered more solutions for Machine Learning, which was also one of WWL&#8217;s future aspirations. After setting up a business case as a Proof of Concept confirmed: Databricks was indeed a viable solution for enhancing reliability, stability, and speed of their data processes.<\/span><\/p>\n<h2>Solution Architecture and Roadmap<\/h2>\n<p><span style=\"font-weight: 400;\">A fair share of time was invested in this phase, because well begun is half done.<\/span><b><br \/>\n<\/b><span style=\"font-weight: 400;\">In the <a href=\"https:\/\/i-spark.nl\/en\/expertise\/data-architecture\/\"><strong>architecture<\/strong><\/a>, we laid the foundation of the future platform based on the learnings we had from the previous data platform. Therefore we gathered both functional and non-functional requirements from the business and translated these to a comprehensive plan for the architecture, including a design for the flow and storage of the large volumes of data. This resulted in a design based on the MACH-architecture principles using a medallion structure for the storage of data. MACH means that it is built around microservices, which are specific jobs for specific tasks, using API\u2019s where possible, being cloud-native and headless, which means that other tools, such as reporting or marketing automation, have (curated) access to the data in and the compute power of Databricks.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As the next step, we determined the best approach for the delivery of the project, the allocation of hours amongst the team and divided the project in multiple phases.\u00a0<\/span><span style=\"font-weight: 400;\">The result of this was a project roadmap where the realization of the architecture was mapped on a timeline using the phased approach.<\/span><\/p>\n<blockquote><p><strong>&#8220;i-sparks approach and good preparation ensured that the delivery of the final solution was done in under two days&#8221;<\/strong><\/p><\/blockquote>\n<h2>Development<\/h2>\n<p><span style=\"font-weight: 400;\">The roadmap guided the initial development. It began with migrating the processes that handled the most data or were the most complex in terms of processing and setup. While starting with a small team of Data Engineers for the first steps, this was gradually expanded to eventually a multidisciplinary team of <strong><a href=\"https:\/\/i-spark.nl\/en\/expertise\/data-engineering\/\">Data Engineers<\/a><\/strong>, Analytics Engineers and <strong><a href=\"https:\/\/i-spark.nl\/en\/expertise\/data-science-data-analysis\/\">Data Analysts<\/a><\/strong> when more and more data became available. The Data Engineers were assigned processes concerning data sources they were already familiar with, and worked in parallel on multiple flows to make this part of the project as efficient as possible. Analytics Engineers and Data Analysts worked on a more waterfall-based principle due to strong dependencies between data. The project was delivered through a &#8216;Big Bang&#8217; release, following a few weeks of operating the new platform alongside the existing one. This parallel run was conducted to validate data and address initial issues.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This approach and good preparation ensured that the delivery of the final solution was done in under two days. Any minor errors were immediately resolved, as the major ones had been prevented.<\/span><\/p>\n<h2>Data History<\/h2>\n<p><span style=\"font-weight: 400;\">A challenge during the engineering process was migrating some of the existing data from Redshift to Databricks. We did this by exporting the historical data as files and later importing them back into Databricks. This imported history was then merged with the new data. Both historical and newly collected data could be retrieved in the same way in the analysis tool, Looker.<\/span><\/p>\n<h2>Looker<\/h2>\n<p><span style=\"font-weight: 400;\">Since the source of data for<strong> <a href=\"https:\/\/i-spark.nl\/en\/technologies\/looker\/\">Looker<\/a> <\/strong>was changed from Redshift to Databricks, a new project within Looker was necessary. Looker is not able to connect to multiple data sources in one project. Therefore, in Looker, all existing explores were first rebuilt and converted to a new project by our Data Analysts. We also took the opportunity to move logic from Looker to dbt, to make future development and data management easier and more resilient. During the migration we encountered a few challenges:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Databricks uses a different SQL dialect than Redshift, necessitating adjustments in specific SQL functions.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">There was no separate development nor test environment where a duplicate instance of Looker was running, that meant that we had to work on the production environment while taking all precautions necessary to ensure a smooth running production environment.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Since Looker was unable to manage more than one data source, we had certain processes that were in Databricks already, temporarily transferring their data to Redshift. This was necessary to maintain the existing production reporting.<\/span><\/p>\n<blockquote><p><span style=\"font-weight: 400;\"><strong>&#8220;All of these optimizations and solutions on performance and costs led to a comprehensive insight in how Databricks works \u2018under the hood\u2019 for many different Spark and SQL workloads and processes, in combination with the infrastructure it is running on&#8221;<\/strong><\/span><\/p><\/blockquote>\n<h2>Cost Optimization<\/h2>\n<p><span style=\"font-weight: 400;\">The real challenging part of this migration journey began after its initial completion. During the development, we had a strong focus on the first three requirements &#8211;\u00a0 enhancing the reliability, stability, and speed of the data processes &#8211; while planning on focusing on the requirement of maintaining acceptable costs after delivery. If we had to redo this, we would focus on this requirement earlier in the process.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This eventually required optimization of compute resources and storage in Databricks afterwards, to find the optimal balance between data costs and consumption. We spend a lot of time and effort in finding the optimum, refactoring newly developed processes and infrastructure. The processing of large data volumes requires a lot of memory, and fast processing demands considerable computing power, all of which impacts costs. Finding the right balance in this was challenging but really educational for us. Many possible solutions have been explored, tested and implemented, resulting in cost optimization in various ways such as:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Smart VACUUM of old data in the data lake to lower storage costs;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Intelligent, dynamic up- and downscaling of SQL Warehouse configurations beyond the default auto-scaling offered by Databricks for SQL Warehouses;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Finding the ideal AWS EC2 instances for certain specific workloads as Job Compute Clusters in Databricks Workflows;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Using Compute Cluster Policies to limit the creation of Compute Clusters to pre-defined EC2 instances;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Moving certain workloads to dbt from Looker and Databricks Jobs to run more efficiently.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">All of these optimizations and solutions on performance and costs led to a comprehensive insight in how Databricks works \u2018under the hood\u2019 for many different Spark and SQL workloads and processes, in combination with the infrastructure it is running on. And in future <strong>migrations<\/strong>, the cost and performance optimization will be requirement number one and two.<\/span><\/p>\n<blockquote><p><strong>&#8220;We adeptly transitioned WWL from a large-scale AWS-based data platform to an enterprise-grade Databricks Data Hub on AWS infrastructure&#8221;<\/strong><\/p><\/blockquote>\n<h2>Experienced yet Embracing a Learning Curve<\/h2>\n<p><span style=\"font-weight: 400;\">In collaboration with <a href=\"https:\/\/i-spark.nl\/en\/data-technologies\/databricks\/\"><strong>Databricks<\/strong><\/a> and <strong><a href=\"https:\/\/i-spark.nl\/en\/success-stories\/wwl-x-i-spark\/\">WWL<\/a>,<\/strong> this project has elevated i-spark&#8217;s expertise and knowledge even more, establishing it as a pivotal migration milestone. We adeptly transitioned WWL from a large-scale AWS-based data platform to an enterprise-grade Databricks Data Hub on AWS infrastructure. During the migration, we faced numerous challenges, yet our effective and swift responses ensured a seamless transition. We are proud to declare the successful migration of WWL to a future-proof platform that fulfils all their requirements, reinforcing our commitment to excellence and innovation in data solutions and looking forward to continuing our collaboration with WWL for the upcoming years.<\/span><\/p>\n<h2>Are you ready for a Strategic Migration or Cost Optimization Partner?<\/h2>\n<p><span style=\"font-weight: 400;\">Unlock the full potential of your data solutions with i-spark, your premier partner for <strong><a href=\"https:\/\/i-spark.nl\/en\/products\/analytics-stack\/data-platform-migration\/\">migration<\/a><\/strong> and cost optimization. Our journey with Databricks has equipped us with insights and experience, positioning us uniquely to guide your migration journey. Specializing in cost-effective strategies, we&#8217;re not just optimizing your current Databricks implementation\u2014we&#8217;re revolutionizing it. Choose i-spark for your migration needs and witness a transformation in your data solutions, powered by our years of experience and expertise of our Data Team-as-a-Service.<\/span><\/p>\n<p>Feel free\u00a0<span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">to<a href=\"https:\/\/i-spark.nl\/en\/contact-us\/\" target=\"_blank\" rel=\"noopener\"><strong> contact<\/strong><\/a><\/span><strong>\u00a0us <\/strong>for more information about your migration.<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The i-spark team is actively engaged with World Wide Lighting [WWL], continuing their multi-year collaboration as a vital part of the data team using i-spark\u2019s Data Team-as-a-Service solution. About a year ago, the e-commerce lighting company WWL reached a point where it had to choose between investing in its current data platform or switching to [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":7553,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[8],"tags":[92,87,93],"class_list":["post-2099","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","tag-databricks-2","tag-migration","tag-redshift-2"],"acf":[],"_links":{"self":[{"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/posts\/2099","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/comments?post=2099"}],"version-history":[{"count":16,"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/posts\/2099\/revisions"}],"predecessor-version":[{"id":10191,"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/posts\/2099\/revisions\/10191"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/media\/7553"}],"wp:attachment":[{"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/media?parent=2099"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/categories?post=2099"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/i-spark.nl\/en\/wp-json\/wp\/v2\/tags?post=2099"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}