Skip to content

Elevating Data Strategies: The Salesforce-Databricks Data Lakehouse Blueprint

First, Data Warehouses and Data Lakes, now Data Lakehouses? What’s next, Data Beach Resorts? Data Timeshares? Soon you will be taking your data on a vacation twice a year.

We’ll spare you the paltry puns, but you should pay attention to this.

Salesforce logo

In the noise of Dreamforce,  Salesforce announced a Strategic Partnership with Databricks. This partnership will allow customers to access data across both Salesforce’s Data Cloud Lakehouse and Databricks’ Lakehouse data as if they are housed in a single location – no ETL necessary. Let’s not forget, Salesforce released a beta connection to Databricks in CRM Analytics and Salesforce Data Pipelines.

You might wonder, what is a Data Lakehouse? According to, keyword searches for Data Lakehouses have increased by +4550% in the past 5 years. Let’s break down the hype:

How does a Data Lakehouse compare to Data Warehouses and Data Lakes?

Data Warehouse

A data management system that is designed to enable business intelligence, typically performing queries and analysis on large amounts of historical data. The data is typically governed and cataloged so it can be digested for analysis.

Data Lake

A pool of raw data, stored in its most primal form. There are no requirements to first structure the data which makes this a flexible data storage solution.

Data Lakehouse

Synthesizes the golden features of data warehouse and data lake technology. It aims to provide the affordable and scalable storage of data lakes with the strong data management and ACID (atomicity, consistency, isolation, durability) transaction capabilities of data warehouses.

With the best features of both Data Lakes and Data Warehouses, the Data Lakehouse is poised to lead enterprises into the new age of Artificial Intelligence.

Data Lakes quickly grow unmanageable due to the volume and variety of the data. They often evolve into Data Swamps without proper governance. Their lack of structure makes them difficult to integrate with other systems. Data Warehouses often fail to scale up with large data volumes, impacting organizations’ storage spend, hardware, processing power and maintenance.  While frustrating for users of intelligence solutions, these limitations increase frustration for AI and Machine Learning Models by orders of magnitude.

Now enter stage left, the Data Lakehouse. With the best features of both Data Lakes and Data Warehouses, the Data Lakehouse is poised to lead enterprises into the new age of Artificial Intelligence.

For organizations contemplating future AI strategy, evaluating the volume of data their enterprise produces is crucial. When data volume grows exponentially, so does the amount of noisy information distracting us from finding core insight and obstructing effective decision making. So, we need tools that help make sense of this data, bubbling up insight to our teams and engaging customers in real-time.

The Use Cases

Consumer Goods / Retail

Consider your typical retail or consumer goods organization focused on optimizing a customer’s propensity to buy. The purpose of a propensity-to-buy model is to predict when a customer is predisposed to make a purchase and engage the customer.  Most propensity-to-buy models leverage sales and loyalty information to identify trends in consumption behavior. While useful, these models do not incorporate a complete picture of the customer. Propensity-to-buy models send cues to downstream systems to raise engagement through various channels such as SMS and Email, and it is important that the customer receives that insight in real-time. Batch processes that wait until overnight will miss out on opportunities to engage a customer. Maybe I have never bought a pumpkin spice latte, but it is September (well maybe now August). I am within 100 yards of a Starbucks, so tell me about that coupon for a free PSL – life is short!

Organizations leveraging both Salesforce’s Data Cloud and Databricks effectively will have a complete view of the customer and an efficient solution for traversing large swaths of data over long periods of time. These organizations can worry less about inundating customers with inopportune messages and instead galvanize their customer base with timely and relevant promotions through preferred channels. Databricks handles churning through vast data volumes to calculate a propensity score. This data is then sent through Salesforce’s Data Cloud for proper segmentation where it is finally pushed to a Marketing Cloud Data Extension for Email or SMS outreach in Journey Builder, or it is simply surfaced to the customer via streaming insights


Envision the automobile manufacturer sifting through large data volumes to optimize inventory while using a mix of material reuse and repurpose strategies and leveraging Databricks to help scale demand appropriately. Imagine connecting this directly to your CRM Analytics instance to create a principal inventory dashboard embedded in Salesforce surfacing key insights to your operations and sales teams. Then, even your partner community of resellers retrieves this level of insight with their own dashboard.


You are in Sales Ops or GTM Strategy at the emblazoned SaaS company, a novelty of the 21st century, focused on reducing churn and accelerating account growth. Leads pour in through online forms and engage with whitepaper downloads. Databricks stores and maintains usage data for your company’s product. As you scale, the data volume seems unruly and chaotic. Then your CEO and CRO ask for a customer health score based on usage data to help predict churn and renewal rates, so your organization builds a model to farm key insight from your massive pool of usage data. You want to surface this information to the Sales teams (who live in Salesforce) with the goal of incorporating qualitative sales notes for sentiment analysis and sales follow up frequency. So, you leverage Data Pipelines to combine Databrick’s usage score, the sentiment analysis from sales, and the follow up frequency, and where the output is a slick dashboard on your Sales Team’s home page and a model built in Einstein Discovery deployed to the Opportunity page to help predict win rate.

As organizations navigate increasing complex landscapes of Data Warehouses and Data Lakes, Data Lakehouses represent powerful solutions combining the strengths of both existing technologies.  Embrace Data Lakehouses and your data not only stays organized but becomes the driving force behind your organization’s success.

To step into the AI-driven automated future, your data needs to be ready. For expert solutions and support reach out to Rosetree Solutions, your trusted 5-star AppExchange partner.

Zach Matek portrait photo

Zach Matek, Principal Consultant

Zach is a former Sales professional turned Salesforce consultant. He spent time using Salesforce in Sales Operations before he began solutioning on the platform. Zach has worked with organizations of all sizes and wide variety of industries, improving business processes and scaling systems via a human-centered design approach.