Large Data Volumes in Salesforce

Salesforce enables seamless scalability for managing large data volumes (LDV). Challenges include performance issues and slower operations which require optimization techniques like data modeling, indexing, partitioning, archiving, and efficient search strategies.

November 28, 2023

Introduction :

In today’s data-driven landscape, business leveraging Salesforce to manage their CRM Data. Salesforce enables customers to easily scale up their applications from small to large amounts of data. This scaling usually happens automatically, but as data sets get larger, the time required for certain operations grows. Organizations find themselves in a challenges posed by Large Data Volumes (LDV). As the amount of data within Salesforce grows, organizations must adopt strategic approaches to maintain system performance, ensure scalability, and deliver a seamless user experience. In this blog post, we’ll explore key considerations and best practices for handling large data volumes in Salesforce.

Understanding the Challenge :

Large Data Volumes (LDV) in Salesforce can lead to performance issues, slower query response, slower search results, slower loading of list views, slower loading of reports, slower sandbox refreshing, etc. To address these challenges requires a holistic approach that incorporate Optimization of Data Modelling, Strategic use of Salesforce features, etc.

Optimize Data Model
Leverage Skinny Table : Skinny Table can be created by salesforce to contain frequently used fields and to avoid joins. This can improve performance of certain read-only operations.
Leverage Indexes :
- Salesforce creates standard index on the following fields.
  - RecordTypeId
  - Division
  - CreatedDate
  - LastModifiedDate
  - Name
  - Email
  - Foreign Key Relationships (Lookup and Master-Detail)
  - Salesforce Record Id
  - External Id and Unique Fields
- Salesforce also support Custom Indexes on Custom Fields except for
  - Multi-Select Picklist
  - Text Area (Long)
  - Text Area (Rich)
  - Encrypted Text Fields
Partitioning Data for Scalability : This includes division of data in a logical manner. For example, a deployment with many customer records might create divisions called US, EMEA, and APAC to separate the customers into smaller groups that are likely to have few interrelationships.
Archiving Strategies

Techniques for Optimizing Performance :

Using Mashups : With this approach data resides in different applications and this will be available to Salesforce as and when needed. Salesforce refers to such an arrangement as a mashups because it provides a quick, loosely coupled integration of the two applications.
Defer Sharing Calculation : With this approach sometimes it beneficial to use a feature called defer sharing calculation, which allows users to defer the processing of sharing rules until after new users, rules, and other content have been loaded. With this feature Salesforce Admin can defer sharing rule calculation during large data load operations which led to complex sharing rule calculation. Salesforce Admin can resume sharing calculation during off peak timing.
Using SOSL and SOQL :
- Use SOQL when :
  - You know in which object resides.
  - You want to retrieve data from single objects or multiple objects which are linked with each other.
  - You want to count the number of records.
  - You want to retrieve data based on certain conditions.
  - You want to perform aggregate functions such as count, sum, min, max, avg.
- Use SOSL when :
  - You don’t know in which object data resides and you want to retrieve data in most efficient way.
  - You want to retrieve multiple objects data with or without certain conditions in most efficient way.
Deleting Data : Salesforce uses Recycle Bin to store the data which is deleted by users for 15 days. This is called as Soft Delete. Salesforce doesn’t actually delete the data but mark is as for deletion and kept in database for 15 days. The data is then Hard Deleted from the database after 15 days; when the size limit is reached; or when the Recycle Bin is emptied using the UI, the API, or Apex. In addition, Bulk API and Bulk API 2.0 support a hard delete option, which allows records to bypass the Recycle Bin and become immediately available for deletion. We recommend that you use the Bulk API 2.0’s hard delete function to delete large data volumes.
Search : When large volumes of data are added or changed, the search system must index that information before it becomes available for search by all users, and this process might take a long time.

Best Practices for Achieving Good Performance with Large Data Volume (LDV) :

Reporting :
- Reduce the number of records by applying filter criteria
- Reduce the number of joins by minimizing the number of objects and relationships used to generate report
- Reduce the number of data return by adding only relevant fields in report
Loading Data from API :
- Use Bulk API to load data when record count is more than 2K.
- Do Delta Load only whenever is possible.
- Avoid unnecessary overhead by authenticating once per load not on each record.
- Avoid computation by using Public Read/Write security during initial load to avoid sharing calculations.
- Deferring sharing calculations.
Extracting Data from API :
- When using a query that can return more than one million results, consider using the query capability of the Bulk API 2.0, which might be more suitable.
SOQL and SOSL :
- Avoid querying on formula fields, which are computed in real time.
- Allow indexed searches when SOQL queries with multiple WHERE filters can’t use indexes.
- Use SOSL and SOQL more appropriately based on use case.
- Build SOSL and SOQL queries efficiently.
- Avoid time outs on large queries.
Deleting Data :
- When deleting large volumes of data, a process that involves deleting one million or more records, use the hard delete option of the Bulk API or Bulk API 2.0 .
- When deleting records that have many children, delete the children first.
General :
- Avoid having any user own more than 10,000 records.
- Use a data-tiering strategy that spreads data across multiple objects, and brings in data on demand from another object or external store.
- Distribute child records so that no parent has more than 10,000 child records.
- When creating copies of production sandboxes, exclude field history if it isn’t required.

References :

https://developer.salesforce.com/docs/atlas.en-us.salesforce_large_data_volumes_bp.meta/salesforce_large_data_volumes_bp/ldv_deployments_introduction.htm

Salesforce