Blog | September 03, 2025
Taming Modern Data Challenges: Structured Data
In our recent posts, we’ve explored the evolving challenges of linked documents—from terminology and preservation to best practices and the Cimplifi Microsoft 365 approach. We reviewed key case law shaping how courts expect linked documents to be handled, offering critical insights for discovery strategy.
As is the case with linked documents, discovery of structured data for litigation and investigations isn’t new, however, the proliferation of enterprise-wide database solutions and the storage of unstructured data formats in structured container files has redefined the challenges and best practices associated with structured data. In this post, we will discuss structured data discovery in terms of how the treatment of structured data has evolved in discovery, types of cases where structured data may be important, and the skills needed and best practices to conduct structured data discovery effectively.
The Evolution of Structured Data in Discovery
As electronic discovery became a formal discipline, the industry’s initial attention centered primarily on emails and office documents; however, structured data such as HR records, customer databases, and financial ledgers quietly grew in significance over time. In the early days of eDiscovery, collecting this data often resulted in massive, unwieldy spreadsheets exported from relational databases. These exports frequently lost key metadata, lacked relational integrity, and failed to reflect the dynamic nature of the systems from which they originated.
As technology and legal teams matured, so have the methods for handling structured data. The focus shifted from “pull everything and review in Excel” to more strategic extraction methods, such as targeting specific fields, timeframes, and relational connections. Litigators and discovery experts now often work with database administrators to tailor exports or use middleware that could preserve data context. This results in more defensible collections to address increased judicial scrutiny on the adequacy of structured data productions, and leaner review sets for more efficient and cost-effective discovery of that data.
The Rise of Enterprise Platforms
The increasing adoption of enterprise-wide solutions like Salesforce, Oracle, SAP, and Workday has redefined how structured data enters the legal arena. These platforms aren’t just databases; they are dynamic, cloud-based ecosystems with interconnected modules, extensive metadata, and user-specific views. They also log activity across users, making them treasure troves of potential evidence, as well as potential landmines of privileged or confidential information.
Take Salesforce as an example. A single customer interaction may touch multiple objects: account records, cases, notes, emails, and support tickets. Extracting a full picture from Salesforce means understanding its object model, relational architecture, and native export limitations. Requesting “all data about a client” isn’t enough anymore; instead, you must be specific about objects, fields, filters, and timeframe. The cloud nature of these systems also introduces additional complications around access rights, audit logs, and change history.
Text Messages and Chat Logs
While structured data traditionally refers to rows and columns, many unstructured communications – like text messages, chats, and other collaboration content – are stored in structured or semi-structured file formats. Examples:
- iMessages are stored in SQLite databases on iPhones
- Slack exports generate JSON files with user IDs, timestamps, and message content.
- Microsoft Teams logs are captured from within Microsoft 365 in a tabular format.
These data sets are a combination of structured and unstructured, requiring hybrid strategies to manage collection. While the content may look unstructured, its technical format can be parsed, filtered, and queried with precision – if you know how to manage it. Understanding the schema of these platforms has become key for effective and defensible discovery.
Types of Cases Likely to Involve Structured Data
Structured data discovery isn’t necessary in every case, but it is not only becoming more common in litigation and investigations, but the evidence from structured data sources is often critical to the case. Types of cases likely to involve structured data include:
- Employment litigation: Time records, payroll data, performance reviews, and attendance logs can help support or refute claims around discrimination, wage disputes, or retaliation.
- Antitrust investigations: Pricing tables, bid records, customer segmentation, and sales pipeline data that can impact antitrust cases are often kept in CRM or ERP systems.
- Product liability and recalls: Manufacturing records, testing logs, and defect tracking databases can determine when and how issues were identified and what was done about them.
- Healthcare litigation: Patient records, clinical trial databases, and prescription logs key to healthcare claims are often extracted from EHR (electronic health record) systems.
- Financial services: Loan records, transaction histories, and compliance logs that show financial dealings often live in custom or regulated database environments.
The common thread across these use cases is the need for precision, context, and data integrity. Especially in enterprise environments, structured data often holds the “single source of truth” from an evidentiary standpoint.
Skills Needed for Structured Data Discovery
The skills required to successfully navigate structured data are unique compared to the skills for discovery of unstructured data using traditional eDiscovery workflows. Needed skills include:
- Relational database literacy: It’s important to understand concepts like table relationships, primary and foreign keys, and how data joins impact analysis.
- Query language familiarity: Knowledge of SQL or ability to read query outputs is key to executing and validating data extractions. Understanding syntax like SELECT statements, GROUP BY clauses, JOIN clauses, subqueries and more is a requirement for many working with structured data sources.
- Platform-specific expertise: While not always a requirement at the beginning of a case, understanding how Salesforce, SAP, or Microsoft Dynamics structures their records can save time and increase accuracy in collecting data from these platforms.
- Metadata interpretation: The ability to recognize the importance of timestamps, field changes, user actions, and audit logs can uncover hidden timelines or issues of spoliation.
Required skills like these are why structured data discovery often requires a close partnership with system owners, IT administrators, and/or outside providers who understand database and SQL fundamentals as well as the intricacies of the platform(s) involved. This level of technical expertise is “table stakes” for handling structured data today.
Best Practices for Discovery of Structured Data
Discovery of structured data not only requires unique skills, but it also requires unique workflows and tools as well. Here are some best practices to conduct efficient and defensible discovery of structured data:
- Plan beforehand: Start with a data map to identify what systems are in use, what types of data are stored, and who owns them. That’s key to ensuring that legal hold notices reach the right custodians and systems.
- Engage early with IT and data custodians: Collaboration ensures that requests are technically feasible, narrowly tailored, and defensible. Define the scope as early as possible to avoid blind extractions that will drive up costs.
- Use targeted queries: Rather than pulling entire databases, work with technical teams to write focused queries that capture only relevant records, fields, and timeframes. Targeted collection is just as valuable for structured data as it is for unstructured data.
- Preserve relational integrity: When exporting data, maintain connections between related tables or records. Flattening everything into a spreadsheet can often destroy important context.
- Use review tools that support structured data formats: Many modern platforms can ingest CSVs, JSON, or XML files and provide field-based filtering or timeline views. Consider using analytics in those platforms to narrow review sets intelligently.
- Watch for hidden PII and sensitive fields: Structured data often contains sensitive content (SSNs, health info, financials). Plan redactions or privilege reviews accordingly.
- Test your assumptions: Validate your extractions through sampling, metrics, and consistency checks. If the data is incomplete or misaligned, address it early to avoid discovery delays and potential sanctions that may arise from them.
- Document extraction and transformation methods: Maintain a clear audit trail describing how the data was collected, filtered, transformed, and validated. Courts increasingly expect this transparency – failure to provide it could lead to sanctions.
Conclusion
Structured data is no longer the exception in eDiscovery – it’s rapidly becoming the rule, or at least a recurring reality. As enterprises continue to invest in cloud platforms, workflow automation, and data-rich ecosystems, legal and eDiscovery professionals must be prepared to incorporate them into their workflows to tame this modern data challenge. By understanding how structured data is stored, when it could be relevant, and how to effectively extract and analyze it, legal teams can turn potential complexity into a strategic advantage.
In our next post in the series, we will explore the impact of emojis and their evolution from playful symbols to critical elements of professional communication, presenting unique challenges in eDiscovery due to their impact on tone, intent, and legal interpretation across platforms like email, text, Slack, and Teams.
For more regarding Cimplifi data reduction & analytics capabilities, click here.