P2P

winter23

Peer to Peer: ILTA's Quarterly Magazine

Issue link: https://epubs.iltanet.org/i/1515316

Contents of this Issue

Navigation

Page 43 of 94

44 P E E R T O P E E R : I L T A ' S Q U A R T E R L Y M A G A Z I N E | W I N T E R 2 0 2 3 data users cannot transfer personal data outside Japan without obtaining informed consent from the individual, or without establishing a personal information protection system with the receiving party in accordance with the APPI. Some countries have even more restrictive laws that govern confidential business information and state secrets. In China, for example, parties attempting discovery should expect wide application of these laws to commercial documents such as meeting minutes, financial statements, and production forecasts. To address this requirement, many global eDiscovery providers have recently invested heavily in onshore hosting capabilities within APAC jurisdictions. This investment makes sense as concerns regarding data sovereignty are not expected to subside despite a clear, recent consensus among APAC nations that greater consistency in regional data protection frameworks is sorely needed. As a result, for the foreseeable future, the landscape of data privacy laws and regulations is expected to remain complex as each country continues to act independently in passing new legislation that addresses current domestic needs. CJK Language Expertise Most document review tools nowadays can support complex languages, such as Chinese, Japanese, and Korean (CJK), but it remains essential to gain a clear understanding of associated processes, such as indexing and searching, when managing cross-border matters expected to contain CJK languages. Indexing is a process that inventories the total content of a file and builds a searchable index, a digital table that serves, conceptually, like the index in a book. Search indexes function as tools designed to facilitate and expedite the retrieval of information. However, before searches can occur, document content must be tokenized into searchable elements. Asian languages are fundamentally more difficult to index than Latin or European languages because their words are not consistently separated by spaces. Chinese and Japanese use no spaces between words, and Korean uses some spaces, but not between every word, and inconsistently depending on the writer. Early efforts at CJK tokenization often indexed every character in the dataset, which resulted in an overabundance of search hits that were less responsive to the intended search. More effective methods at producing indexable units in Asian languages are N-gram and tokenization, although results from these approaches have varying degrees of search precision. In a future post, we will share FRONTEO's technology and preferred approach in this regard. Given that technology is constantly evolving and improving, we recommend obtaining a good understanding of the handling, indexing and searching processes – specifically for CJK languages – that your eDiscovery provider will employ, and that you also inquire about analytics and AI-based tools for CJK languages that may help your legal teams gather additional insights about your data. Analytics & Technology Assisted Solutions Built for CJK Language Nuances Recognizing that the construction of words in CJK languages fundamentally differs from Latin and other European languages, the development of advanced technologies for use in cross border cases should contain algorithms that leverage the use of N-gram or tokenization methodologies to identify relevant material. There are many good TAR and analytics solutions available in the market that can assist with increasing efficiencies and reducing associated discovery costs. However, not all solutions are designed upon initial release to deliver new F E A T U R E S

Articles in this issue

Archives of this issue

view archives of P2P - winter23