Posts By Maya Anderson
Abstract: Regulated industries, such as Healthcare and Finance, are starting to move parts of their data and workloads to the public cloud. However, they are still reluctant to trust the public cloud with their most sensitive records, and hence leave them in their premises, leveraging the hybrid cloud architecture. We address the security and performance challenges of big data analytics using a hybrid cloud in a real-life use case from a hospital. In this use case, the hospital collects sensitive patient data and wants to run analytics on it in order to lower antibiotics resistance, a significant challenge in healthcare. We show that it is possible to run large-scale analytics on data that is securely stored in the public cloud encrypted using Apache Parquet Modular Encryption (PME), without significant performance losses even if the secret encryption keys are stored on-premises. PME is a standard mechanism for data encryption and key management, not specific to any public cloud, and therefore helps prevent vendor lock-in. It also provides privacy and integrity guarantees, and enables granular access control to the data. We also present an innovation in PME for lowering the performance hit incurred by calls to the Key Management Service. Our solution therefore enables protecting large amounts of sensitive data in hybrid clouds and still allows to efficiently gain valuable insights from it.
IBM’s AI Privacy Toolkit is a set of open-source tools designed to help organizations build more trustworthy AI solutions. It bundles together several practical tools and techniques related to the privacy and compliance of AI models. The toolkit is designed to be used by model developers (like data scientists) as part of their existing ML […]
Parquet Modular Encryption Milestones IBM Research initiated and led joint work with the Apache Parquet community to address critical issues in securing the confidentiality and integrity of sensitive data, without degrading the performance of analytic systems [1], [2]. Apache Parquet is the industry-leading standard for the formatting, storage and efficient processing of big data. Parquet […]