In Australia, The Peter MacCallum Cancer Centre and the John Holland Group, an infrastructure and construction firm, have turned to cloud data and AI platform Databricks to solve significant data fragmentation problems that were hindering their ability to draw insights from business data.
Speaking at Databricks’ Data + AI World Tour in Sydney, Australia last month, tech leaders at both organisations reported facing challenges such as siloed data, competing business areas, data integration issues, and legacy systems, prompting the need to seek a cloud data solution.
Peter MacCallum Cancer Centre consolidates data to use AI
Peter Mac’s legacy data infrastructure limited its ability to effectively leverage big data and AI across its extensive clinical and research operations. The legacy technology also jeopardized its mission to improve the lives of people with cancer, including the use of AI to improve clinical decision making and accelerate biological insights and drug discovery.
Problems with data infrastructure
During the conference, Jason Li, head of the bioinformatics core facility in Peter Mac’s cancer research division, said that:
- Peter Mac was dealing with various siloed data and legacy systems.
- The complexity and volume of both clinical and research data across the cancer centre’s operations posed challenges in areas such as data storage and data analytics.
- Ethical, privacy, and safety concerns were all key factors for the governance of Peter Mac’s data and the deployment of any future AI use cases.
- Integration between clinical and research departments complicated the data governance challenge because each had different data requirements.
SEE: Informatica claims data fragmentation a barrier to AI in APAC
Li said Peter Mac selected Databricks to help it harmonise data across the centre and support advanced analytics, including AI, while meeting data security and privacy requirements in health care.
Expanding into new AI use cases
Peter Mac first tested the AI potential of the Databricks platform with an AI transformation pilot project:
- The centre created an end-to-end AI lifecycle, which involved applying deep learning to the analysis of gigapixel whole-slide images to quantify a new biomarker for breast cancer prognosis.
- Databricks supported the AI lifecycle — from initial data ingestion to model deployment and monitoring — in what Li said made the project time and cost efficient;
- The results of the project could have “great promise” for enhancing breast cancer prognosis.
Li said speed across the project was a big advantage: “We estimate that with Databricks, we have sped up the development process by fivefold, and reduced communication overheads across stakeholders by tenfold, allowing us to bring innovations to the market earlier to benefit patients.”
AI strategy now includes future projects
AI has grown into a larger part of Peter Mac’s strategy. Databricks is supporting the cancer centre in three additional use cases: genomics, radiation oncology, and cancer imaging. Additionally, Peter Mac is:
- Extending the AI program to include mainstream bioinformatics, which includes population genetics projects that involve large sample sizes and large amounts of genomic data.
- Applying advances in Large Language Models and Retrieval Augmented Generation to extract knowledge from clinical and radiology reports.
- Planning to implement LLMs in the future for genomics and transcriptomics research, which analyses RNA or the transcriptome to remain competitive in cancer research.
John Holland aims to unify data across construction operations
Meanwhile, John Holland managed 80 large-scale infrastructure projects worth AUD $13.2 billion in 2023. However, Travis Rousell, the company’s head of data and analytics, said its legacy data warehouse environment was fragmented and difficult to integrate.
SEE: How to improve data quality in data lakes
“We’ve got all the typical problems everybody’s had historically with data warehouses and data problems,” Rousell said. “Our legacy data warehouse environment was built incrementally over 20 years. It’s slowly evolved and developed out, and we’ve created this really swampy set of data silos.”
Rousell added: “We could build BI [Business Intelligence] and reports on the front of those, but joining that data together to be able to create insights into the flow of activities and behaviors that are occurring so that we can drive change across our business has been a really difficult process for us.”
A unified data platform to deliver useful insights
John Holland set out to create a unified data platform to unlock data for business value. This was part of the group’s effort to drive innovation and competitive advantage in its industry through modern data and digital practices as part of a broader digital transformation push.
The organisation has sought to:
- Provide a unified and integrated view of data across the business.
- Manage governance of data across separately managed projects.
- Achieve a focus on data engineering rather than platform engineering.
Cost savings come from better data management
John Holland has so far delivered several core business processes to Databricks’ data lake, including project management, project operations, project controls, safety, and fleet analytics.
As a result of using Databricks, Rousell said that John Holland had:
- Reduced platform infrastructure costs by 46% on like-for-like workflows compared with legacy environments;
- Reduced data engineering development effort and time by 30% by building out new data products and models.
- Migrated over 600 users to data products provisioned through the Databricks data lakehouse.
IT becoming an enabler for John Holland’s business
Rousell said that Databricks ensures IT and technology do not constrain the business from progressing.
“I think the biggest thing for me that we’re achieving by doing this is we’re creating this data culture of ‘yes’ within John Holland,” Rousell explained. “Historically, the difficulty in provisioning new and innovative products has meant we’ve had to stand up large slow projects and underdeliver for the business.
“Now, if the business has an idea, we can say yes; we can deploy them a data workspace that gives them access to all the capability and tooling they’ll need, and they can go and build that at the speed.”