ICAT Project

Search

ICAT Collaboration Meeting - 3rd October 2024

Attendance

Attendees:

  • Louise Davies
  • Kirsty Syder
  • Marjolaine Bodin
  • Kevin Phipps
  • Alan Kyffin
  • Patrick Austin
  • Viktor Bozhinov
  • Allan Pinto
  • Alex de Maria
  • Andy Gotz

Agenda

NOBUGS

Microfrontends

  • Louise gave a presentation about DataGateway.
  • ESRF presented about their architecture
  • What is SciCAT doing around visualisation?
    • Louise: it's on their roadmap (from SciCatCon)

Metadata

  • Rolf has worked to expand the DataCite schema (re: Instruments)
  • Standardising: DataCite is a shared "format" for metadata
  • CyroEM exported to mmCIF
  • Ontology for metadata

Search API

  • Interesting direction of LLM for search via search-api
  • Changes from a decentralised solution to a centralised one
    • Less responsibility from facilities but more on the "central" part
    • Funding for keeping central part maintained?

Patrick has some notes he'll upload to the ICAT website later

Site Updates

ESRF

Nothing new, working on DRAC (Data Repository for Advancing open sCience)

  • idea to package things all together, make it configurable as possible
    • via Docker compose etc.

Louise: similar to SciCat's scicatlive package

New portal is replacing old portal in one week's time

Patrick: Alan how does our containerisation work? Alan: For EPAC, looking at data ingestion piepline. In next couple of months, want the work for indexing in Opensearch deployed. Also move ICAT sever to Quarkus (effictively alterntive to Payara so less admin overhead). May not be do-able in time. For pre-prod will use Docker compose. Only server and authn. Rest is separate components.

What about the ingestion? You have nothing...

  • Ingestion is separate and would be handled by the Facility.
  • Would need to be synchronised. Want a generic way to ingest the Dataset. They are linked to proposals. For our Facilities it is similar.
  • Generic would be good. But for STFC it's the one thing we can hand off so probably not going to develop it ourselves.
  • Isnt Python ICAT genreic? Yes.
  • Define XML schema (xlsd) - defines what's allowed in your XML

Alex:

  • message broker/ActiveMQ - do stuff like DOI minting, processed data etc. more jobs than just ingestion

Sirius - are you going to develop your own ingestor? Allan:

  • Under discussion, working on getting data from API onto ICAT, bluesky
  • Discussion for generic API for ingestion, e.g. user portal

Andy: want ActiveMQ not bluesky Allan: using Kafka, need to look at pros/cons

Alex: we want a black box, need to be useful for all facilities. Don't care about the specific technology as long as it works... Allan: sounds good

DLS

Last month, users reported some performance problems browsing datafiles/datasets. Worked on a workaround fix for slow queries. Basic problem is size of DLS ICAT, authorisation slows things down. Alex: what is slow? 10s, 1 minute? Kevin: We time out after a minute, but problem is DB still processing whilst user sends more queries Problem is the Rules, Alan thinking about ideas to improve. for now workaround in datagateway-api - we check if user can see parent entity, and then uses a read all account to do the actual query Alex: only Datafiles problem? Kevin: also having problems at Datasets level, which is maybe 50 million. May just be caused by slow Datafile queries causing DB slowness. Put the fix in place on Tuesday, and queries have stopped timing out

Kevin: Alex you were asking about DB sizes, haven't had a chance to put our stats in but we're interested in the research Alex: yes, our DB admins were concerned about CPU load. Only rare, but we're logging all queries performance in ICAT+ so we can monitor

Alan: long-term, take searches off of the DB and onto ElasticSearch instead. Patrick: challenge is Rules are dynamically evaluated, e.g. removing an investigationUser revokes their access immediately.

Andy: time series, after a set amount of time assert data is static and move it to another, non-modifiable place. Alan: Sharded index could work for that Patrick: old data can have static rules e.g.

Patrick: hard to have a truly generic solution

Sirius

ICAT ready for production, IT going to run some tests. In about a month ICAT may be able to be open to community

Minor issue of displaying HDF files in DataPortal. May reach out to ESRF for help if necessary. Trying to understand h5view & h5grove - studying the documentation. Andy: ESRF can meet you on Zoom

Component Updates

icat.server

Has put a message out to icat mailing list, new versions of ICAT 5 and ICAT 6. If you create a Rule via entitymanager endpoint, the new Rule is used to authorise the user instead of using only old Rules. Basically meant any user can make a Rule giving them full access, then they have full access.

Failure of the logic, Alan has now separated the logic into separate transactions.

AOB