ICAT Collaboration Meeting - 3rd October 2024
Attendance
Attendees:
- Louise Davies
- Kirsty Syder
- Marjolaine Bodin
- Kevin Phipps
- Alan Kyffin
- Patrick Austin
- Viktor Bozhinov
- Allan Pinto
- Alex de Maria
- Andy Gotz
Agenda
NOBUGS
Microfrontends
- Louise gave a presentation about DataGateway.
- ESRF presented about their architecture
- What is SciCAT doing around visualisation?
- Louise: it's on their roadmap (from SciCatCon)
Metadata
- Rolf has worked to expand the DataCite schema (re: Instruments)
- Standardising: DataCite is a shared "format" for metadata
- CyroEM exported to mmCIF
- Ontology for metadata
Search API
- Interesting direction of LLM for search via search-api
- Changes from a decentralised solution to a centralised one
- Less responsibility from facilities but more on the "central" part
- Funding for keeping central part maintained?
Patrick has some notes he'll upload to the ICAT website later
Site Updates
ESRF
Nothing new, working on DRAC (Data Repository for Advancing open sCience)
- idea to package things all together, make it configurable as possible
- via Docker compose etc.
Louise: similar to SciCat's scicatlive package
New portal is replacing old portal in one week's time
Patrick: Alan how does our containerisation work? Alan: For EPAC, looking at data ingestion piepline. In next couple of months, want the work for indexing in Opensearch deployed. Also move ICAT sever to Quarkus (effictively alterntive to Payara so less admin overhead). May not be do-able in time. For pre-prod will use Docker compose. Only server and authn. Rest is separate components.
What about the ingestion? You have nothing...
- Ingestion is separate and would be handled by the Facility.
- Would need to be synchronised. Want a generic way to ingest the Dataset. They are linked to proposals. For our Facilities it is similar.
- Generic would be good. But for STFC it's the one thing we can hand off so probably not going to develop it ourselves.
- Isnt Python ICAT genreic? Yes.
- Define XML schema (xlsd) - defines what's allowed in your XML
Alex:
- message broker/ActiveMQ - do stuff like DOI minting, processed data etc. more jobs than just ingestion
Sirius - are you going to develop your own ingestor? Allan:
- Under discussion, working on getting data from API onto ICAT, bluesky
- Discussion for generic API for ingestion, e.g. user portal
Andy: want ActiveMQ not bluesky Allan: using Kafka, need to look at pros/cons
Alex: we want a black box, need to be useful for all facilities. Don't care about the specific technology as long as it works... Allan: sounds good
DLS
Last month, users reported some performance problems browsing datafiles/datasets. Worked on a workaround fix for slow queries. Basic problem is size of DLS ICAT, authorisation slows things down. Alex: what is slow? 10s, 1 minute? Kevin: We time out after a minute, but problem is DB still processing whilst user sends more queries Problem is the Rules, Alan thinking about ideas to improve. for now workaround in datagateway-api - we check if user can see parent entity, and then uses a read all account to do the actual query Alex: only Datafiles problem? Kevin: also having problems at Datasets level, which is maybe 50 million. May just be caused by slow Datafile queries causing DB slowness. Put the fix in place on Tuesday, and queries have stopped timing out
Kevin: Alex you were asking about DB sizes, haven't had a chance to put our stats in but we're interested in the research Alex: yes, our DB admins were concerned about CPU load. Only rare, but we're logging all queries performance in ICAT+ so we can monitor
Alan: long-term, take searches off of the DB and onto ElasticSearch instead. Patrick: challenge is Rules are dynamically evaluated, e.g. removing an investigationUser revokes their access immediately.
Andy: time series, after a set amount of time assert data is static and move it to another, non-modifiable place. Alan: Sharded index could work for that Patrick: old data can have static rules e.g.
Patrick: hard to have a truly generic solution
Sirius
ICAT ready for production, IT going to run some tests. In about a month ICAT may be able to be open to community
Minor issue of displaying HDF files in DataPortal. May reach out to ESRF for help if necessary. Trying to understand h5view & h5grove - studying the documentation. Andy: ESRF can meet you on Zoom
Component Updates
icat.server
Has put a message out to icat mailing list, new versions of ICAT 5 and ICAT 6. If you create a Rule via entitymanager endpoint, the new Rule is used to authorise the user instead of using only old Rules. Basically meant any user can make a Rule giving them full access, then they have full access.
Failure of the logic, Alan has now separated the logic into separate transactions.