ICAT Collaboration Meeting - 27th November 2025
Attendance
Attendees:
- Rolf Krahl
- Andy Gotz
- Louise Davies
- Kevin Phipps
- Malik Almohammad
- Patrick Austin
- Alex de Maria
- Alan Kyffin
- Marjolaine Bodin (joined 20 mins late)
Agenda
Site Updates
SESAME
No updates, everything going fine
HZB
Not much to report, still moving ICAT servers to new VMs. Not struggling with anything ICAT related, Docker related
AG: cybersecurity problems resolved?
RK: resolved?
AG: can you get ICAT back online?
RK: stricter rules now, probably the cause of the issues I'm having. Still waiting for ID management before ICAT. Not allowed non-2FA login.
AG: keycloak?
RK: yes, going to set it up and once it is then can bring ICAT back up. Non-login related ICAT things are externally accessible so e.g. DOI landing pages, OAI-PMH
ESRF
Working on integration of MX on ICAT. Just sent ispyb users that it will be deprecated in March 2026 and will have to use ICAT/DataPortal.
AG: also started refactoring ICAT+?
AdM: not official, created over 10 years ago, we have been focused on new features, would like to update dependencies/versions, want to split into micro-services.
AG: would be then able to run a single microservice independently?
AdM: yes, that's the idea. Can run e.g. only sample tracking or logbooks. Only an idea at the moment though!
AG: Maybe Marjolaine can present at the F2F?
KP: was about to suggest that
ISIS
LD: Seen the minutes from last time. They've updated to latest ICAT server. Not whether they were running the WAR or proper version. Now on proper version. Otherwise working on internal stuff. Certificates and ingest. That's all that's ICAT related.
AG: Where is Santosh based?
LD: STFC, based in ISIS. But soon they may end up being the same department as us. Not the same group. I go to their meetings so can give updates. Maybe the future will hold something different.
DLS
Working on DOI implementation. Hoping to have something in January for them to test.
AG: contacted by Steve Collins, discussions of future of data archiving. Only 80PB archived? seems low considering age of data
KP: seems right, but curve is exponential. Data goes back to 2008. ISIS goes back to 1987.
AG: ILL can go back 50. Most users no longer alive. L D: What do you do if the PI dies for open data? Who has the rights?
AdM: Keycloak? affect ICAT?
KP: currently going to do SSO with Microsoft Login soon. Going to have meeting about ICAT and how authz is done.
AG: what Alex did with keycloak applicable?
AdM: has openID plugin?
RK: Have Open ID Connect. Between single sign on and service. Takes OIDC access token and delivers the session. Protocol is more advanced. Need redirects between the two. In principle it depends on how the system looks. Could be fairly easy to deploy. At least if behind an Apache reverse proxy, as there is a plugin which does it all for you. Then you get to the application only when the login has already succeeded.
LD: Alan will probably talk about our support for OIDC which is needed for Diamond and CLF.
AK: In DataGateway. All the authn gets done on the SciGatewayAuth component. It's fairly straightforward. Not a lot of code to do it. Can share. Have an athenticator that just gets told to log in the specified user and you get a session id back. That's how we did it.
LD: Idea is it only accepts requests from the machine that the component is running on.
AK: Has a token it passes along with this to verify.
RK: Shouldn't be too difficult. Will have OIDC access token on the way. Could, if you wanted to, use the OIDC authetnicator. Don't even need the username/password, just the token.
AG: Does this support single sign on?
LD: Yes. Need it for ORCID login for the DOI work. Will also use the STFC single sign on. Impletmented to support multple autheticators. Tested with corporate Microsoft, ORCID and keycloak. Should work with any.
RK: OK that would be more complex for the existing oidc plugin. The idea there was you consolidate this into the keycloak, then keycloak goes to the external providers.
: Main motivation was DG doesn't use ICAT session ids, but KWTs. Already need one middleman. Makes sense to do other things in tat component.
AK: Could copy the bits that do the back and forth then pass to the existing icat.oidc authenticator if you only want one provider.
Component Updates
icat.server 6.2
Still pending release. Alan will do it soon.
ids
AG: created IDS issue: https://github.com/icatproject/ids.server/issues/172 Content negotiation ranges is not compatible with desired download tool. Was downloading 99% of 10GB files on not so fast links. At the very end, it fails due to the message. Still have the issue that need 100% of the file.
RK: This is always the case with the zip. Directory is at the very end.
AG: tar.gz files instead?
RK: That would be another issue instead.
AG: Does that need more space?
RK: Doubt id there's a huge difference. You can compress in both cases. Usually a big binary file, compression won't help if it's already compressed. Difficulty with tar is whether there is Java support. Needs to be generated on the fly. Need something I can write the files into... I don't know if this exists. This is why this needs its own issue.
AG: Had this request for a while (year+). Protein crystallography. They would be a big user.
RK: Yes it's possible in principle, need a component for it.
AG: There's something in Apache for this.
RK: Can't promise a timeline.
AG: Request is 12 months old, but keeps being refreshed.
icat.lucene
4.0.0 released. Improved handling of file paths (forward slashes and dots handled properly)
LD: I'll create an issue becuase ISIS uses windows so backwards slashes as path separators, probably needs a config option
PA: it's on my list, no guarantees on speed but an issue would be good
AOB
Direct Data Access from DOIs (D3A)
RK: SIG for Direct Data Access from DOIs. If you have a pid, you want machine actionable download based purely on the pid. Setting up a protocol on how to use it. Two problems. 1: tell the client the download URL from the DOI. They suggest metalink. Then need content negotiation to return this instead of HTML for DOI landing pages. Someone at ESRF is working on this.
AG: Hired someone to implement D3A on top of what we have. On top of our landing page. Negotiate with IDS. Also working on the archiving system. Still has to meet and discuss with Alex and Marjolaine. He doesn't know. He's been talking to Paul, but not how to implement at ESRF.
RK: other thing is how to download things from tape
AG: that's next phase, restore from tape automatically
RK: not yet covered from D3A group, trivial solution already in IDS. If you already have a preparedId from IDS which is not online returns 503. If you come back later then same request succeeds.
AG: Need to protect against robots?
RK: If you want to implement, you want to have machine actionable download. Of couse a robot can use that. There is nothing you could put into the protocol to prevent, as that would mean it isn't machine actionable. Maybe rate limit it serverside to prevent robots from flooding the system.
AG: Maybe we could get a presetnaton on D3A at the F2F as well.
RK: Meeting yesterday. In the aftermath, might be interesting to have another component that delivers well defined links to datasets or datafiles. Could deterministically generate from ICAT entity names. Could do the content negotiation for metalink and provide links for individual files. Link to a prepared id from IDS.
AG: Need to know how it would work with the landing page. Then you go to the other components. It's a nice discussion point. Then if we have CLI tools...
RK: They have client tools
AG: This is what Paul told me. Aria2
RK: Built on a Python framework fsspec
AG: Looks like a filesystem.
RK: Try to stress it would work for any kind of pid.
AG: PURL.
RK: Handles
At the end of the meeting, Tycho Canter Cremers from D3A group joined the meeting. He'll be invited to a future collab meeting or the F2F. AG suggests he should be added to the mailing list potentially.
OpenAlex
AG: anyone using? Free crossref. large body of scientific publications, analyses them.
F2F update
Kevin presented some slides.
List of confirmed attendees growing, just a few more need to confirm.
AG: LNLS might need VC?
KP: rooms should have VC capabilities
Kevin thanks everyone for confirming room reservations and passing along dietary requirements.
Sent Visa supporting letters.
Google form re: RAL facility tours, please fill in by end of the week.
Kevin went over the proposed schedule.
RK: facility updates, probably better to be interesting facility specific projects rather than standard "running this version". e.g for HZB SEPIA
AG: good to present things others could re-use
KP: should we just have X talk "slots" for interesting things rather than reserve a slot for each facility for their updates. Concerned if we ask for volunteers then not many volunteers!
RK: probably going to be more presentations than ideas?
AG: if we run out of ideas then interesting to hear from newer facilities e.g. ALBA
KP: Rodrigo said he wanted to present their work with k8s
AG: could also have a remote presentation for those who can't present in person.
AG: probably need to structure "future" discussions and specific topics
KP: hard to plan in advanced, things will come up during the day
AG: any sustainability issues, EOL issues?
RK: pragmatic approach - won't have December meeting? Next one would be 29th Jan. Say to community to submit proposals for presentations ahead of next collab meeting. Can then discuss in Jan collab meeting the proposals.
AG: SciCAT update? new PANOSC search (AI powered). LD: new search is centralised despite previously being adamant that search needed to be decentralisd
RK: how is data collected from facilties?
AG: remote presentation, or sent the head developer
AG: reduced rate taxis? think I rememember that from last visit?
PA: STFC drivers? unsure if that is applicable to external people
KP: i'll investigate