ICAT Project

Search

ICAT Collaboration Meeting - 30th April 2026

Attendance

Attendees:

  • Louise Davies
  • Silvia Sottini
  • Marjolaine Bodin
  • Kevin Phipps
  • Patrick Austin
  • Alan Kyffin
  • Santhosh Anandarama
  • Andy Götz

Agenda

Site Updates

ESRF

New domain portal for musical instruments, before end of June.

Also working with external company for UX of Human Organ Atlas.

Now send email 1 month before end of embargo

End of month ending of creation of new shipments in iSpyB

 

KP: new domain portal like HOA?

MB: yes, we have HOA, paleo, and now a new one

KP: working on user interface?

MB: yes, for HOA, but hopefully apply findings to all

 

AG: Want to link cryoEM to EMPIAR databank (equivalent to Protein DataBank). Are Diamond doing this?

SS: no, good to know for our cryo people, name?

AG: EMPIAR

SS: linking proposal to publications

AG: going through highlights, trying to encourage citing

SS: manual? not monitoring?

AG: No, can get better answers manually

SS: our teams want to set up monitoring on when people are opening data, get lists who don't do it, can ask them why not

 

AG: PIDINST?

MB: not mentioned, meeting next week with DESY

SS: Diamond thinking about it soon

AG: thinking of a schema collectivelt

SS: ISIS has already started this, want to copy what they have already done. Supported by STFC open science people (Elizabeth Newbold, Katie Yates). Got in contact and they have shared what they have done so far. Would recommend getting in contact with them in the first instance.

AG: are they related to ICAT?

KP: not directly, related to open science group, drafting data policy. Good person to speak to about these topics

ISIS

Some production data issues from the cycle - fixed now.

Updated icat to 6.2.0, VMs migrated to new virtualisation software.

Upgrading scheduling client

Updates to DOI minting to create Instrument <-> Data DOI association

Need to investigate slowness in OAI-PMH connection.

DLS

Minting DOIs, first stage in progress (raw data only). Been pushing legal team review, working with them to create document that needs to be reviewed higher up. Hopefully in next couple weeks can go to the first commitee review. Have started development on code to call API from Diamond. Steve & I testing the code, planning roll out (which users to roll it out to first).

 

AG: Will data be made open automatically or only on request?

SS: only on request, PI has ability to make it open via the UI. Can open early. We don't automatically

AG: Same infrastructure as ISIS?

SS: No.

LD: Developed new for Diamond. A lot of ISIS code is getting out of date. Might try and backport/migrate ISIS' DOIs to the API for Diamond. ISIS use Study because it predates the DataPublication entity.

AG: Requirements are PI selects what data to put into the DOI?

SS: Session DOI after the visit. Only the raw data would become open. Other tool where the DOI can select data from different visits also. Then can create a defined DOI with that data. You can make a selection.

LD: "Opening data" refers to the session DOIs.

AG: the useful ones are the user-defined DOIs, session DOIs are a "fallback". Restore data automatically?

SS: restore data

LD: Allow open data users to restore data, but they have to log in with an ORCID. User can restore an entire DataPublication.collection. Doesn't happen automatically.

PA: discussion around storing data online/caching data, maybe this is a specific use case. ORCiD users allowed to download, and we have ability to prioritise Diamond user office users over ORCiD.

AG: ESRF high profile data on disk, tape restore is broken at the moment!

KP: one of the requirements was to not have completely "anonymous" download

LD: still allow anonymous viewing of metadata, landing pages

PA: leaves open the possibility of blocking/tracking problematic ORCiDs

KP: ORCiD can help with tracking

AG: most users use globus for downloads, so Globus IDs useful here as well

SS: Diamond has Globus as well.

Component Updates

icat.server

Issue fixed a couple weeks ago: if you try to logout of a session which has expired but not deleted yet - it errors and transaction fails to rollback. DLS DOI was causing this error.

 

LD: why log out and log back in instead of refreshing session?

PA: python-icat login method logs out and logs back in

 

AG: discussion: could anonymous access be sessionless? if you have lots of anonymous users could overload the DB theoretically

LD: cart requires tracking the differences between anonymous users (previously it didn't)

AG: performance when 10,000 of users?

LD: session table surely not going to get too large? sessions get cleared out regularly

KP: sessions expire by default after 2 hours of inactivity and get deleted after 1

PA: theoretically malicious user could repeatedly login and create lots of sessions

AG: performance issue was related to creating sessions, not necessarily size of table

AOB

NOBUGS

AG: anyone going?

KP: unsure, not discussed

SS: we want to go, Elliot and Ian?

MB: I'm going to talk about data portal

AI

AG: anthropic wanted all our data on semi-conductors. However tape is broken so couldn't provide it!

SS: anthropic?

AG: they want to train Claude on metadata, we wouldn't have been able to provide it all. Will get back to them after tape restore fixed. We need to know what they will offer us. Will need a massive tape restore. Don't know if even AI can use the data usefully. Scientists are using Claude to analyse their data in a directory.

PA: specific stuff is powerful, e.g. generate a config file so I can run this analysis on this data etc.

AG: example, pointed to directory, it found the files, techniques, it found the way to analyse it.

SS: didn't know they were already that good!

KP: paul quinn is having similar requests here. ISIS data is open and on disk

AG: 30PBs of data, most of it is available. Helpful to identify missing metadata, extract it to ICAT.

PA: can do that without AI surely?!

SS: No sample info, how is AI useful?

PA: guess

AG: data from web, other sources. Sometimes wrong, sometimes plausible.

Some discussion on how facilities data policies are handling AI. This is a work in progress discussion point at facilities.