ICAT Project

Search

ICAT Collaboration Meeting - 31st July 2025

Attendance

Attendees:

  • Rolf Krahl
  • Louise Davies
  • Malik Almohammad
  • Patrick Austin
  • Santhosh Anandarama
  • Alan Kyffin
  • Mojeeb Rahman Sedeqi
  • Allan Pinto

Agenda

Site Updates

SESAME

Had issue downloading 20TB data - have to start using globus. CyberSecurity concerns. Would be keen to collaborate.

Solution to people from other countries timing out downloading data. Use ownCloud service (SESAME cloud). Shared folder for experiment data. IDS sometimes works, during zipping sometimes fails.

RK: Don't know if anyone else has tried to do this (connect to ownCloud). Question to the room?

MA: Working for us, but it's a temporary solution. Would look to keep going with the tech other light sources use (e.g. Globus).

RK: no practical experience w/ Globus. On our todo list. DLS have issues w/ large downloads?

LD: Frontend limit of 10TB. Not really to do with http latency, enforce on all downloads (including globus). Diamond also have Globus. Probably best to speak with Kevin

PA: yes probably Kevin, I did speak to Brian about globus for open data. Run a globus server, authenticate, once logged in it extracts username and is mapped into area on filesystem that you can read. (at least this is how DLS has it set up). Unknown on how much effort is put into setting this up. Have an IDS that sends data to certain disk, and point globus to that location on disk.

LD: ESRF have globus I think?

MA: I think it's in ICAT+

HZB

Still working on implementing Sample database SEPIA?. Introducing Mojeeb who is developing SEPIA.

ISIS

BAU. Restructuring message queues. Added Instrument PIDs

DLS

Kirsty has left DLS. Continued work on testing download full visits. Open data work. Improving metadata - file formats (DatafileFormat), handle file paths better in Lucene search, samples. All ideas coming from Diamond 2 upgrade.

RK: simple thing that looks at files at ingest and searches for appropriate DatafileFormat. Flexible configuration. Might be of interest?

PA: Yeah, even just as a working example would probably be useful. We'd need to back-populate as well.

RK: my script needs the datafile on disk.

PA: Some beamline has over 1000 different file extensions, curious if this is same scale as you?

RK: not that much, but configurable list, otherwise has an "other" format as a default

PA: we need to think about it

RK: my script is not publicly accessible at the moment, currently part of a big blob of ingest stuff. I might try to cut useful stuff from this blob in the near future

PA: happy to just get plain text python through email

LNLS

Apologies for not attending last couple of meetings, but had no updates. New updates:

  1. New beamlines migrated to new software that needs new features in our data (aquisition?) process.
  2. Migrating to new data portal
  3. Ingestion - improve infrastructure to have automatic data pipelines. Try to have data ingested with as little manual intervention as possible.

RK: blue sky, we use this for nexus

Component Updates

icat.server

security issue (v6.1.1)

Can use WHERE clause to extract data from tables not opened. Only a concern if users can send JPQL themselves so don't think this is a concern for other facilities that have ICAT outside a firewall. Fix to move session table outside persistence context, means can no longer write JPQL accessing session table.

RK: noticed this years ago with Steve - was written off as not fixable

PA: yes, similar issue a bit ago and conclusion is best to put ICAT behind firewall if possible. But session table access is worse.

AK: shouldn't break legitimate code, will likely make a patch release so 6.1.1 release, also fixing a few minor bugs as well which will likely be included. Can also backport to previous versions if needed.

RK: might be useful to backport to v5 - we're using v5 still.

6.2

RK: Most are compatible schema changes and have been discuss/approved in previous meetings. New one hhat hasn't been discussed is "acknowledgment" attribute for FundingReference. Just additive. Could we decide this today?

LD: looks fine to me, option "description" like field that is either non-offensive or useful so can't imagine any objections.

PA: multiple things share the same acknowledgement, is this fine? Could then not say specifically what the acknowledgement is targeting

RK: yes it's a limitation

PA: still probably worth going ahead for first attempt

RK: all other PRs are ready, just don't have oracle migration script. I don't have oracle DB to test against so can someone volunteer?

AK: I can write & test that script, should be very similar to the MySQL version

RK: propose to go ahead with 6.1.1 release, then can attempt a v6.2 snapshot

7 schema changes

RK: unknown timeline, but hopefully still this year

API changes

DLS expressing interest with better ways to find data, so we may start exploring if there's ways to improve this. Would obviously raise major changes with the collaboration.

RK: sounds like a discussion for the F2F!

python-icat

Will need a corresponding release for icat.server 6.2 schema changes. Would not be required for using it, schema changes are picked up automatically for most features, import and dumps need changes

ids.server

No activity

AOB

SEPIA

SEPIA is sample database, has its own database backend but tightly integrated in ICAT. Ensures 1-to-1 between samples in own DB and ICAT DB. Could be seen as an extension to ICAT Sample table.

Searching in SEPIA? icat.lucene? Use it in someway, or at least profit from experience.

PA: there is a sample index, but not an endpoint for samples. Can be used as a subquery e.g. find investigation with sample name x. But could be added. Clustered solution i.e. elastic/opensearch (or even Solr) probably better. Hard to turn relational model into inverted index. If only want to search Samples and sampletype, would be easy & efficient to flatten

RK: permissions translate across icat.

PA: inverted index, you would do the search and then iterate through each result to remove unauthorised items. Considerations change based on size of the sample table. Obviously with 5 billion Datafile table there's different concerns! If you want authz in single call, can put it in the query, e.g. put usernames in index, check this.

RK: rule duplication of rules in index.

PA: yes, trade off between performance and redundancy. Authz makes it messy as ICAT rule model doesn't lend itself well to open queries.

PA: happy to continue to chat about this later/offline

website

LD: I'll wait to publish these minutes until after official 6.1.1 release as it discusses security issues.

PA: most facilities should be on the patch already but yeah it's fine to be cautious

RK: re: website, before I couldn't see recent minutes?

LD: they are there, I publish them nomrally shortly after the meetings

RK: they're there now, odd. Also some info is out of date on the website

LD: yes, I welcome PRs for this - most changes would just be editing markdown and submitting a PR - instructions in README. Website is automated deployment so once merged is deployed.

F2F

MA: any news?

PA & LD: it's with Antony Wilson. Haven't heard about it recently, last we heard it was still in the process of planning/getting approval. Will remind him and let everyone know if there's updates.