ICAT Collaboration Meeting - 22nd May 2025

Attendance

Attendees:

Rolf Krahl
Louise Davies
Kirsty Syder
Marjolaine Bodin
Kevin Phipps
Malik Almohammad
Patrick Austin
Alex de Maria
Santhosh Anandarama
Viktor Bozhinov

Agenda

Site Updates

SESAME

ICAT portal running without issue.

Cybersecurity concerns, facing new attacks. Everything safe at the moment.

Published new service (DOI minting).

Update DB schema re: beamlines - track if data generated by SESAME machines

KP: We're using DatasetTypes to identify what comes direct from the beamlines. Something like raw. Data from users, such as processed, gets labelled with a different type.

MA: One of our colleagues wants us to make a DOI for his data. After talking with Salman(?), created a "beamline" in the datahub/ICAT. Created like a usual proposal.

AdM: When you say data coming from the machine?

MA: Some comes from Archive. Some comes from machine control group (and other control groups). Everything regarding the DOI must come from the data portal.

RK: So you want to make a publication of all the data that comes from one particular beamline?

MA: yes, but not from beamline but from entire machine e.g. collecting data from controller teams.

RK: Not sure I understand. Is it for a given snapshot of time? Or you want a covering publication for now and the future?

MA: just a snapshot, upon request.

RK: user makes selection of 1 or more datasets and have in DataPublication.

MA: Yes, but the data is not coming from the beamline but the SESAME machines, diagnostic, controls etc. Other, non-experimental data. Not categorised, not saying it's hdf5. Just that it's the data.

RK: don't see a major problem with that. Need to have the data in ICAT - don't see a reason to treat it any different, can ingest in principle. Need a workflow for user to select data to publish which I presume DataPortal has.

MA: Not tried yet. But don't have concerns.

RK: But then you need to put it all in a data collection, then a DataPublication object to collect bibliographic data.

AdM: Years ago we had a similar use case, wanted to make data public that isn't really from the beamline. Called the beamline "PUBLISHER". The dataset is, for example, a PDF for the data policy. Use case for non-beamline, non-experimental data.

MA: Need to upgrade our data portal. Use new frontend that you are using in ESRF

AdM: Should have this in your version.

HZB

Busy implementing our sample database, expect that will keep us busy for the time being.

Preparing proposal for creating controlled vocab for sample types. Huge variety of samples coming into facilities and no good way currently of describing them. Can create sample types, but better to be standardise via controlled vocab across facilities. Started initiative to start working group in Research Data Alliance, currently preparing proposal for funding. Deadline is next week - currently Rolf is fully focused on this.

PA: would name of sample type be PID? Or schema change for SampleType?

RK: already proposed! Will come round to schema changes in AOB, but we have a proposal that covers a few changes to SampleType, including adding PID

ESRF

Main milestone soon - migrating iSpyB to ICAT by Sept.

Working on reporting - ask to provide statistics. E.g. given a date range, give me amount of samples. Cost of experiments to clients. Anyone done anything similar?

RK: Within the Helmholtz association, counting datapublications and making those counts relevant for the evaluation of the centres. Discussion over what can be counted, level of granularity, etc. Difficult discussion, the are starting with a testing phase. Already doing he counting but not being evaluated yet. Testing phase to last a few years. After that, they want to turn it on and it will be a figure in your evaluation. Obviously it is different for software and Datasets. There is the idea that they have a list of data repositories to be considered. Every entry in that data repository that can be linked to a certain facility and meets some threshold criteria will be counted. But it's a big discussion, in particular since we are a bit different from the other Helmholtz centres. We are a user facility, but most just do their own research. Makes a big difference. Fairness of counting is difficult.

AdM: Ours is just the cost. Measure how much beamtime you have spent (start to end date and sum it) or by sample. Sample is tricky. They don't want all the Samples in ICAT, but ones that have been processed. For this proposal, with several Investigations, give me the statistics. Heavy database query.

LD: Is this relevant to our dashboard work.

KP: Don't think it meets this use case. Stats come from our facility specific DownloadApi component. Want to see what has been downloaded over a period of time, who, how long it took. We're working on that. We don't know anything other than the entity id in the Download API so cross reference with ICAT.

LD: DownloadAPI is the old Topcat backend (renamed internally).

KP: Manages carts, and time that it was submitted and recalled completed.

LD: It's a separate database as it's not experimental metadata. It's just "working" metadata.

AdM: DG uses it's own backend, but also the old Topcat backend for the shopping cart?

LD: Yes ICAT doesn't/shouldn't do the cart.

Helping EMBL Hamburg install ICAT. MX beamlines/crystallography. Setting up a test system, if they like it they might adopt ICAT ecosystem.

RK: EMBL? Part of DESY

AdM: currently separate organisations.

ISIS

Nothing major, staging database migrated. Everything has now been migrated to Payara 6. Investigating multi-threaded.

LD: data policy team asked for the DOI landing pages to have the licence displayed

DLS

Big update, this week released to a limited set of users to restore a whole visit at once or provide a list of files to be restored. Going to see how the load is before rolling out

Component Updates

icat.server

Following 6.1.0 & icat.lucene 3, there was an issue with load on DLS prod, lucene updates were getting backed up. So a few changes to both icat.server & icat.lucene to help this. Still snapshots, so need to release some patches soon. Some non-functional changes to improve performance.

RK: schema changes, have a student at HZB working on those. One that hasn't been discussed in a meeting - adding internalId field for DataPublication.

LD: I don't think I object in pricipla because it's an optional field

PA: Maybe could be done via a RelatedItem of type Identical, but maybe overly complex.

RK: interal id would be strictly in ICAT, not in DataCite. Need to be able to provide a preview landing page, but the normal URL would be obvious to guess. So need a random UUID, but then need to match incoming request to the DataPublication it refers to.

PA: we had some discussions internally.

LD: can just use ICAT permissions

PA: but what about reviewers.

LD: should be careful about web crawlers if it's fully public

RK: so no one objects?

RK: Other schema changes going into ICAT v7, but there's a few schema changes that are fully backwards compatible so maybe should make an ICAT 6.2

LD: In favour of that. ISIS is interested in instrument PIDs, so would support this. Get the compatible changes quicker.

RK: test suite - chicken and egg problem. Suggest collecting changes for 6.2, do snapshot, then can do snapshot for client usable for tests.

LD & PA: yeah it's pretty standard, using snapshots to do half of one to resolve the other half (of client/server)

RK: Philip on leave at the moment, will be progressed shortly once he's back.

PA: let us know if you need reviewers

ids.server & related

Nothing to report

python-icat

Nothing to report

AOB

ICAT F2F

Any thoughts for this year?

No plans for anything yet. Homework for next meeting - investigate options for F2F

Next meeting

Rolf can't attend next meeting on 26th June

ICAT Project

Search