ICAT Collaboration Meeting
Attendance
Attendees:
- Andy Gotz
- Patrick Austin
- Kevin Phipps
- Alan Kyffin
- Viktor Bozhinov
- Rolf Krahl
- Chris Prosser
- Louise Davies
- Malik
- Santhosh
Apologies:
Tabled/General topics
Sample tracking
Some discussion before the start of the meeting on sample metadata MX at ESRF are motivated, they capture space group etc. RK: Heterogeneous communities, some have some systemic work in place already, some communities have nothing. AG: Human organ atlas want a lot of metadata. Have to provide services afterwards to motivate.
Schema changes
RK: need another round of discussion on this topic. Want a proposal on how to proceed. https://github.com/icatproject/icat.server/issues?q=is%3Aissue+is%3Aopen+label%3Aschema
- ICAT5 introduced changes but these were the feasible ones. There is one more which is important: relation between Sample and Investigation. Should be many to many rather than many to 1. Same sample could be handled more than once.
- Propose a dedicated meeting for this. Already quite some meetings soonish. Maybe in April. Around Easter.
- If there are too many proposals at the meeting, could group them semantically. Plan at least one follow up meeting to discuss changes in detail.
- AG: idea is to group as many changes as possible?
- RK: we have many proposals in the issues list, don't know if they are all still "of interest". Need to shortlist the ones that are still relevant. Anyone who wants to support one of them should be presenting it at the first meeting:
- Challenges of it (e.g. incompatibility with other changes)
- Decide yes/no/discuss in more detail
- RK: we have many proposals in the issues list, don't know if they are all still "of interest". Need to shortlist the ones that are still relevant. Anyone who wants to support one of them should be presenting it at the first meeting:
- KP: there is one in particular that AdM raised that would be breaking (Investigators being associated to multiple things)
- RK: not listed, but have in mind another smallish isue.
- KP: last time we had dedicated meetings to discuss a range of schema changes
OSCARS
AG: Calls open for this soon. Distributing euros, open call. Also looking for reviewers. Organisers realise that you may want to do both, they will not restrict reviewers from facilities. They will just handle this.
- RK: have at least 2 proposals
- DDI-CDI for PaN data
- Classification of sample types
NOBUGS
https://indico.esrf.fr/event/114/
- 23rd - 27th of September
ESRF running it this year. AG has spoken with SciCat, thinking of a dedicated day for discussion on how they are solving differently.
- Interest from HZB, STFC (pending approval/funding)
Registration deadline?
- Abstracts close end of April
- Would this be for... satelite programme is more free. Main one is accepting now.
- Which should we direct to?
- AG: Hard to tell until we now attendence. Safer to submit to plenary, then some can be moved to satelite meeting. Better to know how many there are rather than to guess.
- RK: presentation (1hour) on ingest workflow at HZB.
- Registration opens in May and runs until August
Site Updates
ESRF: Andy
ICAT is working well everyday. New portal mainly for MX but also all of the UI. Tricky to get anyone who wants to or has time to test it. Probably not until after SUmmer. More requests for processed data. AdM suggests having to icat.servers, one for write and one for read. Having a lot of read doesn't slow down the write and vice versa.
- KP: yes the DLS ICAT runs like this.
- AG: Can see the impact of one on the other and want to avoid it.
Going through a lot of iterations for the new UI, trying to get it right.
New data policy: https://www.esrf.fr/datapolicy
HZB: Rolf
Not too much to report, plan to setup a SampleDB within the coming year. No fixed date.
ISIS: Louise
Asked ISIS to attend this, will raise again for next time. They're doing some stuff, mainly with ingest. Not totally sure the details. KP: Will Taylor forwarded meeting invite to someone else in ISIS. Santosh is on the call, joined in December - not fully up to speed yet but will try to follow along.
DLS: Chris
Recently working toward code changes for the data aggregation software to allow ingestion from multiple file systems quickly. Previously had to NFS mount to a different daya centre which slowed the ingest. Now allow multiple clients to ingest at speed. Finished that and are in process of deployment planning. Trying to plan way into releasing new version of DG. Constraints on resources making take a while. Run of the mill stuff, migrationg hosts away from SL7 to Rocky 8/9. Busy for next few months with this. AG: ESRF also have problems deploying new versions of DataPortal, but it's difficult when it's not highest priority/hard to get people ot throroughly test new version KP: Data size/count script is being run for DLS, but needed to split by year due to data volume. But it is working.
- AG: but will be automatic on new ingest? Having these is useful.
- KP: yes. Will not need to calculate it on request, numbers will be populated already. Potentially could limit visit downloads based on this.
- Stopping ingest while doing this for fear of ingestion getting in the way of the backpopulation and getting the wrong value.
- Rolf: only risk is incorrect counts, you could review datasets that got ingests later on
SESAME - Malik
Not big changes. Trying to find a compoany to do security checks for the network. ALready have data in their ICAT. That's all for now.
- AG: Recovery is also a concern for them.
- Have a daily backup snapshot (incremental). Have different weekly snapshots in a different place.
Component Updates
ids.server
2.1.0 release - didn't send a release email, might do that soon. Paul Miller complained that IDS provided single files - FTS was choking because IDS was not providing content length Rolf: I'm aware peoplehave their own IDS servers, but IDS server change can be model for other IDSs
AdM/ESRF found a problem with reverse proxy which was limiting transfer size. Rolf also remembers an unrelated issue but that turned out to be clientside. Louise confirmed that ESRF issue was in their setup, since ISIS IDS could transfer larger files
Python ICAT
Shortly before 1.3 release, just need to polish documentation. Finalises ingest workflow.
AOB
Checksums
IDS upload would write a ??32 checksum. AG thought RK might have done something with this recently?
- RK: In every Investigation have a Dataset which contains checksums (sha256) for every file, against their file names, as seen in the ingest directory. User could download this and use it to check their files are right.
- AG: a company based in Cambridge were not happy with zip, wanted a tar.gz as it is still usable if transfer is incomplete (you get some of the files)
- RK: Would like a more sophisticated container for downloads. But it's tricky. Want to include metadata as well alongside the data itself. For that IDS would need to look up the metadata.
Other meetings
logbook meeting
Patrick asked about this Andy - they're going to have an internal meeting. Rolf waiting for this before continuing?
ESRF new data policy
https://www.esrf.fr/datapolicy v2, in force since 1st January 2024
Next Collaboration Meeting
28th of March is scheduled but this is the Thursday before Easter - some people may be away.