ICAT Collaboration Meeting - 27th February 2025
Attendance
Attendees:
- Rolf Krahl
- Louise Davies
- Kirsty Syder
- Marjolaine Bodin
- Malik Almohammad
- Patrick Austin
- Alex de Maria
- Allan Pinto
- Kevin Phipps
- Santosh Anandarama
Agenda
Site Updates
SESAME
Signed agreement with datacite to start minting DOIs. Development with help from Alex - DOIs minted by users. Issues with missing parameters etc. in the database, working through w/ Alex's help Otherwise, everthing is working. So far, so good.
HZB
Not much to report.
Need to replace hardware ICAT is running on. Some changes under the hood but should have no effect on ICAT as ICAT is dockerised
ESRF
Not many things.
Working on diffeerent techniques.
Web security, security training for React + NodeJS. Using software to analyse security of DataPortal & ICAT+ - only low priority issues so good. Running the scan on live code. Some dependencies to update to address vulnerabilities.
Help EMBL Hamburg to setup ICAT. Some beamlines, e.g. MX. Might want to move from ISPyB to ICAT in the future.
Rolf: ISPyB is also on our schedule, will get back to you in the future. Have to talk to MX people. Alex: someone from HZB (James Stewart?) came to ESRF to install ISPyB some months/years(?) ago
ISIS
Backup server migration from CenOS 7 to Rocky 9 complete.
Xpress proposals migrated from old way direct read from DB to new proposals system.
Payara migration - some items still running on Payara 4 - need to move to Payara 6.
DLS
Signed Datacite contract - been in the works for a decade!
Updates for DataGateway - search improvements. First update in a while! In future (maybe April) - restricted downloads of whole visits.
LNLS/Sirius
March, visit from data management about stack of software, show them how to use ICAT & show benefits.
Report/presentation coming soon. Present ICAT, DataPortal & data policy. Can't discuss open data here as there's more people involved. Removed from data policy for now, as more discussion need on data policty (e.g. data storage). So data policy split into steps, with open data as second step.
Also taking to other facilities about data management. Showing them the ICAT and stack that we have in prod now. See if the project fits their needs. Sirius is main facility but others have basically the same workflow but are smaller. See if ICAT meets their needs and increase the number of active facilities using that model for data management infrastrucure.
Implemented new form in Data Portal when people use ? had a specification from the beamline that the user needs to put in the portal. Experimental parameters for high throughput. New way of getting parameters from the users so the beamline can use this to run the experiment. Would like to know if ESRF teams have time to meet and see if the implementation is OK. Would like feedback. Tried to implement in a way that optimises code re-use and interoperability. Don't want a new fork which is not compatible with best practice of the upstream. Have other beamlines with other requirements for custom forms.
Alex: Sure, every beamline has experimental and processing plan. These set parameterers such as energy or resolution. These are downoaded at the beamline and can be used to perform the experiment. If you want, link the repo where the changes are and we can have a look. Happy to meet.
Allan: Code is in our gitlab, only internal but IT team is working on that. Can meet and we just show you the code
Rolf: using current version of DataPortal?
Allan: previous one (?)
Rolf: been talks about collaborating on components like e.g. logbooks
Allan: py-ispyb - crystallography groups are asking about this project/ICAT
Alex: py-ispyb - developed by ESRF, but now abandoned as we've moved to ICAT
Allan: Also interest in the Logbook functionality as well, mostly from engineering groups.
Component Updates
icat.server & icat.lucene
New versions next week - icat.server 6.1.0, icat.lucene 3.0.0, icat.utils 4.17?
Rolf: icat.server 6.1.0 and icat.lucene 3 required? Patrick: icat.lucene 3 needs icat.server 6.10.0. icat.server 6.10 that is using icat.lucene requires icat.lucene 3 - no changes required if not using icat.lucene. Will also require re-indexing. Kevin: pre-creating new Diamond lucene index. Has taken over 2 weeks to re-populate lucene index (mostly 5 billion datafiles)
Alex: Want to test. we're using icat 5 with old Payara. Will it require Payara 6? STFC: yes Alex: Changes in DB between 5 and 6? STFC: no Alex: can we ignore datafiles index? Patrick: you can specify individual entities you want to index yes
Alex: oh yes will also have to migrate other components e.g. authenticators
Louise: Alan cretaed some help: https://github.com/ral-facilities/icat-payara6
ids
No update
Python ICAT
No update
datagateway-download-api (what was the topcat Java api)
Will be changed as part of the download visit changes, but nothing deeper in the stack will be affected under the current plan.
AOB
users large downloads
Alex: large downloads, Globus is complicated and HTTP(S) breaks when over 5(?)GBs
Kevin: Users usually don't complain to us. Hard to know if they have succeeded when you see them in the logs. Have heard that ~TB downloads have completed. These are possible on site for sure. WOuldn't be surprised if many fail. Don't have confirmation. Only see them being requested, but these continue.
Louise: In parallel, many DLS users restore to the DLS disk cache. Alternative to HTTP or Globus. Possible that people move it to their personal hardware once it's on DLS disk?
Kevin: Each of these methods run on a separate IDS, with the non-HTTP methods using pollcat to link and permission files once they've been restored to the IDS. Hoping to extend this to support object storage as a destination. Pollcat is a small script, queries the IDS for completion of the restore and then does permissions, moves, copies etc. ~100 lines of Python per plugin. We do (can?) not tell if people actually get to the last byte of their TB downloads.
Lousie: Sometimes can observe people retrying a different method for the same data. Or they don't retry.
Rolf: Setting up centralised online storage. Experiments drop their data for ingest, but also could restore to online storage from user requests. Plan to have this soon, but it's a work in progress. In the long term we definitely have some file transfer service in order to do this
Patrick: we're developing a python api that will mediate FTS transfers with ICAT. move data e.g. from tape to disk or other locations.
Rolf: have a HZB cloud services - FTS is preferred means of transfer between HZB cloud services. Not utmost priority but on the TODO list
Instrument PIDs
Louise: ICAT metadata changes - off the back of Rolf's Open Science Cafe, ISIS mainly but also DLS pids for instruments. Talking about the practicalities. Some data will be in ICAT, some will be hardcoded. Will need to ask ISIS for dates. Need to store start date and end date. We might suggest them storing it on instrument. When the instrument was created and deprecated. Might be interest in this.
Rolf: If you want to add these, that would be rather lightweight.
Louise: Yes, we thought two optional fields would be minor and has precendent. But the facilities might want to store elsewhere and we don't duplicate the information. Shouldn't be stored in documents.
Rolf: For Instrument DOI you would need a landing page, you would want more information on that page than you can store in ICAT?
Louise: Idea is to re-use existing instrument pages on the DLS/ISIS own websites. But they don't have start/end dates on display. One idea was for tombstone pages. If deprecated, would the website team want to maintain the page? If we don't create current pages, maybe ICAT creates the tombstone page. But e.g. start/end date not in ICAT so would not be able to generate. Still pending meeting(s) with ISIS.
Rolf: Can't see any reason to object. Would be compatible with the schema. We also have forseen changes for ICAT 7. Need to find a way to implement those.
Louise: We could do start/end date as a minor version? Since they are just nullable fields, they are backwards compatible.
Next meeting
27th March. Day after a DAPHNE meeting?