Meeting 125 – 28th March 2019
Attendees:
Alex de Maria (AM), Brian Ritchie (BR), Stuart Pullinger (SP), Rolf Krahl (RK), Chris Prosser (CP), Silvie Da Graca Ramos (SR), Louise Davies (LD), George Christian (GC), Andy, Maxime
Component Updates
ICAT server (SP):
- Released 4.10.0 `icat-server` and `icat-client`
- Schema changes:
- Added pid and enddate to Study
- Rofl’s schema changes
- Need to run schema upgrade scripts to upgrade
- Only additions to schema, minor changes needed
- Issues:
- Steve used MireDot to generate REST API docs
- Licence has experied – awaiting a new licence
- Proprietry software but free for open source
- Will generate docs for previous releases once this has happened
- Waffle.io (used for kanban board) is closing
- Move issues across to Github Projects
IDS (RK):
- Snapshot releases for `ids-server`, `ids-plugin`,
`ids-plugin-file`
- main change: support for file system locking
- if plugin decides to implement, allows for concurrent access to storage
- main change: support for file system locking
Python-ICAT (RK):
- Released `python-icat` 0.15.0
- Supports icat-server schema changes
- Fixes some compatibility issues
- Last version to support Python 2.6
Q: Andy: python 2.6 support dropped – will 2.7 still be supported?
RK: When python-icat v1 is released, likely will drop support for Python 2 then, following on discussions on the mailing list
SP: scripts in icat-server are Python 2 only, so work needed to convert
to python 3
Should we convert to support both or just support python 3?
RH6 only has python 2 as default, so sensible to make them support both
RK: shouldn’t be too hard to support both for small scripts
SP: aim to support both
Topcat (BR):
- Released topcat 2.4.3
- contains schema changes from icat-server 4.10.0
- Minor change from last snapshots: remove spinners for cells that are blank
- component updates in vagrant/travis build
- Travis still sporiadically fails, some attempts have been made but no definitive fix
Site Updates
DLS (CP):
- 2 mechanisms for user recall: standard HTTPS and Globus/PollCAT –
lots of user recalls on both
-
HTTPS: 2 user recalls had issues to do with tape, IDS timed out, script normally restarts recall but the scripts didn’t have enough storage to retrieve the files – resolved by restarting IDS
-
GLOBUS: millions of files requested/several TB of data, swamped IDS, made us reconsider PollCAT design
-
- Everything else seems fine
SR:
- changing ingest – putting into pre-prod soon
Q: Andy – how many times do you copy data when moved in the Globus system?
SR – Diamond uses datafile and not dataset level storage
SP – Scans table in TopCAT directly for downloads marked as Globus, and ignores HTTPS. Files are copied from tape to StorageD, then copied to IDS main storage, then copied into separate Globus download area
SR – google group mailing issue about copying data might seem similar, but believe Diamond issue is separate
Andy – tape storage managers want to reduce copies to only 1 copy rather than 3
RK – IDS has intermediate step if you have dataset where archive is copied to cache, and then it is extracted to main, datafile does not have this step. This extra step has been an issue from 2016, but it is difficult. Would like to refactor some code in IDS before tackling this issue, so no promises of when a fix will happen. You will still have 2 copies – 1 in archive and 1 in main, this is hard to avoid.
Q: AM – can we restore data automatically into main area?
RK – no
Andy – “restore to original place” – aka main storage
RK – IDS doesn’t know about tape storage, so it must request from tape first, it is built with archive and main. Client request comes in, IDS triggers restore, reads ZIP from archive and restores to main.
SP – can storage system unzip to main behind IDS’s back? IDS checks if it is on disk first, before it goes to archive and if it’s available it can use it. If ids-plugin isn’t responsible for restoring, and a separate system unzips?
RK – would need to talk directly to tape storage, but then would need to code all backends. No direct communication between IDS-server and storage backend. IDS talks to main storage plugin and then archive storage plugin. No direct way to avoid this.
HZB (RK):
- Nothing to report
AOB
Diamond recall bug
SP – Diamond problems with restarting recall where IDS needs to be restarted – seems similar to bug (issue #87) where IDS locked files when recalling, but if thread crashed then the files would remain locked until IDS restarted. Check Diamond has a version of IDS which includes this bugfix and if it doesn’t then upgrade.
RK – fixed in IDS 1.9.1
Plugin Logging
SP – Diamond StorageD plugin, most ids-plugins don’t have much logging. RK’s student has solution to get logging in plugins?
RK – You can use logging in the plugins, using log4j, one caveat: you need to provide a logfile config that shadows logfile from IDS-server. Inconvenient because logging for IDS-server is in plugin, but it works.
SP – where is this fix recorded?
RK – issue #76, some documentation in this issue
SP – next version of IDS may require recompilation for plugins, so might be a good idea to look at logging now to save multiple recompilations
RK – away for next 2 weeks, but after may provide information on how to modify plugins delete should not throw exception then file to delete does not exist additional call lock. Suggest adding these things to archive storage, otherwise need to add dummy method
Moving instead of copying in Diamond
CP – Manually deleting data from IDS, response from RK that this isn’t a good idea. Want to move data from IDS cache
SP – on globus IDS files are restored from archive to main like normal IDS, these files are then copied to separate Globus area. Can we just move them instead of copying? Problem: 2 requests for same file, IDS thinks it’s there for second file but file has been moved by request for first file
RK – clarify: you want to move files from main storage to transfer area? This is a problem as IDS doesn’t know file is not there. IDS might restore things twice. Can you not make a hard link?
Silvia – Is it possible to change permissions in main storage area so users can access?
SP – Globus has its own user, don’t know how this will work with permissions, but hard link might work
RK – hard link will only work if both areas are on same file system
Silvia – IDS has functionality to remove files when we want, different process to perform
CP – user1/user2 causing double recall from archive is fine, this is rare case and offset by increase in move efficiency
RK – should be fine, deleting files from datafile level storage is fine
CP – so move works, but less efficient in multiple users requesting same file
Silvia – hard link might be useful for multiple users case
RK – you can make hardlinks for file to users area, when IDS removes file from main storage this will also remove hard links. Still have to check that permissions are separate between hardlinks and not shared.
Dataset level issue
SP: RK is proposing change when zip requested from archive, when files don’t exist in zip or there are unexpected files, throw exceptions. Concern this is not suitable for ESRF where it ma be better to continue working and log errors instead. Is there a comprimise – e.g. config option?
AM: Understand RK’s POV, but this is the best we can do. Sometimes people rename files and it’s impossible to find files. So we’d like to make best effort to restore what we can.
RK: you should make it consistent
AM: you are imposing this constraint
RK: IDS fundamentally relies that the data in ICAT is correct. If data is not correct you don’t get reliable. If the files in the ZIP don’t match then the data is not correct. IDS ignores files with null location.
AM: we like that the location has a record of where it was
RK: location is controlled by IDS, if it’s not then IDS is unstable
SP: is it true that to accommodate RK’s change, you need to scan all files to check that they match what IDS expects – this is a big job
AM: this isn’t too much of a worry, the worry is removing location that is useful to the user
RK: location isn’t supposed to e useful for user, meant to be administrative field for IDs to use
AM: would prefer not to delete location – our IDS is working well
RK: you will run into strange errors: restore will get stuck with an error. IDS relies on location, if it’s not correct then IDS is not correct
SP: could IDS be written that if file is missing you throw the error, in the catch you check config option and skip this file, otherwise rethrow exception
RK: you could do this, but doesn’t remove problem that IDS assumes location is correct and exists
SP: ICAT/Diamond doesn’t have this problem because datafile level storage
RK: IDS will still file if it doesn’t exist
SP: But if it fails for 1 file it will continue to next file, unlike dataset storage.
AM: might be possible to use own field/parameter to record the previous location, then location can be nulled
SP: look at how ingest works and look for inconsistency, look for skips