Commit Graph

23 Commits

Author SHA1 Message Date
Winfried Plappert a2a1309fd9 prune: make small pack size configureable for `prune` all changes together
cmd_prune.go: added option `--repack-smaller-than`
prune.go: added field `SmallPackBytes` to `PruneOptions`, including checking and processing
prune_test.go: added test `TestPruneSmall`
doc/060_forget.rst: added description of enhancement
changelog/unreleased/issue-5109: description of enhancement
2025-02-18 16:54:44 +00:00
Gilbert Gilb's 536ebefff4
feat(backends/s3): add warmup support before repacks and restores (#5173)
* feat(backends/s3): add warmup support before repacks and restores

This commit introduces basic support for transitioning pack files stored
in cold storage to hot storage on S3 and S3-compatible providers.

To prevent unexpected behavior for existing users, the feature is gated
behind new flags:

- `s3.enable-restore`: opt-in flag (defaults to false)
- `s3.restore-days`: number of days for the restored objects to remain
  in hot storage (defaults to `7`)
- `s3.restore-timeout`: maximum time to wait for a single restoration
  (default to `1 day`)
- `s3.restore-tier`: retrieval tier at which the restore will be
  processed. (default to `Standard`)

As restoration times can be lengthy, this implementation preemptively
restores selected packs to prevent incessant restore-delays during
downloads. This is slightly sub-optimal as we could process packs
out-of-order (as soon as they're transitioned), but this would really
add too much complexity for a marginal gain in speed.

To maintain simplicity and prevent resources exhautions with lots of
packs, no new concurrency mechanisms or goroutines were added. This just
hooks gracefully into the existing routines.

**Limitations:**

- Tests against the backend were not written due to the lack of cold
  storage class support in MinIO. Testing was done manually on
  Scaleway's S3-compatible object storage. If necessary, we could
  explore testing with LocalStack or mocks, though this requires further
  discussion.
- Currently, this feature only warms up before restores and repacks
  (prune/copy), as those are the two main use-cases I came across.
  Support for other commands may be added in future iterations, as long
  as affected packs can be calculated in advance.
- The feature is gated behind a new alpha `s3-restore` feature flag to
  make it explicit that the feature is still wet behind the ears.
- There is no explicit user notification for ongoing pack restorations.
  While I think it is not necessary because of the opt-in flag, showing
  some notice may improve usability (but would probably require major
  refactoring in the progress bar which I didn't want to start). Another
  possibility would be to add a flag to send restores requests and fail
  early.

See https://github.com/restic/restic/issues/3202

* ui: warn user when files are warming up from cold storage

* refactor: remove the PacksWarmer struct

It's easier to handle multiple handles in the backend directly, and it
may open the door to reducing the number of requests made to the backend
in the future.
2025-02-01 18:26:27 +00:00
Michael Eischer 9331461a13 prune: correctly account for duplicates in max-unused check
The size comparison for `--max-unused` only accounted for unused but not
for duplicate data. For repositories with a large amount of duplicates
this can result in a situation where no data gets pruned even though
the amount of unused data is much higher than specified.
2025-01-19 17:47:49 +01:00
Michael Eischer 99e105eeb6 repository: restrict SaveUnpacked and RemoveUnpacked
Those methods now only allow modifying snapshots. Internal data types
used by the repository are now read-only. The repository-internal code
can bypass the restrictions by wrapping the repository in an
`internalRepository` type.

The restriction itself is implemented by using a new datatype
WriteableFileType in the SaveUnpacked and RemoveUnpacked methods. This
statically ensures that code cannot bypass the access restrictions.

The test changes are somewhat noisy as some of them modify repository
internals and therefore require some way to bypass the access
restrictions. This works by capturing an `internalRepository` or
`Backend` when creating the Repository using a test helper function.
2025-01-13 22:39:57 +01:00
Viktor Szépe ac00229386 Fix typos 2024-07-03 20:02:06 +02:00
Michael Eischer 5e0ea8fcfa pack: move to repository package 2024-05-25 13:13:03 +02:00
Michael Eischer 50ec408302 index: move to repository package 2024-05-25 13:13:03 +02:00
Michael Eischer 3c7b7efdc9 repository: remove prune plan parts once they are no longer necessary 2024-05-24 22:18:14 +02:00
Michael Eischer 77873f5a9d repository: let prune control data structure of usedBlobs set 2024-05-24 22:18:14 +02:00
Michael Eischer 2033c02b09 index: replace CountedBlobSet with AssociatedSet 2024-05-24 22:18:14 +02:00
Michael Eischer 93098e9265 prune: hide implementation details of counted blob set 2024-05-24 21:42:56 +02:00
Michael Eischer 027cc64737 repository: fix prune heuristic to allow resuming interrupted runs
Pack files created by interrupted prune runs, appear to consist only of
duplicate blobs on the next run. This caused the previous heuristic to
ignore those pack files. Now, a duplicate blob in a specific pack file
is also selected if that pack file only contains duplicate blobs. This
allows prune to select the already rewritten pack files.
2024-05-24 21:33:17 +02:00
Michael Eischer 5f7b48e65f index: replace Save() method with Rewrite and SaveFallback
Rewrite implements a streaming rewrite of the index that excludes the
given packs. For this it loads all index files from the repository and
only modifies those that require changes. This will reduce the index
churn when running prune. Rewrite does not require the in-memory index
and thus can drop it to significantly reduce the memory usage.

However, `prune --unsafe-recovery` cannot use this strategy and requires
a separate method to save the whole in-memory index. This is now handled
using SaveFallback.
2024-05-24 21:33:17 +02:00
Michael Eischer 8f1e70cd9b repository: remove clearIndex and packSize from public interface 2024-05-24 21:33:17 +02:00
Michael Eischer 4df887406f repository: inline MasterIndex interface into Repository interface 2024-05-24 21:33:17 +02:00
Michael Eischer 223aa22cb0 replace some uses of restic.Repository with finegrained interfaces 2024-05-18 21:42:51 +02:00
Michael Eischer 6328b7e1f5 replace "too small" with "too short" in error messages 2024-05-18 19:59:26 +02:00
Michael Eischer 940a3159b5 let index.Each() and pack.Size() return error on canceled context
This forces a caller to actually check that the function did complete.
2024-04-22 22:39:32 +02:00
Michael Eischer 31624aeffd Improve command shutdown on context cancellation 2024-04-22 22:31:38 +02:00
Michael Eischer defd7ae729 prune/repair index: reset in-memory index after command
The current in-memory index becomes stale after prune or repair index
have run. Thus, just drop the in-memory index altogether once these
commands have finished.
2024-04-14 13:46:24 +02:00
Michael Eischer d8622c86eb prune: clean up internal interface 2024-04-14 13:45:15 +02:00
Michael Eischer 85e4021619 prune: move additional option checks to repository 2024-04-14 13:44:58 +02:00
Michael Eischer fc3b548625 prune: move logic into repository package 2024-04-10 21:30:52 +02:00