* feat(backends/s3): add warmup support before repacks and restores
This commit introduces basic support for transitioning pack files stored
in cold storage to hot storage on S3 and S3-compatible providers.
To prevent unexpected behavior for existing users, the feature is gated
behind new flags:
- `s3.enable-restore`: opt-in flag (defaults to false)
- `s3.restore-days`: number of days for the restored objects to remain
in hot storage (defaults to `7`)
- `s3.restore-timeout`: maximum time to wait for a single restoration
(default to `1 day`)
- `s3.restore-tier`: retrieval tier at which the restore will be
processed. (default to `Standard`)
As restoration times can be lengthy, this implementation preemptively
restores selected packs to prevent incessant restore-delays during
downloads. This is slightly sub-optimal as we could process packs
out-of-order (as soon as they're transitioned), but this would really
add too much complexity for a marginal gain in speed.
To maintain simplicity and prevent resources exhautions with lots of
packs, no new concurrency mechanisms or goroutines were added. This just
hooks gracefully into the existing routines.
**Limitations:**
- Tests against the backend were not written due to the lack of cold
storage class support in MinIO. Testing was done manually on
Scaleway's S3-compatible object storage. If necessary, we could
explore testing with LocalStack or mocks, though this requires further
discussion.
- Currently, this feature only warms up before restores and repacks
(prune/copy), as those are the two main use-cases I came across.
Support for other commands may be added in future iterations, as long
as affected packs can be calculated in advance.
- The feature is gated behind a new alpha `s3-restore` feature flag to
make it explicit that the feature is still wet behind the ears.
- There is no explicit user notification for ongoing pack restorations.
While I think it is not necessary because of the opt-in flag, showing
some notice may improve usability (but would probably require major
refactoring in the progress bar which I didn't want to start). Another
possibility would be to add a flag to send restores requests and fail
early.
See https://github.com/restic/restic/issues/3202
* ui: warn user when files are warming up from cold storage
* refactor: remove the PacksWarmer struct
It's easier to handle multiple handles in the backend directly, and it
may open the door to reducing the number of requests made to the backend
in the future.
The HTTP client can only retry HTTP2 requests after receiving a GOAWAY
response if it can rewind the body. As we use a custom data type,
explicitly provide an implementation of `GetBody`.
If client.Do returns an error, then there's no body that has to be
closed. For requests for which we are not interested in the response
body, immediately drain and close the body to make sure it isn't
forgotten later on.
This change in particular adds the missing `Close()` call for the
`List()` command.
rclone returns a "not found" error if an internal error occurs while
listing a folder. Ignoring this error lets restic erroneously think that
there are no files, which can cause `prune` to wipe the whole
repository.
This removes code that is only used within a backend implementation from
the backend package. The latter now only contains code that also has
external users.
When transferring a repository from S3 to, for example, a local disk
then all empty folders will be missing.
When saving files, the missing intermediate folders are created
automatically. Therefore, missing directories can be ignored by the
`List()` operation.
This unified construction removes most backend-specific code from
global.go. The backend registry will also enable integration tests to
use custom backends if necessary.
In order to change the backend initialization in `global.go` to be able
to generically call cfg.ApplyEnvironment() for supported backends, the
`interface{}` returned by `ParseConfig` must contain a pointer to the
configuration.
An alternative would be to use reflection to convert the type from
`interface{}(Config)` to `interface{}(*Config)` (from value to pointer
type). However, this would just complicate the type mess further.
The SemaphoreBackend now uniformly enforces the limit of concurrent
backend operations. In addition, it unifies the parameter validation.
The List() methods no longer uses a semaphore. Restic already never runs
multiple list operations in parallel.
By managing the semaphore in a wrapper backend, the sections that hold a
semaphore token grow slightly. However, the main bottleneck is IO, so
this shouldn't make much of a difference.
The key insight that enables the SemaphoreBackend is that all of the
complex semaphore handling in `openReader()` still happens within the
original call to `Load()`. Thus, getting and releasing the semaphore
tokens can be refactored to happen directly in `Load()`. This eliminates
the need for wrapping the reader in `openReader()` to release the token.
The Test method was only used in exactly one place, namely when trying
to create a new repository it was used to check whether a config file
already exists.
Use a combination of Stat() and IsNotExist() instead.
The ioutil functions are deprecated since Go 1.17 and only wrap another
library function. Thus directly call the underlying function.
This commit only mechanically replaces the function calls.
This package is no longer needed, since we can use the stdlib's
http.NewRequestWithContext.
backend/rclone already did, but it needed a different error check due to
a difference between net/http and ctxhttp.
Also, store the http.Client by value in the REST backend (changed to a
pointer when ctxhttp was introduced) and use errors.WithStack instead
of errors.Wrap where the message was no longer accurate. Errors from
http.NewRequestWithContext will start with "net/http" or "net/url", so
they're easy to identify.
... called backend/sema. I resisted the temptation to call the main
type sema.Phore. Also, semaphores are now passed by value to skip a
level of indirection when using them.
github.com/pkg/errors is no longer getting updates, because Go 1.13
went with the more flexible errors.{As,Is} function. Use those instead:
errors from pkg/errors already support the Unwrap interface used by 1.13
error handling. Also:
* check for io.EOF with a straight ==. That value should not be wrapped,
and the chunker (whose error is checked in the cases changed) does not
wrap it.
* Give custom Error methods pointer receivers, so there's no ambiguity
when type-switching since the value type will no longer implement error.
* Make restic.ErrAlreadyLocked private, and rename it to
alreadyLockedError to match the stdlib convention that error type
names end in Error.
* Same with rest.ErrIsNotExist => rest.notExistError.
* Make s3.Backend.IsAccessDenied a private function.
The missing eof with http2 when a response included a content-length
header but no data, has been fixed in golang 1.17.3/1.16.10. Therefore
just drop the canary test and schedule it for removal once go 1.18 is
required as minimum version by restic.
The rest config normally uses prepareURL to sanitize URLs and ensures
that the URL ends with a slash. However, the test used an URL without a
trailing slash, which after the rest server changes causes test
failures.
This enables the backends to request the calculation of a
backend-specific hash. For the currently supported backends this will
always be MD5. The hash calculation happens as early as possible, for
pack files this is during assembly of the pack file. That way the hash
would even capture corruptions of the temporary pack file on disk.
Add comment that the check is based on the stdlib HTTP2 client. Refactor
the checks into a function. Return an error if the value in the
Content-Length header cannot be parsed.