diff options
Diffstat (limited to 'web-client-with-cache.org')
-rw-r--r-- | web-client-with-cache.org | 542 |
1 files changed, 542 insertions, 0 deletions
diff --git a/web-client-with-cache.org b/web-client-with-cache.org new file mode 100644 index 0000000..17d3f4f --- /dev/null +++ b/web-client-with-cache.org @@ -0,0 +1,542 @@ +#+title: Web Client with Cache {{{version}}} +#+subtitle: for version {{{version}}}, {{{updated}}} +#+author: Vivien Kraus +#+email: vivien@planete-kraus.eu +#+options: ':t toc:t author:t email:t +#+language: en + +#+macro: version @@texinfo:@value{VERSION}@@ +#+macro: updated @@texinfo:@value{UPDATED}@@ +#+texinfo_filename: web-client-with-cache.info +#+texinfo_header: @syncodeindex pg cp fn +#+texinfo_dir_category: The Algorithmic Language Scheme +#+texinfo_dir_title: Web Client with Cache: (web-client-with-cache) +#+texinfo_dir_desc: Using the cache +#+texinfo_printed_title: Web Client with Cache + +This manual is for the Guile Web Client with Cache (version +{{{version}}}, {{{updated}}}). + +* Copying + :PROPERTIES: + :COPYING: t + :END: +This manual is for the Guile Web Client with Cache. + +Copyright \copy 2021 Vivien Kraus. + +#+begin_quote +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. A copy of the license is included in the section entitled "GNU +Free Documentation License". + +The program present in this document is free software: you can +redistribute it and/or modify it under the terms of the GNU General +Public License as published by the Free Software Foundation, either +version 3 of the License, or (at your option) any later version. + +This program is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program. If not, see [[https://www.gnu.org/licenses/]]. +#+end_quote + +* Using the cache +#+cindex: using the cache + +The API is defined in the =(web client with-cache)= module. It +consists of one function: + +#+begin_src emacs-lisp :noweb yes :exports results :results drawer + (format " + ,#+begin_export texinfo + @defun with-cache [#:http-get] [#:current-time] [#:box] + <<with-cache-docstring>> + @end defun + ,#+end_export") +#+end_src + +* Covoiturage +The goal here is to wrap an expensive long-running pure function so +that calling it multiple times with the same arguments until it has +not returned reuses the return value. The name is not pretty, if you +find a better one please tell me. + +#+name: covoiturage-docstring +#+caption: Docstring of the covoiturage function +#+begin_src texinfo + Wrap @var{f} so that simultaneous calls use the same return value. By + simultaneous, we mean parallel calls with the same value for the first + argument (compared with @code{equal?}) during the long-running + evaluation of @var{f}. + + @var{f} takes at least one argument, the key. @var{f} may throw or + return. It is executed in a Future (from @code{(ice-9 futures)}, so + side effects should be synchronized somehow. +#+end_src + +Internally, we use an atomic box. The atomic box contains an alist +from key values to promises to futures. Futures are cool, but once +they are constructed they start running immediately. + +However, if the lockless synchronization failed because of a +concurrent update (for instance, inserting a new query), we should not +start f at all before having updated the atomic box. Otherwise, for +http-get, an unwanted connection to the server will still be +attempted. + +To work around that, we use a promise to a future, so the +future starts when the promise is forced -- thus, either by our +thread, after we have synchronized the box ourselves; or by another +thread, because it has already been added to the box. + +Once the function has returned, we notify all the other waiting +threads and remove everything from the box. Function +[[fn-covoiturage-remove]] does the removal. + +#+name: fn-covoiturage-remove +#+caption: Remove a completed entry from a box +#+begin_src scheme :eval no + (define (remove box key) + (let* ((old (atomic-box-ref box)) + (new (filter (lambda (record) + (not (equal? (car record) key))) + old)) + (discarded + (atomic-box-compare-and-swap! box old new))) + (unless (eq? discarded old) + (remove box key)))) +#+end_src + +The application of function /f/ may either return (multiple values), +or throw. We want to forward both cases to all the callers, so we need +to intercept the throws. Function [[fn-covoiturage-wrap-f]] does that: +call f, and cons either ='ok= or ='error= and the return or throw +values. + +#+name: fn-covoiturage-wrap-f +#+caption: Wrap a thunk so that we know whether it threw or returned +#+begin_src scheme :eval no + (define (wrap-f f) + (catch #t + (lambda () + (call-with-values f + (lambda args + (cons 'ok args)))) + (lambda error + (cons 'error error)))) +#+end_src + +Symmetrically, when we want to report the value, we need to apply +either =throw= or =values=. This is the job of [[fn-covoiturage-report]]. + +#+name: fn-covoiturage-report +#+caption: Report the computed values +#+begin_src scheme :eval no + (define (report result) + (case (car result) + ((ok) (apply values (cdr result))) + ((error) (apply throw (cdr result))))) +#+end_src + +There are now two cases when we query a key: either there is an entry +in the cache, in which case we just remove that, or we need to create +a new future. Function [[fn-covoiturage-create]] will create the job (it +will not start until it has been added to the cache). We note that if +the return value should be cached for longer, the caching should +happen before /f/ has finished executing. + +#+name: fn-covoiturage-create +#+caption: Create a new job (don't start it yet) +#+begin_src scheme :eval no + (define (create box key f args) + (delay + (make-future + (lambda () + (let ((ret (wrap-f (lambda () (apply f key args))))) + (remove box key) + ret))))) +#+end_src + +The query function works atomically: it first reads the cache, use it +if it is present, and otherwise add an entry with a job and start the +job. If it can't do that atomically, it retries. That's function +[[fn-covoiturage-do-query]]. + +#+name: fn-covoiturage-do-query +#+caption: Do query the cache +#+begin_src scheme :eval no + (define (do-query box f key args) + (let ((old (atomic-box-ref box))) + (if (assoc-ref old key) + (report (touch (force (assoc-ref old key)))) + (let* ((job (create box key f args)) + (new (acons key job old))) + (let ((discarded + (atomic-box-compare-and-swap! box old new))) + (if (eq? discarded old) + ;; The box is updated, so the future can be + ;; forced and f starts running + (report (touch (force job))) + ;; Concurrent update, retry. + (do-query box f key args))))))) +#+end_src + +Finally, we wrap everything in a public function that will manage the +state (the atomic box). The final function is [[fn-covoiturage]]. + +#+name: fn-covoiturage +#+caption: The final function definition +#+begin_src scheme :eval no :noweb no-export + (define-public (covoiturage f) + "\ + <<covoiturage-docstring>>" + (let ((box (make-atomic-box '()))) + <<fn-covoiturage-remove>> + <<fn-covoiturage-wrap-f>> + <<fn-covoiturage-report>> + <<fn-covoiturage-create>> + <<fn-covoiturage-do-query>> + (lambda (key . args) + (do-query box f key args)))) +#+end_src + +* Web caching +While we still use an atomic box for this cache, it is more complex +than covoiturage. We need to fix HTTP headers, parse the responses, +manipulate time. + +** Filter =#:header= keyword argument + +Let's start with the headers. We need a function to parse keyword +arguments, extract the non-ignored headers and the non-header keyword +arguments. The function [[fn-filter-keyword]] extracts the headers, and +[[fn-filter-headers]] removes the unwanted ones. + +#+name: fn-filter-keyword +#+caption: Filter a keyword from a keyword list +#+begin_src scheme :eval no + (define (filter-keyword kw args) + (define (search args-kept value args) + (if (null? args) + (values value (reverse args-kept)) + (let ((next-keyword (car args)) + (next-value (cadr args)) + (rest (cddr args))) + (if (eq? next-keyword kw) + (search args-kept (or value next-value) rest) + (search + ;; args-kept is in reverse order! + (cons* next-value next-keyword args-kept) + value + rest))))) + (search '() #f args)) +#+end_src + +#+name: fn-filter-headers +#+caption: Remove headers for which we have the responsibility +#+begin_src scheme :eval no + (define (filter-headers headers) + (filter (lambda (header) + (case (car header) + ((if-none-match if-modified-since) #f) + (else #t))) + headers)) +#+end_src + +** Time management + +Now, it's time for time management. We need different auxiliary +functions: +1. Add seconds to a date, which is polymorphic enough to allow SRFI-19 + dates and SRFI-19 times; +2. Compare two dates or times. + +*** Adding seconds to a date +Function [[fn-date-add]] adds a number of seconds to a time or a date, and +returns a date. + +#+name: fn-date-add +#+caption: Add seconds to a date +#+begin_src scheme :eval no + (define (date-add date seconds) + (let* ((duration (make-time time-duration 0 seconds)) + (time (if (time? date) + date + (date->time-utc date))) + (result (add-duration time duration))) + (time-utc->date result))) +#+end_src + +*** Compare two dates/times +This is the job of functon [[fn-date-geqp]]. If the arguments are dates, +they are first compared to times. + +#+name: fn-date-geqp +#+caption: Compare two dates +#+begin_src scheme :eval no + (define (date>=? past future) + (when (date? past) + (set! past (date->time-utc past))) + (when (date? future) + (set! future (date->time-utc future))) + (time>=? past future)) +#+end_src + +** Fix the responses +There are different things we need to do to responses: +1. Add a date to every response that does not have one; +2. Estimate an expiration date for a response, based on either the + Expires header or its Date and Cache-Control information; +3. Merge the headers for two responses. + +*** Ensure that a response has a date +If the server did not set a date, we set it from the client +clock. This is the job of [[fn-response-with-date]]. + +#+name: fn-response-with-date +#+caption: Ensure that the response has a date. Use the client clock if needed. +#+begin_src scheme :eval no + (define (response-with-date response current-time) + ;; current-time is only used if response does not have a date. + (let ((date (response-date response (time-utc->date current-time))) + (other-headers + (filter (lambda (header) + (not (eq? (car header) 'date))) + (response-headers response)))) + (build-response + #:version (response-version response) + #:code (response-code response) + #:reason-phrase (response-reason-phrase response) + #:headers (acons 'date date other-headers) + #:port (response-port response)))) +#+end_src + +*** Estimate the expiration date from the response +The expiration date can be: +- the max-age value of the cache control after the response date, if + the cache control is present and has a max-age; +- the value of the Expires header; +- =#f= if the others failed. + +This is what [[fn-estimate-response-expires]] does. + +#+name: fn-estimate-response-expires +#+caption: Estimate the expiration date of a response +#+begin_src scheme :eval no + (define (estimate-response-expires response) + (let ((cache-control (response-cache-control response '())) + (expires (response-expires response)) + (date (response-date response))) + (let ((max-age (assoc-ref cache-control 'max-age))) + (cond (max-age + (date-add date max-age)) + (expires expires) + (else #f))))) +#+end_src + +*** Merge the headers of two responses +The difficult part is that this will happen every time a response is +refreshed from a 304 Not Modified response. So if we just merge the +alists, the header alist will grow arithmetically. The trick is to +use a hash table to remove duplicate header names. This explains +function [[fn-merge-headers]]. + +#+name: fn-merge-headers +#+caption: Merge the headers for two responses +#+begin_src scheme :eval no + (define (merge-headers most-important less-important) + (let ((h+ (response-headers most-important)) + (h- (response-headers less-important))) + (let* ((alist (append h+ h-)) + (table (alist->hash-table alist))) + (hash-table->alist table)))) +#+end_src + +** Wrap =http-fetch= so as to store the response +Now we can start using the =covoiturage= function. The inner function +will call =http-get=, but between the moment we have the response and +response body and the moment we return them, we add them to a web +cache. + +The function [[fn-with-cache-add-to-box]] atomically adds the response and +response-body to the cache, while ensuring there is a Date header. + +#+name: fn-with-cache-add-to-box +#+caption: Add a response and its body to the cache +#+begin_src scheme :eval no + (define (with-cache-add-to-box box key response body current-time) + (set! response (response-with-date response current-time)) + (let* ((old (atomic-box-ref box)) + (new (acons key (cons response body) + (filter + (lambda (cell) + (not (equal? key (car cell)))) + old))) + (discarded (atomic-box-compare-and-swap! box old new))) + (if (eq? discarded old) + (values response body) + (with-cache-add-to-box box key response body current-time)))) +#+end_src + +The function [[fn-with-cache-smart-http-get]] takes more argument than +regular =http-get=. It takes a base response and base response body, +which are the previously cached versions of the response. Then, it +proceeds with =http-get=. If the result is a 304 Not Modified +response, then it updates the cached response with the headers from +the new response and returns the previous response. Also, it updates +the cache just before returning. + +#+name: fn-with-cache-smart-http-get +#+caption: =http-get=, but compare with the previous cached responses +#+begin_src scheme :eval no + (define (with-cache-smart-http-get base-response base-response-body + box current-time http-get key . args) + (receive (response response-body) + (apply http-get key args) + (set! response (response-with-date response current-time)) + (when (and base-response (eq? (response-code response) 304)) + (set! response + (build-response + #:version (response-version base-response) + #:code (response-code base-response) + #:reason-phrase (response-reason-phrase base-response) + #:headers (merge-headers response base-response) + #:port (response-port response))) + (set! response-body base-response-body)) + (with-cache-add-to-box box key response response-body current-time))) +#+end_src + +We can now wrap =with-cache-smart-http-get= within covoiturage, and +focus on the main function. The main function takes the box, +current-time and smart http-get function within covoiturage, looks up +the response in the cache, and returns it as is if it is still +valid. Otherwise, the smart http-get is called. This is the job of +function [[fn-with-cache-query]]. + +#+name: fn-with-cache-query +#+caption: The main function for the cache +#+begin_src scheme :eval no + (define (with-cache-query box current-time smart-http-get http-get key . args) + (let* ((old (atomic-box-ref box)) + (resp (assoc-ref old key))) + (receive (request-headers other-request-args) + (filter-keyword #:headers args) + (unless request-headers + (set! request-headers '())) + (set! request-headers (filter-headers request-headers)) + (if resp + (let ((response (car resp)) + (response-body (cdr resp))) + (let ((expires (estimate-response-expires response))) + (if (and expires (date>=? expires current-time)) + ;; The response is cached and not expired + (values response response-body) + ;; The response is cached but expired + (let ((etag (response-etag response)) + (last-modified (response-last-modified response))) + (when etag + (set! request-headers + (acons 'if-none-match (list etag) request-headers))) + (when last-modified + (set! request-headers + (acons 'if-modified-since last-modified request-headers))) + (apply smart-http-get + response response-body box current-time http-get key + #:headers request-headers + other-request-args))))) + ;; The response is not cached + (apply smart-http-get + #f #f box current-time http-get key + #:headers request-headers + other-request-args))))) +#+end_src + +** Putting everything together +The final function is [[fn-with-cache]]. It takes its docstring from +[[with-cache-docstring]]. + +#+name: with-cache-docstring +#+caption: Docstring of the http-get with cache +#+begin_src texinfo + Call @var{http-get} with a cache. @var{http-get} is a procedure that + takes an URI, and optionally some headers and other arguments, and + either fails or returns two values: a response and a response body. + + The following headers are set by the caching function and are thus + ignored from the additional arguments: + + @itemize + @item @code{If-None-Match} + @item @code{If-Modified-Since} + @end itemize + + For cache validation without connecting to the server, + @var{current-time} is needed. It should be a SRFI-19 time or date. By + default, use the client system clock. + + The returned function takes the same arguments as @var{http-get} from + @code{(web client)}, and also returns a response and a response body, + however some headers are filtered, and responses with 304 will be + converted to 200 with the help of the cache, if there is some. + + The optional argument @code{#:box} can be used to use a specific + atomic box instead of one created just for the occasion. +#+end_src + +#+name: fn-with-cache +#+caption: HTTP GET with cache +#+begin_src scheme :eval no :noweb no-export :tangle web/client/with-cache.scm :mkdirp t + (define-module (web client with-cache) + #:use-module (web client) + #:use-module (web response) + #:use-module (srfi srfi-1) + #:use-module (srfi srfi-19) + #:use-module (srfi srfi-69) + #:use-module (ice-9 atomic) + #:use-module (ice-9 receive) + #:use-module (ice-9 optargs) + #:use-module (ice-9 threads) + #:use-module (ice-9 futures)) + + <<fn-covoiturage>> + + (define*-public (with-cache + #:key + (http-get http-get) + (current-time current-time) + (box #f)) + "\ + <<with-cache-docstring>>" + (let ((box (or box (make-atomic-box '())))) + <<fn-filter-keyword>> + <<fn-filter-headers>> + <<fn-date-add>> + <<fn-date-geqp>> + <<fn-response-with-date>> + <<fn-estimate-response-expires>> + <<fn-merge-headers>> + <<fn-with-cache-add-to-box>> + <<fn-with-cache-smart-http-get>> + (define smart-http-get + (covoiturage with-cache-smart-http-get)) + <<fn-with-cache-query>> + (lambda (uri . args) + (apply with-cache-query box (current-time) smart-http-get http-get uri args)))) +#+end_src + +* GNU Free Documentation License + :PROPERTIES: + :APPENDIX: t + :END: +#+texinfo: @include fdl.texi +* Index + :PROPERTIES: + :INDEX: cp + :END: |