#+title: Web Client with Cache {{{version}}} #+subtitle: for version {{{version}}}, {{{updated}}} #+author: Vivien Kraus #+email: vivien@planete-kraus.eu #+options: ':t toc:t author:t email:t #+language: en #+macro: version @@texinfo:@value{VERSION}@@ #+macro: updated @@texinfo:@value{UPDATED}@@ #+texinfo_filename: web-client-with-cache.info #+texinfo_header: @syncodeindex pg cp fn #+texinfo_dir_category: The Algorithmic Language Scheme #+texinfo_dir_title: Web Client with Cache: (web-client-with-cache) #+texinfo_dir_desc: Using the cache #+texinfo_printed_title: Web Client with Cache This manual is for the Guile Web Client with Cache (version {{{version}}}, {{{updated}}}). * Copying :PROPERTIES: :COPYING: t :END: This manual is for the Guile Web Client with Cache. Copyright \copy 2021 Vivien Kraus. #+begin_quote Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". The program present in this document is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see [[https://www.gnu.org/licenses/]]. #+end_quote * Using the cache #+cindex: using the cache The API is defined in the =(web client with-cache)= module. It consists of one function: #+begin_src emacs-lisp :noweb yes :exports results :results drawer (format " ,#+begin_export texinfo @defun with-cache [#:http-get] [#:current-time] [#:box] <> @end defun ,#+end_export") #+end_src * Covoiturage The goal here is to wrap an expensive long-running pure function so that calling it multiple times with the same arguments until it has not returned reuses the return value. The name is not pretty, if you find a better one please tell me. #+name: covoiturage-docstring #+caption: Docstring of the covoiturage function #+begin_src texinfo Wrap @var{f} so that simultaneous calls use the same return value. By simultaneous, we mean parallel calls with the same value for the first argument (compared with @code{equal?}) during the long-running evaluation of @var{f}. @var{f} takes at least one argument, the key. @var{f} may throw or return. It is executed in a Future (from @code{(ice-9 futures)}, so side effects should be synchronized somehow. #+end_src Internally, we use an atomic box. The atomic box contains an alist from key values to promises to futures. Futures are cool, but once they are constructed they start running immediately. However, if the lockless synchronization failed because of a concurrent update (for instance, inserting a new query), we should not start f at all before having updated the atomic box. Otherwise, for http-get, an unwanted connection to the server will still be attempted. To work around that, we use a promise to a future, so the future starts when the promise is forced -- thus, either by our thread, after we have synchronized the box ourselves; or by another thread, because it has already been added to the box. Once the function has returned, we notify all the other waiting threads and remove everything from the box. Function [[fn-covoiturage-remove]] does the removal. #+name: fn-covoiturage-remove #+caption: Remove a completed entry from a box #+begin_src scheme :eval no (define (remove box key) (let* ((old (atomic-box-ref box)) (new (filter (lambda (record) (not (equal? (car record) key))) old)) (discarded (atomic-box-compare-and-swap! box old new))) (unless (eq? discarded old) (remove box key)))) #+end_src The application of function /f/ may either return (multiple values), or throw. We want to forward both cases to all the callers, so we need to intercept the throws. Function [[fn-covoiturage-wrap-f]] does that: call f, and cons either ='ok= or ='error= and the return or throw values. #+name: fn-covoiturage-wrap-f #+caption: Wrap a thunk so that we know whether it threw or returned #+begin_src scheme :eval no (define (wrap-f f) (catch #t (lambda () (call-with-values f (lambda args (cons 'ok args)))) (lambda error (cons 'error error)))) #+end_src Symmetrically, when we want to report the value, we need to apply either =throw= or =values=. This is the job of [[fn-covoiturage-report]]. #+name: fn-covoiturage-report #+caption: Report the computed values #+begin_src scheme :eval no (define (report result) (case (car result) ((ok) (apply values (cdr result))) ((error) (apply throw (cdr result))))) #+end_src There are now two cases when we query a key: either there is an entry in the cache, in which case we just remove that, or we need to create a new future. Function [[fn-covoiturage-create]] will create the job (it will not start until it has been added to the cache). We note that if the return value should be cached for longer, the caching should happen before /f/ has finished executing. #+name: fn-covoiturage-create #+caption: Create a new job (don't start it yet) #+begin_src scheme :eval no (define (create box key f args) (delay (make-future (lambda () (let ((ret (wrap-f (lambda () (apply f key args))))) (remove box key) ret))))) #+end_src The query function works atomically: it first reads the cache, use it if it is present, and otherwise add an entry with a job and start the job. If it can't do that atomically, it retries. That's function [[fn-covoiturage-do-query]]. #+name: fn-covoiturage-do-query #+caption: Do query the cache #+begin_src scheme :eval no (define (do-query box f key args) (let ((old (atomic-box-ref box))) (if (assoc-ref old key) (report (touch (force (assoc-ref old key)))) (let* ((job (create box key f args)) (new (acons key job old))) (let ((discarded (atomic-box-compare-and-swap! box old new))) (if (eq? discarded old) ;; The box is updated, so the future can be ;; forced and f starts running (report (touch (force job))) ;; Concurrent update, retry. (do-query box f key args))))))) #+end_src Finally, we wrap everything in a public function that will manage the state (the atomic box). The final function is [[fn-covoiturage]]. #+name: fn-covoiturage #+caption: The final function definition #+begin_src scheme :eval no :noweb no-export (define-public (covoiturage f) "\ <>" (let ((box (make-atomic-box '()))) <> <> <> <> <> (lambda (key . args) (do-query box f key args)))) #+end_src * Web caching While we still use an atomic box for this cache, it is more complex than covoiturage. We need to fix HTTP headers, parse the responses, manipulate time. ** Filter =#:header= keyword argument Let's start with the headers. We need a function to parse keyword arguments, extract the non-ignored headers and the non-header keyword arguments. The function [[fn-filter-keyword]] extracts the headers, and [[fn-filter-headers]] removes the unwanted ones. #+name: fn-filter-keyword #+caption: Filter a keyword from a keyword list #+begin_src scheme :eval no (define (filter-keyword kw args) (define (search args-kept value args) (if (null? args) (values value (reverse args-kept)) (let ((next-keyword (car args)) (next-value (cadr args)) (rest (cddr args))) (if (eq? next-keyword kw) (search args-kept (or value next-value) rest) (search ;; args-kept is in reverse order! (cons* next-value next-keyword args-kept) value rest))))) (search '() #f args)) #+end_src #+name: fn-filter-headers #+caption: Remove headers for which we have the responsibility #+begin_src scheme :eval no (define (filter-headers headers) (filter (lambda (header) (case (car header) ((if-none-match if-modified-since) #f) (else #t))) headers)) #+end_src ** Time management Now, it's time for time management. We need different auxiliary functions: 1. Add seconds to a date, which is polymorphic enough to allow SRFI-19 dates and SRFI-19 times; 2. Compare two dates or times. *** Adding seconds to a date Function [[fn-date-add]] adds a number of seconds to a time or a date, and returns a date. #+name: fn-date-add #+caption: Add seconds to a date #+begin_src scheme :eval no (define (date-add date seconds) (let* ((duration (make-time time-duration 0 seconds)) (time (if (time? date) date (date->time-utc date))) (result (add-duration time duration))) (time-utc->date result))) #+end_src *** Compare two dates/times This is the job of functon [[fn-date-geqp]]. If the arguments are dates, they are first compared to times. #+name: fn-date-geqp #+caption: Compare two dates #+begin_src scheme :eval no (define (date>=? past future) (when (date? past) (set! past (date->time-utc past))) (when (date? future) (set! future (date->time-utc future))) (time>=? past future)) #+end_src ** Fix the responses There are different things we need to do to responses: 1. Add a date to every response that does not have one; 2. Estimate an expiration date for a response, based on either the Expires header or its Date and Cache-Control information; 3. Merge the headers for two responses. *** Ensure that a response has a date If the server did not set a date, we set it from the client clock. This is the job of [[fn-response-with-date]]. #+name: fn-response-with-date #+caption: Ensure that the response has a date. Use the client clock if needed. #+begin_src scheme :eval no (define (response-with-date response current-time) ;; current-time is only used if response does not have a date. (let ((date (response-date response (time-utc->date current-time))) (other-headers (filter (lambda (header) (not (eq? (car header) 'date))) (response-headers response)))) (build-response #:version (response-version response) #:code (response-code response) #:reason-phrase (response-reason-phrase response) #:headers (acons 'date date other-headers) #:port (response-port response)))) #+end_src *** Estimate the expiration date from the response The expiration date can be: - the max-age value of the cache control after the response date, if the cache control is present and has a max-age; - the value of the Expires header; - =#f= if the others failed. This is what [[fn-estimate-response-expires]] does. #+name: fn-estimate-response-expires #+caption: Estimate the expiration date of a response #+begin_src scheme :eval no (define (estimate-response-expires response) (let ((cache-control (response-cache-control response '())) (expires (response-expires response)) (date (response-date response))) (let ((max-age (assoc-ref cache-control 'max-age))) (cond (max-age (date-add date max-age)) (expires expires) (else #f))))) #+end_src *** Merge the headers of two responses The difficult part is that this will happen every time a response is refreshed from a 304 Not Modified response. So if we just merge the alists, the header alist will grow arithmetically. The trick is to use a hash table to remove duplicate header names. This explains function [[fn-merge-headers]]. #+name: fn-merge-headers #+caption: Merge the headers for two responses #+begin_src scheme :eval no (define (merge-headers most-important less-important) (let ((h+ (response-headers most-important)) (h- (response-headers less-important))) (let* ((alist (append h+ h-)) (table (alist->hash-table alist))) (hash-table->alist table)))) #+end_src ** Wrap =http-fetch= so as to store the response Now we can start using the =covoiturage= function. The inner function will call =http-get=, but between the moment we have the response and response body and the moment we return them, we add them to a web cache. The function [[fn-with-cache-add-to-box]] atomically adds the response and response-body to the cache, while ensuring there is a Date header. #+name: fn-with-cache-add-to-box #+caption: Add a response and its body to the cache #+begin_src scheme :eval no (define (with-cache-add-to-box box key response body current-time) (set! response (response-with-date response current-time)) (let* ((old (atomic-box-ref box)) (new (acons key (cons response body) (filter (lambda (cell) (not (equal? key (car cell)))) old))) (discarded (atomic-box-compare-and-swap! box old new))) (if (eq? discarded old) (values response body) (with-cache-add-to-box box key response body current-time)))) #+end_src The function [[fn-with-cache-smart-http-get]] takes more argument than regular =http-get=. It takes a base response and base response body, which are the previously cached versions of the response. Then, it proceeds with =http-get=. If the result is a 304 Not Modified response, then it updates the cached response with the headers from the new response and returns the previous response. Also, it updates the cache just before returning. #+name: fn-with-cache-smart-http-get #+caption: =http-get=, but compare with the previous cached responses #+begin_src scheme :eval no (define (with-cache-smart-http-get base-response base-response-body box current-time http-get key . args) (receive (response response-body) (apply http-get key args) (set! response (response-with-date response current-time)) (when (and base-response (eq? (response-code response) 304)) (set! response (build-response #:version (response-version base-response) #:code (response-code base-response) #:reason-phrase (response-reason-phrase base-response) #:headers (merge-headers response base-response) #:port (response-port response))) (set! response-body base-response-body)) (with-cache-add-to-box box key response response-body current-time))) #+end_src We can now wrap =with-cache-smart-http-get= within covoiturage, and focus on the main function. The main function takes the box, current-time and smart http-get function within covoiturage, looks up the response in the cache, and returns it as is if it is still valid. Otherwise, the smart http-get is called. This is the job of function [[fn-with-cache-query]]. #+name: fn-with-cache-query #+caption: The main function for the cache #+begin_src scheme :eval no (define (with-cache-query box current-time smart-http-get http-get key . args) (let* ((old (atomic-box-ref box)) (resp (assoc-ref old key))) (receive (request-headers other-request-args) (filter-keyword #:headers args) (unless request-headers (set! request-headers '())) (set! request-headers (filter-headers request-headers)) (if resp (let ((response (car resp)) (response-body (cdr resp))) (let ((expires (estimate-response-expires response))) (if (and expires (date>=? expires current-time)) ;; The response is cached and not expired (values response response-body) ;; The response is cached but expired (let ((etag (response-etag response)) (last-modified (response-last-modified response))) (when etag (set! request-headers (acons 'if-none-match (list etag) request-headers))) (when last-modified (set! request-headers (acons 'if-modified-since last-modified request-headers))) (apply smart-http-get response response-body box current-time http-get key #:headers request-headers other-request-args))))) ;; The response is not cached (apply smart-http-get #f #f box current-time http-get key #:headers request-headers other-request-args))))) #+end_src ** Putting everything together The final function is [[fn-with-cache]]. It takes its docstring from [[with-cache-docstring]]. #+name: with-cache-docstring #+caption: Docstring of the http-get with cache #+begin_src texinfo Call @var{http-get} with a cache. @var{http-get} is a procedure that takes an URI, and optionally some headers and other arguments, and either fails or returns two values: a response and a response body. The following headers are set by the caching function and are thus ignored from the additional arguments: @itemize @item @code{If-None-Match} @item @code{If-Modified-Since} @end itemize For cache validation without connecting to the server, @var{current-time} is needed. It should be a SRFI-19 time or date. By default, use the client system clock. The returned function takes the same arguments as @var{http-get} from @code{(web client)}, and also returns a response and a response body, however some headers are filtered, and responses with 304 will be converted to 200 with the help of the cache, if there is some. The optional argument @code{#:box} can be used to use a specific atomic box instead of one created just for the occasion. #+end_src #+name: fn-with-cache #+caption: HTTP GET with cache #+begin_src scheme :eval no :noweb no-export :tangle web/client/with-cache.scm :mkdirp t (define-module (web client with-cache) #:use-module (web client) #:use-module (web response) #:use-module (srfi srfi-1) #:use-module (srfi srfi-19) #:use-module (srfi srfi-69) #:use-module (ice-9 atomic) #:use-module (ice-9 receive) #:use-module (ice-9 optargs) #:use-module (ice-9 threads) #:use-module (ice-9 futures)) <> (define*-public (with-cache #:key (http-get http-get) (current-time current-time) (box #f)) "\ <>" (let ((box (or box (make-atomic-box '())))) <> <> <> <> <> <> <> <> <> (define smart-http-get (covoiturage with-cache-smart-http-get)) <> (lambda (uri . args) (apply with-cache-query box (current-time) smart-http-get http-get uri args)))) #+end_src * GNU Free Documentation License :PROPERTIES: :APPENDIX: t :END: #+texinfo: @include fdl.texi * Index :PROPERTIES: :INDEX: cp :END: