Browse Source

First prototype.

Daniil Baturin 3 years ago
parent
commit
f17e457c3f
13 changed files with 362 additions and 0 deletions
  1. 102 0
      README.md
  2. 0 0
      dune
  3. 2 0
      dune-project
  4. 1 0
      sample-site/.gitignore
  5. 9 0
      sample-site/site/about.inc
  6. 7 0
      sample-site/site/index.inc
  7. 10 0
      sample-site/soupault.conf
  8. 7 0
      sample-site/templates/main.html
  9. 40 0
      src/config.ml
  10. 30 0
      src/defaults.ml
  11. 4 0
      src/dune
  12. 139 0
      src/soupault.ml
  13. 11 0
      src/utils.ml

+ 102 - 0
README.md

@@ -0,0 +1,102 @@
+soupault
+========
+
+Soupault is a static website generator based on HTML rewriting rather than template
+processing. It is based on the [lambdasoup](http://aantron.github.io/lambdasoup/) library and named after
+the French dadaist and surrealist poet Philippe Soupault.
+
+In a startup pitch style, it's a website generator for Markdown haters.
+
+# Motivation
+
+There are lots of static website generators around already, and most of them are variations on
+"take a Markdown file, convert to HTML, and feed to a template processor".
+
+In practice, limitations of Markdown make people innvent ad hoc extensions or just mix Markdown with HTML,
+which defeats the purpose of Markdown. The need to store metadata such as page titles leads to Frankenstein
+formats that mix YAML headers with Markdown. That solution can work, but can there be any alternatives?
+
+Any webmaster will learn to write HTML eventually, most of the inconvenience of having to close tags etc.
+can be offset with a good editor, and as many client-side scripts and microformats showed, HTML elements
+and attributes can store metadata just fine.
+
+HTML rewriting can also do things that are hard or impossible to do with templates, such as deleting something
+from a page if needed rather than just adding new content to a template.
+
+If anything, a Markdown/RST/whatever preprocessor can be plugged into the pipeline.
+
+# Design goals
+
+* Do not use any special syntax other than HTML in templates and page files
+* Make it easy to create arbitrarily nested website structure
+* Provide built-in functionality for common tasks
+
+Right now soupault is a prototype and does not provide all that yet.
+
+# Usage
+
+## Directory structure
+
+Website content is stored in a directory referred to as `site_dir`. By default it's `site/`.
+Every subdirectory is a section. Every subdirectory of a subdirectory is a subsection and so on.
+
+Soupault's behaviour and settings are controlled by a config file names `soupault.conf`.
+
+Templates are stored in a directory named `templates/` by convention. They can be stored anywhere,
+but with default configuration soupault will look for `templates/main.html` file to use as a default template.
+
+You can fine a real example in the `sample-site` directory here:
+
+```
+sample-site/
+├── site
+│   ├── about.inc
+│   └── index.inc
+├── templates
+│   └── main.html
+└── soupault.conf
+```
+
+At build time, soupault will produce something like this from it:
+
+```
+build/
+├── about
+│   └── index.html
+└── index.html
+
+```
+
+## Creating page templates
+
+Complete pages are created for page skeletons, that is, HTML pages with all container tags left empty.
+Because they are processed by HTML rewriting rather than fed to a template processor, I avoid calling them templates.
+Since there's no template processor, there's no special syntax for elements to be replaced.
+The system simply inserts new HTML content into elements with certain selectors.
+
+This is what the minimum viable page skeleton looks like:
+
+```
+<html>
+  <head> </head>
+  <body>
+  </body>
+</html>
+```
+
+By default, soupault will insert page content into the `<body>` element. You can override it using the
+`content_selector` option in the config. You can use any valid CSS3 selector.
+
+For example, this will make soupault insert the content into the element with `id="content"`:
+```
+[settings]
+  content_selector = "#content"
+```
+
+# TODO
+
+* Widget dispatch mechanism
+* Built-in widgets (include, exec, title, breadcrumbs, TOC...)
+* Per-section and per-page settings overrides
+* Generated page caching
+* Page preprocessors

+ 0 - 0
dune


+ 2 - 0
dune-project

@@ -0,0 +1,2 @@
+(lang dune 1.9)
+(name soupault)

+ 1 - 0
sample-site/.gitignore

@@ -0,0 +1 @@
+build/

+ 9 - 0
sample-site/site/about.inc

@@ -0,0 +1,9 @@
+<h1>About me</h1>
+<p>
+I have graduated from the Miskatonic Institute of Technology where I majored in computer science
+and minored in applied mythology. My thesis was &ldquo;Zipf's law and the Necronomicon&rdquo;
+</p>
+<p>
+I'm now working on obscenity-tolerant data transmission protocols at the Cosmodemonic Telegraph Company.
+</p>
+

+ 7 - 0
sample-site/site/index.inc

@@ -0,0 +1,7 @@
+<h1>Welcome to my homepage!</h1>
+<p>
+I'm J. Random Hacker. I like hacking various things.
+</p>
+<p>
+On this page you can find various information about me.
+</p>

+ 10 - 0
sample-site/soupault.conf

@@ -0,0 +1,10 @@
+[settings]
+  # Where generated files go
+  build_dir = "build"
+
+  # Where page files are stored
+  site_dir = "site"
+
+  # Where in the template the page content is inserted
+  content_selector = "#content"
+  

+ 7 - 0
sample-site/templates/main.html

@@ -0,0 +1,7 @@
+<html>
+  <head />
+  <body>
+    <div id="content">
+    </div>
+  </body>
+</html>

+ 40 - 0
src/config.ml

@@ -0,0 +1,40 @@
+open Defaults
+open Utils
+
+(* List all keys of a TOML table
+   This is used to retrieve a list of widgets to call
+ *)
+let list_config_keys table =
+  TomlTypes.Table.fold (fun k _ ks -> (TomlTypes.Table.Key.to_string k) :: ks ) table []
+
+(* Read and parse a TOML file *)
+let read_config path =   
+  try
+    let open Toml.Parser in
+    let conf = from_filename path |> unsafe in
+    Ok conf
+  with
+  | Sys_error err -> Error (Printf.sprintf "Could not read config file %s" err)
+  | Toml.Parser.Error (err, _) -> Error (Printf.sprintf "Could not parse config file %s: %s" path err)
+
+(* Update global settings with values from the config, if there are any *)
+let update_settings settings config =
+  let st = TomlLenses.(get config (key "settings" |-- table)) in
+  match st with
+  | None ->
+     let () = Logs.warn @@ fun m -> m "Could not find the [settings] table in the config, using defaults" in
+     settings
+  | Some st ->
+    let verbose = TomlLenses.(get st (key "verbose" |-- bool)) |> default settings.verbose in
+    let strict = TomlLenses.(get st (key "strict" |-- bool)) |> default settings.strict in
+    let site_dir = TomlLenses.(get st (key "site_dir" |-- string)) |> default settings.site_dir in
+    let build_dir = TomlLenses.(get st (key "build_dir" |-- string)) |> default settings.build_dir in
+    let content_selector = TomlLenses.(get st (key "content_selector" |-- string)) |> default settings.content_selector in
+    {settings with
+       verbose = verbose;
+       strict = strict;
+       site_dir = site_dir;
+       build_dir = build_dir;
+       content_selector = content_selector
+     }
+

+ 30 - 0
src/defaults.ml

@@ -0,0 +1,30 @@
+type settings = {
+  verbose : bool;
+  strict : bool;
+  doctype : string;
+  build_dir : string;
+  site_dir : string;
+  index_page : string;
+  index_file : string;
+  default_template : string;
+  content_selector : string
+}
+
+type env = {
+  template : string;
+  nav_path : string list
+}
+
+let config_file = "soupault.conf"
+
+let default_settings = {
+  verbose = false;
+  strict = true;
+  doctype = "<!DOCTYPE html>";
+  build_dir = "build";
+  site_dir = "site";
+  index_page = "index";
+  index_file = "index.html";
+  default_template = "templates/main.html";
+  content_selector = "body"
+}

+ 4 - 0
src/dune

@@ -0,0 +1,4 @@
+(executable
+ (name soupault)
+ (libraries lambdasoup toml fileutils logs logs.fmt)
+ (preprocess (pps ppx_monadic)))

+ 139 - 0
src/soupault.ml

@@ -0,0 +1,139 @@
+open Defaults
+
+module FU = FileUtil
+module FP = FilePath
+
+let (+/) left right =
+    FP.concat left right
+
+(* Yet another error monad *)
+let bind r f =
+  match r with
+    Ok r -> f r
+  | Error _ as err -> err
+
+let return x = Ok x
+
+(* Logging setup *)
+
+
+let setup_logging verbose =
+  let level = if verbose then Logs.Info else Logs.Warning in
+  Logs.set_level (Some level);
+  Logs.set_reporter (Logs_fmt.reporter ())
+
+(* Filesystem stuff *)
+let list_dirs path =
+    FU.ls path |> FU.filter FU.Is_dir
+
+let list_page_files path =
+    FU.ls path |> FU.filter (FU.Is_file)
+
+let make_build_dir build_dir =
+  let () = Logs.info @@ fun m -> m "Build directory \"%s\" does not exist, creating" build_dir in
+  try
+    let () = FU.mkdir build_dir in Ok ()
+  with FileUtil.MkdirError e -> Error e
+
+
+(* Create a directory for the page if necessary.
+   If the page is the index page of its section, no directory is necessary.
+   Otherwise, "site/foo.html" becomes "build/foo/index.html" to provide
+   a clean URL.
+ *)
+let make_page_dir settings target_dir page_name =
+  if page_name = settings.index_page then Ok target_dir
+  else
+    let target_dir = target_dir +/ page_name in
+    try
+      FU.mkdir ~parent:true target_dir; Ok target_dir
+    with FileUtil.MkdirError e -> Error e
+
+let load_html file =
+  try Ok (Soup.read_file file |> Soup.parse)
+  with Sys_error e -> Error e
+
+let save_html soup file =
+  try Ok (Soup.pretty_print soup |> Soup.write_file file)
+  with Sys_error e -> Error e
+
+(* Feels wrong to mix the two, this probably should be split into separate functions
+   We need a way to exit early if a template is unusable (that is, has no elements
+   matching desired selector).
+ *)
+let get_template file selector =
+  try
+    let template = Soup.read_file file in
+    let html = Soup.parse template in
+    let element = Soup.select_one selector html in
+    begin
+      match element with
+      | Some _ -> Ok template
+      | None -> Error (Printf.sprintf "Template %s has no element matching selector \"%s\"" file selector)
+    end
+  with Sys_error e -> Error e
+
+let include_content selector html page_file =
+  let content = load_html page_file in
+  match content with
+  | Ok c ->
+    let element = Soup.select_one selector html in
+    begin
+      match element with
+      | Some element -> Ok (Soup.append_child element c)
+      | None -> Error (Printf.sprintf "Failed to insert page content: no element matches selector \"%s\"" selector)
+    end
+  | Error _ as e -> e
+
+let process_page settings env target_dir page_file =
+  let page_name = FP.basename page_file |> FP.chop_extension in
+  let%m target_dir = make_page_dir settings target_dir page_name in
+  let%m target_file = Ok (target_dir +/ settings.index_file) in
+  let () = Logs.info @@ fun m -> m "Processing page %s" page_file in
+  let html = Soup.parse env.template in
+  let%m () = include_content settings.content_selector html page_file in
+  let%m () = save_html html target_file in
+  Ok page_file
+
+(* Monad escape... for now *)
+let _process_page settings env target_dir page_file =
+    let res = process_page settings env target_dir page_file in
+    match res with
+      Ok _ -> ()
+    | Error e -> Logs.warn @@ fun m -> m "Error processing page %s: %s" page_file e
+
+(* Process the source directory recursively
+   
+ *)
+let rec process_dir settings env base_src_dir base_dst_dir dirname =
+    let src_path = base_src_dir +/ dirname in
+    let dst_path = base_dst_dir +/ dirname in
+    let () = Logs.info @@ fun m -> m "Entering directory %s" src_path in
+    let nav_path = if dirname <> "" then dirname :: env.nav_path else env.nav_path in
+    let env = {env with nav_path = nav_path} in
+    let pages = list_page_files src_path in
+    let dirs = List.map (FP.basename) (list_dirs src_path) in
+    let () = List.iter (_process_page settings env dst_path) pages;
+             ignore @@ List.iter (process_dir settings env src_path dst_path) dirs
+    in ()
+
+let initialize () =
+  let settings = Defaults.default_settings in
+  let%m config = Config.read_config Defaults.config_file in
+  let settings = Config.update_settings settings config in
+  let%m default_template = get_template settings.default_template settings.content_selector in
+  let default_env = {template=default_template; nav_path=[]} in
+  Ok (config, settings, default_env)
+  
+let main () =
+  let%m _, settings, default_env = initialize () in
+  let () = setup_logging settings.verbose in
+  let%m () = make_build_dir settings.build_dir in
+  let%m () = Ok (process_dir settings default_env settings.site_dir settings.build_dir "") in
+  return ()
+
+let () =
+  let res = main () in
+  match res with
+  | Ok _ -> exit 0
+  | Error e -> Printf.printf "Error: %s\n" e; exit 1

+ 11 - 0
src/utils.ml

@@ -0,0 +1,11 @@
+(* Exception-safe list tail, assuming a tail of an empty list is an empty list *)
+let safe_tail xs =
+  match xs with
+    [] -> []
+  | _ :: xs' -> xs'
+
+(* Linking batteries just for this would be an overkill... *)
+let default default_value opt =
+  match opt with
+    None -> default_value
+  | Some value -> value