OCaml functional web scraping library https://yannham.github.io/mechaml

Yann Hamdaoui 726130315a Doc improvement + switch to gh-pages branch for hosting 8 months ago
examples 070f5d8397 Migration from Oasis to dune 9 months ago
src 726130315a Doc improvement + switch to gh-pages branch for hosting 8 months ago
test 34d1f757bb Change Formatting to Format 9 months ago
.gitignore 070f5d8397 Migration from Oasis to dune 9 months ago
.merlin 1fa5e8e043 Add more tests and fix revelead bugs in the Form and Cookiejar modules. 2 years ago
.travis.yml a08690a74e Add Travis CI for OCaml 4.06 and 4.07 9 months ago
LICENSE 03492f0259 Licensing 3 years ago
Makefile 273f64c770 Put Form module in the Page one, corrected mli files, some errors, adding a oasis configuration for build 3 years ago
README.md cec1d25fad Update README 9 months ago
dune-project 070f5d8397 Migration from Oasis to dune 9 months ago
mechaml.opam 070f5d8397 Migration from Oasis to dune 9 months ago

README.md

Mechaml Build Status

Description

Mechaml is a functional web scraping library that allows to :

  • Fetch web content
  • Analyze, fill and submit HTML forms
  • Handle cookies, headers and redirections

Mechaml is built on top of existing libraries that provide low-level features : Cohttp and Lwt for asynchronous I/O and HTTP handling, and Lambdasoup to parse HTML. It provides an interface that handles the interactions between these and add a few other features.

Overview

The library is divided into 3 main modules :

  • Agent : User-agent features. Perform requests, get back content, headers, status code, ...
  • Cookiejar : Cookies handling
  • Page : HTML parsing and forms handling

The Format module provides helpers to manage the formatted content in forms such as date, colors, etc. For more details, see the documentation

Installation

From opam

opam install mechaml

From source

Mechaml uses the dune build system, which can be installed through opam. Then, just run

dune build

to build the library.

Use dune build @doc to generate the documentation, dune runtest to build and execute tests, and dune build examples/XXX.exe to compile example XXX.

Usage

Here is sample of code that fetches a web page, fills a login form and submits it in the monadic style:

open Mechaml
module M = Agent.Monad
open M.Infix

let require msg = function
  | Some a -> a
  | None -> failwith msg

let action_login =
  Agent.get "http://www.somewebsite.com"
  >|= Agent.HttpResponse.page
  >|= (function page ->
    page
    |> Page.form_with "[name=login]"
    |> require "Can't find the login form !"
    |> Page.Form.set "username" "mynick"
    |> Page.Form.set "password" "@xlz43")
  >>= Agent.submit

let _ =
  M.run (Agent.init ()) action_login

More examples are available in the dedicated folder.

license

GNU LGPL v3