{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5642ace5-fcab-4ca8-a96c-dcd7b3d51eb2",
   "metadata": {},
   "source": [
    "# Advanced\n",
    "\n",
    "This page details the more advanced `HyperOptimizer` options. \n",
    "\n",
    "First it might be useful to sketch the underlying process for performing a\n",
    "contraction path search:\n",
    "\n",
    "1. A driver (such as `'greedy'` or `'kahypar'` ) along with a set of heuristic \n",
    "parameters (the 'hyper-parameters') are sampled from an 'optimizer'.\n",
    "2. These are used to build a single \n",
    "[`ContractionTree`](cotengra.core.ContractionTree)\n",
    "3. Optionally, some modification of this tree is then performed - either \n",
    "slicing, subtree reconfiguration, or some combination such as dynamic slicing\n",
    "4. A score is then generated for the tree, for example the contraction cost\n",
    "5. This score, the driver and hyper-parameters are fed back to the optimizer to \n",
    "update its search space\n",
    "6. The process is repeated, optionally in parallel, for a set number of repeats\n",
    "or time, with the best scoring tree being returned finally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "62de9546",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cotengra as ctg"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db15e9b3",
   "metadata": {},
   "source": [
    "## Drivers\n",
    "\n",
    "The following are valid initial methods to `HyperOptimizer`(`methods=[...])`, \n",
    "which will tune each one and ultimately select the best performer:\n",
    "\n",
    "* `'kahypar'` *(default)*\n",
    "* `'greedy'` *(default)*\n",
    "* `'labels'` (*default if `kahypar` not installed* ) - a pure python implementation of hypergraph community detection\n",
    "* [igraph](https://igraph.org/python/) **partition** based\n",
    "  * `'spinglass'`\n",
    "  * `'labelprop'`\n",
    "* [igraph](https://igraph.org/python/) **dendrogram** based (yield entire community structure tree at once)\n",
    "  * `'betweenness'`\n",
    "  * `'walktrap'`\n",
    "* **linegraph tree decomposition** based:\n",
    "  * `'quickbb'`\n",
    "  * `'flowcutter'`\n",
    "\n",
    "You can also register your own methods to add to the mix with \n",
    "[`register_hyper_function`](cotengra.hyper.register_hyper_function)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e249b17",
   "metadata": {},
   "source": [
    "## Slicing and Subtree Reconfiguration\n",
    "\n",
    "There are three types of 'post-processing' of trees that the hyper optimizer\n",
    "can perform. For details of what each of these does, or how to perform them \n",
    "outside of the hyper optimization loop, see the 'Tree Surgery' sections:\n",
    "\n",
    "- [Index Slicing](trees.ipynb#index-slicing)\n",
    "- [Subtree Reconfiguration](trees.ipynb#subtree-reconfiguration)\n",
    "- [Dynamic Slicing](trees.ipynb#dynamic-slicing)\n",
    "\n",
    "\n",
    "### **Basic slicing** - ``slicing_opts``\n",
    "\n",
    "This removes indices from the contraction greedily one by one until the \n",
    "specified options are met, without changing the tree structure at all. This is cheap\n",
    "to perform but can incur significant 'slicing overhead' for complicated \n",
    "contractions. To turn it on supply a `dict` to `slicing_opts`, which will be \n",
    "supplied as kwargs to \n",
    "[`ContractionTree.slice`](cotengra.core.ContractionTree.slice)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c01e30da",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt = ctg.HyperOptimizer(slicing_opts={'target_size': 2**28})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0d0418d",
   "metadata": {},
   "source": [
    "Or if we want paths with at least 1024 independent contractions we could use:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4169e2b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt = ctg.HyperOptimizer(slicing_opts={'target_slices': 2**10})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5554e226",
   "metadata": {},
   "source": [
    "Note that you need to work with the `ContractionTree` object rather than a raw\n",
    "path (i.e. call `opt.search` or `quimb.tensor.TensorNetwork.contraction_tree`) \n",
    "if you want to access the sliced indices or perform the sliced contraction \n",
    "(although a tree reconstructed from the raw path will be 'sliceable' in the \n",
    "sense that the indices will be easily refound)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43fb72ac",
   "metadata": {},
   "source": [
    "### **Dynamic slicing** - ``slicing_reconf_opts``\n",
    "\n",
    "This removes indices one by one, but also performs subtree reconfiguration \n",
    "between removals in order to modify the tree to take account of the slicing.\n",
    "Slower than basic slicing but can be much higher quality, especially when \n",
    "slicing many indices of a complicated contraction. This is turned on by \n",
    "supplying a `dict` to `slicing_reconf_opts`, which will be supplied as kwargs to \n",
    "[`ContractionTree.slice_and_reconfigure`](cotengra.core.ContractionTree.slice_and_reconfigure).\n",
    "If both basic and dynamic slicing are specified then the basic slicing is \n",
    "performed first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b8c530c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt = ctg.HyperOptimizer(slicing_reconf_opts={'target_size': 2**28})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad800f22",
   "metadata": {},
   "source": [
    "### **Subtree reconfiguration** - ``reconf_opts``\n",
    "\n",
    "Finally, one can perform subtree reconfiguration alone - ranging over subtrees\n",
    "of the main tree optimally reconfiguring them locally to lower the target \n",
    "score globally. This is turned on by supplying a `dict` to `reconf_opts`, which\n",
    "will be supplied as kwargs to \n",
    "[`ContractionTree.subtree_reconfigure`](cotengra.core.ContractionTree.subtree_reconfigure).\n",
    "The 'reconfing' will be performed after any slicing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "16aac7e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt = ctg.HyperOptimizer(reconf_opts={})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53358f24",
   "metadata": {},
   "source": [
    "Note you just supply an empty dict if you want to use the default options. Some\n",
    "parameters, such as the score function to minimize (see next section), will be\n",
    "set for you to match the global objective if not specified. You can supply\n",
    "`reconf_opts={'forested' :True, ...}`  to use forested subtree reconfiguration.\n",
    "\n",
    "\n",
    "````{note}\n",
    "Note all three of these can be applied successively:\n",
    "\n",
    "```python\n",
    "opt = ctg.HyperOptimizer(\n",
    "    slicing_opts={'target_size': 2**40},         # first do basic slicing\n",
    "    slicing_reconf_opts={'target_size': 2**28},  # then advanced slicing with reconfiguring\n",
    "    reconf_opts={'subtree_size': 14},            # then finally just higher quality reconfiguring\n",
    ")\n",
    "```\n",
    "\n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d18ad3e",
   "metadata": {},
   "source": [
    "## **Simulated Annealing** - ``simulated_annealing_opts``\n",
    "\n",
    "`cotengra` has an implementation of simulated annealing along the lines of\n",
    "[arXiv:2108.05665](https://arxiv.org/abs/2108.05665), the core function of which is\n",
    "[`simulated_anneal_tree`](cotengra.pathfinders.path_simulated_annealing.simulated_anneal_tree).\n",
    "\n",
    "You can turn this on by supplying a `dict` to `simulated_annealing_opts`, which\n",
    "will be supplied as kwargs to the above function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a01a27bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt = ctg.HyperOptimizer(simulated_annealing_opts={})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d88ed1d",
   "metadata": {},
   "source": [
    "There is support for slicing by supplying the `target_size` in this dict.\n",
    "The simulated anneal is performed *first-thing* after generating the initial \n",
    "tree, and can still be combined with any of the other processing steps above."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26715165",
   "metadata": {},
   "source": [
    "## Objective\n",
    "\n",
    "What the optimizer (and subtree reconfiguration) considers the 'best' tree or\n",
    "objective function is set with the `minimize` kwarg.\n",
    "\n",
    "* `minimize='flops'` - minimize the total number of scalar operations, \n",
    "* `minimize='size'` - minimize the size of the largest intermediate tensor,\n",
    "\n",
    "```{hint}\n",
    "Generally to control the memory usage of a contraction it is better to use \n",
    "slicing rather than targetting the size directly.\n",
    "```\n",
    "\n",
    "* `minimize='write'` - minimize the total amount of 'memory written', i.e. the sum of sizes of all intermediate tensors. This targets memory speed, but is best weighted with the flops cost as well - see below. However this score is also relevant for automatic differentiating a contraction, where naively all intermediates must be kept.\n",
    "* `minimize='combo'` - minimize the sum of $FLOPS + \\alpha \\times WRITE$ using the default $\\alpha=64$\n",
    "* `minimize=f'combo-{alpha}'` - use a custom $\\alpha$ for the combo cost\n",
    "\n",
    "For real world contractions, where both clock speed \n",
    "and memory speed are important, using:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "f2e2b87d",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt = ctg.HyperOptimizer(minimize='combo')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7eae003a",
   "metadata": {},
   "source": [
    "should provide decent baseline performance for typical CPUs and GPUs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87cada32",
   "metadata": {},
   "source": [
    "## Optimization Library\n",
    "\n",
    "Optimizing the hyper parameters of the drivers which generate trees is done by \n",
    "one of many backend 'ask-and-tell' optimization libraries. The cost of asking\n",
    "for a new sample point and telling the resulting score should be much smaller \n",
    "than actually generating the trees themselves. However, for Gaussian process\n",
    "bayesian optimization, for example, the cost can become considerable once a few\n",
    "hundred trials have been fitted. As such, you may want to use a different \n",
    "library or algorithm. The options are:\n",
    "\n",
    "* [optuna](https://github.com/optuna/optuna) - **Tree of Parzen Estimators** used by default\n",
    "* [baytune](https://github.com/HDI-Project/BTB) - *Bayesian Tuning and Bandits* - **Gaussian Processes** used by default\n",
    "* [chocolate](https://chocolate.readthedocs.io/en/latest/) - the **CMAES** optimization algorithm is used by default (`sampler='QuasiRandom'` also useful)\n",
    "* [skopt](https://scikit-optimize.github.io/stable/) - random forest as well as Gaussian process regressors (high quality but slow)\n",
    "* [nevergrad](https://facebookresearch.github.io/nevergrad/) - various population and evolutionary algorithms (v fast & suitable for highly parallel path findings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c245c67b",
   "metadata": {},
   "source": [
    "These are specified like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d0cbf52c",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt = ctg.HyperOptimizer(optlib='nevergrad')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4694e2ad",
   "metadata": {},
   "source": [
    "To select which algorithm the optimizer uses supply the `sampler=` kwarg, other\n",
    "options to instantiate the optimizer are also passed in as kwargs.\n",
    "\n",
    "If you don't want to use optimization (either because no library is installed\n",
    "or for the least overhead), you can supply:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "f46ea569",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt = ctg.HyperOptimizer(optlib='random')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "415ebc27",
   "metadata": {},
   "source": [
    "which will just randomly sample the space of hyper parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e275f56",
   "metadata": {},
   "source": [
    "### Optimization Space\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fad99d2",
   "metadata": {},
   "source": [
    "You can view (and modify) the actual hyper parameter space of each driver with\n",
    "the following call:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "4783dffd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'random_strength': {'type': 'FLOAT_EXP', 'min': 0.001, 'max': 1.0},\n",
       " 'temperature': {'type': 'FLOAT_EXP', 'min': 0.001, 'max': 1.0},\n",
       " 'costmod': {'type': 'FLOAT', 'min': 0.0, 'max': 50.0}}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ctg.get_hyper_space()['greedy']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed4f7864",
   "metadata": {},
   "source": [
    "## Parallelization\n",
    "\n",
    "`cotengra` can fairly naturally parallelize over the trials of the hyper \n",
    "optimization, and does so by default `HyperOptimizer(parallel='auto')`.\n",
    "The basic `parallel` options are:\n",
    "\n",
    "- `parallel=False` - no parallelization\n",
    "- `parallel=True` - alias for using default parallelization and number of workers\n",
    "- `parallel=X:int` - alias for using default parallelization but specify number of workers\n",
    "- `parallel=pool` - supply your own `concurrent.futures.ProcessPoolExecutor` compatible pool\n",
    "\n",
    "For non-distributed pools the default number of workers is computed, in order of preference, as:\n",
    "\n",
    "1. `COTENGRA_NUM_WORKERS` environment variable\n",
    "2. `OMP_NUM_THREADS` environment variable\n",
    "3. `os.cpu_count()`\n",
    "\n",
    "### `parallel='concurrent.futures'` \n",
    "\n",
    "The default parallelization is the python [standard library](https://docs.python.org/3/library/concurrent.futures.html).\n",
    "A cached `ProcessPoolExecutor` is used.\n",
    "    \n",
    "### `parallel='loky'` \n",
    "\n",
    "Use the [`loky`](https://loky.readthedocs.io/) (also fallback to \n",
    "`joblib.externals.loky`) reusable pool. \n",
    "\n",
    "### `parallel='dask'` \n",
    "\n",
    "Use the [`dask`](https://dask.org/) distributed scheduler. If a `dask.distributed.Client`\n",
    "already exists then it will be used, otherwise a new one will be created. You can\n",
    "also supply a `dask.distributed.Client` object directly.\n",
    "\n",
    "### `parallel='ray'` \n",
    "\n",
    "Use the [`ray`](https://ray.readthedocs.io/) distributed scheduler, with \n",
    "potentially better performance than `dask`. A `ray` runtime will be created if\n",
    "not already running.\n",
    "\n",
    "\n",
    "### Others such as `mpi4py`\n",
    "\n",
    "Other libraries that conform to the `concurrent.futures` interface can be used,\n",
    "for example the [`mpi4py`](https://mpi4py.readthedocs.io/en/stable/) library.\n",
    "\n",
    "```{warning}\n",
    "There are two caveats to parallelization to consider:\n",
    "\n",
    "1. Since trials have to be pre dispatched, a delay is introduced between \n",
    "   sampling and reporting trials to the underlying optimizer\n",
    "2. If the underlying optimizer cannot suggest trials quickly enough, or the pre\n",
    "   dispatch is not large enough, workers can become idle.\n",
    "\n",
    "Consider using `optlib='nevergrad'` or `optlib='random'` if you are optimizing\n",
    "with many tens or more workers to avoid being bottlenecked by the default Bayesian \n",
    "optimization for example.\n",
    "```\n",
    "\n",
    "### `parallel='threads'` \n",
    "\n",
    "You can specify a `ThreadPoolExecutor` for parallelization, but this is only\n",
    "useful tasks where the underlying driver releases the GIL, e.g. \n",
    "[`RandomGreedyOptimizer`](cotengra.pathfinders.path_basic.RandomGreedyOptimizer),\n",
    "with `accel=True`, where threads are used by default."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78ac17b8",
   "metadata": {},
   "source": [
    "## Termination\n",
    "\n",
    "The [`HyperOptimizer`](cotengra.core.HyperOptimizer) can be terminated in \n",
    "several ways.\n",
    "\n",
    "- `max_repeats=X:int` - terminate after this many trees have been generated. \n",
    "  The default of 128 is reasonable for simple to moderatetely complex \n",
    "  contractions using a Bayesian optimizer.\n",
    "- `max_time=T` - terminate after `T` seconds have elapsed\n",
    "- `max_time=f'rate:{ops_per_second}'` - terminate once the time taken searching \n",
    "  is greater than if the best contraction found so far was performed at the \n",
    "  rate `ops_per_second`. This can be used to avoid spending a long time on \n",
    "  cheap contractions. If the contraction will be performed many times, divide \n",
    "  the  rate by the commensurate factor. For example, if we have a CPU with very\n",
    "  approximately a billion OPS/s but we expect to perform the contraction a \n",
    "  hundred times we could use `max_time='rate:1e7'`.\n",
    "- `max_time='equil:{X:int}'` - terminate if no better contraction has been \n",
    "  found in the last `X` trials. This can be used to avoid spending a long time\n",
    "  on a contraction that is already converged.\n",
    "\n",
    "\n",
    "Note that the [`HyperOptimizer`](cotengra.core.HyperOptimizer) is stateful; \n",
    "each time it is supplied to the same contraction it will reset the above \n",
    "conditions and continue.\n",
    "\n",
    "```{hint}\n",
    "The [`ReusableHyperOptimizer`](cotengra.core.ReusableHyperOptimizer) on the \n",
    "other hand only runs once.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e47a6e6f",
   "metadata": {},
   "source": [
    "## Large graphs - `RandomGreedyOptimizer`\n",
    "\n",
    "The `HyperOptimizer` can handle medium sizes of graph (up to 1000s of tensors), but the hyper-optimization and explicit `ContractionTree` construction of every trial can become slow for very large contractions. An alternative more suited to very lightweight path finding that can handle 10,000s of tensors is the [`RandomGreedyOptimizer`](cotengra.pathfinders.path_basic.RandomGreedyOptimizer). This only samples random greedy paths whilst simultaneously calculating each flops score, thus avoiding the heavier machinery of the hyper optimizer. Moreover, the core function [`optimize_random_greedy_track_flops`](cotengra.pathfinders.path_basic.optimize_random_greedy_track_flops) has an accelerated rust counterpart in [cotengrust](https://github.com/jcmgray/cotengrust), that is both faster and allows threaded parallelization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "7d2fe2ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "inputs, output, _, size_dict = ctg.utils.lattice_equation([100, 100])\n",
    "\n",
    "opt = ctg.RandomGreedyOptimizer(\n",
    "    max_repeats=32,\n",
    "    temperature=(0.01, 0.1),  # sample a range\n",
    "    seed=42,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39ecfb61",
   "metadata": {},
   "source": [
    "It has the same call methods as all `cotengra` optimizers:\n",
    "\n",
    "- `path = opt(inputs, output, size_dict)`: return the best contraction path\n",
    "- `ssa_path = opt.ssa_path(inputs, output, size_dict)`: return the best contraction path in SSA format\n",
    "- `tree = opt.search(inputs, output, size_dict)`: return the best `ContractionTree`\n",
    "\n",
    "The marginally cheapest of these is the SSA format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "06d097cd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 3.21 s, sys: 58 ms, total: 3.26 s\n",
      "Wall time: 613 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "ssa_path = opt.ssa_path(inputs, output, size_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0040ef5",
   "metadata": {},
   "source": [
    "Once the optimizer has been run, the corresponding flops (log10) can be retrieved with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "395c2b80",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "65.02261352539062"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "opt.best_flops"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}