Computational Psychiatry: A Systems Biology Approach to the Epigenetics of Mental Disorders 1st ed.

12. Mathematical Appendix

Rodrick Wallace1

(1)

New York State Psychiatric Institute, New York, NY, USA

12.1 Groupoids

Following Weinstein (1996) closely, a groupoid, G, is defined by a base set A upon which some mapping—a morphism—can be defined. Note that not all possible pairs of states (aj , ak ) in the base set A can be connected by such a morphism. Those that can define the groupoid element, a morphism g = (a j , ak ) having the natural inverse g−1 = (ak , aj ). Given such a pairing, it is possible to define “natural” end-point maps α(g) = aj , β(g) = ak from the set of morphisms G into A, and a formally associative product in the groupoid g1 g2 provided α(g 1 g2) = α(g1), β(g1 g2) = β(g2), and β(g1) = α(g2). Then the product is defined, and associative, (g1 g2)g3 = g1(g2 g3).

In addition, there are natural left and right identity elements λ g , ρ g such that λ g g = g = g ρ g (Weinstein 1996).

An orbit of the groupoid G over A is an equivalence class for the relation aj  ∼ Ga k if and only if there is a groupoid element g with α(g) = aj and β(g) = ak . Following Cannas DaSilva and Weinstein (1999), we note that a groupoid is called transitive if it has just one orbit. The transitive groupoids are the building blocks of groupoids in that there is a natural decomposition of the base space of a general groupoid into orbits. Over each orbit there is a transitive groupoid, and the disjoint union of these transitive groupoids is the original groupoid. Conversely, the disjoint union of groupoids is itself a groupoid.

The isotropy group of a ∈ X consists of those g in G with α(g) = a = β(g). These groups prove fundamental to classifying groupoids.

If G is any groupoid over A, the map (α, β): G → A × A is a morphism from G to the pair groupoid of A. The image of (α, β) is the orbit equivalence relation ∼ G, and the functional kernel is the union of the isotropy groups. If f: X → Y is a function, then the kernel of f, ker(f) = [(x1, x2) ∈ X × X: f(x1) = f(x2)] defines an equivalence relation.

Groupoids may have additional structure. As Weinstein (1996) explains, a groupoid G is a topological groupoid over a base space X if G and X are topological spaces and α, β and multiplication are continuous maps. A criticism sometimes applied to groupoid theory is that their classification up to isomorphism is nothing other than the classification of equivalence relations via the orbit equivalence relation and groups via the isotropy groups. The imposition of a compatible topological structure produces a nontrivial interaction between the two structures. In Sect. 2.7 we introduced a metric structure on manifolds of related information sources, producing such interaction.

In essence, a groupoid is a category in which all morphisms have an inverse, here defined in terms of connection to a base point by a meaningful path of an information source dual to a cognitive process.

As Weinstein (1996) points out, the morphism (α, β) suggests another way of looking at groupoids. A groupoid over A identifies not only which elements of A are equivalent to one another (isomorphic), but it also parameterizes the different ways (isomorphisms) in which two elements can be equivalent, i.e., in our context, all possible information sources dual to some cognitive process. Given the information-theoretic characterization of cognition presented above, this produces a full modular cognitive network in a highly natural manner.

Brown (1987) describes the fundamental structure as follows:

A groupoid should be thought of as a group with many objects, or with many identities…A groupoid with one object is essentially just a group. So the notion of groupoid is an extension of that of groups. It gives an additional convenience, flexibility and range of applications…

EXAMPLE 1. A disjoint union [of groups] G = ∪ λ G λ , λ ∈ Λ, is a groupoid: the product ab is defined if and only if a, b belong to the same Gλ , and ab is then just the product in the group Gλ . There is an identity 1 λ for each λ ∈ Λ. The maps α, β coincide and map Gλ to λ, λ ∈ Λ.

EXAMPLE 2. An equivalence relation R on [a set] X becomes a groupoid with α, β: R → X the two projections, and product (x, y)(y, z) = (x, z) whenever (x, y), (y, z) ∈ R. There is an identity, namely (x, x), for each x ∈ X.

Weinstein (1996) makes the following fundamental point:

Almost every interesting equivalence relation on a space B arises in a natural way as the orbit equivalence relation of some groupoid G over B. Instead of dealing directly with the orbit space BG as an object in the category Smap of sets and mappings, one should consider instead the groupoid G itself as an object in the category Ghtp of groupoids and homotopy classes of morphisms.

The groupoid approach has become quite popular in the study of networks of coupled dynamical systems which can be defined by differential equation models (e.g., Golubitsky and Stewart 2006).

12.2 The Tuning Theorem

Messages from an information source, seen as symbols xj from some alphabet, each having probabilities Pj associated with a random variable X, are “encoded” into the language of a “transmission channel,” a random variable Y with symbols y k , having probabilities Pk , possibly with error. Someone receiving the symbol yk then retranslates it (without error) into some xk , which may or may not be the same as the xjthat was sent.

More formally, the message sent along the channel is characterized by a random variable X having the distribution

 $$\displaystyle{P(X = x_{j}) = P_{j},j = 1,\ldots,M.}$$

The channel through which the message is sent is characterized by a second random variable Y having the distribution

 $$\displaystyle{P(Y = y_{k}) = P_{k},k = 1,\ldots,L.}$$

Let the joint probability distribution of X and Y be defined as

 $$\displaystyle{P(X = x_{j},Y = y_{k}) = P(x_{j},y_{k}) = P_{j,k}}$$

and the conditional probability of Y given X as

 $$\displaystyle{P(Y = y_{k}\vert X = x_{j}) = P(y_{k}\vert x_{j}).}$$

Then the Shannon uncertainty of X and Y independently and the joint uncertainty of X and Y together are defined, respectively, as

 $$\displaystyle\begin{array}{rcl} H(X) = -\sum _{j=1}^{M}P_{ j}\log (P_{j})& & \\ H(Y ) = -\sum _{k=1}^{L}P_{ k}\log (P_{k})& & \\ H(X,Y ) = -\sum _{j=1}^{M}\sum _{ k=1}^{L}P_{ j,k}\log (P_{j,k})& &{}\end{array}$$

(12.1)

The conditional uncertainty of Y given X is defined as

 $$\displaystyle{ H(Y \vert X) = -\sum _{j=1}^{M}\sum _{ k=1}^{L}P_{ j,k}\log [P(y_{k}\vert x_{j})] }$$

(12.2)

For any two stochastic variates X and Y, H(Y ) ≥ H(Y | X), as knowledge of X generally gives some knowledge of Y. Equality occurs only in the case of stochastic independence.

Since P(xj , yk ) = P(xj )P(yk  | xj ), then H(X | Y ) = H(X, Y ) − H(Y ).

The information transmitted by translating the variable X into the channel transmission variable Y —possibly with error—and then retranslating without error the transmitted Y back into X is defined as

 $$\displaystyle{ I(X\vert Y ) \equiv H(X) - H(X\vert Y ) = H(X) + H(Y ) - H(X,Y ) }$$

(12.3)

See Cover and Thomas (2006) for details. The essential point is that if there is no uncertainty in X given the channel Y, then there is no loss of information through transmission. In general this will not be true, and herein lies the essence of the theory.

Given a fixed vocabulary for the transmitted variable X, and a fixed vocabulary and probability distribution for the channel Y, we may vary the probability distribution of X in such a way as to maximize the information sent. The capacity of the channel is defined as

 $$\displaystyle{ C \equiv \max _{P(X)}I(X\vert Y ) }$$

(12.4)

subject to the subsidiary condition that ∑ P(X) = 1.

The critical trick of the Shannon Coding Theorem for sending a message with arbitrarily small error along the channel Y at any rate R < C is to encode it in longer and longer “typical” sequences of the variable X; that is, those sequences whose distribution of symbols approximates the probability distribution P(X) above which maximizes C.

If S(n) is the number of such “typical” sequences of length n, then

 $$\displaystyle{\log [S(n)] \approx nH(X)}$$

where H(X) is the uncertainty of the stochastic variable defined above. Some consideration shows that S(n) is much less than the total number of possible messages of length n. Thus, as n → , only a vanishingly small fraction of all possible messages is meaningful in this sense. This observation, after some considerable development, is what allows the Coding Theorem to work so well. In sum, the prescription is to encode messages in typical sequences, which are sent at very nearly the capacity of the channel. As the encoded messages become longer and longer, their maximum possible rate of transmission without error approaches channel capacity as a limit. Again, the standard references provide details.

This approach can be, in a sense, inverted to give a “tuning theorem” variant of the coding theorem.

Telephone lines, optical wave guides, and the tenuous plasma through which a planetary probe transmits data to earth may all be viewed in traditional information-theoretic terms as a noisy channel around which we must structure a message so as to attain an optimal error-free transmission rate.

Telephone lines, wave guides, and interplanetary plasmas are, relatively speaking, fixed on the timescale of most messages, as are most sociogeographic networks. Indeed, the capacity of a channel is defined by varying the probability distribution of the “message” process X so as to maximize I(X | Y ).

Suppose there is some message X so critical that its probability distribution must remain fixed. The trick is to fix the distribution P(x) but modify the channel—i.e., tune it—so as to maximize I(X | Y ). The dualchannel capacity C can be defined as

 $$\displaystyle{ C^{{\ast}}\equiv \max _{ P(Y ),P(Y \vert X)}I(X\vert Y ) }$$

(12.5)

But

 $$\displaystyle{C^{{\ast}} =\max _{ P(Y ),P(Y \vert X)}I(Y \vert X)}$$

since

 $$\displaystyle{I(X\vert Y ) = H(X) + H(Y ) - H(X,Y ) = I(Y \vert X).}$$

Thus, in a purely formal mathematical sense, the message transmits the channel, and there will indeed be, according to the Coding Theorem, a channel distribution P(Y ) which maximizes C .

One may do better than this, however, by modifying the channel matrix P(Y | X). Since

 $$\displaystyle{P(y_{j}) =\sum _{ i=1}^{M}P(x_{ i})P(y_{j}\vert x_{i}),}$$

P(Y ) is entirely defined by the channel matrix P(Y | X) for fixed P(X) and

 $$\displaystyle{C^{{\ast}} =\max _{ P(Y ),P(Y \vert X)}I(Y \vert X) =\max _{P(Y \vert X)}I(Y \vert X).}$$

Calculating C requires maximizing the complicated expression

 $$\displaystyle{I(X\vert Y ) = H(X) + H(Y ) - H(X,Y )}$$

which contains products of terms and their logs, subject to constraints that the sums of probabilities are 1 and each probability is itself between 0 and 1. Maximization is done by varying the channel matrix terms P(yj  | xi ) within the constraints. This is a difficult problem in nonlinear optimization. However, for the special case M = L, C may be found by inspection.

If M = L, then choose

 $$\displaystyle{P(y_{j}\vert x_{i}) =\delta _{j,i}}$$

where δ i, jis 1 if i = j and 0 otherwise. For this special case

 $$\displaystyle{C^{{\ast}}\equiv H(X)}$$

with P(yk ) = P(xk ) for all k. Information is thus transmitted without error when the channel becomes “typical” with respect to the fixed message distribution P(X).

If M < L matters reduce to this case, but for L < M information must be lost, leading to Rate Distortion limitations.

Thus modifying the channel may be a far more efficient means of ensuring transmission of an important message than encoding that message in a “natural” language which maximizes the rate of transmission of information on a fixed channel.

We have examined the two limits in which either the distributions of P(Y ) or of P(X) are kept fixed. The first provides the usual Shannon Coding Theorem, and the second a tuning theorem variant, i.e., a tunable, retina-like, Rate Distortion Manifold, in the sense of Glazebrook and Wallace (2009).

As described above, this result is essentially similar to Shannon’s (1959) observation that evaluating the Rate Distortion Function corresponds to finding a channel that is just right for the source and allowed distortion level.

12.3 Metabolic Constraints

Let Q(κ M) ≥ 0, Q(0) = 0 represent a monotonic increasing function of the intensity measure of available metabolic free energy M, and C be the maximum channel capacity available to the cognitive biological processes of interest. One would expect

 $$\displaystyle{ \hat{H} = \frac{\int _{0}^{C}H\exp [-H/Q]dH} {\int _{0}^{C}\exp [-H/Q]dH} = \frac{Q[\exp (C/Q) - 1] - C} {\exp (C/Q) - 1} }$$

(12.6)

κ is an inverse energy intensity scaling constant that may be quite small indeed, a consequence of the typically massive entropic translation losses between the metabolic free energy consumed by the physical processes that instantiate information and any actual measure of that information.

Near M = 0, expand Q as a Taylor series, with a first term Q ≈ κ M.

This expression tops out quite rapidly with increases in either C or Q, producing energy—and channel capacity—limited results

 $$\displaystyle{ \hat{H} = Q(\kappa M),C/2 }$$

(12.7)

Then, expanding Q near zero, the two limiting relations imply

 $$\displaystyle\begin{array}{rcl} Q(\kappa M_{X,Y })& <& Q(\kappa M_{X}) + Q(\kappa M_{Y }) \rightarrow M_{X,Y } < M_{X} + M_{Y }\,, \\ C_{X,Y }& <& C_{X} + C_{Y } {}\end{array}$$

(12.8)

The channel capacity constraint can be parsed further for a noisy Gaussian channel . Then (Cover and Thomas 2006)

 $$\displaystyle{ C = 1/2\log [1 + \mathcal{P}/\sigma ^{2}] \approx 1/2\mathcal{P}/\sigma ^{2} }$$

(12.9)

for small  $$\mathcal{P}/\sigma ^{2}$$ , where  $$\mathcal{P}$$ is the “power constraint” such that  $$E(X^{2}) < \mathcal{P}$$ and σ 2 is the noise variance. Assuming information sources X and Y act on the same scale, so that noise variances are the same and quite large, then, taking  $$\mathcal{P} = Q(\kappa M)$$ —channel power is determined by available metabolic free energy—and

 $$\displaystyle{Q(\kappa M_{X,Y }) < Q(\kappa M_{X}) + Q(\kappa M_{Y }).}$$

Both limiting inequalities are, then, free energy relations leading to a kind of “reaction canalization” in which a set of lower level cognitive modules consumes less metabolic free energy if information crosstalk among them is permitted than under conditions of individual signal isolation.

12.4 Morse Theory

Morse theory examines relations between analytic behavior of a function—the location and character of its critical points—and the underlying topology of the manifold on which the function is defined. We are interested in a number of such functions, for example, a “free energy” constructed from information source uncertainties on a parameter space and “second order” iterations involving parameter manifolds determining critical behavior. These can be reformulated from a Morse theory perspective. Here we follow closely Pettini (2007).

The essential idea of Morse theory is to examine an n-dimensional manifold M as decomposed into level sets of some function f: M → R where R is the set of real numbers. The a-level set of f is defined as

 $$\displaystyle{f^{-1}(a) =\{ x \in M: f(x) = a\},}$$

the set of all points in M with f(x) = a. If M is compact, then the whole manifold can be decomposed into such slices in a canonical fashion between two limits, defined by the minimum and maximum of f on M. Let the part of Mbelow a be defined as

 $$\displaystyle{M_{a} = f^{-1}(-\infty,a] =\{ x \in M: f(x) \leq a\}.}$$

These sets describe the whole manifold as a varies between the minimum and maximum of f.

Morse Functions are defined as a particular set of smooth functions f: M → R as follows. Suppose a function f has a critical point xc , so that the derivative df(xc ) = 0, with critical value f(xc ). Then f is a Morse Function if its critical points are nondegenerate in the sense that the Hessian matrix of second derivatives at xc , whose elements, in terms of local coordinates, are

 $$\displaystyle{H_{i,j} = \partial ^{2}f/\partial x^{i}\partial x^{j},}$$

has rank n, which means that it has only nonzero eigenvalues, so that there are no lines or surfaces of critical points and, ultimately, critical points are isolated.

The index of the critical point is the number of negative eigenvalues of H at xc .

A level set f−1(a) of f is called a critical level if a is a critical value of f, that is, if there is at least one critical point xc  ∈ f−1(a).

Again following Pettini (2007), the essential results of Morse theory are:

1. 1.

2. 2.

3. 3.

4. 4.

5. 5.

6. 6.

7. 7.

Again, Pettini (2007) contains both mathematical details and further references. See, for example, Matsumoto (2002).

12.5 An RDT Proof of the DRT

The Rate Distortion Theorem of information theory asks how much a signal can be compressed and have average distortion, according to an appropriate measure, less than some predetermined limit D > 0. The result is an expression for the minimum necessary channel capacity, R, as a function of D. See Cover and Thomas (2006) for details. Different channels have different expressions. For the Gaussian channel under the squared distortion measure,

 $$\displaystyle\begin{array}{rcl} R(D)& =& \frac{1} {2}\log \Big[ \frac{\sigma ^{2}} {D}\Big]\,\,D <\sigma ^{2} \\ R(D)& =& 0\,\,D \geq \sigma ^{2}{}\end{array}$$

(12.10)

where σ 2 is the variance of channel noise having zero mean.

Our concern is how a control signal u t is expressed in the system response xt+1. We suppose it possible to deterministically retranslate an observed sequence of system outputs x1, x2, x3,  into a sequence of possible control signals  $$\hat{u}_{0},\hat{u}_{1},\ldots$$ and to compare that sequence with the original control sequence u0, u1, , with the difference between them having a particular value under the chosen distortion measure, and hence an observed average distortion.

The correspondence expansion is as follows.

Feynman (2000), following ideas of Bennett, identifies information as a form of free energy. Thus R(D), the minimum channel capacity necessary for average distortion D, is also a free energy measure, and we may define an entropy S as

 $$\displaystyle{ S \equiv R(D) - DdR/dD }$$

(12.11)

For a Gaussian channel under the squared distortion measure,

 $$\displaystyle{ S = 1/2\log [\sigma ^{2}/D] + 1/2 }$$

(12.12)

Other channels will have different expressions.

The simplest dynamics of such a system are given by a nonequilibrium Onsager equation in the gradient of S (de Groot and Mazur 1984) so that

 $$\displaystyle{ dD/dt = -\mu dS/dD = \frac{\mu } {2D} }$$

(12.13)

By inspection,

 $$\displaystyle{ D(t) = \sqrt{\mu t} }$$

(12.14)

which is the classic outcome of the diffusion equation. For the “natural” channel having R(D) ∝ 1∕D, D(t) ∝ the cube root of t.

This correspondence reduction allows an expansion to more complicated systems, in particular, to the control system of Fig. 4.1

Let  $$\mathcal{H}$$ be the rate at which control information is fed into an inherently unstable control system, in the presence of a further source of control system noise β, in addition to the channel noise defined by σ 2. The simplest generalization of Eq. (12.13), for a Gaussian channel, is the stochastic differential equation

 $$\displaystyle{ dD_{t} =\Big [ \frac{\mu } {2D_{t}} - M(\mathcal{H})\Big]dt +\beta D_{t}dW_{t} }$$

(12.15)

where dWt represents white noise and  $$M(\mathcal{H}) \geq 0$$ is a monotonically increasing function.

This equation has the nonequilibrium steady state expectation

 $$\displaystyle{ D_{nss} = \frac{\mu } {2M(\mathcal{H})} }$$

(12.16)

measuring the average distortion between what the control system wants and what it gets. In a sense, this is a kind of converse to the famous radar equation which states that a returned signal will be proportional to the inverse fourth power of the distance between the transmitter and the target. But there is a deeper result, leading to the DRT.

Applying the Ito chain rule to Eq. (12.15) (Protter 1990; Khashminskii 2012), it is possible to calculate the expected variance in the distortion as E(Dt 2) − (E(Dt ))2. But application of the Ito rule to Dt 2 shows that no real number solution for its expectation is possible unless the discriminant of the resulting quadratic equation is ≥ 0, so that a necessary condition for stability is

 $$\displaystyle\begin{array}{rcl} M(\mathcal{H})& \geq & \beta \sqrt{\mu } \\ \mathcal{H}& \geq & M^{-1}(\beta \sqrt{\mu }){}\end{array}$$

(12.17)

where the second expression follows from the monotonicity of M.

As a consequence of the correspondence reduction leading to Eq. (12.15), we have generalized the DRT of Eq. (4.2). Different “control channels,” with different forms of R(D), will give different detailed expressions for the rate of generation of “topological information” by an inherently unstable system.

12.6 An Information Black–Scholes Model

We look at  $$\mathcal{H}(\rho )$$ as the control information rate “cost” of stability at the integrated environmental insult ρ. To determine the mathematical form of  $$\mathcal{H}(\rho )$$ under conditions of volatility, i.e., variability proportional to a signal, we must first model the variability of ρ, most simply taken as

 $$\displaystyle{ d\rho _{t} = g(t,\rho _{t})dt + b\rho _{t}dW_{t} }$$

(12.18)

Here, dWt is white noise and—counterintuitively—the function g(t, ρ) will fall out of the calculation on the assumption of certain regularities.

 $$\mathcal{H}(\rho _{t},t)$$ is the minimum needed incoming rate of control information under the Data Rate Theorem. Expand  $$\mathcal{H}$$ in ρ using the Ito chain rule (Protter 1990):

 $$\displaystyle\begin{array}{rcl} d\mathcal{H}_{t}& =& \left [\partial \mathcal{H}/\partial t + g(\rho _{t},t)\partial \mathcal{H}/\partial \rho + \frac{1} {2}b^{2}\rho _{ t}^{2}\partial ^{2}\mathcal{H}/\partial \rho ^{2}\right ]dt \\ & & +[b\rho _{t}\partial \mathcal{H}/\partial \rho ]dW_{t} {}\end{array}$$

(12.19)

It is now possible to define a Legendre transform, L, of the rate  $$\mathcal{H}$$ , by convention having the form

 $$\displaystyle{ L = -\mathcal{H} +\rho \partial \mathcal{H}/\partial \rho }$$

(12.20)

 $$\mathcal{H}$$ is an information index, a free energy measure in the sense of Feynman (2000), so that L is a classic entropy measure.

We make an approximation, replacing dX with Δ X and applying Eq. (12.19), so that

 $$\displaystyle{ \varDelta L = \left (-\partial \mathcal{H}/\partial t -\frac{1} {2}b^{2}\rho ^{2}\partial ^{2}\mathcal{H}/\partial \rho ^{2}\right )\varDelta t }$$

(12.21)

According to the classical Black–Scholes model (Black and Scholes 1973), the terms in g and dWt “cancel out,” and white noise has been subsumed into the Ito correction factor, a regularity assumption making this an exactly solvable but highly approximate model.

The conventional Black–Scholes calculation takes Δ LΔ T ∝ L. At nonequilibrium steady state, by some contrast, we can assume  $$\varDelta L/\varDelta t = \partial \mathcal{H}/\partial t = 0$$ , giving

 $$\displaystyle{ -\frac{1} {2}b^{2}\rho ^{2}\partial ^{2}\mathcal{H}/\partial \rho ^{2} = 0 }$$

(12.22)

so that

 $$\displaystyle{ \mathcal{H} =\kappa _{1}\rho +\kappa _{2} }$$

(12.23)

The κ i will be nonnegative constants.

12.7 Estimating the Quadratic Variation from Data

The so-called white noise has quadratic variation ∝ t. The “colored” noise relation can be estimated from the observed periodogram using the methods of Dzhaparidze and Spreij (1994).

For a stochastic process Xt and a finite stopping time T and each real number λ, the periodogram of X evaluated at T is defined as

 $$\displaystyle{ I_{T}(X;\lambda ) \equiv \vert \int _{0}^{T}\exp [i\lambda t]dX_{ t}\vert ^{2} }$$

(12.24)

Take ε as a real random variable that has a density ω symmetric around zero and consider, for any positive real number L, the quantity

 $$\displaystyle{ E_{\epsilon }[I_{T}(X;L\epsilon )] =\int _{ -\infty }^{+\infty }I_{ T}(X;Ls)\omega (s)ds }$$

(12.25)

Dzhaparidze and Spreij (1994) show that, for L → ,

 $$\displaystyle{ E_{\epsilon }[I_{T}(X;L\epsilon )] \rightarrow [X_{T},X_{T}] }$$

(12.26)

Thus, the quadratic variation can be statistically estimated from observational time series data, as is routinely done in financial engineering, from which, in fact, this analysis is taken.

12.8 A Metabolic Black–Scholes Model

Suppose metabolic free energy to be available at a rate M, and let R(D) be a general RDF for a process ultimately fueled by M. How are M and R related under conditions of volatility? Let

 $$\displaystyle{ dR_{t} = f(t,R_{t})dt + bR_{t}dW_{t} }$$

(12.27)

Let M(Rt , t) be the incoming rate of metabolic free energy, and expand using the Ito chain rule

 $$\displaystyle\begin{array}{rcl} dM_{t}& =& \left [\partial M/\partial t + f(R_{t},t)\partial M/\partial R + \frac{1} {2}b^{2}R_{ t}^{2}\partial ^{2}M/\partial R^{2}\right ]dt \\ & & +[bR_{t}\partial M/\partial R]dW_{t} {}\end{array}$$

(12.28)

We define a quantity L as a Legendre transform of the rate M, by convention having the form

 $$\displaystyle{ L = -M + R\partial M/\partial R }$$

(12.29)

Again, heuristically, replacing dX with Δ X in these expressions and applying Eq. (12.28) gives

 $$\displaystyle{ \varDelta L =\Big (-\partial M/\partial t -\frac{1} {2}b^{2}R^{2}\partial ^{2}M/\partial R^{2}\Big)\varDelta t }$$

(12.30)

Again, the terms in f and dW t cancel out, and the effects of noise are subsumed into the Ito correction factor, powerful regularity assumptions that make this an exactly solvable approximate model.

The conventional Black–Scholes calculation takes Δ LΔ T ∝ L. Here, at nonequilibrium steady state, we assume Δ LΔ t = C ≥ 0, ∂ M∂ t = 0, so that

 $$\displaystyle{ -\frac{1} {2}b^{2}R^{2}\partial ^{2}M/\partial R^{2} = C }$$

(12.31)

with solution

 $$\displaystyle{ M = \frac{2C} {b^{2}} \log [R] +\kappa _{1}R +\kappa _{2} }$$

(12.32)

References

Black, F., and M. Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81: 637–654.CrossRef

Brown, R. 1987. From groups to groupoids: A brief survey. Bulletin of the London Mathematical Society 19: 113–134.CrossRef

Cannas DaSilva, A., and A. Weinstein. 1999. Geometric Models for Noncommutative Algebras. Providence, RI: American Mathematical Society.

Cover, T., and J. Thomas. 2006. Elements of Information Theory, 2nd ed. New York: Wiley.

de Groot, S., and P. Mazur. 1984. Nonequilibrium Thermodynamics. New York: Dover.

Dzhaparidze, K., and P. Spreij. 1994. Spectral characterization of the optional quadratic variation. Stochastic Processes and Their Applications 54: 165–174.CrossRef

Feynman, R. 2000. Lectures on Computation. Boulder: Westview Press.

Glazebrook, J.F., and R. Wallace. 2009. Rate distortion manifolds as model spaces for cognitive information. Informatica 33: 309–346.

Golubitsky, M., and I. Stewart. 2006. Nonlinear dynamics of networks: The groupoid formalism. Bulletin of the American Mathematical Society 43: 305–364.CrossRef

Khashminskii, R. 2012. Stochastic Stability of Differential Equations, 2nd ed. New York: Springer.CrossRef

Matsumoto, Y. 2002. An Introduction to Morse Theory. Providence, RI: American Mathematical Society.

Pettini, M. 2007. Geometry and Topology in Hamiltonian Dynamics and Statistical Mechanics. New York: Springer.CrossRef

Protter, P. 1990. Stochastic Integration and Differential Equations. New York: Springer.CrossRef

Shannon, C. 1959. Coding theorems for a discrete source with a fidelity criterion. Institute of Radio Engineers International Convention Record, vol. 7, 142–163.

Weinstein, A. 1996. Groupoids: Unifying internal and external symmetry. Notices of the American Mathematical Association 43: 744–752.



If you find an error or have any questions, please email us at admin@doctorlib.org. Thank you!