Formula Parser Equip Vars Proposal

From PCGen
Jump to: navigation, search


Scope of the project

This discusses the design of a replacement formula/value calculation system within PCGen and its limited use for Equipment Variables.

The formula system is embedded within the "core" of PCGen to do mathematical calculations. It is used internally as well as exposed in a limited fashion to the data developers through tokens. Specifically, both data defined variables and BONUS values depend on the formula calculations performed by this subsystem.

This project discusses the architecture and design of that system with the long-term intent of using it as the formula system for all of PCGen. The immediate project is a proposal to implement Equipment Variables using the system in order to enable testing on a lower-risk basis and learn about overall integration challenges with PCGen. This document therefore balances the full set of requirements while only worrying about implementation of those necessary to do equipment variables.

Note that the changes proposed only cover the formula system. While many mentions are made of the BONUS system, it will remain unchanged at this time. It is discussed in some detail because it heavily relies on the formula system so we can use the requirements of the BONUS system to guide the design of the formula system.

A Note on examples and document scope

Examples included here are based on a hypothetical syntax meant to demonstrate the concepts. This is a code / architecture project scope and is not intended as a data proposal to finalize LST syntax. No guarantee is made that the provided syntax is compatible with current LST files (and thus whether it is even usable in a formal LST syntax proposal).



For purposes of this document, a term is a "built in" value that can be used in a formula. These are a subset of the items currently documented in the "Pre-defined Variables" or "System-Defined Variables" section of the docs (depending on how you get to that section)

An example of this include (but are certainly not limited to) BAB, BASECR, CASTERLEVEL, CL, SR, and TL.

Note that some terms can possess context. For example, CASTERLEVEL is valid only in SPELL objects, as it requires the "context" of the castable spell (which implicitly includes the class level, DC, # of times usable, etc... it is more than the Spell object defined in the LST file)

Some built-in variables look to the user as if they are a function, and are often treated by users as functions even though they are terms. These include square brackets in the pre-defined variable name. (e.g. COUNT[SKILLS]). Due to the presence of the brackets they are NOT terms for purposes of this document. See Bracket Functions below.


For purposes of this document, a variable is a data-defined value. Using today's data, this means it was defined by the DEFINE: token being encountered in the data. Obviously, due to the nature of formulas and ambiguity, there is limitation on variable names in that they must not conflict with a pre-defined term.

Global Variable

Global variables are variables that exist across the entire set of data. A global variable can be defined in one object and used in another. This is currently the case for all variables in PCGen, as they are all created with the DEFINE: token.

Local Variable

Local Variables are variables that exist in only a portion of the data. They possess context. They can only be used within that context. For example, a local variable defined on a piece of Equipment could only be used within that piece of equipment. Since Equipment can "own" Equipment Modifiers, EqMods could also modify or evaluate the local variable. However, an attempt to interpret the value of the local variable in a Spell (for example) should produce an error, since there is no known context of Equipment within a spell.

Given that they have (per-item) context to a specific instance (such as "Longsword +1" and "Shortbow +2"), local variables have independent values on each instance (so MyPlusValue on the Longsword could be 1, and MyPlusValue on the Shortbow could be 2).

Like global variables, local variables are on a per-character basis (so local variables are per-character, per-item). A Longsword on one character has a value N and a Longsword on a second character has a value M. N may or may not be equal to M (there is no linkage - it is based on whether the two Longswords have identical EqMods).

A Note on Possible usage

While this proposal is specific to Equipment for completeness of thought and assistance in understanding possible future uses, the most obvious places where local variables make sense is:

  • Equipment: since EqMods can be attached
  • Spells: Since they could be enhanced by things like MetaMagic Feats by the time they become castable spells (eventually forming what the code team calls a CharacterSpell)
  • Classes: Since there are ClassLevels that can be "attached" (Note that this has an inherent "problem" with variables that are based on class level; thus it is unlikely that local variables will make their way to classes in any tactical timeframe.)

Paren Function

A paren function is a function that uses parenthesis () to contain the arguments to the function. An example of this is var("CL=Fighter")

"var" is the function, "CL=Fighter" is the (one) argument to the function

Bracket Function

A bracket function is a "built in" value that can be used in a function. These are a subset of the items currently documented in the "Pre-defined Variables" or "System-Defined Variables" section of the docs (depending on how you get to that section)

An example of this include (but are certainly not limited to) COUNT[SKILLS] and COUNT[STATS]

While both use the same infrastructure in PCGen currently, these can be distinguished from terms in that they contain square brackets.


The Format of a variable is the java "Class" it contains. Today, all variables are numbers, but in the future we will support Boolean, String, and many other formats.

Existing Implementation

The existing implementation is comprised of the following:

Formula Parsing

Formula parsing is performed by JEP (the Java Expression Parser) a 3rd party library

Major characteristics of this system:

  • Formulas have functions (delimited by parenthesis)
  • We emulate formulas that are delimited by brackets (they are treated as terms)
  • Formulas have both built-in terms as well as user defined variables (DEFINE: token)
  • All user defined variables are global in scope
  • All user defined variables are provided a starting value on definition
  • User defined variables are assumed to be zero if no DEFINE is ever encountered
  • User defined variables may be defined in more than one location
  • Some terms may be local in scope (e.g. spells have unique terms)
  • Diagnostic tools in the UI allow presentation of the current value of a variable (or really any formula)
  • Formulas are NOT parsed for significant levels of validity at data load. They are given basic checks (to ensure parenthesis match, for example), but a full validity check is not possible.

Bonus Processing

BONUS processing is performed by our BONUS management system (specifically BonusManager)

Major characteristics of this system:

  • BONUSes have specific values calculated by the formula system
  • BONUSes have certain stacking rules based on their type and other flags (.STACK)
  • BONUSes allow override of values (.REPLACE)
  • BONUSes are used to modify variables (BONUS:VAR|...)
  • BONUSes can be conditional (and the condition cane be a variable or other item), making BONUS updates highly self-dependent [this is currently done in a loop to ensure BONUS values stabilize]
  • The system does not manage loops/conflicts well, in that lack of stabilization has to be terminated based on a number of tries.

Reason for development


The licensing for JEP changed and we have targeted it for replacement. When we originally integrated JEP, it was licensed under a dual commercial/GPL license. We received (and still operate under) a special exception to use JEP with our code. This has the side effect of limiting the use of our code outside of our distribution. Subsequent to our initial use of JEP, the GPL option was dropped, and JEP is now a commercial product. This means we no longer have updates available to us, and we are using a stagnant library.


We want to improve the performance of formula calculation. Today, each time a formula is processed, we re-parse the formula (which redoes all of the validation and other checks). This is CPU intensive for complex formulas, and should be something we can cache and re-use. We therefore want a system where we can parse the formula early in the process and store the parsed formula, using that "binary" version for evaluation. (See functional requirements for more on the binary format)

We want to improve performance around BONUSes. The major performance bottleneck we now have is around the loop of "Variables can be modified by BONUSes, which can be conditional upon Prerequisites, which can use variables". This loop is currently "lookback" in that things are calculated as necessary and a large resolution loop is required to ensure the system reaches stability. We want to shortcut this when possible to reduce looping and overall calculation of values that do not change.

Avoiding Ambiguity

We want to avoid some situations of ambiguity. The current system for variables takes a "largest wins" argument when multiple definitions are encountered. This leads to potential confusion (and potentially debate over whether such an implicit decision should be allowed). It has also led to the adoption of a "data standard" that all values should be defined at zero and modifications provided as a BONUS:VAR|... This redesign looks to eliminate the confusion over multiple conflicting definitions by providing a global definition characteristic and preventing otherwise identical variable definitions from having different starting values.

Reducing Complexity

We want to reduce the amount of confusion and complexity around BONUSes. Currently the conditions of stacking, replacement, and overall calculation of final values are constrained and limit flexibility of data designers. We require multiple BONUSes to calculate a single value (and even then it is not feature complete). Simple rule changes can add huge complexity to data and that shouldn't be necessary.

Reduce d20 Linkage

We want to reduce the tie to d20. Many of our system make heavy d20-related assumptions (reference our internal terms here, for the most part) and we want to reduce those over time (put more power in the hands of the data and reduce the assumptions in the code)


Given that the intent is to eventually replace JEP, a primary design consideration is to minimize the amount of radical change required to existing, well-formed formulas. (This is true even if the token using the formula is changed from BONUS to MODIFY as part of a deprecation). This saves learning by users of the formula system.

This requirement guides and constrains much of the design to result in a syntax that is very similar to any general equation parser (which also happens to be similar to JEP) so overall the design here is not terribly unique to PCGen. In fact, any tutorial for building an equation parser covers most of the basic design of the parser itself [the parser definition being what is in the .jjt file]. The primary exception is - perhaps - how functions are defined and parsed.


Well-formed in the case of the larger project of replacing JEP matches the definition often used in LST token development, meaning if things are leveraging a bug to get a correct answer, or have some outlying issue that allows them to work when they shouldn't (such as unbalanced parenthesis), then the system makes no effort to allow the not-well-formed formula to work. This includes using some obscure features of JEP that we do not intend to duplicate (I don't believe we actually do this, but we need to recognize that JEP has some rather advanced capability that we do not intend to duplicate). The intent is for 95%+ (likely 99%+ given our experience with LST tokens) of formulas to initially work without modification. To be clear: The requirement when this is swapped in as the primary formula parser is NOT that LST will work unmodified: We know from experience that there are errors in the data and we must be able to allow those to fail at LST load, even though they do not today.

Work without modification

Work without modification in the case of the larger project of replacing JEP does not mean "work without deprecated content". It also does not mean "be converted without data team intervention". It is highly likely that NOTHING will work without explicit movement to a "new" token that recognizes the new formula system.

It means the formula will parse and produce the correct answer in the version in which the formula swap is made. It may report as deprecated either because it uses deprecated features (bracket functions being one example) or it may report as deprecated because the new equation parser cannot parse it and it has to fall back to JEP or the older pre-JEP equation parser.

This second situation probably deserves an example. Because we do not have the ability to strictly monitor formulas when they load to determine if they are JEP-legal, it was recently (6.2) possible to have a formula of the form: FooMAXBar

This is the equivalent of max(Foo,Bar)

This formula would never be parsed by the New Equation Parser, and would immediately be reported as deprecated (it will also fail in practice in PCGen 6.3). It would also report as nonconvertible by any converter and require data team intervention. Requiring such cases to be automatically converted basically renders any equation parser impossible to implement, due to the quirks and complexity of both JEP and the pre-JEP formula system.

Given the intended swap from BONUS to MODIFY and considerations around mixing of formula systems, it is expected that little, if anything will be automatically converted. In the case we do find something that can be converted, the general rules for conversion will be:

  • If the item is correctly parsed by the New Equation Parser, it should be automatically convertible. There may be exceptions that we have not yet identified (as that project is not being fully scoped at this time)
  • If the item has to fall back to JEP or the pre-JEP formula system, it will not be automatically convertible.

Required Functions

Major characteristics of this system that are part of the existing implementation:

  • Formulas allow user defined variables
  • User defined variables may be defined in more than one location
  • User defined variables are assumed to be zero if no DEFINE is ever encountered (an alternative is to produce an error - up to the data team to decide)

Major characteristics of this system that are small modifications of the existing implementation:

  • Formulas have functions (delimited by parenthesis or brackets) [new: bracket functions are now "native" (more below on why)]
  • Support global and local variables [new: support for local variables]

Major characteristics of this system that are major modifications of the existing implementation:

  • Starting value for a variable is based on the variable format (e.g. zero for numbers), so multiple defines do not "compete" [new: Define does not have a definition per DEFINE: token, but rather once per game mode for a variable format (numbers being a variable format). This is consistent with and thus formalizes the data best practice of defining variables to zero]
  • Terms are eliminated. They are replaced by new functions (in most cases - some may be handled as user variables)
  • BONUS:VAR is eliminated and replaced with Modifiers (see below) [and in general ALL BONUS tokens will likely disappear]

Major characteristics of this system that are new capabilities:

  • Diagnostic tools in the UI should provide the ability to expose the entire calculation, from default value through all modifications (and source of those modifications)
    • It is important to recognize that the data practice of indirection (e.g. one object awards a Template called FACE_20 that sets the FACE:) is STRONGLY discouraged in the new system. That indirection makes it A LOT harder to figure out what modified a variable - something that the new system can make VERY clear, but that the data can hide and make difficult.
  • Formulas are validated at data load to ensure good syntax, and that any functions that are used exist (and are valid) and variables are nominally defined somewhere in the data.
  • Dependencies are tracked and any circular logic will be identified as such during calculation (it is expected that this is not possible to catch at LST load due to formulas on impossible data combinations leading to false positives)


These are effectively an early framework of the replacement for BONUSes, and will have the following characteristics:

  • Specific values calculated by the formula system
  • No implicit stacking rules. All stacking is explicit by the data owner
  • No implicit replacement rules. All replacement is explicit by the data owner
  • Dependency will be explicit and calculation will be carefully managed. A calculation loop should only be necessary in as much as it modifies values used in Prerequisites. (Prerequisites will not be possible in the initial implementation)

Future requirements

For reference:

  • Modifiers can be conditional (and the condition can be a variable or other item)

Note on Intent with respect to bracket functions

Today, bracket functions are effectively built in variables (of rather high complexity).

These should be retired for a few reasons:

  • The current system of two things that look like functions (one with parens, one with square brackets) is enormously confusing and often misinterpreted, especially since both use "count"
  • The argument style (doesn't use quotes) is inconsistent with parenthesis-based functions, adding yet more confusion due to inconsistent syntax
  • The square bracket items often require pattern matching for the current term system to recognize them as a term, and we want the parsing of strings (variables and function names) to be deterministic rather than a pattern match
  • We are retiring terms anyway, so these need a form of replacement

At the same time, the presence of the square brackets can allow us to temporarily treat them as "first class functions" rather than a complex term. This:

  • Helps break up the code into smaller, more isolated pieces that are specific to one argument rather than dealing with the entire function
  • Helps sunset the pattern matching behavior (at least for the parser framework itself - the function may have to do some complex process to work out the arguments)

The intent of bracket functions is to allow them to be easily implemented for backwards compatibility but not to continue to develop new function as bracket functions. The intent would be to use paren functions for all new formulas (bracket functions "start their life" as deprecated).

Other Functional Requirements

Formula parsing to a Tree

In a previous code team discussion, we had a discussion over the "binary" nature of the formulas that we would use. An existing "formula compiler" (from a separate project) was provided that compiled formulas into bytecode (using ASM). This was deemed "more than necessary" (and potentially confusing), so it was decided that the "binary" implementation would simply be the "tree" of objects returned by the parse. We could then visit the tree to perform the calculation. (I am - unfortunately - unable to find the code team meeting in which this discussion occurred)

Note that no (current) judgement is made over whether the formulas are parsed at LST load and permanently stored in their parsed state, or parsed at LST load, discarded, and then parsed and cached on first use in a PC. The former is clearly more memory intensive and may be unreasonable. The threshold is set here at a maximum of 5MB for formulas when the RSRD for Players is loaded. If that threshold is exceeded, then load, discard and cache-on-first-use will be the required implementation.

Re-use identical formulas

Have a formula factory that can detect situations where a formula like 1+INT is used multiple places in the LST. These should be reduced via a cache to a single Formula object (since a Formula is immutable) to save memory and hopefully reduce LST load time.

Formulas implemented as Plugins

Make formula functions into plugins, so the Formula system can be extended without modifying the core.

Retire Terms

For reasons explained below (in Discussion), there are no built-in terms. Everything will be a variable. There may need to be a system for defining game-mode-wide Variables to partially support replacement of terms. There will also need to be additional functions provided to complete term replacement.

Integer-aware Arithmetic

Bonus points if we can have the system do "integer aware" arithmetic, meaning 2+3=5 (not 5.0 or 4.999999999 or 5.000000001 as we might have today)

Prerequisite Support

A Prerequisite for "equipment variables" needs to be provided, so something like: PREEQVARxx:a,b ... where XX is "LT", "LTEQ", etc. ... and a, b are the two values.

Single-pass resolution

Currently, the use of JEP requires a multi-pass resolution. The formula is parsed, then it is processed/queried in order to determine any variables or terms it contains, and then those values are loaded into the formula and the formula is resolved. (see VariableProcessor.processJepFormula() )

This (JEP) method of resolution comes with some challenges and risks.

The parsed version of a JEP formula (a PJEP) ends up with a lot of complexity. It is a controller of sorts, rather than just a framework for a formula. The proposed architecture here keeps the formula to the minimum amount of knowledge necessary for the formula, and has an external system provide any necessary context for resolution.

Also, the parsed version of a JEP formula has a set of awareness. The variable/term values are loaded into it, meaning a contract is placed on the programmer to "clean that up" so those values do not pollute other, later calculations. (This is generally not an issue based on how it is otherwise used, but may not be initially clear to an uninformed reader)

Having a single-pass resolution by a visitor to the tree, with the context passed in as a parameter, has a number of functional advantages:

  • This effectively makes a formula "immutable" in the sense that the context is passed in during resolution and thus can be evaluated in multiple threads without causing issues. Note this is a necessary consideration if we want to reuse formulas that are in the data multiple times, as the UI demonstrably triggers evaluation of items from multiple threads. The visitors are also "immutable" (at least in the limited sense that their fields are all private final), and thus reusable without being concerned about thread safety.
  • This keeps the formula as light-weight as possible (The tree structure is still a bit expensive, but better than a PJEP)
  • The context can be set based on what is passed in, meaning evaluation locality is driven by the caller (knowing the locality) rather than forcing the formula to evaluate some string to figure out where it is
  • The entire concept of a PJEP pool goes away (in exchange we effectively have a context, but those are reusable so we don't ever have the lock/free necessity of PJEP)
  • We actually prohibit any funky hoop-jumping. By only providing the context in the sense of a variable library, we can accurately handle things like spell-localized items (e.g. today's CASTERLEVEL term) without having a temporarily set global item (which - by the way - also isn't thread safe)

Boolean-aware calculation

After discussion, it has been determined that Boolean and Numeric values will be calculated in their own domain. It will NOT be assumed that TRUE is 1 and FALSE is 0 as some other formula systems assume. A Boolean value here will be a Boolean and only usable where a Boolean is legal.

Therefore, the common operations that are Boolean operations such as AND (&&) and OR (||) will only operate on Boolean values. If a Function requires a Boolean value (such as the first argument to an IF function), then it must validate the semantics of the subformula. (It should validate the others to ensure they are some form of Number)

Interaction with existing (JEP) Formula System

There is NO expectation that formulas can be shared across the two systems while we have two parsers. You cannot use a global variable in a formula (JEPFormula) in an equipment variable formula (NEPFormula), or vice versa. The namespaces and calculations are completely separate.

To break this assumption invites a whole ton of complexity that basically breaks any ability to do equipment variables in a simple way that does not impose itself on the Variable->Bonus->Prerequisite->Variable loop.

Note: There may need to be a limited ability to import NEPFormula variables into JEPFormulas during a future BONUS transition. This would be doable with a function (e.g. nepvar("blah")). Due to the ugly nature of what is legal in JEP formulas (and the fact that they cannot be validated during LST load), JEP variables will not be usable in NEPFormulas.

Requirements For discussion

I would propose that we break the Formula Parser off into a separate sub-project. None of it needs to be PCGen-specific (as demonstrated in the current sandbox), and the jjtree/javacc calls required to build .java files make a more complex build cycle that it would be nice to hide from the main trunk (the challenge being that you have to do a build, then actually select the project and cause it to refresh in order for it to correctly compile the java files to the current version - so the compilation of the formula system into a separate JAR file is something I would appreciate.) ... This may have the effect of breaking a reasonable portion of the pcgen.base.* also into a separate (and different) sub-project since they are shared dependencies of the core and the formula parser. Since these items are reasonably stable (most have had more more than cosmetic changes in years), the separation and then addition of 2 JARs to our distribution should not be an unreasonable burden on the main build of PCGen.


Why no built in terms?

Basically they add complexity. With built-in terms, when a string is encountered in a formula, we have to establish whether it is a built-in term or whether it is a variable. The built-in term would be a plug-in in Java that would then call back into the core in order to produce the answer. Use "BASECR" as an example.

As a term: a) Formula Visitors all need to distinguish "is it a term" or "is it a variable" - meaning a code check (if statement) and a "term library" has to be added to the FormulaManager. b) If a term is encountered, it has to jump into that term (subroutine call to external plugin - and the external plugin is something we had to process at boot) c) The term then calls back into the core to get the answer (so the term had to be passed the PC) d) During LST load, the term system must also be checked to ensure that a variable name and a term name do not overlap.

Now imagine we have a variable: a) Formula visitors assume all text is a variable b) Variables must be able to be defined game-mode-wide (since we still may want the user to be able to type "RACECR" rather than "racecr()" - though that is certainly open for debate c) We need to implement a function that can get the base challenge rating (this effectively does "b" and "c" from term but just call it "function" instead)

So the net effect of banning terms is that we simplified the check of variables at LST load and simplified the formula visitors in exchange for - possibly - some more variables and having to define those game-mode-wide. This is actually a good trade, as it decouples us from d20, and makes our calculations (and variables) explicit to the game mode. We also can remove certain variables from game modes that do not need to worry about those items, cleaning up the data and better allowing errors in the data to be caught... so it's effectively a win-win trade to have no built-in terms.

There is one situation where terms could be seen as an advantage: They cannot be modified. But even in this case, I am challenged to find a use case where that advantage is clear. If such a use case is encountered, then adding the ability to "lock" a variable in the game mode (e.g. LOCKVAR:BaseCR) would be possible - this really is a rather trivial change to the VariableIDFactory. Today when a variable scope is asserted, it returns true (that's ok) or false (you've asked for a definition that conflicts with what I've already been told). If we want to have locked variables (things that are "final" and not modifiable in user data) then we end up with wanting to have 3 responses: Legal, Illegal, and Locked. That is a minor change to the VariableIDFactory and is easily supported if such a use case is identified.

Calculating PC variables

The formula system described above only parses the Formula into a tree. We need another system to take that tree, understand dependencies, and properly calculate values based on those dependencies. We call this subsystem the "Solver" subsystem.

This can solve for all characteristics of a PC. This could be a variable used by the data team or an internal item like the calculation of "Hands" (what could be referred to as a "global characteristic" of a PC).

For any variable or given PC Characteristic, it can be solved through knowledge of:

  • An initial value
  • A set of modifiers that allow modification of that value

Modification may include add, subtract, multiply, set, or some more complex operation

The initial design around equipment variables, but allow for non-numeric values to be solved


BONUS Dependencies

We are heavily dependent upon the formula system for appropriate calculation of values, and significant changes to this system are challenging (arguably "high risk")

The dependency calculation on BONUSes is currently based on the entire String of the BONUS, and thus performs some very complicated dependency analysis due to that use of the full String. This makes a transition to a new formula system (that clearly understands dependencies) a prerequisite for conversion of the BONUS system to a more manageable system. This limits our first-pass scope to just the variable system.

Variable Scope/Context

Variables have a context, but it is NOT like an Ability CATEGORY in acting like a name space.... things can't be reused in a related context. They CAN be reused in an unrelated context.

Assume this appears in the Variable Definition File:


Then the following would all produce errors:

GLOBAL:INTEGER|SomeVar (illegal because GLOBAL is a parent scope of EQUIPMENT)

The following is legal on other pieces of Equipment (because the context is Equipment.class, not a specific instance of Equipment)


For clarity: The reason that a "related" (parent or child) scope cannot use the same variable name is that we want variable access to be easy. So we want an EqMod to be able to modify a global variable without having to "think about" it being global. Thus, this is perfectly legal on an EqMod (or just about anywhere else for that matter - the exceptions being things like ArmorProf/WeaponProf and the other items that have gone through the tag limitation discussion)



This proposal adds a new capability (Equipment Variables) and thus has no compatibility issues. Future projects to replace JEP or redo the BONUS system will encounter a number of compatibility issues, including, but probably not limited to:

  • Handling .STACK, .REPLACE, TYPE= on BONUSes
  • Handling conversion of terms to variables (either local or game-mode-wide)
  • Handling the use of Output Tokens (to be removed - they are basically impossible to validate anyway)
  • Conditional Modifiers (since BONUS can take PRExxx)

However, since it is expected that few, if any, items will be automatically converted, the actual details of these challenges will be part of individual deprecation and handled at that time.

Existing Sandbox

There is a proposed implementation of this subsystem, located in 3 pieces:

The Base Libraries are located in

The Formula subsystem is located in

A functioning version of PCGen demonstrating the system is (sometimes) located in ... look for a branch that starts with NEWTAG-239

Use Cases

Global Variables

Inspire Duration

Controls the heroics duration for a Bard ("Bardic Inspire Heroics")


DEFINE:InspireDurationBase|0 (in an Ability)
DEFINE:InspireHeroicsDuration|InspireDurationBase (in an Ability)
BONUS:VAR|InspireDurationBase|5 (on Bard class level 1)


NAMESPACEDEF:NUMBER|VAR (in Data Control file)
GLOBAL:VAR|InspireDurationBase (in Variable Def file)
GLOBAL:VAR|InspireHeroicsDuration (in Variable Def file)
MODIFY:InspireHeroicsDuration|SOLVE|InspireDurationBase (in an Ability)
MODIFY:InspireDurationBase|ADD|5 (on Bard class level 1)

The first DEFINE: is converted to a Variable Definition

The second DEFINE: is converted to a Variable Definition and a MODIFY: (since it was defining to a non-zero value)

The BONUS: is converted to a MODIFY

Local Variables

Catching Bad Use of a Variable

GLOBAL:VAR|Foo (in an Ability)
LOCAL:EQUIPMENT|VAR|Bar (in Equipment)
MODIFY:MyVar|SOLVE|value()+Foo+Bar (in a Skill)

Will fail at load because Bar is a local variable on Equipment.

Fantasy Craft Essence/Charm

Rule: "Whether found, seized, crafted, or purchased, every magic item possesses 1 Essence and/or 1 Charm (but no more)."

Today: Not possible (needs Equipment Vars)


LOCAL:EQUIPMENT|VAR|AllowedCharms (in Variable Definition file)
LOCAL:EQUIPMENT|VAR|PossessedCharms (in Variable Definition file)
MODIFY:AllowedCharms|ADD|1 (on EqMod that makes an item magical)
MODIFY:PossessedCharms|ADD|1 (on any Charm EqMod)
PREEQVARLT:PossessedCharms,AllowedCharms (on any Charm EqMod)

Note this provides the flexibility to allow the charm limit to be 5 for artifacts.

Note also that this specific implementation (by using EQUIPMENT not EQUIPMENT.PART) is putting charms on the entire piece of equipment, not just on a Head/Part of the Equipment...

Barbarian Illiteracy


ABILITY:Special Ability|VIRTUAL|Illiteracy|PREVAREQ:TL,IlliteracyLVL

New System:

GLOBAL:VAR|IlliteracyLVL (in Variable Definition file)
ABILITY:Special Ability|VIRTUAL|Illiteracy|PRENEPVAREQ:totallevel(),IlliteracyLVL

Note the special case of this() being allowed in the function classlevel.... two effects: "this" is a "reserved" function name, just like "value", and it safely allows cloning, et al to occur since it is resolved at run time to determine the owning object.

Note the use of this() may have to be limited to certain situations - there are objects that do not properly "trace themselves" through today's formula system and we will need to address how that works in the new system to ensure that the tracing always exists (although the requirement for a scope may resolve that issue entirely)

Proper Order of Operations

Order of operations and variable scope definition

If we have a formula on Longsword that uses:


Let's assume we are building a dependency tree for MyVar. As a reader, we can see that this formula is dependent on Foo and Bar.

We force global definition of variables to resolve other ambiguity. Therefore, we know, for example, the "Variable ID" of Foo is "Global:Foo". However, if we are in a piece of equipment, we must know if Bar is "Global:Bar" or "Equipment('Longsword'):Bar". We do this by loading the Variable Definition file before any data.

The alternative (attempting to use some replacement for DEFINE) produces a number of uncomfortable situations that may only be detectable at runtime. This is asserted to be undesirable (we want to catch issues at LST load), so we force a pre-definition of all variables.

The creation of the appropriate VariableID (in this case "Longsword:Bar") is done by the VariableIDFactory - the "sole place" to get VariableIDs created.

PC Characteristic


An example of this is "Hands" on a PlayerCharacter (simple Integer)

Today the base # of hands is set by the race, and potentially modified by Templates.

Existing processes we use to calculate Hands has a significant issue: There is a race condition. In the case of two templates that modify "Hands" on a PC, it is the template that is "applied last" which wins. However, "last" is relative, since the contents of a PC can be constantly re-interpreted internally by PCGen. So there are some (admittedly rather obscure) corner cases where the calculation would not be correct.

We need to have a way to eliminate the race condition


HANDS:2 (on Race)
HANDS:4 (on Template1)
HANDS:6 (on Template2)

The PC will have 4 or 6 hands, depending on the order PCGen sees the templates (this is not [well] guaranteed)

New System:

GLOBAL:VAR|Hands (in Game Mode)
MODIFY:Hands|SET|2 (in Race)
MODIFY:Hands|SET|4|PRIORITY=x (in Template1)
MODIFY:Hands|SET|6|PRIORITY=y (in Template2)

The PC will have 4 or 6 hands, depending on the values of x and y. if x == y then the output is undefined like in the existing case. Otherwise, the "higher" priority "wins".

Note also that we have freed up templates to do addition of hands rather than a set, so the new system is MUCH more flexible for the data team.


For items that require multiple modifications, we add more code and infrastructure because of more and more complex mathematical requirements. This lack of flexibility puts pressure on the code team rather than allowing the data team to simply specify the calculation it desires. Movement is a good example here where we have multiple bonuses that add to movement, multiply, add after the multiply, etc. This "tit-for-tat" escalation that requires new token creation should be eliminated.



Theoretically, this will produce (20+10)*2+5 = 65 as the movement.


This also produces 65 as the movement, and is enormously flexible about being able to do additional calculations on movement without having to get more and more BONUS objects defined in order to get the answer correct. All that needs to be correctly specified is the Priority of the modifications to ensure they occur in the correct order (and other data can insert new items between existing calculations should anything like that be needed - no code required)

General use of Priority

Capping a value

In the case of enforcing a CAP, I actually think it's a bad idea for code to be involved. We see too many sources "break" limitations implied by game systems, and the code wants nothing to do with a "tit-for-tat" lock and unlock game. The whole point of providing a PRIORITY= setting to the data team is to avoid that escalation. As this develops into the full formula system and not just for EqVars, you can expect locking as in how stats are done today to disappear and have that responsibility entirely transferred to the data. The data should do all such enforcement.

For example, it can easily (and in my opinion much more clearly) be solved with an equation based "lock" (a "soft lock" as it might be called):


or preferably:


This way, the data sets a "soft limit" at 20, that the data can override if it REALLY needs to by setting a priority over 1000000

This implies that as part of this proposal, it might be advisable for the data team to set a "limit number" in the data (such as a million) that signifies a base rule limitation and at that point folks would know that any priority they have to set over that number is signifying that they are bending the base rules.... and any that don't should be kept under that value. That way, the numbers can communicate something to the data developer, just like the standards for KEY can communicate things if you know the syntax.

Major Components

Overall Formula

To be stored in pcgen.base.formula

Parser syntax is written in jjtree (.jjt) syntax, compiled to a javacc (.jj) file and then to java. Parser syntax, parser files modified from the default, and dynamic files are stored in pcgen.base.formula.parse

Principles of Design:

  • Typical mathematical calculations (+-*/%^, logical operations, etc)
  • Allow variables and functions
  • Functions can be parenthesis functions, e.g. count(...) or bracket functions, e.g. COUNT[...]. The two are NOT interchangeable.
  • Parse the formula into a tree, allow the tree to be walked in order to perform calculations
  • Allow the formula tree to be "reconstructed" into the string representation so the data converter can do formula modification (e.g. a function rename). Note that this reconstruction follows the same principles as those of LST tokens: We DO NOT guarantee spacing or other syntactically-ignored items. For example: 1+ INT may be output as 1+INT (deleting the space). Perfect reconstruction adds complexity to the parser that has (as far as I can tell) no useful value.

Maintain the formula as a tree - minimize post-processing after parse. Little value in post-processing (the "Expensive" part is building the tree).

Visitor pattern already well recognized, so we can just have visitors that perform different functions and that should be reasonable for a developer to follow.


Performs Two Functions:

  • Identifies whether the formula is valid
  • Identifies what format the formula returns

Principles of design for validity:

  • Let parse errors do their thing (We'll have to consider what we do as far as bad parser diagnostics)
  • A formula should be able to return a FormulaSemantics object to identify (with specificity) any issues with the formula
  • If there is more than one issue, only one issue needs to be returned (fast fail is acceptable)
  • Want to be able to be clear to a user why something is not a valid formula (e.g. a variable that is never defined)
  • Things to detect: Bad structure (internal errors), invalid # of formula arguments, function not found, variable not found


To be stored in pcgen.base.formula.variable

Principles of Design/use:

  • There are NO built in terms - we will have "global" variables that can be defined at the game mode level (not attached to an object)
  • Variable names must start with a letter, and may have numbers, periods, underscores
  • Variable names may have a single equal sign ("=") for backwards compatibility with things like CL=x. This should NOT be part of a data standard and may be deprecated after we make a full conversion to the new equation parser
  • A Variable has both a name and a scope (there can be local variables)

Variable Scope

Also stored in pcgen.base.formula.variable due to the relationship between variables and the Scope in which they are used.

Principles of Design:

  • Support local variables (e.g. a variable solely calculated within a piece of equipment)
  • Detect when a variable is properly used in scope (see use cases)
  • A Variable name may be defined in more than one scope AS LONG AS the scopes are disjoint. If "Foo" is used as an Equipment variable, it CAN be used as a Spell variable (for example) but NOT as a Global Variable (because use of "Foo" inside of Equipment would then be ambiguous as to whether it was referring to the Global Variable or the Local one). [This assumes we support multiple scopes of local variables at some point] (see use cases). Enforcement of this is done by "ScopeLibrary".
  • Each variable scope needs to understand / contain its parent scope so that the scope tree can be "walked" during variable resolution
  • Scope should not be null (there should be a "global scope" object) [This is mainly for "laziness" of not wanting to have null checks all over the place - just enforce up front the != null characteristic]. This is enforced by "ScopeLibrary".


There are a few design characteristics / decisions that should be noted:


It's probably appropriate to walk through the various items and their names:


This sets the default value for a "Variable Format". Variable Formats will likely be implemented as plugins (and it will translate "NUMBER" as representing a variable of Number.class). This is required in order to use that "Variable Format" in the first argument to NAMESPACEDEF:


This Defines a Namespace / "Variable Type" called "VAR". Variable Type "VAR" is a "NUMBER". "NUMBER" is a Variable Format and therefore any "VAR" will default to the value provided in the Game Mode Misc Info file for "NUMBER"


This defines a variable with a "GLOBAL" scope, of Variable Type "VAR", and the Variable Name is "GlobalVariable". A PC will have a single, global, shared value for "GlobalVariable" (This variable has the default of Variable Type "VAR" which was acquired from Variable Format "NUMBER"). All of this information (GLOBAL, VAR, GlobalVariable) is stored in a ScopeTypeDefinition.


This defines a Local variable that will operate in the "EQUIPMENT" scope. It is also of of Variable Type "VAR", and the Variable Name is EqVarOne. Each piece of Equipment will have a single value for "EqVarOne" that (barring explicit action by the data with capabilities not defined here) will have a value independent of other Equipment. (It also has the default of Variable Type "VAR" which was acquired from Variable Format "NUMBER"). All of this information (LOCAL:EQUIPMENT, VAR, EqVarOne) is stored in a ScopeTypeDefinition.

Legal scopes are implemented as plugins, although the usefulness of that is limited as they still need to be connected back to LST file loaders at this time.

When an object is actually initialized, we end up with a "variable scope", this is a combination of an actual instance and the var name (e.g. "Longsword:Foo"). This is done when an item is instantiated. In the case of Equipment, this means (a) when it is added to the PC or (b) When it is opened in the customizer.

ScopeTypeDefinition design reasoning

One possibility would have been to force the ScopeTypeDefinition to be a hard class, such as Equipment.class and have that implicitly compared within the ScopeLibrary. This makes a pretty broad set of assumptions in how a ScopeTypeDefinition is related to another ScopeTypeDefinition (is it just the class hierarchy, should interfaces be ignored, etc.) It also makes an awkward situation for interim classes where the scope may not be sensible in a given use. There is also the problem that we know we have situations where the scope of a sub-class should match that of the class (think SubClass.class). Thus, coding this as a Class was seen as too restrictive. If local variables are allowed on classes, then we need to allow those to work on a SubClass line as well. So a strict compare does not work, and if the rule is not that obvious, then it's better to externalize the check.

Therefore, we have a ScopeTypeDefinition object. Since scope definitions are defined by their parent scope, this allows us to subclass items yet still have them in a specific scope, since a given implementation of ScopeTypeDefinition can understand precisely what the parent scope is. All of the construction and enforcement is done by "ScopeLibrary".

ScopeLibrary design reasoning

Why is ScopeLibrary necessary? - Why a factory?

Specifically, we want to control where VariableIDs are manufactured to ensure we are always manufacturing VariableIDs that are compatible with the given scope for a variable.

Since that needs to be enforced against where a variable is legal, we need the VariableID construction to be closely associated with the ScopeTypeDefinition object, so we contain construction inside the factory and the factory also holds the ScopeTypeDefinition objects, so a VariableID cannot be constructed if it does not meet the ScopeTypeDefinition.


Located in pcgen.base.formula.function (as well as - in the future - plugins)

Principles of design:

  • Want to be able to have functions pluggable (common interface, ability to pull name from the instance)
  • Bracket functions support one and only one argument.
  • Paren functions support zero or more arguments (based on the function). A function may support a variable number of arguments... it is up to that function to declare if it is valid [or not] for the given arguments.


  • Bracket functions are designed for backwards compatibility only and are discouraged. Use should fall outside of a data standard
  • Bracket functions may be deprecated in the future, subject (of course) to replacement by other means

Formula requirements that "fall onto" functions:

  • Want to be able to tell if a formula is static (can cache the answer)
  • Want to be able to tell if a formula is valid at LST load (find invalid variables/functions, find incorrect or invalid function arguments)
  • Want to be able to get a list of variables from a formula so we can establish dependencies
  • Want to be able to have the Formula identify its semantics at LST load (is it returning a Boolean or Number, etc.)
    • Ensure proper validation of local/global variables and catching scope problems at LST load (see use cases)
    • Help avoid recalculations of unnecessary things (if desired)

Formula Manager

Located in pcgen.base.formula.manager

  • Design Pattern is composite
  • Exists to simplify those things that require context to be resolved (legal functions, variables (which pulls in scope))
  • Done also to "cache" the visitors (since each visitor needs to know some of the contents in the FormulaManager, they can be lazily instantiated but then effectively cached as long as that FormulaManager is reused - especially valuable for things like the global context which in the future we can create once for the PC and never have to recreate...)


Located in pcgen.base.formula.visitor

We have a series of visitors that can perform various functions on the parse tree. This includes, but is not limited to:

  • Reconstructing the string (so formulas can be deeply processed by a converter)
  • Evaluating the Formula
  • Determining if a Formula is static
  • Determining if a Formula is valid
  • Dumping the formula to standard out
  • Capturing all dependencies of a Formula.


Located in pcgen.base.modifier and pcgen.base.modifier.number

  • Can change the value of the characteristic/variable
  • Needs two sets of priority:
    • Inherent Priority: Order of operations akin to mathematical order of operations. Multiplication before addition, for example.
    • User Priority: Allows the user to set the priority (effectively looked at before the inherent priority)
  • Allow for arbitrary ordering of calculations (and thus resolving race conditions) without additional code intervention
  • Can use the existing value, so the following are equivalent in function:



Why have ADD

If ADD with an integer x and SET with value()+x are equivalent in function, why have ADD at all?

These are NOT AT ALL equivalent in memory use or speed/performance.

The ADD (Since it's adding an integer) occupies approximately 40 bytes of memory and is an extremely rapid calculation: If it took 200 nanoseconds I'd be surprised. This is because ADD is using a number. If it was a formula, even 1+2, it would load the formula parser. This special case on constants allows the modification system to modify a value without loading a much larger infrastructure, and this is valuable as a large majority of our calculations are simple modifications of values.

The SET shown above loads the formula system, and thus may occupy 500 (or more) bytes of memory (>10x), so it could take 100 times as long to process as the ADD (it also took longer during LST load).

Thus: When possible the data standards should be written to use numeric modifiers (think "static" modifiers ... that only use numbers) rather than a formula

The intent (for Number format objects anyway) is to have:

  • ADD
  • SET
  • MIN
  • MAX

Why value is a function not a variable

Why does a formula use value() [a function] rather than something like %VALUE (a variable)?

This has to do with how the FormulaManager works.

Adding a Variable means teaching the ScopeFactory that a new local variable is allowed. Allowing a "super local" variable (a.k.a. "temporary" variable) would mean allowing it in an arbitrary scope. So we immediately take on the burden of allowing a variable in more than one scope. (Alternately we would enable it in one scope, use it, and then delete that availability - that level of churn seems unnecessary). Even with it enabled in more than one scope, we have to teach the "variable cache" the value, and then immediately clear the value so it is not accidentally used elsewhere. We also have to build a new FormulaManager (since the FormulaManager passes itself to any functions and we have to have it pass the "correct" one - so a decorator on the FormulaManager will not work). The result of using a variable is a lot of load, use, destroy behavior (and in ways that is _definitely_ not thread safe because the "variable cache" is shared and polluted with the "local" value).

Using a function allows us to create a derivative FormulaManager that does not alter the variable storage. So the ScopeFactory, VariableScope, and VariableStore all remain unchanged. We simply decorate the existing function library with a new function library that recognizes "value()" as a function, and pass that new FormulaManager into the Formula to be resolved. It still results in a few "temporary" objects (a FormulaManager, the decorator on the FunctionLibrary, and an EvaluateVisitor), but since it does not alter shared resources (e.g. the VariableCache), it is thread safe in addition to being significantly less effort to implement.

Why Modifier takes a FormulaManager as a parameter

For formula resolution of a SET with something like "value()%2"

Initial value of variables

  • All variable types require an initial value that is not externally dependent (e.g. must be a "set" modifier")
  • Initial Value is set by the type of the variable. For example, all "Number" variables could be defaulted to zero. These defaults are loaded into the SolverFactory by the game mode.


Located in pcgen.base.solver

  • Built by the SolverFactory based on the type of variable (Factory provides the initial value)
  • Perform the calculation from an initial value through all the modifiers provided for that variable
  • Relies on a "variable store" that stores the results of other calculations, so things like hands*5 can be used to calculate fingers.
  • Can be diagnosed, so the UI can display:
    • The initial value
    • Each modification that took place and the source of the Modifier
    • The final value

Solver Manager

Located in pcgen.base.solver

  • Is aware of the full set of Solvers for a particular system
  • Maps variables to solvers
  • Tracks dependencies between variables/solvers, can calculate solvers as required
  • Can be "push" or "pull" on solving variables, current implementation is "aggressive" (meaning it will recalculate as soon as a dependency is updated) but not "topologically sorted" (meaning it can do more calculations than strictly necessary)


Steps taken for an "aggressive solver" that is not "topologically sorted"

Assume the following was added to the PC (in this order) and that all variables are defined.

a) MODIFY:Fingers|SET|5
b) MODIFY:Hands|SET|Fingers/5
c) MODIFY:Fingers|ADD|5
d) MODIFY:Feet|SET|Toes/5
e) MODIFY:Appendages|SET|Fingers+Toes+Hands+Feet
f) MODIFY:Toes|ADD|10

The following would take place:

Fingers is set to 5 (overriding the default value of zero)
Hands is identified as being dependent on Fingers.
Hands is solved to get 1.
Fingers is set to 5 (set) + 5 (add) = 10
Hands (since it is dependent) is recalculated to produce 2.
Feet is identified as being dependent on Toes.
Feet is set to zero (toes is zero since it has no modifier)
Appendages is identified as being dependent on Fingers, Toes, Hands, Feet
Appendages is calculated to be 12 (Feet is zero, and Toes is zero since it has no modifier)
Toes is calculated to be 10 (start at 0, add 10)
Feet and Appendages both need to be recalculated.  Note there are two options here:
(g1) Perform a topological sort on the dependencies to realize you should recalculate  Feet then Appendages
(g2) Recalculate in a random order, risking that the calculation order will be:
* Appendages (because Toes changed) set to 22
* Feet set to 2
* Appendages (because Feet changed) set to 24
Note: The current implementation does not do a topological sort, asserting that the sort is more expensive than miscalculation.  This can easily be tested once this is integrated into PCGen.
Toes is set to 10.  The set at Priority 1000 "resets" the value vs. the 0+10 that was originally calculated for Toes (effectively done at priority = 0).
No other updates because the value for Toes did not change.

Variable Lifecycle Walkthrough

Prior to Load

Prior to any Load, there is a minimum of definition that has to be performed to set up the global scope. First we need to define the global scope, then we need to instantiate it. This would occur as:

globalScopeDef = scopeLib.getScopeDefinition(scopeLib.getMasterScopeDefinition(), "Global Variables");
globalScope = scopeLib.instantiateScope(scopeLib.getMasterScope(), globalScopeDef);

During Load - Administrative

During the Load of data, we need to take on the administrative task of ensuring that the "Active" scope is properly identified. The precise details are not defined here, but for purposes of local variables in Equipment/EqMods, we would need something like:

eqScopeDef = scopeLib.getScopeDefinition(globalScopeDef, "Equipment Variables");

For each Equipment line we encounter, we would then do something like:

e.putObject(ObjectKey.VARIABLE_SCOPE, scopeLib.instantiateScope(globalScope, eqScopeDef));

Why into the Equipment? Because we can't reliable know the key (due to the KEY: token), so we can't reliably store it externally from the Equipment.

Encountering a Definition

The first thing that will occur is encountering a definition:


Inside of the code that processes definitions (in reality probably within VariableContext) we take certain action:

valid = scopeLib.assertVariableScope("Legs", globalScopeDef);

This would check to see if "Legs" had already been used in a scope that was incompatible with the Global Scope (return of true indicates the usage is safe).

It could also be local:

valid = scopeLib.assertVariableScope("MaxModsAllowed", localScopeDef);

Note that the management of the "localScopeDef" is not much different than other "semi-global" information we already store on objects as we parse through data (e.g. sourceURI), so this is not terribly problematic to manage.

Encountering usage

This is usage in a formula such as:


This will validate the usage is legal:

//Parse formula into tree
semantics = formulaManager.isValid(tree);

This can be checked for whether it is valid, as well as ensure that it returns a Number (which is what is required in this situation). Note that implicitly during this operation, the variables were checked with the ScopeLibrary:

scopeLib.isLegalVariableID("Legs", activeScopeDef);
scopeLib.isLegalVariableID("Arms", activeScopeDef);

If this were used in Equipment, then activeScopeDef would have been eqScopeDef... but since that will check for local items and then defer to its parent (globalScopeDef), these would both be identified as legal.

As a note, this is the scope definition since we are at LST load and therefore can't ensure that we know a good activeScope... so we validate against the definition. While we could potentially pass in a "partially built" scope, it is MUCH safer to simply check validity against the definition rather than against a scope (since treating the scope as active may set incorrect expectations).


When it's time to actually solve a formula, it is sent to the FormulaManager:


This will implicitly check with scopeLib to get the appropriate VariableID objects:

scopeLib.getVariableID(activeScope, "Legs");

Note that we are now using the Scope of the object, so perhaps eq.getObject(ObjectKey.FORMULA_SCOPE) ... we can do this at runtime since we will have a concrete and fully formed scope. Note also that the scopeLib is smart enough to detect local vs global vars (each Scope has a parent and will check the parent to ensure that the appropriate VariableID is returned. In this case, it would be somethign like "Global:Legs".

Key Items for the Data Team to consider

There are a number of things for the data team to consider that are summarized here. Some of these are related to syntax to ensure the data team is considering items in preparation for the PROPOSAL discussion, not to attempt to answer them at this time.

Locking Variables

Is there any use case where we need to "lock" variables where we *really* *really* wouldn't want to let data modify something (data outside the game mode)

Priority Standards

From the capping value use case:

It might be advisable for the data team to set a "limit number" in the data (such as a million) that signifies a base rule limitation and at that point folks would know that any priority they have to set over that number is signifying that they are bending the base rules.... and any that don't should be kept under that value. That way, the numbers can communicate something to the data developer, just like the standards for KEY can communicate things if you know the syntax.