Grammar
The work on the Excel grammar here is an extension of my HonoursProject and is mainly targeted at the structure of the Excel sheets (cells, formula references, etc). It is currently a work in progress and corrections/additions are welcomed (see http://www.mcs.vuw.ac.nz/~elvis/db/contacts.html).
BNF Example
To parse a formula string into a List of tokens in [RPN] order something like the following would be used. (Inspired by "Lets Build a Compiler", by Jack Crenshaw and extracted from source code of [POI>HSSF].)
[BNF] for the formula expression is : <expression> ::= <term> [<addop> <term>]* <term> ::= <factor> [ <mulop> <factor> ]* <factor> ::= <number> | (<expression>) | <cellRef> | <function> <function> ::= <functionName> ([expression [, expression]*])
Formula
Formula ::= "=" <Expression> ArrayFormula? ::= "=" "{" <Expression> "}" An array formula is a formula that performs multiple calculations on one or more sets of values, and then returns either a single result or multiple results. Array formulas are enclosed between braces { }, and are entered by pressing CTRL+SHIFT+ENTER. Expression ::= <Primitive> | <Cell> | <Function> | ... Formulas can refer to cells or ranges of cells, or to names or labels that represent cells or ranges
ReferencePrefix? ::= (<Workbook> | <Sheet> | ε ) <Reference> ε represents the empty string/null
Reference ::= <Cell> | <AreaReference?> AreaReference? ::= <Range> | <Vector> | <Intersection> | <Union>
Workbook ::= "["workbook_path/filename.xls"]" <Sheet> Sheet ::= <Sheetname>"!" Sheetname ::= <The name of each worksheet.> e.g. Sheet1, Sheet2, Sheet3, ...
3DSheetReference? ::= <Sheetname><UnionOperator?><Sheet><Cell>
E.g. =COUNTA(Sheet1:Sheet3!A1)
Cell ::= ["$"]<Column>["$"]<Row> The $ symbol denotes an absolute reference to that dimension/axis. Row ::= x | x ε Int, 0 < x ≤ 65536 Row numbers are in [1..65536]
Column ::= [A-Z] | [A-H][A-Z] | I[A-Z] Here [A-Z] is interupted like a regular expression i.e. Any character between 'A' and 'Z'. Columns use letters (A through IV, for a total of 256 columns) for labels.
Absolute/Relative? references
Absolute Column, Absolute Row, $A$1, is an absolute cell reference
Relative Column, Absolute Row, A$1, is a mixed cell reference
Absolute Column, Relative Row, $A1, is a mixed cell reference
Relative Column, Relative Row, A1, is a relative cell reference (default style)
Pressing the F4 key in Excel, while typing the cell address, will cycle it through the four variations.
R1C1 references
[Ref]. To toggle between A1 and R1C1 style referencing Tools > Options > General tab > R1C1 reference style.
In R1C1 style, Microsoft Excel indicates the location of a cell with an "R" followed by a row number and a "C" followed by a column number. For example, the absolute cell reference R1C1 is equivalent to the absolute reference $A$1 in A1 reference style. If the active cell is A1, the relative cell reference R[1]C[1] refers to the cell one row down and one column to the right, or B2.
The following are examples of references in R1C1 style.
Reference Meaning
R[-2]C A relative reference to the cell two rows up and in the same column R[2]C[2] A relative reference to the cell two rows down and two columns to the right R2C2 An absolute reference to the cell in the second row and in the second column R[-1] A relative reference to the entire row above the active cell R An absolute reference to the current row
AreaReference? operators
Range ::= <Cell><RangeOperator?><Cell>
Vector ::= <ColumnRange?> | <RowRange?> It is possible to refer to a single column(B:B) or row(5:5) range (known as a vector) in Excel.
ColumnRange? ::= <Column><RangeOperator?><Column> The first column reference should be ≤ to the second column reference.
RowRange? ::= <Row><RangeOperator?><Row> The first row reference should be ≤ to the second row reference.
Intersection ::= <AreaReference?><IntersectionOperator?><AreaReference?> Two ranges seperated by a single space.
Union ::= <AreaReference?><UnionOperator?><AreaReference?> Reference operators combine ranges of cells for calculations.
Operators
(Based on [WSFF] and help files from [EXCEL])
Operators specify the type of calculation that you want to perform on the elements of a formula. Microsoft Excel includes four different types of calculation operators: arithmetic, comparison, text, and reference.
Operator ::= <ComparrisonOperator?> | <ArithmaticOperator?> | <ConcatinationOperator?>
ComparrisonOperator? ::= ">=" | "<=" | "≥" | "≤" | "<>" | "=" Comparison operators compare two values and then produce the logical value TRUE or FALSE. (greater than sign) Greater than. E.g. A1>B1 (less than sign) Less than. E.g. A1<B (greater than or equal to sign) Greater than or equal to. E.g. A1>=B1 (less than or equal to sign) Less than or equal to. E.g. A1<=B1 (not equal to sign) Not equal to. E.g. A1<>B1 (equal sign) Equal to. E.g. A1=B1
ArithmaticOperator? ::= <ExpressionOperator?> | <TermOperator?> | <ExponentiationOperator?> | <PercentageOperator?> Arithmetic operators perform basic mathematical operations such as addition, subtraction, or multiplication; combine numbers; and produce numeric results.
ExpressionOperator? ::= "+" | "-" Arithmetic operators perform basic mathematical operations such as addition, subtraction, or multiplication; combine numbers; and produce numeric results. + (plus sign) Addition. E.g. 3+3 – (minus sign) Subtraction. E.g. 3–1
TermOperator? ::= "*" | "/" * (asterisk) Multiplication. E.g. 3*3 / (forward slash) Division. E.g. 3/3
NegationOperator? ::= "-" – (minus sign) Negation. E.g. –1
ExponentiationOperator? ::= "^" ^ (caret) Exponentiation. E.g. 3^2
PercentageOperator? ::= "%" % (percent sign) Percent. E.g. 20%
ConcatinationOperator? ::= "&" (ampersand) Connects, or concatenates, two values to produce one continuous text value. E.g. "North" & "wind" produces "Northwind"
ReferenceOperator? ::= <RangeOperator?> | <UnionOperator?> | <IntersectionOperator?> Reference operators combine ranges of cells for calculations.
RangeOperator? ::= ":" (colon) produces one reference to all the cells between two references, including the two references. E.g. B5:B15
UnionOperator? ::= "," (comma) combines multiple references into one reference. E.g. SUM(B5:B15,D5:D15)
IntersectionOperator? ::= " " (single space) produces one reference to cells common to two references. E.g. SUM(B5:B15 A7:D7) In this example, cell B7 is common to both ranges.
Operator Precedence
Based on [Excel Tutorial - SC 2000 - Basic Math] ([local]) and [Microsoft support].
When you combine several operators into a single formula, Microsoft Excel performs the operations in the following order:
- : Range RangeOperator?
- space Intersection IntersectionOperator?
- , Union UnionOperator?
- - Negation NegationOperator?
- % Percentage PercentageOperator?
- ^ Exponentiation ExponentiationOperator?
- * Multiplication or / Division ExpressionOperator?
- + Addition or - Subtraction TermOperator?
- & Text Operator ConcatinationOperator?
- = < > <= >= <> Comparison Operators ComparrisonOperator?
In Excel 97 Help: operators, evaluation order in formulas "The order in which Microsoft Excel performs operations in formulas" topic: operators, precedence...
Note: (From http://www.faqs.org/faqs/spreadsheets/faq/)
10.13 Why is =-1^2 positive and =0-1^2 negative?
nonstandard operator precedence -- the `unary minus' has a high precedence, as normal, but the `exponentiation operator' has a higher one, which is NOT normal.
Excel treats -1^2 as (-1)^2, while most languages would as -(1^2). (you can ALWAYS use parentheses to force either interpretation.)
Note that Excel formulas and VBA formulas disagree on the order.
Primitives
Primitive ::= Number | Boolean | String | Date
Functions
To get list of functions in catagory. Find in help for "about function".
Cut and paste list of functions to Word and do replace using ^wworksheet function^p with |
Then do replace using ^t with nothing
Function ::= (Function_LOGICAL | Function_FINANCIAL | Function_STATISTICAL | Function_MATHANDTRIG | Function_DATABASE | Function_DATEANDTIME | Function_ENGINEERING | Function_TEXT | Function_INFORMATION | Function_LOOKUPANDREFERENCE)
Function_LOOKUPANDREFERENCE ::= ("ADDRESS" | "AREAS" | "CHOOSE" | "COLUMN" | "COLUMNS" | "HLOOKUP" | "HYPERLINK" | "INDEX" | "INDIRECT" | "LOOKUP" | "MATCH" | "OFFSET" | "ROW" | "ROWS" | "TRANSPOSE" | "VLOOKUP")
Function_INFORMATION ::= ("CELL" | "COUNTBLANK" | "ERROR.TYPE" | "INFO" | "ISBLANK" | "ISERR" | "ISERROR" | "ISEVEN" | "ISLOGICAL" | "ISNA" | "ISNONTEXT" | "ISNUMBER" | "ISODD" | "ISREF" | "ISTEXT" | "N" | "NA" | "TYPE")
Function_FINANCIAL ::= ("ACCRINT" | "ACCRINTM" | "AMORDEGRC" | "AMORLINC" | "COUPDAYBS" | "COUPDAYS" | "COUPDAYSNC" | "COUPNCD" | "COUPNUM" | "COUPPCD" | "CUMIPMT" | "CUMPRINC" | "DB" | "DDB" | "DISC" | "DOLLARDE" | "DOLLARFR" | "DURATION" | "EFFECT" | "FV" | "FVSCHEDULE" | "INTRATE" | "IPMT" | "IRR" | "MDURATION" | "MIRR" | "NOMINAL" | "NPER" | "NPV" | "ODDFPRICE" | "ODDFYIELD" | "ODDLPRICE" | "ODDLYIELD" | "PMT" | "PPMT" | "PRICE" | "PRICEDISC" | "PRICEMAT" | "PV" | "RATE" | "RECEIVED" | "SLN" | "SYD" | "TBILLEQ" | "TBILLPRICE" | "TBILLYIELD" | "VDB" | "XIRR" | "XNPV" | "YIELD" | "YIELDDISC" | "YIELDMAT")
Function_ENGINEERING ::= ("BESSELI" | "BESSELJ" | "BESSELK" | "BESSELY" | "BIN2DEC" | "BIN2HEX" | "BIN2OCT" | "COMPLEX" | "CONVERT" | "DEC2BIN" | "DEC2HEX" | "DEC2OCT" | "DELTA" | "ERF" | "ERFC" | "GESTEP" | "HEX2BIN" | "HEX2DEC" | "HEX2OCT" | "IMABS" | "IMAGINARY" | "IMARGUMENT" | "IMCONJUGATE" | "IMCOS" | "IMDIV" | "IMEXP" | "IMLN" | "IMLOG10" | "IMLOG2" | "IMPOWER" | "IMPRODUCT" | "IMREAL" | "IMSIN" | "IMSQRT" | "IMSUB" | "IMSUM" | "OCT2BIN" | "OCT2DEC" | "OCT2HEX")
Function_DATEANDTIME ::= ("DATE" | "DATEVALUE" | "DAY" | "DAYS360" | "EDATE" | "EOMONTH" | "HOUR" | "MINUTE" | "MONTH" | "NETWORKDAYS" | "NOW" | "SECOND" | "TIME" | "TIMEVALUE" | "TODAY" | "WEEKDAY" | "WORKDAY" | "YEAR" | "YEARFRAC")
Function_DATABASE ::= ("DAVERAGE" | "DCOUNT" | "DCOUNTA" | "DGET" | "DMAX" | "DMIN" | "DPRODUCT" | "DSTDEV" | "DSTDEVP" | "DSUM" | "DVAR" | "DVARP" | "GETPIVOTDATA")
Function_MATHANDTRIG ::= ("ABS" | "ACOS" | "ACOSH" | "ASIN" | "ASINH" | "ATAN" | "ATAN2" | "ATANH" | "CEILING" | "COMBIN" | "COS" | "COSH" | "COUNTIF" | "DEGREES" | "EVEN" | "EXP" | "FACT" | "FACTDOUBLE" | "FLOOR" | "GCD" | "INT" | "LCM" | "LN" | "LOG" | "LOG10" | "MDETERM" | "MINVERSE" | "MMULT" | "MOD" | "MROUND" | "MULTINOMIAL" | "ODD" | "PI" | "POWER" | "PRODUCT" | "QUOTIENT" | "RADIANS" | "RAND" | "RANDBETWEEN" | "ROMAN" | "ROUND" | "ROUNDDOWN" | "ROUNDUP" | "SERIESSUM" | "SIGN" | "SIN" | "SINH" | "SQRT" | "SQRTPI" | "SUBTOTAL" | "SUMIF"+")" | "("+"SUM" | "SUMPRODUCT" | "SUMSQ" | "SUMX2MY2" | "SUMX2PY2" | "SUMXMY2" | "TAN" | "TANH" | "TRUN")
Function_STATISTICAL ::= ("AVEDEV" | "AVERAGE" | "AVERAGEA" | "BETADIST" | "BETAINV" | "BINOMDIST" | "CHIDIST" | "CHIINV" | "CHITEST" | "CONFIDENCE" | "CORREL" | "COUNT" | "COUNTA" | "COVAR" | "CRITBINOM" | "DEVSQ" | "EXPONDIST" | "FDIST" | "FINV" | "FISHER" | "FISHERINV" | "FORECAST" | "FREQUENCY" | "FTEST" | "GAMMADIST" | "GAMMAINV" | "GAMMALN" | "GEOMEAN" | "GROWTH" | "HARMEAN" | "HYPGEOMDIST" | "INTERCEPT" | "KURT" | "LARGE" | "LINEST" | "LOGEST" | "LOGINV" | "LOGNORMDIST" | "MAX" | "MAXA" | "MEDIAN" | "MIN" | "MINA" | "MODE" | "NEGBINOMDIST" | "NORMDIST" | "NORMINV" | "NORMSDIST" | "NORMSINV" | "PEARSON" | "PERCENTILE" | "PERCENTRANK" | "PERMUT" | "POISSON" | "PROB" | "QUARTILE" | "RANK" | "RSQ" | "SKEW" | "SLOPE" | "SMALL" | "STANDARDIZE" | "STDEV" | "STDEVA" | "STDEVP" | "STDEVPA" | "STEYX" | "TDIST" | "TINV" | "TREND" | "TRIMMEAN" | "TTEST" | "VAR" | "VARA" | "VARP" | "VARPA" | "WEIBULL" | "ZTEST")
Function_TEXT ::= ("CHAR" | "CLEAN" | "CODE" | "CONCATENATE" | "DOLLAR" | "EXACT" | "FIND" | "FIXED" | "LEFT" | "LEN" | "LOWER" | "MID" | "PROPER" | "REPLACE" | "REPT" | "RIGHT" | "SEARCH" | "SUBSTITUTE" | "T" | "TEXT" | "TRIM" | "UPPER" | "VALUE")
Function_LOGICAL ::= ("IF" | "AND" | "OR" | "NOT" | "TRUE" | "FALSE");
[Markus Clermont's] Grammar
Spreadsheet ::= Row, {col, Row}, col Row ::=(Formula|Number|string), {;,Row} Number ::=['-'], digit, {digit}, [',', digit, {digit}] Formula ::= '=', FormulaExpression? FormulaExpression? ::= Expression, {('+'|'-'|'*'|'/'|'^'), Expression} CellAddress? ::= 'R',('$' AbsCord?)|RelCord?, 'C',('$' AbsCord?)|RelCord? RelCord? ::= ['-'],digit,{digit} AbsCord? ::= digit, {digit} LogOp ::= '>' | '<' | '=' | '!=' | '>=' | '<=' LogicalExpression? ::= FormulaExpression? [FormulaExpression] LogArgList? ::= LogicalExpression? ';' LogicalExpression? FunctionCall? ::= ('IF(',LogicalExpression?, ';', FormulaExpression?, [';',FormulaExpression?],')' | (GroupFkt?'(',RArgList?,')') | ('NOT(', LogicalExpression?, ')') | (Function, '(', ArgList? ')') | (LogFunc?'(', LogArgList? ')') | ('SVerweis(', FormulaExpression?, ';', CellAddress?, [':', CellAddress?], ';', FormulaExpression?, ';' FormulaExpression?, ')') GroupFkt? ::= 'SUM'|'MAX'|'MIN'|... Function ::= ... LogFunc? ::='Or'|'And'|... ArgList? ::=FormulaExpression?, {';', FormulaExpression?} RArgList? ::= RArgument, [';' RArgument] RArgument ::= RangeReference? | FormulaExpression? RangeRefernece? ::= CellAddress?, ':', CellAddress?
Example Formulas
- =1
- =1+1
- =A1
- =$B$2
- =SUM(B5:B15)
- =SUM(B5:B15,D5:D15)
- =SUM(B5:B15 A7:D7)
- =SUM(sheet1!$A$1:$B$2)
- =[data.xls]sheet1!$A$1
- =SUM((A:A 1:1))
- =SUM((A:A,1:1))
- =SUM((A:A A1:B1))
- The Function SUM, with a Intersection between a Column and a Range as the argument.
- =SUM((D9:D11,(E9:E11,F9:F11)))
- =IF(P5=1.0,"NA",IF(P5=2.0,"A",IF(P5=3.0,"B",IF(P5=4.0,"C",IF(P5=5.0,"D",IF(P5=6.0,"E",IF(P5=7.0,"F",IF(P5=8.0,"G"))))))))
- From an actual spreadsheet.
- ={SUM(B2:D2*B3:D3)}
- This is an ArrayFormula?
Quotes
"Although the multidensional spreadsheet has attracted a lot of interest since it empowers the end user to analyze business data, this has not replaced traditional analysis by means of a managed query environment." - [COMP442:ODWOT]
Help Stuff from XP
You can use Auto Fill Options to choose options for how to fill the selection. For example, you can choose to Fill Formatting Only or Fill Without Formatting.
If a cell contains a formula that breaks one of the rules, a triangle appears in the top-left corner of the cell. You can prevent these indicators from being displayed. Clicking in these cells will display the trace error icon.
Error detection rules
Formula omits cells in region
The formula may not include a correct reference. If a formula refers to a range of cells, and you add cells to the bottom or right of that range, the references may no longer be correct. The formula does not always automatically update its reference to include the new cells. This rule compares the reference in a formula against adjacent cells. If the adjacent cells contain more numbers (are not blank cells), then the problem is noted.
For example, the formula =SUM(A2:A4) would be noted with this rule, because A6, A7, and A8 are adjacent, and contain data.
A 1| Invoice 2| 15,000 3| 9,000 4| 8,000 5| 20,000 6| 5,000 7| 22,500 8| =SUM(A2:A4)
Errors, Checkers, Auditors
[FishBrain::SpreadsheetAuditing]
(From http://www.faqs.org/faqs/spreadsheets/faq/)
Auditor tools and books
The Excel Auditor [XLAUD] Spreadsheet Detective [SSD]
Professional's Guide to Robust Spreadsheets http://www.manning.com/Richardson/ http://www.squ.edu.om/agr/OnlineCourses/Excel/Formul2.html
Error value Meaning #DIV/0! The formula or function is trying a division by 0. Which is mathematically undetermined. Check for values = 0. One of the references in the formula may be to a blank or empty cell. In this case, change the reference or enter a value other than zero. #N/A Refers to a value that is not available, or non existing ( a blank cell for instance) #NAME? Uses a name that EXCEL does not regognize. #NULL! Specifies an invalid intersection of two areas. #NUM! Uses a number incorrectly for instant in the argument of a function. It is also displayed if the numbers are too big or too small to be understood by EXCEL. #REF! Refers to a cell that is not valid (non existing cell for instance). #VALUE! Uses an incorrect argument or operant. ##### Produces a result too long to be displayed in the cell. This is not really an error: it indicates the the cell needs to be wider or of a different format.
Notes and stuff
Google discussion group: http://groups.google.com/groups?group=comp.apps.spreadsheets
"The shortcut key to cell names is Ctrl + F3." [OOPS] To make Excel calculate a circular reference, set iteration to on. Use Tools|Options|Calculation|Iteration. Also set the Maximum Iterations number to 50 or more.
Extract from http://groups.google.com Subject: Formula- Grammar Newsgroup: comp.apps.spreadsheets Harlan Grove (hrlngrv@aol.com) Subject: Re: Formula- Grammar Newsgroups: comp.apps.spreadsheets View this article only Date: 2001-09-06 11:44:32 PST Markus CLERMONT <mark@isys.uni-klu.ac.at> wrote... >I'm doing currently some research in debugging excel-spreadsheets. >As we are currently developping a tool for detecting errors in >spreadsheets, we have to parse Excel-Formulas. Therefore I'd >need a formula-grammar for excel-formulas. Meaning a BNF-style grammar? I don't think there is one. For the most part it's very basic: 5 infix arithmetic operators (+, -, *, /, ^), 6 infix comparison operators (=, <>, <, <=, >, >=), locale-varying argument separator and decimal 'point' (usually , or ; and . or ,) taken from Windows' Regional Settings (that alone would make a BNF grammar rather a tricky thing to write), numeric constants, string constants, error constants (all begin with #), function calls as function-name(argument-list), and then there's the harder stuff. * range references, including collections, intersections and 3D blocks * array constants * whitespace in limited instances * no more than 7 levels of formula nesting * no argument list longer than 29 arguments * anything not interpreted as one of the items above that matches the regular expression [_A-Za-z][_0-9A-Za-z]* but not 'R' or 'C' would be interpreted as defined names Not sure why you need this. Excel won't allow users to enter syntactically invalid formulas, so you'll never find any cell formulas in Excel workbooks for which you'd need a grammar to check them. If the tool you're developing would locate argument type errors in function calls within cell formulas, that's problematic. In may instances Excel can use numeric and text values interchangeably, like awk or perl. So likely all you really need is a means of identifying function calls within formulas. The syntax isn't materially different from C. Function names match the regular expression [A-Za-z][_0-9A-Za-z]* (no restrictions, but several built-in functions), function names are immediately followed by left parenthesis, then there's the argument list, and finally the closing right parenthesis. So the regular expression [A-Za-z][_0-9A-Za-z]*[(] (assuming you're not inside a double-quoted string) matches the beginning of any function call.
UML Spreadsheet Strucutre
Links
Consider using a parser generator [[JavaCC]] or [JLex] to automatically create code directly from the definition.
[SMSSE] [WFE] [MRX] [UYCS] [ECS] [FD] [DSGPASL] [WSSF] [FishBrain :: SpreadsheetMetrics]
Last updated 26-Jul-2006, Contact Page Author.