## Wednesday 3 May 2017

### LEX YACC program to check / recognize valid Arithmetic Expression

In this post, we will see LEX and YACC programs to check or recognize valid arithmetic expression. This expression may have operations like Addition(+), Subtraction(-), Multiplication(*), Division(/) or Modulus(%). Expression may contain balanced round brackets.
Explanation of the program, is given at the end of this post.
Go through the following programs:

Lex Program: (sample.l)

%{
#include<stdio.h>
#include "y.tab.h"
%}

%%
[a-zA-Z]+ return VARIABLE;
[0-9]+ return NUMBER;
[\t] ;
[\n] return 0;
. return yytext[0];
%%
int yywrap()
{
return 1;
}

Yacc program: (sample.y)

%{
#include<stdio.h>
%}
%token NUMBER
%token VARIABLE

%left '+' '-'
%left '*' '/' '%'
%left '(' ')'

%%

S: VARIABLE'='E {
printf("\nEntered arithmetic expression is Valid\n\n");
return 0;
}
E:E'+'E
|E'-'E
|E'*'E
|E'/'E
|E'%'E
|'('E')'
| NUMBER
| VARIABLE
;

%%

void main()

printf("\nEnter Any Arithmetic Expression which can have operations Addition, Subtraction, Multiplication, Divison, Modulus and Round brackets:\n");
yyparse();
}

void yyerror()
{
printf("\nEntered arithmetic expression is Invalid\n\n");

}

How To Run:
>>>yacc -d sample.y
>>>lex sample.l
>>>gcc lex.yy.c y.tab.c
>>>./a.out

Output:

For Valid Expressions

For Invalid Expressions

Explanation:
LEX
LEX program consists of three sections "Definitions""Regular Expressions and action for each regular expression" and "Subroutines".

Definition section consists of C language code which involves header file inclusion, global variables declaration/definition etc. C language code can be mentioned in between the symbols %{ and %}

LEX requires regular expressions or patterns to identify token of lexemes. Examples of token are identifier, header file, constants etc. while Lexemes are the actual words used in your input program for e.g. printf, scanf, stdio.h etc. These regular expressions and action for them are mentioned in second section. When we call yylex() function, it starts the process of pattern matching. Lex keep the matched string into the address pointed by pointer yytext. Matched string's length is kept in yyleng while value of token is kept in variable yylval. In above program NUMBER and VARIABLE are tokens.

Third section consists of subroutines/functions. Lex call yywrap() function after input is over. It should return 1 when work is done or should return 0 when more processing is required. yylex() function actual starts the process of pattern matching. In above program, we have not called yylex() since yyparse() in yacc program automatically calls yylex(). Thats why there in no need to call yylex() separately. If you are writing standalone Lex program, then you have to call yylex() in main() function in Lex program.

YACC
YACC program also consists of three sections, "Definitions""Context Free Grammar and action for each production""Subroutines/Functions"

In first section, we can mention C language code which may consist of header files inclusion, global variables/ Constants definition/declaration. C language code can be mentioned in between the symbols %{ and %}.  Also we can define tokens in the first section. In above program, NUMBER is the token. We can define the associativity of the operations (i.e. left associativity or right associativity). In above yacc program, we have specified left associativity for all operators. Priorities among the operators can also be specified. It is in the increasing order "from top to bottom". For e.g. in our above yacc program, round brackets '(',')' has the higher priority than '*', '/', '%' which has higher priority than '+', '-'. Operators in the same statement have the same priority. For e.g. in our above program all of the '*', '/', '%' have the same priority.

In second section, we mention the grammar productions and the action for each production.

Third section consists of the subroutines. We have to call yyparse() to initiate the parsing process. yyerror() function is called when all productions in the grammar in second section doesn't match to the input statement.