语句和状态

All my life, my heart has yearned for a thing I cannot name.

在我的一生中,我一直渴望一件无法命名的事情。

André Breton, Mad Love

The interpreter we have so far feels less like programming a real language and more like punching buttons on a calculator. “Programming” to me means building up a system out of smaller pieces. We can’t do that yet because we have no way to bind a name to some data or function. We can’t compose software without a way to refer to the pieces

到目前为止,我们拥有的解释器,感觉不像是在编程一种真正的语言,更像是在计算器上按下按钮。对我来说,编程意味着使用更小的部件,构造一个系统。现在我们还无法做到这一点,因为,我们还无法将名称绑定到某些数据或者函数上,我们无法在没有引用这些片段(数据、函数)的情况下编写软件。

To support bindings, our interpreter needs internal state. When you define a variable at the beginning of the program and use it at the end, the interpreter has to hold on to the value of that variable in the meantime. So in this chapter, we will give our interpreter a brain that can not just process, but remember.

为了支持绑定,我们的解释器需要内部状态,当我们在程序的开始,定义一个变量,并且在程序结尾使用它,解释器必须同时保持该变量的值。所以在本章中,我们将给解释器一个大脑,它不仅仅可以计算,而且可以记忆。

brain

State and statements go hand in hand. Since statements, by definition, don’t evaluate to a value, they need to do something else to be useful. That something is called a side effect. It could mean producing user-visible output or modifying some state in the interpreter that can be detected later. The latter makes them a great fit for defining variables or other named entities.

状态和语句是齐头并进的,根据定义,语句的求值结果不是一个值,因此,它们需要做其他有用的事情,这被称为副作用。这可能意味着生成用户可见的输出,或者修改解释器中的状态,可以被检测到。后者,使得它们非常适合定义变量或者其他命名的实体。

You could make a language that treats variable declarations as expressions that both create a binding and produce a value. The only language I know that does that is Tcl. Scheme seems like a contender, but note that after a let expression is evaluated, the variable it bound is forgotten. The define syntax is not an expression.

你可以创建一个语言,将变量声明视为既创建绑定,又生成值的表达式的语言。我所知道的,唯一做到这一点的语言是Tcl,Scheme看起来是一个竞争者,但是请注意,在计算 let表达式后,它绑定的变量将被遗弃。define 语句不是表达式。

In this chapter, we’ll do all of that. We’ll define statements that produce output (print) and create state (var). We’ll add expressions to access and assign to variables. Finally, we’ll add blocks and local scope. That’s a lot to stuff into one chapter, but we’ll chew through it all one bite at a time.

在本章中,我们将完成所有这些,我们将定义两种语句,输出语句(print) 和 创建状态语句(var). 我们将添加表达式来访问和分配变量。最后,我们将添加块和局部作用域,这一章将会涉及很多内容,但是我们将一点一点去学习。

一、Statements

语句

We start by extending Lox’s grammar with statements. They aren’t very different from expressions. We start with the two simplest kinds:

我们首先用语句,去扩展Lox语言的语法。它们和表达式没有非常大的区别,我们从两种最简单的类型开始。

  1. An expression statement lets you place an expression where a statement is expected. They exist to evaluate expressions that have side effects. You may not notice them, but you use them all the time in C, Java, and other languages. Any time you see a function or method call followed by a ;, you’re looking at an expression statement.

    表达式语句,允许我们将表达式放置在需要语句的地方,它们的存在是为了评估具有副作用的表达式. 你可能没有注意到它们,但是,在C/Java 和其他语言中,我们一直在使用它们。每当我们看到函数或者方法调用语句,后面紧跟一个;符号,我们实际上在使用表达式语句。

  2. A print statement evaluates an expression and displays the result to the user. I admit it’s weird to bake printing right into the language instead of making it a library function. Doing so is a concession to the fact that we’re building this interpreter one chapter at a time and want to be able to play with it before it’s all done. To make print a library function, we’d have to wait until we had all of the machinery for defining and calling functions before we could witness any side effects.

    print语句计算表达式,并且将结果展示给用户。我承认,将print当作一个语句添加到语言中,而不是使其作为一个库函数,非常奇怪。这样做,是对于这样一个事实的让步,即我们将一章一章的构建解释器,并且我们希望在完全构建好解释器之前,就可以使用它。如果要让 print 成为一个库函数,必须等到拥有了定义和调用函数的所有机制后,才可以实现。

Pascal is an outlier. It distinguishes between procedures and functions. Functions return values, but procedures cannot. There is a statement form for calling a procedure, but functions can only be called where an expression is expected. There are no expression statements in Pascal.

Pascal 语言有些不同,它会区分过程和函数,函数返回值,但是,过程不会返回结果。有一种用于调用过程的语句形式,但是只能在需要表达式的地方调用函数。Pascal中没有表达式语句。

I will note with only a modicum of defensiveness that BASIC and Python have dedicated print statements and they are real languages. Granted, Python did remove their print statement in 3.0 . . .

我只会稍微辩解一下,BASIC 和 Python是真正的语言,但是它们也存在print语句,当然,在Python3.0 以后,删除了 print 语句😄。

New syntax means new grammar rules. In this chapter, we finally gain the ability to parse an entire Lox script. Since Lox is an imperative, dynamically typed language, the “top level” of a script is simply a list of statements. The new rules are:

新的语法意味着新的语法规则,在本章中,我们最终将获得解析整个Lox脚本的能力。因为,Lox是一门命令式的动态类型语言,所以,脚本的顶层语法规则,只是一个语句的列表。新的语法规则包含:

program      ——> statement* EOF ;

statement    ——> exprStmt
             | printStmt ;
			 
exprStmt     ——> expression ";" ;
printStmt    ——> "print" expression ";" ;

The first rule is now program, which is the starting point for the grammar and represents a complete Lox script or REPL entry. A program is a list of statements followed by the special “end of file” token. The mandatory end token ensures the parser consumes the entire input and doesn’t silently ignore erroneous unconsumed tokens at the end of a script.

Right now, statement only has two cases for the two kinds of statements we’ve described. We’ll fill in more later in this chapter and in the following ones. The next step is turning this grammar into something we can store in memory—syntax trees.

第一条规则现在是程序,它是语法的起点,表示一个完整的Lox脚本或者REPL 过程,程序是一个语句列表,后面跟随者文件结束标志 EOF, 强制结束 token,确保解释器可以使用整个输入,并且不会在脚本结束时候,忽略错误的未使用的token

现在,对于我们描述的语句,我们只有两种类型。我们将在本章的最后,填写更多的内容。下面,我们将尝试将这种语法,转换为可以保存在语法树中的东西。

1.1 Statement syntax trees

语句语法树

There is no place in the grammar where both an expression and a statement are allowed. The operands of, say, + are always expressions, never statements. The body of a while loop is always a statement.

Since the two syntaxes are disjoint, we don’t need a single base class that they all inherit from. Splitting expressions and statements into separate class hierarchies enables the Java compiler to help us find dumb mistakes like passing a statement to a Java method that expects an expression.

That means a new base class for statements. As our elders did before us, we will use the cryptic name “Stmt”. With great foresight, I have designed our little AST metaprogramming script in anticipation of this. That’s why we passed in “Expr” as a parameter to defineAst(). Now we add another call to define Stmt and its subclasses.

语法规则中,不允许同时使用表达式和语句。例如:+ 运算符,始终是表达式,而不是语句。while 循环的主体永远是语句。

因为,语句和表达式是不相交的,我们不需要一个语句和表达式都继承的基类。将表达式和语句拆分为不同的类,可以使得Java编译器,能够帮助我们发现一些愚蠢的错误,例如:将语句传递给需要表达式的 Java方法。

这意味着,语句存在一个新的基类。正如我们的前辈那样,我们使用Stmt 当作新的基类的名称。我们非常有远见的设计了AST元编程脚本,这就是我们将 "Expr" 当作参数传递给 defineAst()函数的原因,现在,我们将添加一个新的调用,来定义Stmt 和它的子类。

Not really foresight: I wrote all the code for the book before I sliced it into chapters.

实际上,没有什么先见之明,我是先把所有代码都完成后,才开始将本书分成这些章节的。


// tool/GenerateAst.java, in main()

      "Unary    : Token operator, Expr right"
    ));

    defineAst(outputDir, "Stmt", Arrays.asList(
      "Expression : Expr expression",
      "Print      : Expr expression"
    ));
  }
  

The generated code for the new nodes is in Appendix II: Expression statement, Print statement.

为新节点生成的代码可以查看 附录II, 表达式语句,print语句。

Run the AST generator script and behold the resulting “Stmt.java” file with the syntax tree classes we need for expression and print statements. Don’t forget to add the file to your IDE project or makefile or whatever.

运行 AST 生成脚本,查看生成的 "Stmt.java" 文件,其中包含表达式和print语句,所需的语法树类。不要忘记将生成的文件,添加到项目中。

1.2 Parsing statements

解析语句

The parser’s parse() method that parses and returns a single expression was a temporary hack to get the last chapter up and running. Now that our grammar has the correct starting rule, program, we can turn parse() into the real deal.

解析器的parse() 方法解析并且返回一个表达式,这是一个临时的破解方式,让最后一章启动运行。既然,我们的程序有了正确的开始规则, program, 我们就可以把parse() 方法变为真正的处理方法。


// lox/Parser.java, method parse(), replace 7 lines

  List<Stmt> parse() {
    List<Stmt> statements = new ArrayList<>();
    while (!isAtEnd()) {
      statements.add(statement());
    }

    return statements; 
  }

What about the code we had in here for catching ParseError exceptions? We’ll put better parse error handling in place soon when we add support for additional statement types.

我们之前用于捕获ParseError异常的代码呢?当我们添加对于其他语句类型的支持时候,我们将很快实现更好的解析错误处理

This parses a series of statements, as many as it can find until it hits the end of the input. This is a pretty direct translation of the program rule into recursive descent style. We must also chant a minor prayer to the Java verbosity gods since we are using ArrayList now.

这个新的parse() 方法,将会解析一系列的语句,尽可能多,直到输入的结尾。这是将program 语法规则,非常直接的转换为递归下降的样式。由于我们现在使用 ArrayList(),所以,我们还需要引入 Java的 ArrayList模块


// lox/Parser.java

package com.craftinginterpreters.lox;

import java.util.ArrayList;
import java.util.List;

A program is a list of statements, and we parse one of those statements using this method:

程序是一系列的语句组成的,我们将使用 statement() 方法解析其中的一个语句


// lox/Parser.java, add after expression()

  private Stmt statement() {
    if (match(PRINT)) return printStatement();

    return expressionStatement();
  }
  

A little bare bones, but we’ll fill it in with more statement types later. We determine which specific statement rule is matched by looking at the current token. A print token means it’s obviously a print statement.

If the next token doesn’t look like any known kind of statement, we assume it must be an expression statement. That’s the typical final fallthrough case when parsing a statement, since it’s hard to proactively recognize an expression from its first token.

上面的函数有些简单,但是,稍后我们将会使用更多的语句类型来填充 statement() ,通过查看当前的token,我们可以确定匹配哪一个特定的语句规则。print token,表示当前的语句是print语句。

如果,下一个token看起来不像是任何已知的语句类型,我们先假设它,必须是表达式语句,这是解析语句时候,典型的最终失败案例,因为很难从第一个token,就判断出表达式

Each statement kind gets its own method. First print:

每个语句都有自己的对应方法,首先是print语句


// lox/Parser.java, add after statement()

  private Stmt printStatement() {
    Expr value = expression();
    consume(SEMICOLON, "Expect ';' after value.");
    return new Stmt.Print(value);
  }

Since we already matched and consumed the print token itself, we don’t need to do that here. We parse the subsequent expression, consume the terminating semicolon, and emit the syntax tree.

因为我们已经匹配到并且使用了 print token,所以,我们不需要在这里这样做。我们将解析后续的表达式,使用终止分号,生成语法树。

If we didn’t match a print statement, we must have one of these:

如果我们没有匹配到print 语句,则必须是下面的场景之一


// lox/Parser.java, add after printStatement()

  private Stmt expressionStatement() {
    Expr expr = expression();
    consume(SEMICOLON, "Expect ';' after expression.");
    return new Stmt.Expression(expr);
  }


Similar to the previous method, we parse an expression followed by a semicolon. We wrap that Expr in a Stmt of the right type and return it.

与前面的方法类似,我们解析后面跟随者分号的表达式,我们将Expr包装在正确的Stmt类型中,并且返回。

1.3 Executing statements

执行语句

We’re running through the previous couple of chapters in microcosm, working our way through the front end. Our parser can now produce statement syntax trees, so the next and final step is to interpret them. As in expressions, we use the Visitor pattern, but we have a new visitor interface, Stmt.Visitor, to implement since statements have their own base class.

我们在微观世界完成了前面的几章,在前端工作。我们的解析器现在可以生成语法树,因此,下面的并且是最后的步骤是,如何解释它们。和表达式一样,我们使用访问者模式,访问者,因为每一个具体的statement类,都有相同的基类。

We add that to the list of interfaces Interpreter implements.

我们将其添加到解释器实现的接口列表中


// lox/Interpreter.java, replace 1 line

class Interpreter implements Expr.Visitor<Object>,
                             Stmt.Visitor<Void> {
  void interpret(Expr expression) { 

Java doesn’t let you use lowercase “void” as a generic type argument for obscure reasons having to do with type erasure and the stack. Instead, there is a separate “Void” type specifically for this use. Sort of a “boxed void”, like “Integer” is for “int”.

Java 中不允许使用小写的 void 当作泛型类型参数,原因不明,与类型擦除和堆栈有关,相反,有一个单独的Void类型,专门用于这种用途,Void类型之于 void,有点类似 Integer类型之于 int

Unlike expressions, statements produce no values, so the return type of the visit methods is Void, not Object. We have two statement types, and we need a visit method for each. The easiest is expression statements.

和表达式不同,语句不生成值,所以,visit方法的返回值类型是 Void,而不是Object,我们有两种语句类型,每个都需要一种访问方法,最简单的是表达式语句。


// lox/Interpreter.java, add after evaluate()

  @Override
  public Void visitExpressionStmt(Stmt.Expression stmt) {
    evaluate(stmt.expression);
    return null;
  }

We evaluate the inner expression using our existing evaluate() method and discard the value. Then we return null. Java requires that to satisfy the special capitalized Void return type. Weird, but what can you do?

我们使用之前实现的方法 evaluate() 计算表达式,并且丢弃计算结果,返回 null,Java需要满足特殊的Void 类型的值。我们还需要做什么?

Appropriately enough, we discard the value returned by evaluate() by placing that call inside a Java expression statement.

合适的是,我们将 evaluate() 返回的值,放入表达式语句中,从而丢弃该值

The print statement’s visit method isn’t much different.

print语句的访问方法,没有太多的不同


// lox/Interpreter.java, add after visitExpressionStmt()

  @Override
  public Void visitPrintStmt(Stmt.Print stmt) {
    Object value = evaluate(stmt.expression);
    System.out.println(stringify(value));
    return null;
  }


Before discarding the expression’s value, we convert it to a string using the stringify() method we introduced in the last chapter and then dump it to stdout.

在丢弃表达式的值之前,我们使用上一章介绍的 stringify() 方法将结果转换为字符串,然后,将其存储到 stdout

Our interpreter is able to visit statements now, but we have some work to do to feed them to it. First, modify the old interpret() method in the Interpreter class to accept a list of statements—in other words, a program.

我们的解释器,现在可以访问语句,但是,我们需要先做一些工作,来向他们提供信息,首先,修改旧的interpret() ,接受一个语句列表,也就是说,参数是一个program


// lox/Interpreter.java, method interpret(), replace 8 lines

  void interpret(List<Stmt> statements) {
    try {
      for (Stmt statement : statements) {
        execute(statement);
      }
    } catch (RuntimeError error) {
      Lox.runtimeError(error);
    }
  }
  

This replaces the old code which took a single expression. The new code relies on this tiny helper method:

这将替换采用单个表达式的旧代码,新的代码需要一个 新的 execute()函数


// lox/Interpreter.java, add after evaluate()

  private void execute(Stmt stmt) {
    stmt.accept(this);
  }


That’s the statement analogue to the evaluate() method we have for expressions. Since we’re working with lists now, we need to let Java know.

上面是类似于 evaluate() 处理表达式的处理语句方法,因为我们在处理列表,所以需要引入 List


// lox/Interpreter.java

package com.craftinginterpreters.lox;

import java.util.List;

class Interpreter implements Expr.Visitor<Object>,


The main Lox class is still trying to parse a single expression and pass it to the interpreter. We fix the parsing line like so:

主Lox 类,仍然在尝试解析单个表达式,并且传递给解释器,我们这样修复解析行


// lox/Lox.java, in run(), replace 1 line

 Parser parser = new Parser(tokens);
    List<Stmt> statements = parser.parse();

    // Stop if there was a syntax error.


And then replace the call to the interpreter with this:

然后,解释器的调用,替换为


// lox/Lox.java, in run(), replace 1 line

    if (hadError) return;

    interpreter.interpret(statements);
  }
  

Basically just plumbing the new syntax through. OK, fire up the interpreter and give it a try. At this point, it’s worth sketching out a little Lox program in a text file to run as a script. Something like:

基本上,就是通过管道传递新语法。好的,我们开始启动翻译,并且开始运行。此时,我们可以在文件中构造一个简单的程序,当作Lox脚本执行。


print "one";
print true;
print 2 + 1;

It almost looks like a real program! Note that the REPL, too, now requires you to enter a full statement instead of a simple expression. Don’t forget your semicolons.

上面的看起来像是一个真是的程序了,请注意,REPL 现在也要求我们输入完整的语句,而不是简单的表达式,不要忘记分号

二、Global Variables

全局变量

Now that we have statements, we can start working on state. Before we get into all of the complexity of lexical scoping, we’ll start off with the easiest kind of variables—globals. We need two new constructs.

既然现在我们已经有了语句,我们可以研究状态了,在我们了解词汇范围的复杂性之前,我们将从最简单的全局变量开始。我们需要两个新的构造。

  1. A variable declaration statement brings a new variable into the world.

    一个变量声明语句,为世界带来了一个新的变量

    var beverage = "espresso";
    

    This creates a new binding that associates a name (here “beverage”) with a value (here, the string "espresso").

    上面的声明语句,将会创建一个新的绑定,绑定了一个名称(beverage) 和 对应的值( espresso)之间的关系

  2. Once that’s done, a variable expression accesses that binding. When the identifier “beverage” is used as an expression, it looks up the value bound to that name and returns it.

    完成后,变量表达式将访问该绑定,当标识符 beverage 用作表达式时候,它会查找绑定到该名称的值,并且返回该值。

    
    print beverage; // "espresso".
    
    

Later, we’ll add assignment and block scope, but that’s enough to get moving.

稍后,我们将介绍赋值和块范围,但是现在足够我们开始了。

Global state gets a bad rap. Sure, lots of global state—especially mutable state—makes it hard to maintain large programs. It’s good software engineering to minimize how much you use.

But when you’re slapping together a simple programming language or, heck, even learning your first language, the flat simplicity of global variables helps. My first language was BASIC and, though I outgrew it eventually, it was nice that I didn’t have to wrap my head around scoping rules before I could make a computer do fun stuff.

全局状态受到了不好的评价,当然,许多全局变量——特别是可变变量,使得维护大型程序变得困难。这是一个很好的软件工程,可以最大限度的减少你的使用量。

但是,当你拼凑一门简单的编程语言,或者,当你开始学习第一门编程语言时候,全局变量的简单性将会提供帮助。我的第一门语言是 BASIC,尽管最终,我没有用到它,但在我可以让计算机做很多有趣的事情之前,不必让自己去记住复杂的范围规则,这很好。

2.1 Variable syntax

变量语法

As before, we’ll work through the implementation from front to back, starting with the syntax. Variable declarations are statements, but they are different from other statements, and we’re going to split the statement grammar in two to handle them. That’s because the grammar restricts where some kinds of statements are allowed.

和之前一样,我们将从语法开始,从前到后完成实现。变量声明是语句,但是它们和其他语句不同,我们将语句分为两部分,来处理它们。这是因为语法限制了,允许某些类型的语句的位置。

The clauses in control flow statements—think the then and else branches of an if statement or the body of a while—are each a single statement. But that statement is not allowed to be one that declares a name. This is OK:


if (monday) print "Ugh, already?";

But this is not:

if (monday) var beverage = "espresso";

控制流语句中的子句认为if 语句的then、else分支,while语句的主体,都是单个语句。但是,这个语句不允许是声明语句。举例:

We could allow the latter, but it’s confusing. What is the scope of that beverage variable? Does it persist after the if statement? If so, what is its value on days other than Monday? Does the variable exist at all on those days?

Code like this is weird, so C, Java, and friends all disallow it. It’s as if there are two levels of “precedence” for statements. Some places where a statement is allowed—like inside a block or at the top level—allow any kind of statement, including declarations. Others allow only the “higher” precedence statements that don’t declare names.

我们Lox可以允许后面的程序,但是这会令人困惑。beverage 变量的作用范围是什么?它在 if语句之外是否持续存在,如果存在,beverage 变量的值是什么?这个变量 beverage 是否会一直存在?

这样的代码会非常奇怪,所以,C/Java/friends 等语言,都不允许这样写。这好像是说,语句存在两个优先级。某些允许语句的地方——例如,块内或者顶层,允许任何类型的语句,包括声明语句。其他地方,只允许出现更高级别的语句,不包括声明。

In this analogy, block statements work sort of like parentheses do for expressions. A block is itself in the “higher” precedence level and can be used anywhere, like in the clauses of an if statement. But the statements it contains can be lower precedence. You’re allowed to declare variables and other names inside the block. The curlies let you escape back into the full statement grammar from a place where only some statements are allowed.

在上面的类比中,块语句好像是表达式中的小括号,块本身处于更高的优先级,可以在任何地方使用,例如:if语句的子句中。但是,块包含的语句可以具有更低的优先级,我们可以在块内,声明变量和其他名称,游标,允许我们从只允许某些语句的地方,返回到完整的语句语法。

To accommodate the distinction, we add another rule for kinds of statements that declare names

为了适应这种规则,我们为声明语句添加了另外一条语法规则


program       ——> declaration* EOF;

declaration   ——> varDecl
	          |   statement;
			  
statement     ——> exprStmt;
	          |   printStmt;
			  

Declaration statements go under the new declaration rule. Right now, it’s only variables, but later it will include functions and classes. Any place where a declaration is allowed also allows non-declaring statements, so the declaration rule falls through to statement. Obviously, you can declare stuff at the top level of a script, so program routes to the new rule.

声明语句,符合最新的声明语法规则,现在,我们的声明语句,只是变量。但是,稍后,它将包含函数和类,任何允许声明语句的地方,也同时允许其他语句,因此,声明规则,也适应于其他语句。显然,我们可以在脚本的顶层声明内容,因此,程序将路由到新规则。

The rule for declaring a variable looks like:

声明变量的规则如下

varDecl      ——> "var" IDENTIFIER ( "=" expression ) ? ";" ; 

Like most statements, it starts with a leading keyword. In this case, var. Then an identifier token for the name of the variable being declared, followed by an optional initializer expression. Finally, we put a bow on it with the semicolon

和大多数的语句一样,它以一个前导关键字开始,在本例中,关键字是 var,然后,是声明的变量的名称,即一个标识符 token,最后,跟随着一个可选的初始值设定项表达式。别忘了语句的最后是一个分号

To access a variable, we define a new kind of primary expression.

为了访问变量,我们定义一个新的主表达式


primary           ——> "true" | "false" | "nil"
                  |   NUMBER | STRING
				  | "(" expression ")"
				  | IDENTIFIER;

That IDENTIFIER clause matches a single identifier token, which is understood to be the name of the variable being accessed.

IDENTIFIER 子句,匹配单个标识符token,该token可以被理解为正在访问的变量的名称。

These new grammar rules get their corresponding syntax trees. Over in the AST generator, we add a new statement node for a variable declaration.

这些新的语法规则,得到了相应的语法树。在AST生成器中,我们为变量声明添加了一个新的语句节点


// tool/GenerateAst.java, in main(), add “,” to previous line

   "Expression : Expr expression",
      "Print      : Expr expression",
      "Var        : Token name, Expr initializer"
    ));
	

It stores the name token so we know what it’s declaring, along with the initializer expression. (If there isn’t an initializer, that field is null.)

Var变量中保存名称token,以便我们知道它在声明什么,以及初始值设定项表达式(如果没有初始值设定项,则该字段为空)

Then we add an expression node for accessing a variable.

然后,我们还要创建一个用于访问变量的表达式节点


// tool/GenerateAst.java, in main(), add “,” to previous line

     "Literal  : Object value",
      "Unary    : Token operator, Expr right",
      "Variable : Token name"
    ));
	

It’s simply a wrapper around the token for the variable name. That’s it. As always, don’t forget to run the AST generator script so that you get updated “Expr.java” and “Stmt.java” files.

它只是变量名 token的包装器,就是这样,一如既往,不要忘记运行AST生成器脚本,以便获得更新的 "Expr.java" 和 "Stmt.java" 文件

2.2 解析变量

Before we parse variable statements, we need to shift around some code to make room for the new declaration rule in the grammar. The top level of a program is now a list of declarations, so the entrypoint method to the parser changes.

在解析语句之前,我们需要转换一下代码,为语法中的新声明规则腾出空间,程序的顶部是声明语句,因此解析器的入口点发生了变化。


// lox/Parser.java, in parse(), replace 1 line

  List<Stmt> parse() {
    List<Stmt> statements = new ArrayList<>();
    while (!isAtEnd()) {
      statements.add(declaration());
    }

    return statements; 
  }
  

That calls this new method:

它调用了新方法,


// lox/Parser.java, add after expression()

  private Stmt declaration() {
    try {
      if (match(VAR)) return varDeclaration();

      return statement();
    } catch (ParseError error) {
      synchronize();
      return null;
    }
  }
  

Hey, do you remember way back in that earlier chapter when we put the infrastructure in place to do error recovery? We are finally ready to hook that up.

This declaration() method is the method we call repeatedly when parsing a series of statements in a block or a script, so it’s the right place to synchronize when the parser goes into panic mode. The whole body of this method is wrapped in a try block to catch the exception thrown when the parser begins error recovery. This gets it back to trying to parse the beginning of the next statement or declaration.

The real parsing happens inside the try block. First, it looks to see if we’re at a variable declaration by looking for the leading var keyword. If not, it falls through to the existing statement() method that parses print and expression statements.

Remember how statement() tries to parse an expression statement if no other statement matches? And expression() reports a syntax error if it can’t parse an expression at the current token? That chain of calls ensures we report an error if a valid declaration or statement isn’t parsed.

嗨,你还记得我们在上一章中的执行错误恢复代码吗?我们终于准备好了

这个 declaration() 方法是我们解析代码块或者脚本中的一些列语句时候,会反复调用的方法,因此,当解析器进入到紧急模式后,它是同步进行的一个位置,这个方法的整个主体被包装在一个try块中,以捕获解析器开始错误恢复时候,抛出的异常。而这样,我们又回到了尝试解析下一个语句或者声明的开头。

真正的解析发生在try内,首先,它通过查找前导var关键字,来查看我们是否在变量声明中。如果没有发现 var,将会执行现有的statement() 方法,解析print语句和表达式语句。

还记得,如果没有其他语句匹配,statement() 方法,如何尝试解析表达式语句吗?如果 expression() 方法无法在当前的token处,解析表达式,则会报告语法错误?如果未解析有效的声明或者语句,该调用链将保证我们报错错误。

When the parser matches a var token, it branches to:

当匹配到 Var token时候,将会执行



// lox/Parser.java, add after printStatement()

  private Stmt varDeclaration() {
    Token name = consume(IDENTIFIER, "Expect variable name.");

    Expr initializer = null;
    if (match(EQUAL)) {
      initializer = expression();
    }

    consume(SEMICOLON, "Expect ';' after variable declaration.");
    return new Stmt.Var(name, initializer);
  }
  

As always, the recursive descent code follows the grammar rule. The parser has already matched the var token, so next it requires and consumes an identifier token for the variable name.

Then, if it sees an = token, it knows there is an initializer expression and parses it. Otherwise, it leaves the initializer null. Finally, it consumes the required semicolon at the end of the statement. All this gets wrapped in a Stmt.Var syntax tree node and we’re groovy.

与之前一样,递归下降代码遵循语法规则。解析器已经匹配了var关键字,因此接下来,需要并且会获取 var后面的变量名。

然后,如果接下来得到一个 = token,我们知道这是一个初始值赋值表达式,并且解析这个语句。否则,我们会定义初始项为null,最后,我们在语句的结尾需要包含 ; 。所有这些都被封装成一个 Stmt.Var 语法树,我们是有趣的

Parsing a variable expression is even easier. In primary(), we look for an identifier token.

解析一个变量表达式更加容易。在primary() 方法里面,我们会查看单个标识符


// lox/Parser.java, in primary()

     return new Expr.Literal(previous().literal);
    }

    if (match(IDENTIFIER)) {
      return new Expr.Variable(previous());
    }

    if (match(LEFT_PAREN)) {
	

That gives us a working front end for declaring and using variables. All that’s left is to feed it into the interpreter. Before we get to that, we need to talk about where variables live in memory.

这为我们声明和使用一个变量提供了一个有效的前端,剩下的就是让它继续解析。在开始之前,我们需要讨论变量在内存中的位置

三、Environments

环境

The bindings that associate variables to values need to be stored somewhere. Ever since the Lisp folks invented parentheses, this data structure has been called an environment.

将变量和值关联的绑定关系,需要保存在某个位置,自从lisp发明了括号,这种数据结构称为环境

environment

I like to imagine the environment literally, as a sylvan wonderland where variables and values frolic.

我喜欢把环境想象成一个森林仙境,变量和变量值在其中嬉戏。

You can think of it like a map where the keys are variable names and the values are the variable’s, uh, values. In fact, that’s how we’ll implement it in Java. We could stuff that map and the code to manage it right into Interpreter, but since it forms a nicely delineated concept, we’ll pull it out into its own class.

我们可以把它想象成一个映射,其中key是变量名,value是变量名对应的值。事实上,这就是我们用Java实现的方式。我们可以将该映射和管理它的代码,填充到解释器中,但是,由于它形成了一个很好的概念,所以我们将把它放到自己的类中。

Java calls them maps or hashmaps. Other languages call them hash tables, dictionaries (Python and C#), hashes (Ruby and Perl), tables (Lua), or associative arrays (PHP). Way back when, they were known as scatter tables.

Java中称为 map 或者 hashmap, 其他语言称它是 hash table, dictionaries(Python 和 C#),hash(Ruby 和 Perl),table(Lua) , associative arrays(PHP), 而很久以前,它被称为 散点表

Start a new file and add:

添加一个新的文件


// lox/Environment.java, create new file

package com.craftinginterpreters.lox;

import java.util.HashMap;
import java.util.Map;

class Environment {
  private final Map<String, Object> values = new HashMap<>();
}

There’s a Java Map in there to store the bindings. It uses bare strings for the keys, not tokens. A token represents a unit of code at a specific place in the source text, but when it comes to looking up variables, all identifier tokens with the same name should refer to the same variable (ignoring scope for now). Using the raw string ensures all of those tokens refer to the same map key.

我们创建了一个java Map来存储绑定关系,它使用字符串当作key,而不是token. token表示源文件特定位置的代码单元,但是在查找变量时候,所有具有形同名称的标识符token,都已经引用相同的变量(暂时忽略范围). 使用原始字符串,可以保证这些token都对应着同一个map key.

There are two operations we need to support. First, a variable definition binds a new name to a value.

我们还需要支持两种操作,首先,变量定义将新名称绑定到一个具体的值


lox/Environment.java, in class Environment

  void define(String name, Object value) {
    values.put(name, value);
  }
  

Not exactly brain surgery, but we have made one interesting semantic choice. When we add the key to the map, we don’t check to see if it’s already present. That means that this program works:

不是一个脑部手术,但是我们做了一个有趣的语义选择。当我们将key添加到map中后,我们不会检查它是否已经存在,这意味着,这个程序可以是


var a = "before";
print a; // "before".
var a = "after";
print a; // "after".

A variable statement doesn’t just define a new variable, it can also be used to redefine an existing variable. We could choose to make this an error instead. The user may not intend to redefine an existing variable. (If they did mean to, they probably would have used assignment, not var.) Making redefinition an error would help them find that bug.

However, doing so interacts poorly with the REPL. In the middle of a REPL session, it’s nice to not have to mentally track which variables you’ve already defined. We could allow redefinition in the REPL but not in scripts, but then users would have to learn two sets of rules, and code copied and pasted from one form to the other might not work.

变量语句不仅仅定义一个新变量,它还可以用于重新定义一个现有变量。我们可以选择将其设置为错误,用户可能不打算重新定义现有变量,(如果用户真的想要这样做,可能会使用赋值,而不是变量),重新定义错误,将帮助用户找到错误。

然而,这样做将会使得 REPL 的交互很差,在一个REPL的会话中,最好不要跟踪已经定义的变量。我们可以允许在 REPL中重新定义,但是不允许在脚本中重新定义。但是,用户必须学习两组规则,代码从一个地方复制到其他地方,可能不起作用。

My rule about variables and scoping is, “When in doubt, do what Scheme does”. The Scheme folks have probably spent more time thinking about variable scope than we ever will—one of the main goals of Scheme was to introduce lexical scoping to the world—so it’s hard to go wrong if you follow in their footsteps.

Scheme allows redefining variables at the top level.

我关于变量和作用范围的规则是,当有疑问时候,做Scheme做的事。Scheme可能花费了更多的时间来思考变量范围,Scheme的主要目标是向全世界介绍词汇范围,所以如果你追随他们的脚步,将很难出错

Scheme允许在顶层重新定义变量

So, to keep the two modes consistent, we’ll allow it—at least for global variables. Once a variable exists, we need a way to look it up.

所以,为了保持两种模式的一致性,我们至少允许全局变量使用这种模式,一旦变量已经存在,我们就需要一种方法来查找它。


// lox/Environment.java, in class Environment

class Environment {
  private final Map<String, Object> values = new HashMap<>();

  Object get(Token name) {
    if (values.containsKey(name.lexeme)) {
      return values.get(name.lexeme);
    }

    throw new RuntimeError(name,
        "Undefined variable '" + name.lexeme + "'.");
  }

  void define(String name, Object value) {
  

This is a little more semantically interesting. If the variable is found, it simply returns the value bound to it. But what if it’s not? Again, we have a choice:

  • Make it a syntax error.

  • Make it a runtime error.

  • Allow it and return some default value like nil.

这在语义上更有意思,如果找到了变量,我们只会返回变量绑定的值,如果找不到变量,我们该如何选择呢?

  • 判断为语法错误

  • 触发运行时错误

  • 允许这种场景,并且返回一些默认初始值,例如: nil

Lox is pretty lax, but the last option is a little too permissive to me. Making it a syntax error—a compile-time error—seems like a smart choice. Using an undefined variable is a bug, and the sooner you detect the mistake, the better.

The problem is that using a variable isn’t the same as referring to it. You can refer to a variable in a chunk of code without immediately evaluating it if that chunk of code is wrapped inside a function. If we make it a static error to mention a variable before it’s been declared, it becomes much harder to define recursive functions.

Lox语言非常宽松,但是最后一个选项,对于我们来说,过于宽松了,将其作为一个语法错误(编译时错误)看起来是一个明智的选择,使用未定义的变量是一个错误,越早发现越好。

问题是,使用变量和引用变量不同,如果代码块封装在函数中,则可以引用代码块中的变量,而无需立即对其求值;如果我们在声明变量之前使用它,是一个静态错误,那么定义递归函数,就会变得更加困难。

We could accommodate single recursion—a function that calls itself—by declaring the function’s own name before we examine its body. But that doesn’t help with mutually recursive procedures that call each other. Consider:

我们可以容纳单个递归,即在检查函数体之前,声明函数自身的名称,来调用自身的函数,但这对相互调用的递归过程没有帮助,例如:


fun isOdd(n) {
  if (n == 0) return false;
  return isEven(n - 1);
}

fun isEven(n) {
  if (n == 0) return true;
  return isOdd(n - 1);
}


The isEven() function isn’t defined by the time we are looking at the body of isOdd() where it’s called. If we swap the order of the two functions, then isOdd() isn’t defined when we’re looking at isEven()’s body.

isEven() 函数在isOdd() 运行到对应代码时候,还没定义,如果我们交换这两个函数的位置,那么,当我们运行 isEven() 函数到isOdd() 函数时候,isOdd() 函数同样没有定义。

Granted, this is probably not the most efficient way to tell if a number is even or odd (not to mention the bad things that happen if you pass a non-integer or negative number to them). Bear with me.

当然,上面的方法不是判断一个数字奇偶性的最有效的方法,(更不用说,如果我们传递一个非整数或者负数会发生什么),先忍受这个代码吧

Some statically typed languages like Java and C# solve this by specifying that the top level of a program isn’t a sequence of imperative statements. Instead, a program is a set of declarations which all come into being simultaneously. The implementation declares all of the names before looking at the bodies of any of the functions.

一些静态语言(例如: Java, C#) ,通过指定程序的顶层不是命令语句序列来解决这个问题,相反的,顶层的程序是一组同时出现的声明,在查看任何函数主体之前,先声明所有函数名称。

Older languages like C and Pascal don’t work like this. Instead, they force you to add explicit forward declarations to declare a name before it’s fully defined. That was a concession to the limited computing power at the time. They wanted to be able to compile a source file in one single pass through the text, so those compilers couldn’t gather up all of the declarations first before processing function bodies.

像是 C/Pascal 这样的语言无法这样工作,相反,它会要求使用者,添加显式的声明,在完全定义之前。这是对当时,有限的计算能力的妥协。它们希望能够在文本的一次传递中编译一个源文件,因此这些编译器,无法在处理函数体之前,收集到所有声明。

Since making it a static error makes recursive declarations too difficult, we’ll defer the error to runtime. It’s OK to refer to a variable before it’s defined as long as you don’t evaluate the reference. That lets the program for even and odd numbers work, but you’d get a runtime error in:

由于将其设置为静态语法错误,会使得递归函数的定义非常困难。因此,我们将错误延迟到运行时,只要不计算引用,在定义变量之前,引用变量是允许的,这使得上面的判断数字奇偶性方法可以运行。但是,我们将得到一个运行时错误,如果:


print a;
var a = "too late!";

As with type errors in the expression evaluation code, we report a runtime error by throwing an exception. The exception contains the variable’s token so we can tell the user where in their code they messed up.

与表达式求值代码中的类型错误一样,我们通过抛出异常来报告运行错误,异常包含了变量的token,因此我们可以告诉用户他们在代码哪里出错了

3.1 Interpreting global variables

解释全局变量

The Interpreter class gets an instance of the new Environment class.

解释器类型,获取一个新的Env类实例


// lox/Interpreter.java, in class Interpreter

class Interpreter implements Expr.Visitor<Object>,
                             Stmt.Visitor<Void> {
  private Environment environment = new Environment();

  void interpret(List<Stmt> statements) {


We store it as a field directly in Interpreter so that the variables stay in memory as long as the interpreter is still running.

我们在解析器中,把这个实例作为字段,直接存储起来,这样,只要解释器仍然在运行,变量就可以留在内存中。

We have two new syntax trees, so that’s two new visit methods. The first is for declaration statements.

我们有两个新的语法树,这两个新的访问方法,第一个是声明语句。


// lox/Interpreter.java, add after visitPrintStmt()

  @Override
  public Void visitVarStmt(Stmt.Var stmt) {
    Object value = null;
    if (stmt.initializer != null) {
      value = evaluate(stmt.initializer);
    }

    environment.define(stmt.name.lexeme, value);
    return null;
  }

If the variable has an initializer, we evaluate it. If not, we have another choice to make. We could have made this a syntax error in the parser by requiring an initializer. Most languages don’t, though, so it feels a little harsh to do so in Lox.

We could make it a runtime error. We’d let you define an uninitialized variable, but if you accessed it before assigning to it, a runtime error would occur. It’s not a bad idea, but most dynamically typed languages don’t do that. Instead, we’ll keep it simple and say that Lox sets a variable to nil if it isn’t explicitly initialized.

如果变量有一个初始值,我们就对其求值,如果没有初始值,我们还有一个选择,可能需要一个初始值设定项,从而使这成为解释器中的语法错误,然而,大多数语言都不这样做,所以,Lox这样做,有些苛刻。

我们可能会让它变为运行时错误,我们允许用户定义一个未初始化的变量,但是,如果代码在分配值之前,访问了该变量,则会报错,这不是一个坏主意,但是大多数的动态类型语言不会这样做,相反,我们将保持简单。如果没有显式初始化,则Lox将变量设置为nil


var a;
print a; // "nil".

Thus, if there isn’t an initializer, we set the value to null, which is the Java representation of Lox’s nil value. Then we tell the environment to bind the variable to that value.

因此,如果没有初始值设定项,我们将值设置为null,这是Lox语言中的nil的Java表示,然后,我们将Env实例中,对应变量的值设置为nil

Next, we evaluate a variable expression.

接下来,我们计算一个变量表达式


// lox/Interpreter.java, add after visitUnaryExpr()

  @Override
  public Object visitVariableExpr(Expr.Variable expr) {
    return environment.get(expr.name);
  }
  

This simply forwards to the environment which does the heavy lifting to make sure the variable is defined. With that, we’ve got rudimentary variables working. Try this out:

这只是简单的转发到Env,env进行计算,确保定义了变量,这样,我们就有了基本的变量,


var a = 1;
var b = 2;
print a + b;

We can’t reuse code yet, but we can start to build up programs that reuse data.

我们还不能复用代码,但是,我们可以开始构建复用数据的程序。

四、Assignment

赋值

It’s possible to create a language that has variables but does not let you reassign—or mutate—them. Haskell is one example. SML supports only mutable references and arrays—variables cannot be reassigned. Rust steers you away from mutation by requiring a mut modifier to enable assignment.

Mutating a variable is a side effect and, as the name suggests, some language folks think side effects are dirty or inelegant. Code should be pure math that produces values—crystalline, unchanging ones—like an act of divine creation. Not some grubby automaton that beats blobs of data into shape, one imperative grunt at a time.

可以创建一种语言,具有变量,但是不允许重新赋值,或者修改变量值,Haskell就是这样的一门语言。SML仅支持可变引用,并且不能重新赋值数组变量。Rust通过 mut修改器,重新赋值,引导我们远离突变。

一个变量的突变是一个副作用,正如名字所暗示的,一些语言学家认为副作用是不好的,代码应该是纯粹的数学,它可以产生结晶的、不变的价值观,就像是神创造的行为一样,而不是一个肮脏的机器人,一次只能发出一声命令性的咕噜声音,把一堆堆数据打成形状。

I find it delightful that the same group of people who pride themselves on dispassionate logic are also the ones who can’t resist emotionally loaded terms for their work: “pure”, “side effect”, “lazy”, “persistent”, “first-class”, “higher-order”.

我觉得令人兴奋的是,同样一群以冷静的逻辑为荣的人,也无法抗拒他们工作中充满情感的术语,纯粹、副作用、懒惰、执着、一流、高阶。

Lox is not so austere. Lox is an imperative language, and mutation comes with the territory. Adding support for assignment doesn’t require much work. Global variables already support redefinition, so most of the machinery is there now. Mainly, we’re missing an explicit assignment notation.

Lox并没有那么严肃,Lox是一种命令式语言,可变性伴随着作用域而来,Lox支持赋值不需要做很多的工作。全局变量已经支持重新定义,所以现在大多数的机制已经存在,主要问题是,我们现在缺少一个明确的赋值符号。

4.1 Assignment syntax

赋值语法

That little = syntax is more complex than it might seem. Like most C-derived languages, assignment is an expression and not a statement. As in C, it is the lowest precedence expression form. That means the rule slots between expression and equality (the next lowest precedence expression).

= 符号的语法比看起来更加复杂,与大多数的C派生语言一样,赋值是一个表达式而不是一个语句,和C中一样,它是最低优先级的表达式形式,这意味着规则介于表达式和相等之间。

In some other languages, like Pascal, Python, and Go, assignment is a statement.

在一些其他语言中,像是 Pascal, Python, Go, 赋值是一个语句


expression     → assignment ;
assignment     → IDENTIFIER "=" assignment
               | equality ;

This says an assignment is either an identifier followed by an = and an expression for the value, or an equality (and thus any other) expression. Later, assignment will get more complex when we add property setters on objects, like:

上面的语法规则意味着,赋值表达式是 一个标识符跟随者一个= 符号 和一个值表达式,或者是一个等式表达式,稍后,我们在对象上,添加属性设置,赋值将变得更加复杂,例如:


instance.field = "value";

The easy part is adding the new syntax tree node.

最简单的方式是,添加新的语法树节点。


// tool/GenerateAst.java, in main()

 defineAst(outputDir, "Expr", Arrays.asList(
      "Assign   : Token name, Expr value",
      "Binary   : Expr left, Token operator, Expr right",
	  

It has a token for the variable being assigned to, and an expression for the new value. After you run the AstGenerator to get the new Expr.Assign class, swap out the body of the parser’s existing expression() method to match the updated rule.

它有一个被赋值变量的token,和一个新值的表达式,在我们运行AstGenerator后,将会得到一个新的Expr.Assign 类,交换解析器现有expression() 方法,匹配更新后的规则。


// lox/Parser.java, in expression(), replace 1 line

 private Expr expression() {
    return assignment();
  }


Here is where it gets tricky. A single token lookahead recursive descent parser can’t see far enough to tell that it’s parsing an assignment until after it has gone through the left-hand side and stumbled onto the =. You might wonder why it even needs to. After all, we don’t know we’re parsing a + expression until after we’ve finished parsing the left operand.

The difference is that the left-hand side of an assignment isn’t an expression that evaluates to a value. It’s a sort of pseudo-expression that evaluates to a “thing” you can assign to. Consider:

这就是棘手的地方,一个单一的token,前瞻递归下降解析器看不到足够远的地方,直到它通过左侧并且偶然发现= 符号之后,才能判断出它正在解析赋值。你可能想知道它为什么需要这样做。毕竟,我们在分析过左操作数后,才能知道我们正在解析一个+ 表达式

不同之处是,赋值表达式的左侧,不是计算为值的表达式,这是一种伪表达式,其计算结果可以是赋值表达式的赋值对象,请考虑:



var a = "before";
a = "value";


On the second line, we don’t evaluate a (which would return the string “before”). We figure out what variable a refers to so we know where to store the right-hand side expression’s value. The classic terms for these two constructs are l-value and r-value. All of the expressions that we’ve seen so far that produce values are r-values. An l-value “evaluates” to a storage location that you can assign into.

上面代码的第二行,我们不计算a 的值(此时,a变量将会返回字符串 before), 我们弄清楚变量a 所指的是什么,这样我们就知道在哪里,存储右侧表达式的值。这两个构造的经典术语是,左值和右值,到目前为止,我们遇到的所有产生值的表达式都是右值,左值的求值,是可以存储右值的位置。

In fact, the names come from assignment expressions: l-values appear on the left side of the = in an assignment, and r-values on the right.

事实上,这些名称来自赋值表达式,在赋值表达式中,左值出现在左侧,右值出现在右侧。

We want the syntax tree to reflect that an l-value isn’t evaluated like a normal expression. That’s why the Expr.Assign node has a Token for the left-hand side, not an Expr. The problem is that the parser doesn’t know it’s parsing an l-value until it hits the =. In a complex l-value, that may occur many tokens later.

我们希望语法树反应的左值不像是普通表达式那样的求值,这就是 Expr.Assign 节点在左侧有一个token,而不是 Expr,问题是,解析器在解析到 = 符号之前,不知道它正在解析左值,在复杂的左值场景,这可能会在以后的许多token中出现。


makeList().head.next = node;

Since the receiver of a field assignment can be any expression, and expressions can be as long as you want to make them, it may take an unbounded number of tokens of lookahead to find the =.

由于字段赋值的接收者可以是任何表达式,并且表达式可以是任意长度,因此可能需要无限数量的 token后,才能发现 = 符号

We have only a single token of lookahead, so what do we do? We use a little trick, and it looks like this:

我们只有一个前瞻性的 token,那么我们如何实现呢,这里我们使用了一个小技巧,看起来像这样


// lox/Parser.java, add after expressionStatement()

  private Expr assignment() {
    Expr expr = equality();

    if (match(EQUAL)) {
      Token equals = previous();
      Expr value = assignment();

      if (expr instanceof Expr.Variable) {
        Token name = ((Expr.Variable)expr).name;
        return new Expr.Assign(name, value);
      }

      error(equals, "Invalid assignment target."); 
    }

    return expr;
  }
  

Most of the code for parsing an assignment expression looks similar to that of the other binary operators like +. We parse the left-hand side, which can be any expression of higher precedence. If we find an =, we parse the right-hand side and then wrap it all up in an assignment expression tree node.

用于解析赋值表达式的代码 看起来和大部分解析二元运算符的代码相似,我们解析左侧,它可以是任何优先级更高的表达式,如果找到 = ,我们解析右侧,然后将其全部包装在赋值表达式树节点中。

We report an error if the left-hand side isn’t a valid assignment target, but we don’t throw it because the parser isn’t in a confused state where we need to go into panic mode and synchronize.

如果左侧不是有效的赋值目标,我们会产生一个错误,但是我们不会抛出错误,因此解析器没有处于混乱状态,我们需要进入紧急模式并进行同步。

One slight difference from binary operators is that we don’t loop to build up a sequence of the same operator. Since assignment is right-associative, we instead recursively call assignment() to parse the right-hand side.

The trick is that right before we create the assignment expression node, we look at the left-hand side expression and figure out what kind of assignment target it is. We convert the r-value expression node into an l-value representation.

与二进制运算符的一个细微区别是,我们不需要循环来构建同一个运算符的序列,由于赋值是右关联的,因此我们会递归调用 assign() 来解析右侧。

技巧是,在创建赋值表达式节点前,我们查看左侧的表达式,找出它是什么类型的赋值目标,我们将右值表达式节点,转换为左值表示。

This conversion works because it turns out that every valid assignment target happens to also be valid syntax as a normal expression. Consider a complex field assignment like:

这种转换之所以有效,是因为它证明了每一个有效的赋值目标都是作为普通表达式的有效语法,考虑一个复杂的字段赋值,例如:


newPoint(x + 2, 0).y = 3;

The left-hand side of that assignment could also work as a valid expression.

上面的赋值表达式的左侧也可以是一个有效的表达式

newPoint(x + 2, 0).y;

The first example sets the field, the second gets it.

第一个示例,设置字段值,第二个示例获取字段值

This means we can parse the left-hand side as if it were an expression and then after the fact produce a syntax tree that turns it into an assignment target. If the left-hand side expression isn’t a valid assignment target, we fail with a syntax error. That ensures we report an error on code like this:

这意味着我们可以像解析表达式一样,解析左值,然后在事实发生后,生成一个语法树,将其转换为赋值目标,如果左侧的表达式不是有效的赋值目标,我们将报错,产生语法错误,这样,确保了如下代码会报错


a + b = c;

Right now, the only valid target is a simple variable expression, but we’ll add fields later. The end result of this trick is an assignment expression tree node that knows what it is assigning to and has an expression subtree for the value being assigned. All with only a single token of lookahead and no backtracking.

现在,唯一有效的目标是一个简单的变量表达式,但是我们稍后将添加字段,这个技巧的最终结果是一个赋值表达式树节点,它知道要赋值的对象,并且有一个赋值的表达式子树,所有这些都是一个前瞻token,没有回溯

You can still use this trick even if there are assignment targets that are not valid expressions. Define a cover grammar, a looser grammar that accepts all of the valid expression and assignment target syntaxes. When you hit an =, report an error if the left-hand side isn’t within the valid assignment target grammar. Conversely, if you don’t hit an =, report an error if the left-hand side isn’t a valid expression.

即使存在无效表达式的赋值目标,也可以使用此技巧,定义一个覆盖语法,一个接受所有有效表达式和赋值目标语法的较宽松的语法,当我们遇到 = 时候,如果左侧不是一个有效的赋值目标时候,则报告错误,相反,如果没有遇到 = , 左侧不是有效的表达式,则报错错误

Way back in the parsing chapter, I said we represent parenthesized expressions in the syntax tree because we’ll need them later. This is why. We need to be able to distinguish these cases:

在解析那一章,说过在语法树中表示带括号的表达式,因为我们稍后会需要它,这就是为什么,我们需要能够区分这些情况

a = 3; // ok

(a) = 3; // error

4.2 Assignment semantics

We have a new syntax tree node, so our interpreter gets a new visit method.

我们有一个新的语法树节点,所以我们的解释器有一个新的访问方法。


// lox/Interpreter.java, add after visitVarStmt()

  @Override
  public Object visitAssignExpr(Expr.Assign expr) {
    Object value = evaluate(expr.value);
    environment.assign(expr.name, value);
    return value;
  }
  

For obvious reasons, it’s similar to variable declaration. It evaluates the right-hand side to get the value, then stores it in the named variable. Instead of using define() on Environment, it calls this new method:

由于明显的原因,它类似于变量的声明,它计算右侧的值,然后将其存储在命名变量中,它不是在environment上使用define() 方法,而是调用新的方法


// lox/Environment.java, add after get()

  void assign(Token name, Object value) {
    if (values.containsKey(name.lexeme)) {
      values.put(name.lexeme, value);
      return;
    }

    throw new RuntimeError(name,
        "Undefined variable '" + name.lexeme + "'.");
  }
  

The key difference between assignment and definition is that assignment is not allowed to create a new variable. In terms of our implementation, that means it’s a runtime error if the key doesn’t already exist in the environment’s variable map.

赋值和定义之间的关键区别是,不允许赋值创建新变量,就我们的实现而言,这意味着如果 environment 的values 中如果不存在键,则这是一个运行时错误。

The last thing the visit() method does is return the assigned value. That’s because assignment is an expression that can be nested inside other expressions, like so:

visit() 方法所做的最后一件事情是返回指定的值,这是因为赋值是一个可以嵌套在其他表达式中的表达式,如下所示:


var a = 1;
print a = 2; // "2".

Our interpreter can now create, read, and modify variables. It’s about as sophisticated as early BASICs. Global variables are simple, but writing a large program when any two chunks of code can accidentally step on each other’s state is no fun. We want local variables, which means it’s time for scope.

我们的解释器,现在可以创建、读取、修改变量,它和早期的BASIC一样复杂,全局变量很简单,但是当任意两个代码块可能意外的获取到彼此的状态时候,编写一个大型程序并不有趣。我们需要局部变量,这意味着现在是我们进入到作用域的时候了。

Unlike Python and Ruby, Lox doesn’t do implicit variable declaration.

和Python,Ruby不一样,Lox不做隐式变量声明

Maybe a little better than that. Unlike some old BASICs, Lox can handle variable names longer than two characters.

也许比BASIC好一些,与一些旧的BASIC不同,Lox可以处理长度超过两个字符的变量名

五、Scope

作用域

A scope defines a region where a name maps to a certain entity. Multiple scopes enable the same name to refer to different things in different contexts. In my house, “Bob” usually refers to me. But maybe in your town you know a different Bob. Same name, but different dudes based on where you say it.

Lexical scope (or the less commonly heard static scope) is a specific style of scoping where the text of the program itself shows where a scope begins and ends. In Lox, as in most modern languages, variables are lexically scoped. When you see an expression that uses some variable, you can figure out which variable declaration it refers to just by statically reading the code.

一个作用域定义了一个区域,其中包含名称映射到特定实体。多个作用域可以使得,相同名称的变量可以在不同的上下文中引用不同的事物。例如:在我家,Bob 通常指的就是我,但也许在你的家乡,你还认识一个不同的Bob,相同的名字,但是根据不同的地方,可以指代不同的人。

词汇作用域(或者不太常见的静态作用域)是一种特定类型的作用域,程序文本本身显示作用域的开始和结束位置。和大多数的现代语言一样,在Lox中, 变量的作用域是词汇的作用域,当我们看到使用变量的某个表达式时候,我们可以通过静态读取代码来确定它引用的是哪个变量。

For example:

例如:


{
  var a = "first";
  print a; // "first".
}

{
  var a = "second";
  print a; // "second".
}

Here, we have two blocks with a variable a declared in each of them. You and I can tell just from looking at the code that the use of a in the first print statement refers to the first a, and the second one refers to the second.

这里,我们有两个代码块,每个块中都声明了变量a, 我们可以从代码中看到,第一个块中的print 是指first,第二个块中的print是指 second

blocks

“Lexical” comes from the Greek “lexikos” which means “related to words”. When we use it in programming languages, it usually means a thing you can figure out from source code itself without having to execute anything.

Lexical 来自于希腊语 Lexikos, 意思是与单词有关,当我们在编程语言中使用它时,它通常意味着你可以从源代码中找出,而不必执行任何东西。

Lexical scope came onto the scene with ALGOL. Earlier languages were often dynamically scoped. Computer scientists back then believed dynamic scope was faster to execute. Today, thanks to early Scheme hackers, we know that isn’t true. If anything, it’s the opposite.

ALGOL 语言中出现了词汇作用域概念,早期的语言,通常是动态范围的,当时的科学家认为动态范围执行速度更快,现在,多亏了早期的Scheme 黑客,我们直到这个结论不准确,如果要重新下结论,答案是动态作用域会更慢。

Dynamic scope for variables lives on in some corners. Emacs Lisp defaults to dynamic scope for variables. The binding macro in Clojure provides it. The widely disliked with statement in JavaScript turns properties on an object into dynamically scoped variables.

变量的动态作用域存在于某些角落,Emacs Lisp默认变量是动态作用域的,Clojure 通过绑定宏提供了它,JavaScript语言中广受欢迎的with语句将对象的属性转换为动态作用域的变量

This is in contrast to dynamic scope where you don’t know what a name refers to until you execute the code. Lox doesn’t have dynamically scoped variables, but methods and fields on objects are dynamically scoped.

这与动态作用域不同,在动态作用域中,在执行代码之前,我们不需要知道名称指代的是什么,Lox没有动态作用域变量,但是对象上的方法和字段是动态作用域的



class Saxophone {
  play() {
    print "Careless Whisper";
  }
}

class GolfClub {
  play() {
    print "Fore!";
  }
}

fun playIt(thing) {
  thing.play();
}

When playIt() calls thing.play(), we don’t know if we’re about to hear “Careless Whisper” or “Fore!” It depends on whether you pass a Saxophone or a GolfClub to the function, and we don’t know that until runtime.

当函数playIt() 调用thing.play() 时候,我们不知道返回值是 “Careless Whisper” 还是 “Fore!”,这取决于你是把 Saxophone 还是 GolfClub 传送给函数,而我们直到运行时候,才能知道实参信息

Scope and environments are close cousins. The former is the theoretical concept, and the latter is the machinery that implements it. As our interpreter works its way through code, syntax tree nodes that affect scope will change the environment. In a C-ish syntax like Lox’s, scope is controlled by curly-braced blocks. (That’s why we call it block scope.)

作用域和环境是相似概念,作用域是一个理论上的概念,环境是实现作用域的一个机制。当我们的解释器,紧随代码运行时候,作用域变化会导致语法树的节点的环境的变化。在类C语言,例如Lox中,作用域使用大括号控制(这也是我们称作用域为块范围的原因)


{
  var a = "in block";
}
print a; // Error! No more "a".

The beginning of a block introduces a new local scope, and that scope ends when execution passes the closing }. Any variables declared inside the block disappear.

块的开始引入了一个新的本地变量,当运行到右括号时候,作用域结束。块内声明的变量将会消失。

5.1 Nesting and shadowing

A first cut at implementing block scope might work like this:

  • As we visit each statement inside the block, keep track of any variables declared.

  • After the last statement is executed, tell the environment to delete all of those variables.

实现块范围的第一步可能是这样的,

  • 当我们访问块内的每一个语句时候,会跟踪声明的任何变量

  • 执行最后一条语句时候,告诉环境删除所有的变量

That would work for the previous example. But remember, one motivation for local scope is encapsulation—a block of code in one corner of the program shouldn’t interfere with some other block. Check this out:

这适用于前面的演示,但请注意,局部变量的一个动机是封装——程序某个部分的代码,不会影响另外一部分的代码,看看下面


// How loud?
var volume = 11;

// Silence.
volume = 0;

// Calculate size of 3x4x5 cuboid.
{
  var volume = 3 * 4 * 5;
  print volume;
}

Look at the block where we calculate the volume of the cuboid using a local declaration of volume. After the block exits, the interpreter will delete the global volume variable. That ain’t right. When we exit the block, we should remove any variables declared inside the block, but if there is a variable with the same name declared outside of the block, that’s a different variable. It shouldn’t get touched.

When a local variable has the same name as a variable in an enclosing scope, it shadows the outer one. Code inside the block can’t see it any more—it is hidden in the “shadow” cast by the inner one—but it’s still there.

When we enter a new block scope, we need to preserve variables defined in outer scopes so they are still around when we exit the inner block. We do that by defining a fresh environment for each block containing only the variables defined in that scope. When we exit the block, we discard its environment and restore the previous one.

We also need to handle enclosing variables that are not shadowed.

看看这个块,我们用本地变量 volume保存长方体体积计算值,块退出后,解释器将会删除全局变量volume, 这是不对的,当我们删除块时候,我们应该删除块内所有的变量,但是,如果在块外声明了一个同名变量后,那就是另外一个变量,它不应该被访问到

当局部变量和块中的变量同名,它会隐藏外部变量,块内的代码将无法看到外部变量,它隐藏在内部代码投射的阴影中,但是它仍然存在。

当我们进入到一个新的块作用域时候,我们需要保留在外部范围内声明的变量,这样当,我们退出内部块时候,这些外部变量仍然存在。我们通过为每一个块,提供一个environment,来实现。该环境仅包含在该作用域中定义的变量, 当我们退出代码块时候,我们将丢弃它的环境,并且恢复以前的环境。

我们还需要处理未隐藏的封闭变量


var global = "outside";
{
  var local = "inside";
  print global + local;
}

Here, global lives in the outer global environment and local is defined inside the block’s environment. In that print statement, both of those variables are in scope. In order to find them, the interpreter must search not only the current innermost environment, but also any enclosing ones.

We implement this by chaining the environments together. Each environment has a reference to the environment of the immediately enclosing scope. When we look up a variable, we walk that chain from innermost out until we find the variable. Starting at the inner scope is how we make local variables shadow outer ones.

在这里,变量global 存在于外部环境变量中,变量 local 定义在代码块中,在print语句中,这两个变量都在作用域中,为了找到它们,解释器不仅需要搜索当前最内部的环境,还必须搜索任何封闭的环境。

我们通过将环境链接在一起来实现这一点,每个环境都具有对紧邻的封闭范围的环境的引用,当我们查找一个变量时候,我们从最里面向外遍历该链,直到找到该变量,从内部范围开始,我们将如何使得局部变量覆盖外部变量

chain

While the interpreter is running, the environments form a linear list of objects, but consider the full set of environments created during the entire execution. An outer scope may have multiple blocks nested within it, and each will point to the outer one, giving a tree-like structure, though only one path through the tree exists at a time.

当解释器运行时候,环境形成了一个线性的对象列表,但要考虑在整个执行过程中创建的全套环境。一个外部作用域可能存在多个嵌套在其中的块,每个块都会指向外部的块,从而形成一个类似树的结构,尽管一次只会存在一条通向树的路径

The boring name for this is a parent-pointer tree, but I much prefer the evocative cactus stack.

这个无聊的名字是一个父指针树,但我更加喜欢能引发共鸣的仙人掌堆栈这个名字。

cactus

Before we add block syntax to the grammar, we’ll beef up our Environment class with support for this nesting. First, we give each environment a reference to its enclosing one.

在将块语法添加到语法之前,我们将增强environment 类对于嵌套的支持,首先,我们将给每个environment对其封闭环境的引用


// lox/Environment.java, in class Environment

class Environment {
  final Environment enclosing;
  private final Map<String, Object> values = new HashMap<>();
  

This field needs to be initialized, so we add a couple of constructors.

这个字段需要初始化,因此我们将添加一些构造函数


// lox/Environment.java, in class Environment

  Environment() {
    enclosing = null;
  }

  Environment(Environment enclosing) {
    this.enclosing = enclosing;
  }
  

无参数构造函数,用于结束链的全局作用域环境,另一个构造函数创建一个嵌套在给定外部作用域内的新的本地作用域。

We don’t have to touch the define() method—a new variable is always declared in the current innermost scope. But variable lookup and assignment work with existing variables and they need to walk the chain to find them. First, lookup:

我们不需要触及define() 方法——一个新的变量总是在当前最内部的作用域中声明,但是变量查找和赋值,和现有变量一起工作。它们需要遍历链才可以找到,首先,查找:


// lox/Environment.java, in get()


      return values.get(name.lexeme);
    }

    if (enclosing != null) return enclosing.get(name);

    throw new RuntimeError(name,
        "Undefined variable '" + name.lexeme + "'.");
		

If the variable isn’t found in this environment, we simply try the enclosing one. That in turn does the same thing recursively, so this will ultimately walk the entire chain. If we reach an environment with no enclosing one and still don’t find the variable, then we give up and report an error as before.

Assignment works the same way.

如果在当前的环境中无法找到某个变量,我们只需要尝试使用封闭变量,这反过来递归做同样的事情,最终将会遍历整条链,如果我们到达某个环境,不存在封闭变量,仍然找不到变量,那么我们将放弃,并且向之前那样报告错误。

变量赋值的工作方式相同。


// lox/Environment.java, in assign()

      values.put(name.lexeme, value);
      return;
    }

    if (enclosing != null) {
      enclosing.assign(name, value);
      return;
    }

    throw new RuntimeError(name,
	

It’s likely faster to iteratively walk the chain, but I think the recursive solution is prettier. We’ll do something much faster in clox.

迭代遍历链可能更快,但是我认为递归解决方案更加好,我们将在clox中做一些更快的事情。

Again, if the variable isn’t in this environment, it checks the outer one, recursively.

同样,如果变量不在当前环境中,我们将会查询外部环境,一直递归。

5.2 Block syntax and semantics

块语法和语义

Now that Environments nest, we’re ready to add blocks to the language. Behold the grammar:

现在我们支持了嵌套环境,我们准备向语法中添加块语法,


statement      → exprStmt
               | printStmt
               | block ;

block          → "{" declaration* "}" ;

A block is a (possibly empty) series of statements or declarations surrounded by curly braces. A block is itself a statement and can appear anywhere a statement is allowed. The syntax tree node looks like this:

块是由大括号包围的一系列语句或者声明(可能是空的),块本身就是一条语句,可以出现在允许语句出现的任何地方,语法树节点如下所示


// tool/GenerateAst.java, in main()

    defineAst(outputDir, "Stmt", Arrays.asList(
      "Block      : List<Stmt> statements",
      "Expression : Expr expression",
	  

It contains the list of statements that are inside the block. Parsing is straightforward. Like other statements, we detect the beginning of a block by its leading token—in this case the {. In the statement() method, we add:

它包含块内的语句列表,解析很简单,与其他语句一样,我们通过前导token检测块的开始,在本例中,是 {, 在 statement() 方法中,我们添加


// lox/Parser.java, in statement()

    if (match(PRINT)) return printStatement();
    if (match(LEFT_BRACE)) return new Stmt.Block(block());

    return expressionStatement();
	

All the real work happens here:

真的解析发生在


// lox/Parser.java, add after expressionStatement()


  private List<Stmt> block() {
    List<Stmt> statements = new ArrayList<>();

    while (!check(RIGHT_BRACE) && !isAtEnd()) {
      statements.add(declaration());
    }

    consume(RIGHT_BRACE, "Expect '}' after block.");
    return statements;
  }
  

We create an empty list and then parse statements and add them to the list until we reach the end of the block, marked by the closing }. Note that the loop also has an explicit check for isAtEnd(). We have to be careful to avoid infinite loops, even when parsing invalid code. If the user forgets a closing }, the parser needs to not get stuck.

That’s it for syntax. For semantics, we add another visit method to Interpreter.

我们创建1个空列表,然后解析语句,并且将它们添加到列表中,直到到达块的结尾,标记是 }, 注意,循环还有一个条件是isAtEnd() , 为了防止进入无限循环中,即使在解析无效代码时候也应该这样做。如果用户忘记了块结尾的 }, 代码也不会一直循环。

这就是语法,对于语义,我们需要向解释器添加一个新的访问方法。


// lox/Interpreter.java, add after execute()

  @Override
  public Void visitBlockStmt(Stmt.Block stmt) {
    executeBlock(stmt.statements, new Environment(environment));
    return null;
  }
  

Having block() return the raw list of statements and leaving it to statement() to wrap the list in a Stmt.Block looks a little odd. I did it that way because we’ll reuse block() later for parsing function bodies and we don’t want that body wrapped in a Stmt.Block.

让block() 方法返回原始语句列表,并将其留给statement() 方法,将列表包装在 Stmt.Block中,看起来有些怪,我这样做是因为稍后,我们将复用block() 来解析函数体,我们不希望函数体包装在 Stmt.Block 中。

To execute a block, we create a new environment for the block’s scope and pass it off to this other method:

要执行一个块,我们为块的作用域创建一个新的environment,并将其传递给另一个方法


// lox/Interpreter.java, add after execute()

  void executeBlock(List<Stmt> statements,
                    Environment environment) {
    Environment previous = this.environment;
    try {
      this.environment = environment;

      for (Stmt statement : statements) {
        execute(statement);
      }
    } finally {
      this.environment = previous;
    }
  }

This new method executes a list of statements in the context of a given environment. Up until now, the environment field in Interpreter always pointed to the same environment—the global one. Now, that field represents the current environment. That’s the environment that corresponds to the innermost scope containing the code to be executed.

To execute code within a given scope, this method updates the interpreter’s environment field, visits all of the statements, and then restores the previous value. As is always good practice in Java, it restores the previous environment using a finally clause. That way it gets restored even if an exception is thrown.

这个新方法在给定环境的上下文中执行语句列表,到目前为止,Interpreter中的environment 字段,始终指向的是全局变量 environment,现在,该字段,表示的是当前的environment, 这是与包含要执行的代码的最内部范围对应的environment.

要在给定范围内执行代码,此方法会更新 interpreter的 environment字段,访问所有的语句,然后再恢复 environment 为之前的值,正如Java的良好实践一样,它使用 finally子句恢复以前的环境,这样,即使抛出异常,它也能恢复

Manually changing and restoring a mutable environment field feels inelegant. Another classic approach is to explicitly pass the environment as a parameter to each visit method. To “change” the environment, you pass a different one as you recurse down the tree. You don’t have to restore the old one, since the new one lives on the Java stack and is implicitly discarded when the interpreter returns from the block’s visit method.

手动更改和恢复一个可变的environment字段看起来不太合适,另一种经典的方式是,将environment当作参数,显式传递给每个访问方法,要改变环境,在树下递归时候传递一个不同的environment,我们不需要恢复旧的,因为新的environment 存在于Java堆栈中,并且当解释器从块的访问方法返回时候,被隐式的丢弃

I considered that for jlox, but it’s kind of tedious and verbose adding an environment parameter to every single visit method. To keep the book a little simpler, I went with the mutable field.

我考虑在jlox中这样实现,但是在每个单独的访问方法中添加一个environment参数有些冗余乏味,为了让本书更加简洁,我使用了可变字段。

Surprisingly, that’s all we need to do in order to fully support local variables, nesting, and shadowing. Go ahead and try this out:

令人惊讶的是,为了完全支持局部变量、嵌套和隐藏,我们需要做的就是这些,继续尝试下面代码


var a = "global a";
var b = "global b";
var c = "global c";
{
  var a = "outer a";
  var b = "outer b";
  {
    var a = "inner a";
    print a;
    print b;
    print c;
  }
  print a;
  print b;
  print c;
}
print a;
print b;
print c;

Our little interpreter can remember things now. We are inching closer to something resembling a full-featured programming language.

我们的解释器现在能记住东西了,我们正逐步接近一门功能齐全的编程语言。

六、CHALLENGES

习题

  1. The REPL no longer supports entering a single expression and automatically printing its result value. That’s a drag. Add support to the REPL to let users type in both statements and expressions. If they enter a statement, execute it. If they enter an expression, evaluate it and display the result value.

    REPL 不再支持输入单个表达式并且自动打印结果值,这是一个优化点,向REPL 添加支持,允许用户同时输入表达式和语句,如果输入一条语句,直接执行,如果输入表达式,计算表达式值,然后输出结果。

  2. Maybe you want Lox to be a little more explicit about variable initialization. Instead of implicitly initializing variables to nil, make it a runtime error to access a variable that has not been initialized or assigned to, as in:

    
    // No initializers.
    var a;
    var b;
    
    a = "assigned";
    print a; // OK, was assigned first.
    
    print b; // Error!
    
    

    也许我们希望Lox,在变量初始化方面更加明确一些,与其隐式的将变量初始化为nil,不如将访问尚未初始化或者赋值的变量设置为运行时错误

  3. What does the following program do?

    以下程序做什么?

    
    var a = 1;
    {
      var a = a + 2;
      print a;
    }
    

    What did you expect it to do? Is it what you think it should do? What does analogous code in other languages you are familiar with do? What do you think users will expect this to do?

    你希望它做什么?这是你认为它应该做的吗?你熟悉的其他语言中是这样的吗?你认为用户希望的返回结果是什么?

七、DESIGN NOTE: IMPLICIT VARIABLE DECLARATION

设计思想: 隐式变量声明

Lox has distinct syntax for declaring a new variable and assigning to an existing one. Some languages collapse those to only assignment syntax. Assigning to a non-existent variable automatically brings it into being. This is called implicit variable declaration and exists in Python, Ruby, and CoffeeScript, among others. JavaScript has an explicit syntax to declare variables, but can also create new variables on assignment. Visual Basic has an option to enable or disable implicit variables.

When the same syntax can assign or create a variable, each language must decide what happens when it isn’t clear about which behavior the user intends. In particular, each language must choose how implicit declaration interacts with shadowing, and which scope an implicitly declared variable goes into.

Lox 在声明新的变量和对变量赋值,有不同的语法,有些语言将它们合并为一种语法,对一个不存在的变量赋值,会自动声明。这称为隐式变量声明。支持隐式变量声明的语言有Python/Ruby/CoffeeScript等等。JavaScript具有声明变量的显式语法,但是也支持在变量赋值时候直接创建。Visual Basic 具有禁用启用隐式变量声明的选项。

当相同的语法可以赋值或者创建变量时候,每种语言必须决定在不清楚用户想要的行为时候,会发生什么?特别是,每种语言都必须选择隐式变量声明与阴影如何交互?以及隐式声明的变量属于什么作用域?

  • In Python, assignment always creates a variable in the current function’s scope, even if there is a variable with the same name declared outside of the function.

在Python中,赋值总是在当前函数的作用域中创建一个变量,即使在函数外部已经存在一个同名的变量

  • Ruby avoids some ambiguity by having different naming rules for local and global variables. However, blocks in Ruby (which are more like closures than like “blocks” in C) have their own scope, so it still has the problem. Assignment in Ruby assigns to an existing variable outside of the current block if there is one with the same name. Otherwise, it creates a new variable in the current block’s scope.

Ruby通过对全局变量和局部变量使用不同的命名规则,来避免一些歧义。然而,Ruby中的块(更像是闭包而不是C中的块)有自己的作用域,所以它仍然存在问题。如果存在同名的变量,Ruby中的赋值,将会赋值给当前块外部的同名变量。否则,如果当前块的外部不存在同名变量,我们将在当前块的作用域中创建一个新的变量。

  • CoffeeScript, which takes after Ruby in many ways, is similar. It explicitly disallows shadowing by saying that assignment always assigns to a variable in an outer scope if there is one, all the way up to the outermost global scope. Otherwise, it creates the variable in the current function scope.

    CoffeeScript在很多方面都和Ruby相同,它明确禁止阴影,表达式赋值总是分配给外部作用域中的变量(如果有),一直到最外层的全局作用域,否则,它将在当前函数作用域中创建一个新的变量。

  • In JavaScript, assignment modifies an existing variable in any enclosing scope, if found. If not, it implicitly creates a new variable in the global scope.

    在JavaScript中,赋值会修改封闭作用域内的已经存在的变量,如果能找到。如果没有,将在全局范围内,隐式创建一个新的变量。

The main advantage to implicit declaration is simplicity. There’s less syntax and no “declaration” concept to learn. Users can just start assigning stuff and the language figures it out.

Older, statically typed languages like C benefit from explicit declaration because they give the user a place to tell the compiler what type each variable has and how much storage to allocate for it. In a dynamically typed, garbage-collected language, that isn’t really necessary, so you can get away with making declarations implicit. It feels a little more “scripty”, more “you know what I mean”.

隐式变量声明的优点是简单,语法更少,也不需要学习"声明" 等术语,用户只需要赋值,语言将会解决声明。

老的静态类型语言,例如C,从显式声明中受益,因为它们为用户提供了一个地方,可以告诉编译器,每个变量的类型,以及为其分配的存储空间。在动态类型,支持垃圾收集的语言中,变量显式声明,并不是必须的,因此可以使用隐式变量声明,

But is that a good idea? Implicit declaration has some problems.

隐式变量声明是一个好主意吗?实际上,它存在一些问题。

  • A user may intend to assign to an existing variable, but may have misspelled it. The interpreter doesn’t know that, so it goes ahead and silently creates some new variable and the variable the user wanted to assign to still has its old value. This is particularly heinous in JavaScript where a typo will create a global variable, which may in turn interfere with other code.

用户可能打算对一个已有变量赋值,但是可能拼错了。解释器并不知道,所以会自动创建一些新的变量。而用户想要更新的变量没有变化,这在JavaScript中更加突出,因为拼写错误,会创建一个全局变量,可能会影响其他代码

  • JS, Ruby, and CoffeeScript use the presence of an existing variable with the same name—even in an outer scope—to determine whether or not an assignment creates a new variable or assigns to an existing one. That means adding a new variable in a surrounding scope can change the meaning of existing code. What was once a local variable may silently turn into an assignment to that new outer variable.

    JS,Ruby, CoffeeScript 中,通过判断,当前变量是否存在(即使存在于外部作用域中),来决定创建一个新的变量还是赋值给现有变量,这意味着在附近作用域中添加新变量,可能会改变现有代码的含义。曾经是局部变量的内容可能会默默变为对新的外部变量的赋值。

  • In Python, you may want to assign to some variable outside of the current function instead of creating a new variable in the current one, but you can’t.

    在Python中,我们可能会希望赋值给当前函数之外的某个变量,而不是函数中新创建的变量,但是无法实现。

Over time, the languages I know with implicit variable declaration ended up adding more features and complexity to deal with these problems.

随着时间的推移,我所知道的支持隐式变量声明的语言,最终增加了更多的特性和复杂性来处理,这些问题。

  • Implicit declaration of global variables in JavaScript is universally considered a mistake today. “Strict mode” disables it and makes it a compile error.

    JavaScript 的全局变量的隐式声明,被普遍认为是一个错误,严格模式已经禁用,并且把它当作一个编译错误。

  • Python added a global statement to let you explicitly assign to a global variable from within a function. Later, as functional programming and nested functions became more popular, they added a similar nonlocal statement to assign to variables in enclosing functions.

    Python添加了一个 global 语句,允许我们在内部函数中,显式的对全局变量赋值,后来,随着函数式编程和嵌套函数越来越流行,Python 添加了一个类型的语句 nonlocal ,对封闭函数中的变量进行赋值。

  • Ruby extended its block syntax to allow declaring certain variables to be explicitly local to the block even if the same name exists in an outer scope.

    Ruby 扩展了语法,允许声明某些变量为块的显式本地变量,即使在外部作用域中存在相同名称的变量。

Given those, I think the simplicity argument is mostly lost. There is an argument that implicit declaration is the right default but I personally find that less compelling.

My opinion is that implicit declaration made sense in years past when most scripting languages were heavily imperative and code was pretty flat. As programmers have gotten more comfortable with deep nesting, functional programming, and closures, it’s become much more common to want access to variables in outer scopes. That makes it more likely that users will run into the tricky cases where it’s not clear whether they intend their assignment to create a new variable or reuse a surrounding one.

So I prefer explicitly declaring variables, which is why Lox requires it.

考虑到这些,我认为隐式变量声明简单性的优点实际是错误的,有一种观点认为,隐式变量声明是正确的默认方式,我个人不这么认为。

我的观点是,在过去的几年中,隐式声明是有意义的,当时大多数的脚本语言都是必要的,而且代码语法都非常简单,随着程序员越来越熟悉深度嵌套、函数式编程、闭包,想要访问外部作用域中的变量的需求越来越普遍,这使得用户更有可能遇到棘手的状况,即他们不清楚自己的赋值是创建新的变量,还是对周围的同名变量赋值

所以,我更喜欢显式声明变量,这就是Lox需要显式声明的原因。

Creative Commons License Flag Counter