'r0', etc. become names of registers, even perhaps in normal Trylon. Target-specific register names too, I suppose. But any sends to them are intercepted by the compiler. One must know what one is doing, values will be translated between the two worlds unsafely.
Automatically generates a class wrapping a BytePtr, with accessor methods for the fields. The structure definition must be visible in C, probably through the preamble. How to deal with "struct mg_request_info" vs. a typedef'd anonymous struct? How to deal with read-only vs. read-write fields? Writing in general -- when to copy strings?
Turns out the new parser doesn't implement a "uses" statement; the old one didn't either, apparently! What does a "uses" mean currently? It must only be used to get "Standard" into "Main".
Maybe it really is better to have subclasses also be in the namespace:
But then we're tempted to "use" Collection just to get those names. The solution gets a little subtle:
But we need "Empty" in the outer scope so someone who *doesn't* "use" Collection can say "Collection Collection"... oy...
What was my point?
Anyway, here's the CFunction library demonstrating the alternative:
For a while I've been trying to think about whether adding C-style function call syntax, probably doing C-ABI function calls, would work. And I think we could get it to work with other ABIs (Java, Python) too. Have it be syntactic sugar:
Nullary call:
[Hey, "Collection Empty" is finally a real use for the prototype objects.]
C calls would have args unboxed at runtime. That may require the use of an FFI lib. Different classes of C calls handle return values:
Also want a facility for declarative argument checking:
Or even:
...except we can't declare a a shared variable to be initialized to a non-contant value.
Verb-like objects for more shell-scripting style:
Actually, that's rather noun-like. And with real verbs like "ls", we run into a problem at the leaves. Maybe.
But the first will do nothing, it is just a reference.
Finally using the curly braces and the semicolon:
Or can we use the comma? I think so:
Yeah, that's easy enough to parse, at least currently when ',' has the least binding power. And it has the advantage of not adding any new punctuation.
Implementation can be optimized by sorting the fields alphabetically, and keeping a global dictionary with (orderable) tuples of the names as keys, so there's only one class of { key, value } objects, no matter how many times such objects are created.
Multiple value returns seem to be the main use case for this, and they are rare, so this isn't a high priority.
How does this fit into the namespace system? "HTTP some.guy.com TinyHTTPServer"? Use that syntax instead?
No, I think this would be better:
Either that, or the first one, and have the package named with the full URL in Main. Yes, that. The only odd thing about that is the internal colon in the unary selector -- which is easy to define as correct: only words *ending* in a colon are keywords.
Also allow Git (eventually maybe Hg, Svn, etc.). This would grab the 'master' branch. Pull on every compile? Probably not.
Need a place to put these; specified as "remote-packages" in build-settings. Default to "~/trylon-remote-packages". It'd be nice to have a "~/.trylon-build-settings" file too for per-user global settings. So the example would be stored as "~/trylon-remote-packages/http/some.guy.com/trylon-packages/TinyHTTPServer/". "main" is grabbed by default, anything else is downloaded on demand. Try "wget" first, then "curl -O".
Someday, perhaps "http:" should handle "tar.gz", ".tgz", etc. as well.
Replace "c-fn" with this:
What to do when the language isn't targeted? Currently it's an error, so "iff"s must be used. But the above shows the possibilities of simply not defining the function if the language isn't targeted.
Another possibility is foreign blocks (with a "two-level" syntax akin to "switch"):
I guess I prefer the simplicity of the first alternative, and don't really find it too repetitive.
Still trying to unify methods and objects. Any object can be considered the activation frame of a function call. Necessitates going to a purer prototypism (var declarations create fields, not shared fields).
Umm, there seems to be some ambiguity about what "this" is. Or maybe not; maybe it always sorta points up a level lexically. So when 'create: other' is executing, 'this' would be a String, passed in like an arg. When 'String' is evaluating, 'this' would be... [String itself, I suppose.]
Anyway, another far-out idea: unify object references and BytePtrs. All objects respond to BytePtr's methods; any 'does-not-understand:' on those selectors is trapped. That includes if the object's classref is random bits.
A file like this (named "foo"):
Should respond to shell commands like:
Parse binops or not? Think about field storage (and creation) later. (".foo.values" file in same dir?)
These could be turned into commands, paving the way for user-defined control structures.
(Adding "string message". Also add "Collection message".)
But while "break" and "continue" work nicely unchanged, other control structures ("if", "while") can't be done that way. So we're back to the ol' "first word" thing. Except now maybe it's the first word of the expression, not just the statement.
I've needed a name for expressions starting with the verb/keyword. You could think of them as having an implied "you":
Which would make them "imperative commands", or just "commands".
Some funny character to start them?
I prefer the latter.
This suggests a form of "reader" like Scheme has. "¶" is obviously a single word that introduces a reader, but maybe we want some kind of generic reader syntax:
or
Maybe?
-----
Infer types of locals that are only assigned to once. But only if something can be gained: that doesn't optimize BytePtrs or Ints (which often increment themselves).
Rename the "prepare-to-emit" phase to "resolve".
Rename "the-compiler" to "compiler".
Double keywords:
Perhaps not... but it would maybe enhance inline conditionals:
-----
New stream protocol:
Hmm, I really only want send. It means send a message -- most often that message takes the form of a line.
-----
Double indents ("indentation violations"):
parses as a line with a subordinate block ("outer block"). The outer block contains two lines. The first is a null line with its own subordinate block ("inner block"), the second is the "scrantone" line.
That doesn't gibe well with the auto-line-extension idea. It could work, but it groups the extension with the body instead of the line. I think I really don't want auto-line-extension anyway (well, maybe after the first extension).
This could be extended to even more baroque null-line structures by simply counting tabs.
Call the subordinate block a "body" instead of a "block"? A line has the line itself (usually) and also its body.
-----
Underscore for line extension
Not really. If only it were a tiny ellipsis...
-----
Been thinking about trailing "=" instead of ":"/"put:" for setting:
Well, maybe just for the ":" case; still use "put:" 'cause "at:=" is a little weird... or is it?
Not bad, but I think I still prefer "put:". But "size=" wasn't as ugly as I thought.
"[]" becomes "at:":
I think I need a better name for what's now called a Function (ie. TrylidFunction). That is, the first word in an expression; the thing that's looked up in the context. Linguisticly, it unifies two roles: naming the subject (receiver) of a message, or calling a function. In the compiler, it has another dual role: emit-call: vs. emit-function:. (Note lookup-function: vs. lookup-instance-function: in CompiledProto -- "instance" shouldn't be meaningful. "lookup-own-function:"? "lookup-selector:"? Should it also check the proto's directories?)
Don't forget to fix && || ! precedence.
Excessive (redundant) name-mangling may be one reason the Trylon-built Trylid compiler goes faster than the (Trylon-built) Trylon compiler. (But the emit phase isn't the only culprit.)
Keep a dictionary of all selectors, annotated with used/defined bits, to prevent link errors. Actually, it looks like we already have such a dict (minus the annotations) in the Trylon compiler, called "object-function-names", needed to build the dispatch table.
"primitive" statement declares a variable holding a machine word. It only responds to a few methods, and the compiler can enforce that.
In the compiler, Send will have to be special-cased.
When porting the building-dispatch-table phase, rename DispatchRow to SelectorRow.
Still speak of "classes"; they are what your code defines. But each class automatically has a prototype, so it acts like a prototype-based system. So "MyClass" or "Standard Int" (as an expression) refers to a prototype.
The Trylon compiler needs to change to being prototype-based. Copy Trylid's CompiledProto to Trylon, adapt it to Trylon by cutting-and-pasting from CompiledClass and Package, and hook it into the Parser and Compiler. It is expected that existing Trylon code will not need modification (but should eventually have "class-fn" and "class-field" removed).
Make "nil" the only falsy value, as in Jolt & Trylid.
Proto names ('.proto-name') should be symbols, not strings.
Field is a macro, a special form, or perhaps even a function that creates a new field in the current object. No, it's still syntax because "field" has to get the name as an argument.
The "--" should indicate that the following *block* is a continuation of the line. Extra indentation can be used when both a continuation block and a subordinate block are needed:
"splice" instead of extend:
CamelCase for class names is the last vestige of hungarianism. Should it be eliminated?
When class extension is implemented, I can eliminate "primitive-fn". A function with no body is noted as such, and anyone can replace it. Primitives are a special case of extenders. Also, C-style separation of interface from implementation becomes possible (but not mandatory!).
I want the interpreter (to replace bash). Maybe that could be a fun mechanical project.
More tests of object-orientation:
The latter reaffirms the need to eliminate verbosity in basic control statements.
Rationale: I'd like to have a Math package that, when "used", would add new methods to Int and Float.
Register-based. A method can have as many registers as it wants, ie. they are locals (including temporaries and arguments). Operands (eg. for an "if" condition) can be a register, a field in 'this', what else? Literals? Or are literals only assignable to registers?
Arguments are registers called '.a1', '.a2', etc., and are shared by the caller and callee. Or perhaps the callee knows them only by their given names, which map to registers starting with number zero. The *last* registers in the frame are called '.a1', '.a2', etc., and are used to pass arguments downward.
If there are not enough bits in the bytecode for an operand, use words after the bytecode in UTF-8 format for easy efficient flex sizes.
How to specify classes, as we will need to do? They're just items in the literals. The literal section is the one that will need to be linked if it is stored outside a running image. It'll need to link symbols as well as classes.
Can mix with other languages, using indentation to determine language boundaries. Already it kind of does that, with separate languages at the package level and the method level.
Like the Unix filesystem: names starting with a dot exist, but are hidden in any UI listings. '.' == 'this'. '.class' instead of 'class'.
In theory, you could be explicit about 'this':
But that'd be stupid, except in a bytecode assembler.
First, make sure they can be used in keywords: eg. π. [Done.]
Have codegen use escapes for UTF-8 characters in strings.
Then, add certain binops to the grammar: ≠, ≤, ≥ (and @ while you're at it).
Finally, strings, including counting beginning and end quotes:
Arc may have the answer (http://www.paulgraham.com/arcll1.html)! Backslash introduces character constant:
An example, taken from Dia's XML and simplified:
Reunify characters and integers? Replace awkward backtick char constants with a special form of numeric constant?
"fields" statement should handle indents as continuation.
Also, extending the dict-constant syntax (see 2006.3.2), so it allows non-valued dict entries (default value? nil? name as symbol?), allows this:
(But not "fields = file lines"!)
More in that direction:
One is almost tempted to omit the '='s. Actually, just go back to the way it already is, to avoid ambiguity between "fields" and unary function defs. (Brilliant useless insight!: the presence of any function call (that is, any expression) is what differentiates a dict-constant from a function def.)
Trying to head for a syntax that includes only function calls and assignments. But "if" and "while" remain intractable.
Also note new auto-"new:" function taking a Tuple.
Multi-line quotes are indent-sensitive (and don't require a closing quote-mark).
BTW, field declarations above don't allow for types.
The "members" view can be unified with the "code" view by treating it as a program run by the compiler. Which it essentially is anyway.
Hmm, there still is a sort of radical context shift once you get into a function.
What about that method declaration syntax? It seems to imply a rather late detection of expressions vs. declarations.
Go back to Cleen(?)-style binop handling: "3 + 4" -> "3 plus: 4".
Need "+=" for "fields":
That implies Tuple '+' and Symbol "plus: object -> Tuple".
Auto-declaring a variable with a constant value automatically gives the variable that type. Eg.
Having initialization to a function call might be nice, but is probably hard to implement in the current compiler:
Note that this changes the definition of the language; it's not just an optimization. You're not allowed to store any other type of object in that variable. It can be overridden by explicitly typing to "Object":
Of course, *explicit* type declarations must first be made to work.
build-settings:
main:
Or something like that. But "new" isn't an instance function of a Class object (currently). Either metaclasses can be added (ugh, implementationally; cleaner semantically), or a "Class raw-instance" function can be added (easy but uglier).
Parentheses could be used for type declarations -- they should be parseable in declarations of locals, arguments, and fields -- thus freeing up [] as the index operator again.
Currently tuples have a precedence below keyword calls, but maybe they should be like any other binop.
Recent experiments show the speed inferiority of Trylon when implementing a Lexer. (C is even faster than Cleet. Trylon:Cleet:C == 1500:500:300.)
Putting type names into signatures could work. When calling a function with a primitive type as one (or more) of the arguments, check at compile time if there's a method with the typed signature. If not, box. The exact check means you can't subclass primitive types (or get boxing if you do), but that's okay, since we only need to optimize primitive types anyway.
Use Javascript-style objects, but without the indexing. I guess that makes them more like Python objects. But anyway, classes are singletons:
So class variables now become members of the singleton class object. And they assume greater importance as the compiler treats some of them specially.
(Could go even further with this: "Entry = class", "add-paragraph: = ", but it's probably better not to.)
Anonymous objects: { name = "Thing", callback = at:get: }
For a new language, try going back to traditional (non-Smalltalk-style) calls/arguments. Perhaps these can be unified with the anonymous objects somehow (and maybe that's where Javascript's indices come implicitly in).
But can it really be compiled well? If we only pass args positionally it's fine, but I don't know how to do the keyword arguments at compile time. How does CLOS do it?
Actually, do it positionally, but the arglist, being an object, can also have names for the slots. The function can be compiled to assert that any names present in the arglist match the names of its arguments. Oppositely, function calls can be optimized by not passing argument names. Scrambled arguments can be disallowed, and/or caught by assertions and descrambled.
We want to be able to return from the function from inside a lambda:
This is a non-local return (in the C code). This could be handled by an exception. The function has an implicit try block around the whole thing:
And add a ReturnFromLambaStatement.
The problems is that there could be a "catch Object" in the method, which would catch this too.
On exhale, a class/package is exhaled as a directory if it contains other classes/packages, and as a file if not. (Unless, of course, the system kept track of how it was inhaled.)
Would like to syntactically distinguish class from instance functions. Really there are only two options: either class functions need a keyword, or method functions do.
The class should become more of a singleton object. (Hopefully without falling into metaclass hell.)
(See 2004.12.8 for implementation.) Maybe only allow them as the last argument:
Hmm, this suggests using a sort of "unary |" to introduce a lambda.
A block has "do", "do-with:", "do-with:and:", "do-with:and:and:".
Which raises the issue of a lambda's return value, here taken to be the value of the last evaluated expression.
So far I feel like Cleet is faster than Trylon, also it doesn't handle functions with many arguments that well. Could Cleet be quickly given xlon blocks?
But would implementing that really be any quicker than implementing full type-checking in Trylon? And Trylon could use both systems, vtables when possible and an rddtable otherwise.
But first we need to know if Trylon really *is* slower than Cleet.
Python has this:
It seems like a nice feature to have. But it conflicts (somewhat) with Trylon's setters.
But I have, over time, thought a little about how it might be implemented. But I can't fully remember now... I guess it might have something to do with MethodContext's lookup. It can go to its parent if it doesn't know it immediately, but if the parent doesn't know it, it can declare it. No, actually, it's Block that would do it, and it looks perfectly feasable -- and even trivally easy -- from looking at the actual Trylon compiler source.
Only names containing a single (trailing) colon would trigger an auto-declaration (eg. "name:", but not "length" or "draw:at:with:").
If you want to really go crazy, this can probably be extended to fields and classes/packages:
This actually achieves the age-old dream of a single syntax for class/package fields in classes and packages -- but not for class/package functions. Also note that in the class-field case, the class is a different syntax (than a block), not just a different context.
Note too that any instance function can declare a field, not just a creator. The field will be nil like any other field until someone assigns to it. However, due to the mechanics of parsing, it's likely that the field will not be visible (without a "this" prefix) to methods that are lexically before (or earlier in parse-order, however that turns out) the method in which it is first auto-declared. Parsing creators first may be desirable, in case one is lexically later.
And getting really out of control, there is no "class-fn", only "fn". I think I discussed this before. But here's the implementation: the MethodContext always checks for "this" or function calls on "this" -- and sets a flag if one is ever used. If true, or if the function has the same name as (that is, overrides) a function in a superclass, the function (and the function it overrides) is an instance function. Gee, this is all exactly what I was thinking on 2005.3.22.
Anyway, while I'm at it, I'm starting to really long for the "setting = true" syntax in the build-settings files.
And while I'm making Trylon 2, get rid of "fn" (or rather, make it optional). "class" and "iff" remain as special "statements" in a class/package. How to do primitives? Even class vs. package is unified. There are only classes, but a class with no instances needn't appear in the method table.
Also, maybe make any "create-"-prefixed name be a constructor, so we can do this: "Point new-at-x: 10 y: 20".
Revisiting the implementation of auto-declaration of locals, it's not quite as trivially simple as I thought. That's because there can be many levels of Blocks, but only the lowest one can auto-declare. Probably add Context.lookup-function-autodeclaring(name: String). FunctionCall.prepare-to-emit() calls that; nothing else calls it.
How about making *all* symbols delimited by whitespace? (Once again, LISP was there first -- almost, anyway. I guess FORTH is one that really is like that.) Probably the worst problem is parens/brackets:
Yeah, forget it.
A certain amount of it can be indicated by the first word on a line.
A command line (or possibly an editor) would repeat the previous "prompt" (including the null prompt) as the new prompt. Indentation levels are "prompts"?
Make them block sensitive, so the continuation lines don't have to end with the continuator?
Probably best to pay attention to nested indents.
But that implies a layer between the lexer and the parser, doing a tokens->tokens transformation. But if we have such a layer, it can fix up blank lines too, placing them (properly) between blocks rather than at the end of innermost blocks. This layer could possibly be easily made part of the Lexer (but should it be?).
More syntax tests:
I think I like the square brackets best.
What would C be like with a Trylon (or Python) -style syntax? Very much like a fully-typed Trylon, but with C-style function calls, I think:
No reason it can't automatically move declarations to the top. And convert inter-hyphens to underscores, "nil" to "NULL", ...
Another example, this time as an instance function with automatic args and return type:
Nice, but what about conflicting syntax ("[]", "label:", "--" etc.)?
Since these are already hypertrophying toward a subset of Trylon, is there a way to use the existing parser to deal with them? By feeding the Parser the right Contexts, and by making the ParseNodes do interpretation as well as codegen?
Make declarations more function-style (in much the same way as "virtual" is functified now).
This could be added quickly to the current compiler.
But I've been thinking that I want to ablate any prefix for function definitions. Also, use ":=" to declare fields.
How about this: use "fields:" to declare instance fields, ":=" to declare class/package variables.
Hmm, try having no distinction, syntacticly, between "class/package" functions and "instance" functions. Any function calling "this" is an instance function (and is detected as such by the compiler). This high-level concept corresponds to what's happening at the low level.
However, that requires the compiler to parse the whole function before knowing how other functions can call it (probably not really a problem for the current compiler; not necessarily an issue in a dynamic environment either). Detecting the use of "this" is a little tricky because it can be implicit. (Python doesn't have this problem; it kinda thinks it's C rt a real o-o language.) Probably "this" always shows up in the method's context; we then detect if it ever gets hit. Actually, it's the contents of "this" -- that is, the instance functions (as FunctionCallOnThis's) -- that need to have this detection. Actually, MethodContext.lookup-function() is the place to do it, and it'll be easy for that to know the method. Viola!
But syntacticosemantically, it means that any function in a class can be applied either to an instance or to the class.
The problem: detection also depends on inheritance. And how to declare a virtual (not pure-virtual) that doesn't access "this"? Maybe: a "thisless" function goes into both the class and instance functions. (Maybe the instance one is just an adapter calling the class one.)
It'd be nice to have a specific "block comment".
Found while compiling List:
- Type declarations -- even inside a method -- can't refer to a subclass or subpackage that is declared later in the file. - "== nil" doesn't work. I want nil to still be zero, since that's how objects are initialized. Do it like this: Install a new NilFunction as the definition of "nil" in setup-main(). For "==" and "!=" (parse-equality-expression()), use a new EqualityCall (EqualityObjectCall?). That will check for NilFunction as the argument; if it's not a NilFunction, its emit-code() will make an ObjectCall and use its emit-code(). Since there are no "===" and "!==" operators, we don't need to worry about convert-to-setter-call() and copy().
Omit the "class
Maybe eliminate the distinction between a Package and a Class? "method" and "field"/"fld" are for instance members, "function"/"fn" and "variable"/"var" for class/package members. Hmm, I think I really want "fn" for instance functions, the commonest case. Maybe try "method" and see how it works out.
Rename the "main" file to "contents"? A class is exhaled as a single file, unless it contains other classes, in which case it is exhaled as a directory. When inhaling from a directory, note whether a "contents" file is present, and don't exhale one if all the members are classes.
Could streamline the "draw: string x: x y: y" idiom:
Could even go all the way:
Eventually, we'll need #ifdefs in primitives. Export names in Main as "verbose__exists_", "Darwin__exists_", etc.?
This should work! Or it would in Cleet: Symbol.==(Symbol): Bool, Symbol.==(String): Bool, String.==(Symbol): Bool. In the new lang, do it all in string:
Not good enough! If != is not also fast, there's no point in Symbols. How about this: "Symbol ==:" is fast; it won't match a String. "String ==:" is slow, and will match a Symbol or a String. This means that '"foo" == string' won't work. But that's okay, it's an ugly idiom (and '"foo" == symbol' does work).
"Args list as tuple" and Python-style "binding function to object" are both things where the compiler should be able to easily determine that they're used for their common cases, and optimize that (passing args on stack (or in registers) instead of building a Tuple, regular dispatch/call instead of building a binding). The former is easiest; we *know* we built it only as a formality (view draw: string x: x y: y -> view.'draw:x:y:'(string, x, y)).
__ is space in C names, other escapes as done already. There is no conflict, think of it from the perspective of a reader: _ introduces an escape sequence. __ is space, _XX_ is one of the special characters or "-XX-" if not, any other _ is a hyphen. (___ (triple) could be used for hidden implementation stuff, but don't: all that stuff should use trailing hyphens, putting it out of the mangled namespace entirely. Also, it really can't be used, due to the possibility of adjoining escapes.)
A C function name will now be a mangling of its fully-specified name: "Standard Int +:" -> "Standard__Int___pl__co_".
Oh, but escapes make trailing underscores possible in mangled names... Can we get rid of them?: "Standard__Int___pl_co". Make escapes uppercase to minimize conflicts with hyphen-alpha-alpha-hyphen? "Standard__print_CO", "Standard__Int__PL_CO".
Inhaling a package/class: read "main" first; include the comments and blank lines there. Members from the directory are then added at the end. Exhale so classes/packages are always both in "main" (just the empty declaration) and the classes/packages in files/directories. (Assuming there's an occasion to exhale.)
The occasion to exhale comes in a dynamic system.
Accept "sources" as a dir for Main. [Done.]
"iterator" is a function on Object, which responds with a SingleObjectIterator. So a function can iterate over all the elements of "arg", even if only a single object (not a collection) was passed as "arg". Useful for functions operating on files, among other things.
Words are primary; groups of characters surrounded by spaces. Try completely moving away from C-style lexing. The lexer feeds out a stream of words, some of which have special meaning. Binops, for instance.
However, commas and parentheses will still be separated from any adjoining words. '<' and '>' too? (Not periods, though; they're part of a word.)
Some types: 'name', '+', '-', '*', '/', '%', '<=', '=', '+=', 'selector', 'string', 'symbol', 'integer', 'float', 'file-path', 'regex'.
'file-path' examples: foo/bar, foo.c, /dev/foo. But is "/" a 'file-path' or a binop? It's a binop; anything looking for a 'file-path' already needs to accept 'name' as well, so it's no extra difficulty to accept '/'.
Is it called 'name' or 'word'?
The first arg is positional; its name is "arg". How does this fit in with command-line use? A typical command would have "arg" be a file, or a list of files. "--some-option=foo" becomes "some-option: foo". "--some-option" becomes "some-option: yes". ("yes" is a synonym for "true" in Standard.)
Examples: "ls: docs". "gcc: file.c o: file.o no-frame-pointer: true".
Well, okay, this is not quite Smalltalk style, since all args except the first have names shared by the caller and callee.
Can control structures be done this way? Combine with Ruby-style trailing block argument named "block".
Here's our first difficulty: multi-branch control structures. Maybe if the "else:" function can get itself attached to the preceding "if:" function somehow...
Also note that they must actually be macros. This is especially clear for the "while" statement:
Function names: "if:block:", "else:", "while:block:", "try:", "catch:block:", "continue", "throw:".
Let's look at returning a primitive type from a function call. Such a function can generate two functions: "foo" and "foo -> Standard Int". A caller that expects the Int result will call the latter; the former is a wrapper that boxes the object (the compile-time VlangeFunction system thing can take care of this).
Arguments can be handled similarly: "foo value" and "foo value[Int]"
How much type inference can we get if only function results are typed? Hey, then we could return "[]" to use in indexing.
Vital insight or blindingly obvious?: It is not the object but the *class* which runs operations on an object. The object is under the control of the class, which can have it change its representation in memory if it wishes.
All field references on an object from *outside* the object's class must go through a function call at runtime.
Bytecode/runtime recompilation on certain class changes. No, on function changes (maximum granularity). Arbitrary changes; unit of compilation granularity is the function. (Sometimes a class will recompile *all* of its functions. Perhaps sometimes a change in the *callers* can trigger a function to recompile. (What for?) Or a change in a called function can trigger recompilation of all callers.)
We want the bytecode to be expressible as objects, in the spirit of LISPish code manipulation. Is it a full-on compact representation of the source, including comments? Or a low-level "expressions and control structures only" representation? (Are they really that far apart? Can comments and blank lines be treated as *annotations* on the source?)
"for" loops are an important target for optimization. Ideally, we want the iterator object completely optimized out in some cases.
Functions are defined with argument names, but can be called without argument names. In that case, the caller passes arguments named "arg-1", "arg-2", etc. (one-based counting). The called method knows each argument by *both* names (declared and positional).
In the compiler, this means that the same function/method has two names in the dictionary (of its enclosing Context). In the runtime, ditto for any reflection data.
How to reconcile Vlange's "func arg-1 arg-2" naming (compatible with Python and positional calling) with Smalltalk's "func:with-arg:" style? Again, by having two names. (Does this work for both *declaration* styles? I guess not, but at least we can have a unified language (maybe) in which a function must be called using the same calling convention in which it was declared.) Properties of canonical function names: They encode the number of arguments. They encode the calling style.
Consider:
This is really parsed as:
When you probably meant:
Available at both the namespace and method levels.
Start with allowing a name, which is tested for presence in the global namespace (or maybe any namespace in the current context). Possibly then allow context lookups: "Curses supports-line-drawing". The final step would be to have full-on expressions, but of course only compile-time constant expressions, and allowing absence of the given name, which is treated as a false value.
The parser can completely skip a block just by counting indents and unindents.
Want a table-driven approach for the first implementation. Because class hierarchies tend not to be too deep, it should be fine to list every class that implements a function in the list. The exception is functions on Object. In EE3, there are 300 classes! Cleet has 12 functions on Object, so that's burning 14K; also, dispatch becomes slow on functions defined in Object (including ==).
The solution is to have two dispatch functions. One throws a "function not defined" exception if it doesn't find it in the list, the other calls the version on Object.
[2004.12.11]
Hey, it can all be done with *one* dispatch, since that last null entry indicates easily whether it's on Object or not. Also, we want to send message-not-understood instead of throwing an exception.
(I actually worked out some or all of this in a dream! In the dream, it was already mid-2005.)
Hmm, this is sorta edging toward having a context bound up with a function, where that context could be an object or lambda-locals.
How would lambdas work, especially when not the last argument in a function call? Use {}?
Syntactically, it'll look like everything between the braces is still on the same line.
- As a Command-Line Language
I'm surprised I wasn't thinking of this before, as it's actually quite well-suited for it. But then it becomes important that arguments can be given in any order, and probably that there be optional arguments. The verbosity can be mitigated by a predictive command line.
Can a Woosh-ish directories-as-objects system be the way to bootstrap the whole language?
Filesystem access falls out pretty easily, perhaps with a unary '/' operator to access the root:
Can't quite use programs as commands directly, but a simple spec should suffice to allow it.
That would create the following functions:
Hmm, gets combinatorial fast. Really need optional args...
- Wrapping/continuations
Use "--" to indicate continuations?
- Type Declarations
Considering {} for type declarations rt []:
- Characters
Avoid Cleet's difficulties with characters by making them a separate type (Char, not Int). Comparisons, etc., from Strings/Symbols are easy, but how to do assignment?
- Call Syntax
Instead of the above try this:
So all arguments have names.
Selector names of the above:
Maybe it's fine not to sort argument names. So foo:x:y: is distinct from foo:y:x:.
Should we call nullary functions with a trailing colon too, when we need to indicate their functionness? Eg. print:.
The conceptual clarity of this approach is counterbalanced by its verbosity.
- Object Creation
- Dispatch
Could work in reverse: selector is the main thing, then check the reciever type to choose which method. For monomorphic functions, receiver type can be verified or not.
- Tuples
Since {} are not used for blocks, we can use them for tuple-building. Roughly:
Or maybe, like Python, have a "," operator that makes tuples. It'd be lower precedence than keyword calls.
We'd like these to be a single block of memory:
- Comments
Just use "#"? We want to accept it anyway, and it's easier on smart editors if they only have to look for one character to start a comment.
- Wrapping
How to wrap function calls? We want to do this:
But how do we deal with expressions in "if" and "while" statements?
I suppose with enough lookahead it would work: name eol block-start selector etc.
- Declarations
Now we don't have distinguish 'foo' from 'foo:'. Does this give us the ability to go back to colons for type declarations? Function arg declarations without colons have no type.
So a selector starting a line is a declaration. But that conflicts with function call wrapping in "if"/"while" statements.
- Syntax
- Containment
Classes can contain fields, functions, class fields, class functions, classes, superclass declarations, and maybe even packages. A class can "use" a package.
Packages can contain classes, fields, functions, and packages. Fields and functions act like class fields and class functions. A package can "use" another package.
The global level is just the package Main. Actually, no, other packages should see globals but not Main, and Standard should not be known as Main Standard. But globals will probably be implemented as a package; it contains all the same kinds of things. "nil", "true", "false", and "globals" can be defined in the globals.
So there are really functions and instance functions (same with fields). In a package, "fn" defines a function. In a class, "fn" defines an instance function, and "class-fn" defines a function.
- Return Values
*All* functions return something. If a return type is not declared, Object is assumed. All functions implicitly end in "return nil" (should compile down to a single instruction that clears a register).
- Primitives
- Use indentation like Python.
Honor the line! C/Pascal/Algol pretend the line doesn't exist (probably in reaction against Fortran's assumption that the line was on a punch card). We'll need to allow wrapping (can't quite rely on the editors yet); try to indicate that syntactically (including the use of indentation).
- Function call syntax:
Try to unify C/Fortran/math-style function calls with Smalltalk-style (every argument has a keyword). Every argument has a keyword. But lists of unnamed arguments can also be passed. There is no default assignment of positional arguments, except when the function takes only one argument. Otherwise, names must be given.
draw-text: "Hello." x: 23 y: 27.
matrix[x, y, z] --same as-- matrix at x: x y: y z: z
No, see, that indicates we do want default positional arguments. Or do we?
matrix[x, y, z] --same as-- matrix at: (x, y, z)
I'm leaning toward some hybrid where the first argument has a name that the caller doesn't specify, but all the other arguments are given by name. Here's some tests, with and without type specifiers.
def draw-text: text x y
def draw-text: text: String x: Int y: Int
def draw-text: text: String, x: Int, y: Int
def draw-text: text, x, y
def draw-text: text [String], x [Int], y [Int]
In the declaration, if the arguments are typed, commas must be used. (Actually, the compiler can easily accept not having commas now that we're using [] for typing, but probably ought to complain.)
Name them like Smalltalk -- draw-text:x:y: -- but order of arguments is not meaningful.
I guess compilers will tend to sort the argument names to canonicalize the argument. To be nice about presenting them to humans after compilation, global knowledge of the different declared orders can be used. If only one such order is declared, show that. If more than one, choose one of the declared orders rt showing the canonical order. Actually, the name of the first argument can even be presented this way.
Writing a "()" operator function:
More declarations:
- Namespaces
Get serious about namespaces. Not merely in the Cleet sense (of packages/modules), but with the idea that any given section of code exists in a namespace in which names can be looked up. Hmm, perhaps even some degree of runtime lookup would be useful, even if it means that globals must (usually) use a runtime lookup.
Can we fix Visitor this way somehow? Like, there's sort of a dual "this", and a name
Single names might be function calls, and get involved in setters too[1]. Their hierarchy is like:
[1] No they don't. It's a function call:
Is that right? Does setting fall out nicely? (The two functions are "foo" and "foo:".)
So there could almost be local functions (and why not) since variables -- now *all* variables -- are just special cases of function calls.
By the way, we're not too worried about a proliferation of names. So for example, the first argument to a function could be called "argument" as well as its given name.
Another stab at the hierarchy. These are probably installed on the fly as the compiler compiles:
Package refs:
- Optimization
Taking a different tack towards Cleet's goal of C-ish speed with Smalltalkish dynamicism. For now, still start with a compiler that reads the entire program and generates C-ABI code. It will report where it cannot use optimizations. For function-call optimization, also make use of global knowledge. Eg. if there's only one function named "eat-my-shorts" and someone calls "eat-my-shorts", anyone who calls "eat-my-shorts" must be calling that type.
- Type-Optional
An object's type is not required (Object is implied), but can be given. There will be different levels of reporting: no type check, type check for optimization... ...lost...
Anyway, if a name's type is given, how hard do we try to ensure that the object bound to that name is that type?
I think we have to have a certain amount of leeway, especially for iteration. For instance:
It's unlikely we'll want anything like Cleet's Qualified Types, so we'll have to just trust that the object we get from "views iterator current" is really a View. In other words, a type declaration is a guarantee *by the programmer* of an object's type.
The Pascal-style syntax is becoming unwieldy. Instead, use an attribute-style syntax:
- Setters
Just working this out:
Hmm, this is not getting us general setters on any conceivable function.
For now, I don't think we care. It doesn't seem to be a truly useful idiom, as long as nullary and operator functions are covered. Wait: unary too:
- Command Line
Prompt is a "----" above. Maybe a "> " on the command line, with " " as the prompt for continuation lines (if any) so that they line up, or maybe use auto-indentation of some kind, depending on how sophisticated the command line is. Anyway, this divides the screen up into command/response sections.
- Some syntax
Or alternately:
- Expression wrapping
"rest-of-block" will get the remainder of the current block within the wrapped expression. It won't go out of the expression-statement.
- To Do
How can class fields/functions be unified with package fields/functions and global fields/functions?
Declaration wrapping.
Object creation. Point new: 10 y: 20. Or Point new x: 10 y: 20. Point init. But the latter's not so good when only one argument is used, and it's the same as a field:
The first one could be syntactic sugar for: