Have you ever felt embarrassed while reading code written in your own language?

Say you are a native Portuguese speaker, and that one day you start working with a programming language that is implemented in Portuguese. Not only the names of classes and objects - assuming the language is object-oriented - are written in Portuguese, but the keywords of the language as well, including the libraries you need to use. Now let’s assume that you’ve been working in an English speaking environment - perhaps you’ve been contracting for American firms for a while - but you suddenly decide to work for a local company in Brazil, where you live. Suddenly, you have in front of you a codebase that is mostly written in Portuguese instead of English. As a native speaker, the words and their meaning are crystal clear to you - even if the specific syntax of the programming language might not be just yet.

After working on that codebase for a few days, however, you start to feel a bit uneasy. There is something strange and eerie in that programming language, something that makes you cringe. Compared to a programming language written in English, this one looks like a toy language. It doesn’t look like a serious language. You are embarrassed by it.

How can reading source code affect a person’s feelings in such a way?

What would it take for something like that to happened?

I have asked myself those questions many times during the past few months. They are the sort of questions that have started to emerge out of a series of interviews that I’m doing with ruby developers for my current research on the history, language, and values of ruby and its community. In one of these interviews, a ruby dev who works as a QA engineer for a London start-up told me something pretty similar to the story I just described.

“I have my own experience because I speak Russian. And I know that we have at least one popular programming language in Russian which is called 1C and it’s terrible, and people keep laughing at it”.

The language in question is 1C Enterprise script, a major entreprise language in Russia.

“I don’t know how that feels for native English guys. I can only imagine. But in Russia, when you code in your natural language, you just start laughing”.

He told me keywords - such as for loops - were especially funny, because they don’t look serious enough. And he added that developers sometimes have a hard time programming in that language.

“They cannot read it because their minds start blowing up because they read it as a natural language. But it’s not natural. It’s technical. So what the fuck?”

Well what the fuck indeed. To him, these feelings had to do with the way in which programming is taught in schools. In his school, children learned basic programming skills. They learned basic commands to instruct the computer and were told to add comments to explain what the commands do. And the comments were all written in either Russian or Ukranian. In this setting, he says, it’s only natural to use your own language to think about programming. But once you are out of school, once you’ve started to work in a professional manner, you don’t want to return to this stage in your life. That’s the main reason that reading 1C code makes him feel “kind of funny”, because it makes him feel like a little boy, a little boy still in school.

"Если ЭтаФорма.ТекущийЭлемент = ПолеВводаТекущийИндекс Возврат;"

Isn’t it amazing that reading source code can transport you back to your childhood? The reason why programming languages can have such power, I think, is because American English has come to dominate the world of computing to such an extent that it has become unthinkable to conceive of programming languages which are not based in English. Not just impractical, but unthinkable. I have only just started to look at this, but it seems to me that the reason why English has come to dominate the vast field of computing might be that a particular way of thinking about languages has found its way into computers via work that was done on compilers during the 1950s.

Think of a compiler as translator which transforms code written in one language into a different one. More commonly, though, it translates source code into binary. A compiler has two main parts - the backend and the frontend. The frontend is responsible for checking for syntax errors, and then, if no errors are found, going through a series of phases - lexical analysis, syntax analysis, semantic analysis - before generating an “intermediate representation”. This representation is the result of parsing the initial source code into an intermediate language - frequently C -, which can then be sent to the backend of the compiler where it is optimized and translated into assembly and binary. Here is where it gets very interesting, because all of these analysis of source code - the lexical and grammatical analysis included - happen within a part of the compiler called a parser. And a parser is the part of a compiler that transforms the original source code into a data structure - such as a syntax tree, for example.

The logic of parsers was established in the 1960s. It was based on theories about natural languages that were developed by Noam Chomsky in the mid-1950s. One famous parser from that time was called the LR (left to right) parser, in which the scanner of the parser runs from left to right. Donald Knuth created that parser, and he used the word “language” in that context to describe “a set of character strings which has been variously called a context free language”, “a Chomsky type 2 (or type 4) language”, or “a push-down automaton language”. And on the reason for choosing these “context-free” languages as the linguistic model to define the logic of the LR parser Knuth says:

“such languages have aroused interest because they serve as approximate models for natural languages and computer programming languages”.

Here, then, the internal logic that allows a computer to read and analyse a chunk of source code is based on a formal idea - a model, a theory - of the similarity between natural languages and computer languages. Note how Knuth says that he modelled his parser on the idea of context-free languages. This all comes from Chomsky, who tried to remove any reference to the context of speaking when describing languages. To Chomsky, language is about syntax. The way people speak, when they speak it, and in which context languages are spoken doesn’t matter to Chomsky at all. And Knuth took this idea of “context-free” languages and used it to create the grammatical logic that underpins his LR parser. The problem is that natural languages in real life are not context-free. Whenever we speak, there is always a context that allows you to speak. If you strip a sentence of its context, if you abstract its structure, you end up with a model of a language. And models are great, there is nothing wrong with them, but they are not the real deal - in a sense. For a language to feel alive, to feel real, to feel, well, serious, it can’t be just a model stripped off its context.

Is that why Oleks felt that code in his own language was not serious enough? Perhaps that code didn’t feel real, perhaps it felt like a model. It lacked context. Did removing the context made 1C a bit of an embarrassment? That sounds like a good hypothesis to me, especially if we think about the dominance of English. But if that’s true, and given that most programming languages are based on the English language, how do native English speaking programmers feel when they code? Does their language feel out-of-context as well? Do they potentially feel embarassed by it?

I asked Oleks about that.

“I have no idea, but I guess they don’t feel any problem, any inconsistency in this because it’s natural for them. If they want to be a programmer, they start from this point. They start to separate the technical English from the natural English from the beginning and they do not see it is as joke. So it’s natural for them. I guess they would be really surprised. I mean, if they would have this kind of feeling like we have that we are not writing in our native language. […] I mean, I have that separation. So when they have a code they wrote in Ruby it’s not the same to which I use when explaining something [in my own language][…] So I don’t know how it’s happening with these English guys. So no idea.”

Oleks says he has two separate parts of his brain: one for technical stuff, which runs English (MacOSx version); and another for the spoken word, running (Free BSD) Russian. If these two get mixed up, something a bit funny happens. Has anything like that ever happened to you?



:: post written by Gui Heurich - @anthrolanguage