Jerry's blog: World 2.0 - Dr. Dario de Judicibus

http://lindipendente.splinder.com/post/15354690/World+2.0

World 2.0

Preface

This preface is not a gratuitous introduction to my article, or just a way to justify any possible consequence of my limited knowledge of English language, but a fundamental component of this text because it highlights one of the most critical problems of the web today: the English language as a must to be effectively part of the new Web 2.0 Platform.

English is not my first language, and even if I am an Italian writer, I am not so fluent in English as I am in my own language. So, why did I write this article in English? Because today English is a sort of lingua franca of the web. Most of people who use Internet are able to read English, even if it is not their own first idiom. If you write an article in English, a lot of people will be able to read it and, if the article is a good one, someone may decide to translate it to other languages too. But if you write it in another language, especially a language which is not well-known in the world as Italian, you have little chances it will be translated to English even if is an excellent text. Another advantage is that if you refer to other popular articles — a very common practice in the web — which usually are in English language too, you will be sure that the citations will be exact and not misleading, whereas referencing translations can always introduce some misunderstandings.

On the other hand, writing in a language different from yours has serious disadvantages. First of all you cannot really compete with native writers both in terms of content and style. Your writing style will be poorer by definition, because you are not so confident with that language as you are with yours. Your text could be less readable or even boring to the native reader, as well as misunderstandings will be more probable. Languages are not different just because of words and syntax, but the communication style itself differs from language to language. A plain translation, even if perfect from a syntactical point of view, can prevent the reader to end the article, even if the subject is interesting. From a content point of view, your text could be too simple, childish, because of your limited knowledge of the other language's vocabulary. In any language there are a lot of words which have mostly the same meaning — we often refer to them as synonyms — but that are not perfectly interchangeable. Using a term rather than another gives to readers a different taste, supports the communications, enriches your message.

Last but not least, if you want to write an article in another language, you must think in that language, and since that is not your own, it will be more difficult for you to write the article, more tiring, biasing your ability to reach the excellence, the final goal of any good writer. Every good article, in fact, whatever is the subject, is a work of art, and every good writer is not glad to publish something that he or she is not proud of.

At last I decided that advantages were higher than drawbacks, so I wrote the article directly in English, asking an American friend of mine to review it and correct at least the most evident mistakes. What is still wrong in, it is my fault, of course. By the way, I wrote all my blog posts in Italian, up to know. This is the first one in English. In future I will probably write other English articles whenever I judge that the content is worth a worldwide visibility.

What is Web 2.0

If you wish to know what is Web 2.0 you probably may want to read the article of Tim O'Reilly. It was translated in various languages, including Italian. The O'Reilly article gives a very good overview of the major elements which characterize a Web 2.0 site. It does not really provides you with a definition, anyway, but with a list of principles to consider when evaluating if a site can be considered Web 2.0 or not. In fact, according to Tim and other web analysts, a good Web 2.0 site should:

provide services, not just a packaged software, and ensure cost-effective scalability,
be based on unique and hard-to-recreate data sources that get richer as more people use them,
trust users as co-developers,
harness collective intelligence,
leverage the long tail through customer self-service,
be potentially deployable on any device,
provide users with lightweight user interfaces, development models, and business models.

This is a good practical approach, often used in Physics and other sciences, and it is called an operative definition, that is, a definition of a concept which explains how it may be observed rather than what it is. Colloquially, an operative definition tells you «how to know it when you see it».

However the web is not something ruled by natural laws, but a continuously evolving environment. The operative definition of a physical event may last for years, it is a long term definition, and changes only when new tools are developed to measure new parameters which totally or in part substitute for the old ones. But what we call Web 2.0 is evolving so fast that in theory we should speak of Web 2.0.1, 2.0.2, 2.0.3, and so forth. That is, an operative definition of Web 2.0 risks to be continuously updated, constraining its intrinsic value. This is in contradiction with the principle reported by Tim, that to Web 2.0 does not apply anymore the traditional Software Life Cycle, but what we could call a Continuous “Beta” Development. So we need a more general definition which highlights the distinctive factors which characterize Web 2.0.

In addition, Tim's article is very focused on specific sites and companies, but Web 2.0 is not only based more and more on mashup, but most sites which aggregates data from other sources are themselves becoming new sources of data because they build value as a side-effect of ordinary use of their applications. So Web 2.0 is getting more and more a wide platform where prosumers are the new actors of the web, and companies as Google or eBay are drivers and facilitators. The user is no more external to the system, but integral part of it. This is not a totally new concept, but it is well known by ICT specialists as a characteristic of Knowledge Management Systems.

So, how we should build a new definition of Web 2.0 which is not limited to how web is evolving today but which establishes long-term principles for a continuously changing environment? In my personal opinion it should state:

what we are speaking of, that is, what is the object called «Web 2.0»,
what is it for, that is, the purpose or the reasons for its existence,
how it works, and which architectural design it is based on.

So I developed several definitions in the last few months, but none was satisfying until I developed this one. Of course, I do not pretend it is the definitive definition of Web 2.0, but only my personal two-cents contribution to understand what I consider not just a technological evolution, but a global social event which is going to affect hundreds of millions of people in the world.

Web 2.0 is a knowledge-oriented environment
where human interactions generate contents that are
published, managed and used
through network applications
in a service-oriented architecture.

Let us go into this statement thoroughly. First of all, what is Web 2.0: a knowledge-oriented environment. Not a site, not a server or a bunch of servers, not a single community or a team. It is an environment. An environment is more than just a platform. A platform is the foundation of an environment, but an environment is an autopoietic ecosystem which involves many different actors in various levels which interact each other.

But any ecosystem is based on rules. For example, natural ecosystems are based on the principles of surviving and of natural selection. The sustaining principle which is at the base of Web 2.0 is knowledge. Knowledge is more than just information, as well as information is more than data. There are several definitions of knowledge. My favorite one, is the following:

Knowledge is the correlation of data and pieces of information
with personal or group experiences and lessons learned,
which creates a new partial awareness.

In practice, there is no knowledge unless there is somebody who can use it. In contrast, information is just an ensemble of data associated to a specific context, which exists independently from somebody who may use it. This definition has an important implication: it is not possible to store knowledge as we do with information. Often you may have heard of knowledge bases as databases where knowledge is stored. In my personal opinion, knowledge bases do not contain real knowledge, but pieces of pre-digested knowledge which becomes real knowledge only when they come into contact with a human being, that is, an intelligence. In fact, whatever you read, may or not become knowledge depending on who reads it. The same pieces of information given to several different people may take those people to different or no conclusions depending on their skills and experience.

Several years ago I developed a definition for intelligence, an operative definition, by the way, which I consider particularly useful when we have to understand how people create, acquire, and uses knowledge.

Intelligence is the ability to perform correlations
among various pieces of information and experiences.

In practise, given an ensemble of pieces of information and experiences, the faster you are able to correlate those pieces among them, and the wider is the resulting network you are able to manage, the more intelligent you are. I like this definition because it is not strictly related to logic and to the rational side of our brain. I did not specify that correlations have to be logical links. Any kind of correlation applies. So, it can refer to the way an artist create a painting, or a composer creates a melody. We can apply this definition to artistic intelligence as well to scientific one, to the right side of brain as well to the left one.

You can easily see a link now between my definition of intelligence and the previous definition of knowledge I mentioned. To efficiently and effectively create, acquire, manage, and use knowledge, you must be really clever. It seems a trivial statement, but it is simply a direct consequence of both definitions, which are not trivial at all.

And this takes now to the second line of my Web 2.0 definition: the Web 2.0 environment is where human interactions generate content that can become knowledge when put in contact with people. It is not simply a transfer of knowledge. The resulting knowledge it is not necessarily the union of the know-how’s which generated that content, but since it depends on the peculiar characteristics of the recipient, it may be something new. This is the strength of Web 2.0: whatever idea you put in the cauldron may be picked and generate a new idea or be changed in such a way that you hardly may recognize your original one.

Pieces of knowledge? Content nodes of a large growing network? Skill? Experience? Let us go back to my definition of intelligence: the ability to create a network of correlations between nodes representing pieces of information and experiences. Can you see the parallel with Web 2.0? So the concept of collective intelligence naturally arises. The faster and wider we can create valuable links between pieces of information in the web, the higher is the collective intelligence of the network.

Now, let us consider the following definition:

Knowledge management is a discipline whose goal is to ensure that
the right information is available to the right person
just in time to make the best possible decision.

We could speak for hours about the meaning of this definition, especially as far as the term «right» is concerned, but I would like now to put your attention on the last few words: to make the best possible decision. This is why we must know: to decide. Every decision requires having enough knowledge to take it. The more the knowledge, the more reliable will be the decision. This applies to Web 2.0 too. That is why in my definition I speak of publishing, managing and using the content generated by human interactions. Wikipedia would have no meaning if nobody used it, nor would the PageRank of Google or the eBay Feedbacks be of any value if they would not be useful to decide if a site content or a seller would be reliable.

So we know now what is Web 2.0 and what is is for: a knowledge-oriented environment to share content generated by human interactions. But how does it work? Is that important for its definition? According to Tim, Web 2.0 must be hardware and software independent, so it looks like it is not. However we can describe the architectural principles of a system without linking the system to a specific implementation. Web 2.0 is based on network applications. This is a fact. Even Web 1.0 is based on network applications, but unlike Web 2.0 they are not service oriented. Tim correctly highlights the importance of services as the foundation of a Web 2.0 approach. Of course we can deploy services by using many different architectures, but in Web 2.0, the fact that services are the core of the new Internet, the choice of the standards that are used to implement such an architecture have a consequence. That is why, in my opinion, a real Web 2.0 environment must be based on a Service-Oriented Architecture.

Now the definition is complete: we said what, why, and also how.

The Language Issue

There is however still a consideration left. Let us go back to the preface of this article and to my decision to write it in English. Web 2.0 is about sharing pieces of information and knowledge, but how can one share anything if one cannot communicate? From a technical point of view it is not a problem: the web standards are the success factor of a global network made of heterogeneous hardware and software. But what about natural language? We do not only communicate through a language, but a language is the representation of a culture, a mindset, a lifestyle.

For example, there is no way, at the moment, to add most of non-English books on Shelfari, as well as many other analogous Web 2.0 applications, because most of them rely on the Amazon database. However, Amazon lists only few non-English titles, and in several countries is not available at all. There is no Amazon.it, for example, or Amazon.es. So, if a book has an ISBN beginning with 88, for example, you will not be able to add it to your shelves, not even manually.

Of course, large companies such as eBay and Google are available in many countries and languages, since it is their interest to be as global as possible. However, still in those cases, non-English readers are penalized. For example, most of Google Translate services are from and to the English language. If you wish to translate from Italian to French or from German to Spanish, you do not have a reliable service. Automatic translations are still very raw, but they are in any case a useful tool if you do not know a language at all. For example, if you cannot read Chinese at all, a bad translation will be better that no translation at all. Most of Google Translation services are quite good to understand at least the subject of a text. The translation from German to English and vice-versa is quite reliable, but if you try to translate from German to Italian through English, it is really a mess.

The only Web 2.0 service which is really global is probably Wikipedia with over seven million articles in more than 200 languages, and still growing! The various blogospheres are islands in a ocean, with really few connections to each other. There are really only a few bloggers who write in several languages and fewer who write all their articles in more than one language. But if the French, German, Spanish, Italian, and many other blogospheres have links to English articles, the vice-versa is really rare. Most English bloggers do not care to link foreign articles even if they can read them, because they assume that most of their readers cannot. Most non-English bloggers, vice versa, assume that their readers can read some English. So, there is an evident asymmetry.

So, more and more the English-based Web 2.0 is going to ignore the rest of the web. Language does not affect only blogs or books, but songs, music, lyrics, movies, and every other aspect of social life. This is not typical of the web, however. It is a fact that a significant piece of the book marketplace in non-English countries is based on the translation of English authors. Not just the best sellers, but really many authors of high and medium quality. However, it is extremely hard for a non-English author to be published in USA or UK, for example, unless he/she is really very very popular. The same for singers. How many Italian or French singers are known in USA? In most cases American people still sing French or Italian songs of fifty years ago. They do not know anything of modern singers and modern songs. But American and British singers are well known all over the world, and it is not a matter of quality. English is becoming a killer of many world languages and, as a dramatic consequence, of many world cultures. In the web, especially Web 2.0, this is simply more evident.

However the Web 2.0 could make the difference and dramatically change this trend. How?

The Global Dictionary

Automatic translation of languages is a serious issue. What one can say using a single word in a language may require a longer phrase in another one. Certain terms simply have no translation at all, since the concept they refer to is typical of a specific culture and have no counterpart in other ones. Idioms, jargons, and specialized expressions make automatic translation a mess. Word by word translation is often useless, but also more sophisticated algorithms which analyze group of words and make assumptions on the context, may fail. As I said before, just changing a single word in a phrase by using a synonym, can give to that phrase a different taste, sometimes a different meaning: serious, ironical, playful. Languages continuosly change, and people continuosly create new meanings and variants every day, just speaking or writing. In theory, each of us speaks a different language, a personal one, or gives slightly different meanings to the same term or statement.

From a practical point of view, every pair of languages requires its own dictionary and a complicated set of rules to take in consideration every possible idiomatic expression. In theory, this is necessary even for reverse translation, that is translating from Italian to Spanish, for instance, may require different translation rules and data than translating from Spanish to Italian. It is a huge effort even when languages are similar.

However there is a different approach that could take advantage of Web 2.0: the Global Dictionary.

The idea is to assess all possible concepts. A physical object like a «house» or a «box» is a concept, but also an adjective, like «to be red» or «to be big», an adverb, like «periodically» or «never», a verb, like «to sing» or «to shake». Note that the same concept can be expressed by different words or combination of words, and that the same term can be used to express different concepts either as used as a single word or in conjunction with other terms. So, a census of all possible concepts is a tremendous huge effort, but this is exactly the right work for the Web 2.0 approach.

Every time we assess a concept, we have to define it, and provide a definition in as many as languages as possible. You can think of it as the union of all monolingual vocabulary published in the world, with a difference: you do not define words, but concepts. The Global Dictionary represents a public source data that can be used by web services to translate any page from any language to any other language. In the initial phase, the translation will be probably just a little bit better than current automatic translations, with the only advantage of allowing translations between pair of languages which are not currently supported by existing services; for example, from Cherokee to Swahili. But in the long term, we could take advantage of services that allow us to write Global Dictionary enabled text, that is text where words are tagged in such a way to identify a precise concept. For example, «casa» in Italian is used for both «house» (the building), and «home» (the familiar setting), but «home» itself has various meaning in English too. Each of this meaning has to be assessed in the Global Dictionary and a Universal Identifier (UID) has to be associated to it. When I write a text by using a GD-enabled application, the software will make some assumptions about which concepts I am using according to the context tag I have provided at the beginning, and if uncertain about what to do, it will propose to the writer a list of choices from which to select the right one.

Of course it will take more time to write an article, but the advantage is that you know that it can be automatically translated in any language so that anybody in the world will be able to read it. The impact on Search Engine methods will be significant: from word-based indexes to concept-based ones, from a syntactic approach to a semantic one.

No company can afford the effort of creating such a Global Dictionary and to develop the applications to generate GD-enabled text and to translate it, but Web 2.0 can make it real. The web will change: all blogospheres will become a single big continent, no more islands; all pages will be available to everybody and we will have a single big Wikipedia to which all Wikipedians of all countries will have the possibility to contribute, dramatically increasing its value. No more different quality articles in different Wikipedia, but the best article as possible in any language of the world. A new world: World 2.0.

Jerry's blog

2009年3月28日星期六

World 2.0 - Dr. Dario de Judicibus

World 2.0

沒有留言:

張貼留言

2009年3月28日 星期六

World 2.0 - Dr. Dario de Judicibus

World 2.0

沒有留言:

張貼留言

2009年3月28日星期六