Computational linguistics courses. Where in Russia do they teach computational linguistics? About the course of lectures "Computational Linguistics"

16.11.2019

Philology high school economy starts a new Master Program dedicated to computational linguistics: applicants with humanitarian and mathematical basic education and all those who are interested in solving problems in one of the most promising branches of science. Its leader, Anastasia Bonch-Osmolovskaya, told Theory and Practice what computational linguistics is, why robots will not replace humans, and what they will teach in HSE master's degree in Computational Linguistics.

This program is almost the only one of its kind in Russia. Where did you study yourself?

I studied at Moscow State University at the Department of Theoretical and Applied Linguistics of the Faculty of Philology. I did not get there right away, at first I entered the Russian department, but then I became seriously interested in linguistics, and I was attracted by the atmosphere that remains at the department to this day. The most important thing there is good contact between teachers and students and their mutual interest.

When I had children and had to earn a living, I went into the field of commercial linguistics. In 2005, it was not very clear what this area of activity as such was. I worked in different linguistic companies: I started with a small company at the Public.ru website - this is such a media library, where I began to work with linguistic technologies. Then I worked for a year at Rosnanotech, where I had an idea to make an analytical portal so that the data on it would be automatically structured. Then I headed the linguistic department at the Avicomp company - this is already a serious production in the field of computational linguistics and semantic technologies. At the same time, I taught a course in computational linguistics at Moscow State University and tried to make it more modern.

Two resources for a linguist: - a site created by linguists for scientific and applied research related to the Russian language. This is a model of the Russian language, presented with the help of a huge array of texts from different genres and periods. Texts are provided with linguistic markup, which can be used to obtain information about the frequency of certain linguistic phenomena. Wordnet - a huge lexical database in English, the main idea of Wordnet is to connect not words, but their meanings into one big network. Wordnet can be downloaded and used for your own projects.

What does computational linguistics do?

This is the most interdisciplinary field. The most important thing here is to understand what is happening in the electronic world and who will help you do specific things.

We are surrounded by a very large amount of digital information, there are many business projects whose success depends on the processing of information, these projects can relate to marketing, politics, economics, and anything. And it is very important to be able to handle this information effectively - the main thing is not only the speed of information processing, but also the ease with which you can, after filtering out the noise, get the data that you need and create a whole picture from them.

Previously, some global ideas were associated with computational linguistics, for example: people thought that machine translation would replace human translation, robots would work instead of people. But now it seems like a utopia, and machine translation is used in search engines for quick searches in an unfamiliar language. That is, now linguistics rarely deals with abstract tasks - mostly with some small things that can be inserted into a large product and make money on it.

One of the big tasks of modern linguistics is the semantic web, when the search is carried out not just by the coincidence of words, but by meaning, and all sites are somehow marked by semantics. This can be useful, for example, for police or medical reports that are written every day. The analysis of internal connections gives a lot of necessary information, and it is incredibly long to read and calculate it manually.

In a nutshell, we have a thousand texts, we need to sort them into piles, present each text as a structure and get a table that we can already work with. This is called unstructured information processing. On the other hand, computational linguistics deals, for example, with the creation of artificial texts. There is a company that came up with a mechanism for generating texts on topics that are boring for a person to write on: changes in property prices, weather forecasts, football match reports. It is much more expensive to order these texts for a person, moreover, computer texts on such topics are written in a coherent human language.

Yandex is actively engaged in developments in the field of unstructured information search in Russia, Kaspersky Lab hires research groups that study machine learning. Is someone in the market trying to come up with something new in the field of computational linguistics?

**Books on Computational Linguistics:**

Daniel Jurafsky, Speech and Language Processing

Christopher Manning, Prabhakar Raghavan, Heinrich Schütze, Introduction to Information Retrieval

Jacob Testelec, "Introduction to General Syntax"

Most linguistic developments are the property of large companies, almost nothing can be found in the public domain. This hinders the development of the industry, we do not have a free linguistic market, boxed solutions.

Moreover, there is a lack of complete information resources. There is such a project as the National Corpus of the Russian Language. This is one of the best national corpuses in the world, which is rapidly developing and opens up incredible opportunities for scientific and applied research. The difference is about the same as in biology - before and after DNA research.

But many resources do not exist in Russian. So, there is no analogue to such a wonderful English-language resource as Framenet - this is such a conceptual network, where all possible connections of a particular word with other words are formally presented. For example, there is the word "fly" - who can fly, where, with what pretext this word is used, what words it is combined with, and so on. This resource helps connect the language with real life, that is, to trace how a particular word behaves at the level of morphology and syntax. It is very useful.

Avicomp is currently developing a plug-in to search for related articles. That is, if you are interested in some article, you can quickly see the history of the plot: when the topic arose, what was written, and when was the peak of interest in this problem. For example, using this plugin, it will be possible, starting from an article on events in Syria, to very quickly see how last year events unfolded there.

How will the learning process in the master's program be structured?

Education at HSE is organized into separate modules, just like in Western universities. Students will be divided into small teams, mini-startups - that is, at the end we should get several finished projects. We want to get real products, which we will then open to people and leave in the public domain.

In addition to direct supervisors of students' projects, we want to find curators from among their potential employers - from the same Yandex, for example, who will also play this game and give students some advice.

I hope that people from various fields will come to the magistracy: programmers, linguists, sociologists, marketers. We will have several adaptation courses in linguistics, mathematics and programming. Then we will have two serious courses in linguistics, and they will be connected with the most current linguistic theories, we want our graduates to be able to read and understand modern linguistic articles. It's the same with mathematics. We will have a course called "Mathematical Foundations of Computational Linguistics", which will present those sections of mathematics on which modern computational linguistics is based.

In order to enroll in a master's program, you need to pass entrance examination in language and pass a portfolio competition.

In addition to the main courses, there will be a line of elective subjects. We have planned several cycles - two of them are focused on a deeper study of individual topics, which include, for example, machine translation and corpus linguistics, and, on the contrary, one is related to related areas: such as , social networks, machine learning or Digital Humanities - a course that we hope will be delivered in English.

The cultural and educational center "Arkhe" invites you to the course of lectures by Alexander Chedovich Pipersky "Computational Linguistics".

Topic of the first lecture: "The main tasks of computational linguistics and approaches to their solution".

Machine translation, spell checking, text classification, speech recognition and much more: all these are tasks of computational linguistics. You can solve them in different ways: either by trying to imitate how a person works with a language, or by hoping that everything can be handled with big data. But natural language is not easy to process automatically, and there are many difficulties along the way. The problems include homonymy (when the same word calls different things), synonymy (when, on the contrary, the same thing is called different words) and other properties of human languages that we do not even pay attention to in ordinary life.

About the lecturer:
, Candidate of Sciences in Philology, Associate Professor at the Institute of Linguistics of the Russian State Humanitarian University, Research Fellow at the School of Philology at the National Research University Higher School of Economics, author of the book Designing Languages (Alpina Non-Fiction, 2017).

About the course of lectures "Computer Linguistics":

Computational linguistics is one of the most dynamically developing areas at the intersection of theory and practice. We come across the achievements of computational linguistics every day: machine translation, Internet search, voice assistants, and much more. Behind each such product there is a serious work of linguists and programmers. During the course, we will talk about the history of computational linguistics and its most popular methods, as well as see how they allow us to solve important problems. practical tasks: for example, check spelling or classify news by topic.

« The opening of the department at MIPT allows us not only to help „ their“ students.

Our goal is to make the best teaching of Computer Science in Russia at FIVT”
Svetlana Luzgina, corporate communications service.

Head of Department: Vladimir Pavlovich Selegey, director of linguistic research at ABBYY

The Department of Computational Linguistics of the FIVT was founded in 2011 by the Russian company ABBYY, one of the leading developers software in the field of artificial intelligence, in particular, document recognition and processing natural language. The department trains specialists who are able to work effectively in the development of innovative language computer technology in particular, ABBYY Compreno technologies for syntactic and semantic text analysis.

In the last decade, computational linguistics has been actively developing all over the world. This is due to the growing influence of the Internet and the advent of a large number new technical devices with natural language interfaces. Technologies such as multilingual information retrieval, machine translation, knowledge extraction, speech recognition, etc. are developing especially rapidly. In Russia, computational linguistics has so far received insufficient attention in the education system. Because of this, in the world scientific research in computational linguistics, the Russian language is underrepresented.

The Computer Linguistics specialization at MIPT is based on deep technical education, which is given by Phystech. Classes at the base department are held at the ABBYY office, where company employees teach courses on automatic language processing, general and computer lexicography, corpus linguistics, as well as the integral disciplines of Computer Science in the field of software development.

One of the tasks of the department is the active involvement of students in scientific life. It is important not only to be aware of the current global "trends" in computational linguistics, but also to be part of the global process. The students of the department accept Active participation in the development of ABBYY Compreno technology and a joint research project with the Russian State Humanitarian University to create the General Internet Corpus of the Russian Language (GIKRYA) based on the resources of the Russian-language Internet.

Admission to the department is made according to the results of the competition for both the undergraduate and the first year of the master's program. Bachelors of all faculties of the Moscow Institute of Physics and Technology, as well as other higher educational institutions. Enrollment is based on the results of solving logical and algorithmic problems and an interview with the leadership of the department.

If you want to interview for the department or ask a question, write to [email protected]. See you at ABBYY!

Computational linguistics is one of the most dynamically developing areas at the intersection of theory and practice. We come across the achievements of computational linguistics every day: machine translation, Internet search, voice assistants, and much more. Behind each such product there is a serious work of linguists and programmers. During the course, we will talk about the history of computational linguistics and its most popular methods, as well as see how they can solve important practical problems, such as checking spelling or classifying news by topic.

Sample course plan:

1. Introduction: main tasks of computational linguistics and approaches to their solution.
2. Processing strings with regular expressions.
3. Linguistic corpora and their quantitative analysis.
4. Morphological and syntactic markup. language models.
5. Spell checking.
6. Machine translation.
7. Classification of texts.
8. Natural language generation. General principles evaluation of the quality of work of computer-linguistic applications.

Lecturer: Pipersky Alexander Chedovich, Candidate of Sciences in Philology, Associate Professor at the Institute of Linguistics of the Russian State Humanitarian University, Research Fellow at the School of Philology, National Research University Higher School of Economics.

The cost of one lecture is 500 rubles. Discounts: students (50%), schoolchildren (70%)
Subscription for the course (8 lectures) - 3500 rubles [students - 2000 rubles; schoolchildren - 1200r].

At the end of the course, a certificate is issued (only for students attending subscription classes).

And we will also organize a live broadcast of the lecture:
The cost of broadcasting one lecture -
200 rubles.
You can pay via TimePad:

You can ask questions and pay for the broadcast in another way by writing to the mail: [email protected]

Lectures will be held at the following address: Moscow, metro station Sportivnaya, Malaya Pirogovskaya st., 29/7 (IFTIS MPGU)

You can get more detailed information on the course by phone: 8-495-088-92-81
You can register for the course by mail.