![]() |
Home | Biodata | Biography | Photo Gallery | Publications | Tributes Indus Script |
![]() |
Introduction
The inscriptions of the Indus valley civilization (2500-1500 B.C.) represent the earliest forms of writing in India and Pakistan, and even though we know something about the material culture of the Indus people we do not know what language they used for communication. Claims have been made about successful decipherment of the inscriptions even though no bilingual inscriptions have been found to verify the claims of decipherment.1 There is one school of thought which proposes that the language of the Indus people was an early form of Indo-Aryan. S.R.Rao2 has followed up this hypothesis and has produced readings of the inscriptions. There is another school of thought which treats the language as a form of ancient Dravidian, and Iravatharn Mahadevan3 who follows this hypothesis has produced a valuable concordance entitled The Indus Script
Our work carried out during the last three years does not depend upon either the hypothesis of the Indo-Aryan nature of the language or the hypothesis of the proto-Dravidian nature Our interest is in making use of the tables provided by Mahadevan in The Indus Script for drawing general conclusions about the inscriptions. We used computer methods to classify the different signs of the Indus script into different classes or clusters based on their positional characteristics.4 We also studied the frequently occurring inscriptions5 and classified them in relation to the metropolitan centres from which they were found. In this paper we look at the problem of segmentation of inscriptions on the basis of positional characteristics of the signs tabulated in Mahadevan's concordance.
We wish to take up for our study unusually long inscriptions which have a minimum length of ten signs in the Concordance. We examine whether each one of these long inscriptions forms a single text or is made up of two or three shorter texts. One may use different criteria for segmenting a given inscription but we make use of only one criterion in this paper and that criterion is based on the positional characteristics of the signs occurring in that inscription. Each sign may occur at the initial, medial or the final positions of inscriptions or may occur alone. Using the Concordance it is possible to calculate the percentages of occurrence of a sign at different positions. The corresponding proportions may be taken to be the best available estimates of the probabilities of occurrence of the different signs in the different positions. Using these probabilities it is possible to decide whether a long inscription could be segmented or not, and if so, how. The mathematical details of this optimization method are given in the appendix.
First we take up single line inscriptions and work out the two most likely segmentations. In a majority of cases the original text remains as the most likely string and the segmented version is only the second best. We also take up for study inscriptions appearing in more than one line or the same side or different sides of an object. In the Concordance the unit of textual analysis is a line of a text. Mahadevan has pointed out in his introduction to the Concordance that there is no way of knowing before hand whether different lines of an inscription have continuity of sequence or whether they should be regarded as separate texts. We first assume that the different lines form a single sequence and then test whether the combined text can be segmented using the positional probabilities of the individual signs. In many cases the combined text gets segmented back into the original lines with which we started thereby showing that each line can be treated as a separate text. In some cases, however, new segments are obtained. We also checked whether some lines can be read in the boustrophedon fashion even though the individual signs may have a normal orientation but we did not find any new convincing cases These lines which contain illegible signs, or signs with uncertain identity are not made use of in this study. Each string of signs is read from right to left as given in the Concordance.
Segmentation of Single Line Texts
In Table 1 we present seven examples in which the best reading is obtained by segmenting the strings. The first text (No. 2436) is a string of 11 signs including some signs that are repeated. The best reading is obtained by segmenting after the arrow sign which occurs in the middle The next best reading is obtained by keeping the line unsegmented as a single text. In the second text (No. 6125} the best reading is obtained by segmenting after the jar sign and the next best reading is obtained by retaining the original inscription as a single text without segmentation. The third text (No. 9011) contains 10 signs. The best reading is obtained by reading the last four signs as a separate text and the next best reading is to keep the original text without change. It Is interesting to note that there exists a short inscription (No. 3251) of three signs which is similar to the short segment we obtained from the third text (No. 9011). The short segment has four signs including the jar sign in the terminal position and the two-upraised-hands sign in the pre-terminal position. These two signs form a pair which most often occurs in the terminal position. The jar sign is also a predominantly terminal sign. The short inscription (No. 3251) differs from the short segment in not having the pre-terminal sign of two-upraised-hands. The occurrence of this short inscription (No. 3251) would further strengthen the case for segmenting the third inscription (No. 9011).
In Table 2 we present a set of single line inscriptions whose best readings leave each text unsegmented. Only as a second best reading each inscription gets segmented. We have twenty-seven inscriptions presented in Table 2. From the second inscription (No.1087) we obtain two shorter texts after segmentation. The shorter of the two new texts ends with the jar sign and contains four signs. This segment of four signs also occurs as an independent text three times and this would strengthen the case for segmentation of the second inscription (No. 1087). The eighth inscription (No. 2446) gets segmented into two and the shorter segment contains three signs. This short segment also occurs as an independent text (No. 2214)
The sign containing seven lines is often interpreted to denote the numeral 'seven'. The pair of signs which includes the seven lines is conjectured to denote the phrase "The Seven High Places" 6. The Rig Vedic phrase sapta sindhavas and Proto-Iranian hapta hindu could refer to the "Seven High Places" and these phrases could be compared with the Sumerian phrase bad imin which means "The Seven Enclosed Places". Sumerian trading documents are said to refer to the region Bad Imin and it is identified with some region away from Sumeria but possibly close to or identical with Indus cities. Scholars who are on the look out for geographical names in the Indus inscriptions look out for sign pairs with the numeral sign seven, and more than one pair has been identified to present "The Seven High Places".
Multi-Line Inscriptions.
In this section we take up for study multi-line inscriptions whose total length is ten or more signs as given in the Concordance. In Table 3, we have four inscriptions of two lines each. In each case the best reading is obtained by combining the two lines to get a single long inscription. The second best reading is also given. After the two lines are combined, the two texts get segmented and produce texts different from the original version obtained from two lines. In other words taking an inscription of two lines, we get the best reading by combining the two lines into one and the second best reading by segmenting it into two strings each of which is different from the first and the second lines of the original inscription. The third inscription is made up of two lines of identical segments and the best reading is to combine them into a single string and the second best reading is to leave them as they are. The difference in probability between the first and the second readings is very small. The inscriptions are found on two sides of an object from Harappa. It is classified as a sealing which has positive impressions in relief made by seals or moulds; Even though the texts are the same on each side the inscribed objects are quite different. Mahadevan describes the anthropomorphic forms on the two sides as "Man armed with a sickle-shaped weapon facing a seated woman with dishevelled hair and upraised arm" on one side and "Nude female figure upside down with thighs drawn apart and a crab (?) issuing from her womb; two tigers standing face to face rearing on their hind legs" on the other side.
In Table 4 we present ten two-line inscriptions for each of which the best reading is obtained by retaining the original form of two lines as two texts. The next best reading is obtained in most cases by combining the two lines into a single line. In one case (No.1227) the second best reading is obtained by segmenting the inscription into three groups but this reading does not seem to be reasonable. For instance the man sign is combined with the comb sign to form a single segment whereas this pair would more naturally occur with the jar sign at the end of a text. However the middle segment of three signs with the jar sign as the terminal sign occurs as an independent text (No. 7027). In one case (No.1012) a natural way of segmenting would be to divide the first line into two halves after the jar sign and to retain the second line as a separate segment but this segmentation is not one of the two best readings obtained by us. In the eighth inscription (No. 7249), which is from Lothal the first line also occurs as an independent text from the same site.
In Table 5 we present four multi-line inscriptions in which the first line itself gets segmented. The first and the third inscriptions have identical first lines and their second lines have a common terminal sign. The best reading for the first inscription (No. 1321), is obtained by segmenting it into three texts and the second best reading into four texts. The second inscription has three or four segments. The points at which the inscription gets segmented do not coincide with the end of any line of the original inscription.
In table 6, we present an interesting inscription in five lines. The
best reading is obtained by retaining the first line of the text
as the first segment and combining the remaining lines to form the second
segment. The calculation of probabilities show that this reading is twice as
likely as the original text. The next best reading divides the text into three segments. So
that the first and second lines make up the first and second
segments respectively and the other three lines combine into second
segment.
CONCLUSION
The tables of positional frequencies of each sign given in the Concordance are based on the assumption that
each line may not be an actual text of the Harappans and each line could contain more than one text. If reliable segmentations can be made then the positional frequencies
could be used for estimating the positional probabilities of the individual signs to a better level of accuracy. If the entire corpus is analysed and segmented, using
some objective criteria then one could get at these better estimates of probabilities. These probabilities would be useful in segmentation of texts and also for a cluster analysis of signs.
In this paper we have demonstrated a new method of segmentation using an optimization technique and we hope that the segmentations obtained by our method would be of some assistance to scholars who wish to take up the work of identifying new segments in the original inscriptions.
Calculations were made using small electronic calculators but if more texts are to be segmented the aid of a computer would become necessary.
Acknowledgement
We wish to thank our colleague Mr Dorai Pandian and a number of student volunteers who assisted us in computational work.
Notes
:-
1 Arlene R. K. Zide, "A brief survey of work to date on the Indus Valley Script",
Journal of Tamil Studies, Vol. 2, No. 1, May 1970, pp. 1-9.
2 S. R. Rao, "Deciphering the Indus Valley Script", Indian and Foreign
Review, Vol. 17, No.3, pp. 15-30, Nov 1979.
3 Iravatham Mahadevan, The Indus Script: Texts, Concordance and Tables, Memoirs of the Archaeological Survey of India, New Delhi, 1977.
4 Gift Siromoney and Abdul Huq, "Cluster Analysis of Indus signs: a computer approach",
Proceedings of the Fifth International Tamil Conference at Madurai,
1981, pp. 2-15 to 2-23.
5 Gift Siromoney, "Classification of frequently occurring inscriptions of Indus civilization in relation to metropolitan cities", STAT-45/80 (mimeo); Paper presented at the Seventh Annual Congress of the
Epigraphical Society of India held at Calcutta in January 1981.
6 John Mitchiner, Studies in the Indus Valley Inscriptions, Oxford and
IBH Publishing Co., New Delhi, 1978.
7 K. V. Mital, Optimization methods, Wiley Eastern Ltd., New Delhi, 1976.
APPENDIX
Let DCBA be a given sequence of Harappan signs. The tables in The Indus Script would show how frequently each one of these symbols, "A", "B", "C", and "D" occurs at the initial, medial and terminal positions of the texts as well as how frequently each one of them occurs alone. For instance in 20 out of 100 cases "A" may occur alone, in 30 out of I00 cases it may occur in the medial position and in 50 out of 100 cases it may occur at the initial position. In other words in 20% of the cases, "A" occurs alone; in 30% of the cases it occurs as medial sign and in 50% of the cases it occurs in the initial position. Given that the sign "A" has occurred in an inscription, we assume that the probability that it is a medial sign is 0.30 and the probability that it is an initial sign is 0.50.
Each one of the signs then would have four possible positions and their corresponding probabilities could be estimated from the tables. We wish to find out whether the given inscription DCBA should be treated as D+ CBA or D+C+ BA or DC+BA or DCB+ A or DC+B+A or D+C+B+A and so on. What are the chances that the inscription should be read as D+CBA or in other words segmented between D and C? We first find the probability that the sign D occurs alone and then multiply it with the suitable probabilities for C, B, and A. We assume that C would be a terminal sign, B a medial sign and A, an initial sign. The product of these probabilities would be calculated and compared with the products obtained for other combinations. The best reading is obtained by choosing that segmentation which has the maximum value of the product of the probabilities. A string of length 4 can be segmented in 8 ways and a string of length n in 2n-1 ways. For a string of length 10 the number of ways of segmenting it is 29 ways or 512 ways. The calculations can be made by adapting a technique known as Dynamic Programming perfected in an area known as Operations Research.7
In order to compare strings of length n, the nth root of the product of n probabilities is taken.