Sanskrit & CNNs: Understanding Typography in AI with Midjourney’s ‘/describe’
As a foreign student studying in Nepal, my ‘real job’ outside of my world of animation has been my PhD work in the rich world of Sanskrit, an ancient classical language of South Asia. It’s not a job that pays the bills, but it does give me the visa to allow me to live in this beautiful country and as self-indulgent PhDs are, it will enable me to study at times with no aim in sight, just reading and seeing where the topic leads me. But lately, the worlds of my passion and profession have collided because my obsession with AI has intertwined them.
If you don’t know it, AI and Sanskrit have a deep history together. The intersection of Artificial Intelligence (AI) and Sanskrit can be traced back to 1985 when NASA researcher Rick Briggs published a paper highlighting the potential of Sanskrit’s highly structured grammar for knowledge representation in AI systems. Since then, researchers have explored the computational aspects of Sanskrit for natural language processing tasks such as parsing and machine translation. With advancements in AI and machine learning techniques, there has been a growing interest in analysing and understanding Sanskrit literature and preserving and digitising the vast corpus of texts. Although research in this area remains limited compared to more widely spoken languages, Sanskrit’s unique properties continue to offer intriguing opportunities for AI and natural language processing researchers.
To explain, parsing and machine translation are two essential tasks in natural language processing (NLP), which focus on enabling computers to understand, interpret, and generate human languages.
Here are two definitions that I like:
Parsing is the process of analysing a sentence to determine its grammatical structure, breaking it into parts and identifying its relationships. It’s essential for various natural language processing tasks, helping computers understand sentence structure.
Machine translation is the task of automatically translating text or speech between languages using computational algorithms. The goal is to enable effective communication without human translators. Current state-of-the-art models have achieved impressive results, but challenges remain in handling language-specific nuances.
Now I dabble in learning about NLP because it’s just another passion, but I’ve spent the past years researching the Bhrami script in Sanskrit. You can think of the Brahmi script as a form of typography or a writing system used to visually represent the sounds and structure of a language, in this case, Sanskrit, Prakrit, and Pali, among other ancient Indian languages. So, while the Brahmi script does not predate the origin of the Sanskrit language itself, it does predate the written form of Sanskrit literature. The Brahmi script was one of the earliest writing systems used to record the Sanskrit language, which had previously been preserved and transmitted orally. Brahmi script is fascinating and, at its core, very elemental, which is what I like about it and served as the foundation for several subsequent scripts and writing systems in South and Southeast Asia and is characterised by its simplicity and systematic structure, which made it adaptable and easy to learn.
As the Brahmi script evolved, it gave rise to numerous derivative scripts such as Gupta, Nagari (the precursor to Devanagari), Pallava, and many others. These scripts, in turn, became the foundation for writing systems used for various languages in India and Southeast Asia, such as Devanagari (used for Hindi, Sanskrit, and other languages), Bengali, Gurmukhi (used for Punjabi), and the scripts of South Indian languages like Telugu, Kannada, and Malayalam, among others.
In summary, the Brahmi script can be considered elemental due to its foundational role in developing numerous writing systems across South and Southeast Asia and its systematic structure that efficiently represented the sounds of the languages it was used to write.
But I could talk about Brahmi all day. I’m interested in exploring how through my studies and free time using Midjourney, I have realised that the complex structure of Sanskrit may hold the key to unlocking AI’s potential in understanding and generating typography more effectively.
With my general public enthusiast knowledge background in artificial intelligence (AI), particularly in text-to-image generation for my animation, I have experienced firsthand the limitations of AI in typography. This is because conventional AI approaches, such as convolutional neural networks (CNNs) and generative adversarial networks (GANs), predominantly treat typography as a series of images, focusing on visual patterns and structures. However, this image-based approach fails to capture complex scripts like Sanskrit’s linguistic, structural, and contextual nuances.
Explain it like I’m 5?
1. Convolutional Neural Networks (CNNs):
Imagine you have a big book of pictures. A CNN is like a smart robot that looks at all the pictures and learns to recognise different things in them, like animals, cars, or trees. It can then tell you what it sees in new pictures you show it.2. Generative Adversarial Networks (GANs):
Now, think of a GAN as two robots playing a game together. One robot (the generator) tries to draw real pictures, while the other (the discriminator) tries to guess if the pictures are real or fake. They both get better as they play: the generator learns to draw better pictures, and the discriminator gets better at telling real and fake pictures apart.
In summary, CNNs are like smart robots that can look at pictures and tell you what’s in them, while GANs are two robots playing a game to create and recognise real-looking pictures.
Why do they fall short? Because they are still treating the script (typography) as an image and all the other information in the picture.
A Test in Midjourney
To play with AI’s limitations on text and to better demonstrate how it can recognise and reproduce Latin-based languages better than say something as complex as Sanskrit, I used the /describe function in Midjourney on a cutout of the vowel A ‘आ’ in Sanskrit and tried to see how it would interpret the image.
I typed in /describe, uploaded the image and obtained the following results. It always gives you 4 descriptions.
1. A pink and white paper with the words b, in the style of maroon, Bikash Bhattacharjee, transparent/translucent medium, 35mm film, dye-transfer, unpolished authenticity, dark teal and maroon
2. A blue paper sticker that has the letter t3 printed on it, in the style of dark pink and maroon, art of Burma, found-object-centric, group f/64, booru, dye-transfer, Samikshavad
3. An artwork written on a piece of paper, in the style of dark pink and light maroon, Bengal school of art, 35mm lens, booru, tonist, feminine sticker art, 500–1000 ce
4. The letter t, in the style of dark pink and light maroon, art of Burma, dye-transfer, tagging and marking style, toraji, 35mm film, unpolished authenticity
Interpreting the Results
You can see how it looked at “आ” as an image and related it to a Latin-based script. It found the letters “b” and “t” in it and also the pair “t3”. Culturally it related “आ” to the art of Burma, Bengal and Samikshavad, the first indigenous art movement in modern India, which started in North India in 1974. It also connected it to artists in its dataset, the famous late Indian painter Bikash Bhattacharjee from Artnet and a more contemporary digital artist named Toraji that it seemed to have gotten from ArtStation.
I then decided to correct the prompts with “आ”.
Here are my examples of changing numbers two and four and the resulting image/images.
2. A blue paper sticker that has the letter आ printed on it, in the style of dark pink and maroon, art of Burma, found — object — centric, group f/ 64, booru, dye — transfer, Samikshavad
4. The vowel आ, in the style of dark pink and light maroon, art of Burma, dye — transfer, tagging and marking style, toraji, 35mm film, unpolished authenticity
The resulting images showed nonsensical language, or did they?
I had to double-check with Google Translate, so I uploaded the pictures to my phone and had Google Lens read them to see if any languages were hidden inside. Hindi and Romanian were the closest connections it could find.
But is there any deeper meaning here? I had to ask ChatGPT.
I want to know if there is any connection between the Sanskrit vowel आ and the Hindi words “vo jaa,” meaning “he go”, and the Romanian words “Ma Va”, meaning “I will”?
Its response:
While there might appear to be some similarity between the Sanskrit vowel आ, the Hindi words “vo jaa,” meaning “he go,” and the Romanian words “Ma Va”, meaning “I will,” these similarities are likely coincidental and do not indicate a direct linguistic connection between the three.
Sanskrit, Hindi, and Romanian belong to different branches of the Indo-European language family. Sanskrit is an ancient language belonging to the Indo-Aryan branch, while Hindi, a modern language, also belongs to the same Indo-Aryan branch. On the other hand, Romanian belongs to the Romance branch of the language family.
The Sanskrit vowel आ (pronounced “aa”) is a phonetic element used in many words across the language. The Hindi words “vo jaa” (वह जा) and the Romanian words “Ma Va” do not have any direct linguistic connection to the Sanskrit vowel आ.
The similarities you notice are likely due to the nature of the Indo-European language family, which shares some common linguistic roots and structures. However, the specific words and phrases you mentioned do not have any direct connection or shared etymology.
So, in summary, if an image recognition software misinterprets the Sanskrit vowel आ as the Hindi words “vo jaa” or the Romanian words “Ma Va,” it’s likely due to the limitations of the software and its training data rather than any direct linguistic connection between these phrases.
The Complex Structure of Sanskrit and its Implications for AI
With its intricate script and unique characteristics, Sanskrit presents many challenges for AI in typography. The script, typically written in Devanagari, features complexities such as conjunct characters, ligatures, contextual shaping, and diacritic marks. Furthermore, its phonetic structure emphasises the representation of sounds rather than solely visual symbols. These aspects demand that AI systems not only recognise and reproduce visual patterns but also understand and incorporate linguistic information to generate accurate and readable typography.
If we were to take the same prompts and use a Latin-based alphabet letter, like A, instead, we would perhaps achieve success, and in trying the same experiment, I did, in fact, produce great results.
But that’s just a letter; it can’t create words to the same effect. I tried the word “Aardvark” in the same two prompts and got these. This time the prompt saw the word and interpreted me as wanting to see an image of it. Maybe the fault lies in the prompt; I do not know.
Realising the Limitations of AI in Typography
As you can see, in my journey with AI-generated artwork, I have observed the limitations of AI when it comes to understanding and generating typography.
Here are some other examples coming from the same image of “आ” and the resulting prompts; some changed, some I didn’t.
Sanskrit as a Catalyst for AI Advancement in Typography
The challenges posed by Sanskrit’s complex structure offer a unique opportunity for AI to refine its understanding of typography. By developing AI systems capable of handling the intricacies of Sanskrit, we can gain valuable insights that can be applied to other scripts and languages. Here are some ways in which Sanskrit can help AI achieve better results in understanding typography:
Addressing script complexity: Developing AI systems that can effectively handle Sanskrit’s complexities, such as conjunct characters and ligatures, will improve their performance for other scripts with similar challenges.
Incorporating linguistic understanding: Learning to account for the phonetic nature of Sanskrit and other languages will enable AI to generate more accurate and readable typography.
Balancing aesthetics and functionality: Creating visually appealing typefaces that adhere to linguistic and typographic rules will enhance AI-generated typography across various languages and scripts.
Respecting cultural and historical aspects: Sanskrit’s rich cultural and historical significance can guide AI systems in maintaining a balance between technical proficiency and cultural sensitivity in typography generation.
Encouraging interdisciplinary collaboration: Working with Sanskrit typography necessitates collaboration between AI researchers, linguists, typographers, and historians, fostering the development of comprehensive and well-rounded AI systems.
In conclusion, my exploration of Sanskrit and AI has unveiled the immense potential of harnessing Sanskrit’s complexity to propel advancements in AI-generated typography. By addressing the unique challenges of Sanskrit’s intricate structure, we can enhance AI’s ability to generate accurate, functional, and visually appealing typography across various languages and scripts. The interdisciplinary collaboration between AI researchers, linguists, typographers, and historians is vital to developing well-rounded AI systems that respect cultural and historical nuances. Ultimately, delving into the rich world of Sanskrit might lead to significant breakthroughs in AI’s capacity to comprehend and create typography, opening new doors for advancements in natural language processing and beyond.