Preethi P. is sitting on a stool next to a sewing machine in her one-room home on a peaceful street in Agara, a small community three hours southwest of Bangalore surrounded by groundnut fields and rice paddies. She would typically embroider or fix clothing for hours at a time, earning less than $1 each day on average. But on this particular day, she is reading a line into a phone app in her native Kannada language. She reads another after a brief break.
Preethi, who goes by one name as is customary in the area, is one of the 70 employees that a startup named Karya hired in Agara and nearby villages to collect text, voice, and image data in the vernacular languages of India. She is a member of an enormous, invisible global workforce that works in places like Kenya, the Philippines, and India. Her job is to gather and categorize the data that virtual assistants and AI chatbots need in order to produce pertinent responses. Preethi, however, is compensated fairly for her work, at least by local standards, unlike many other data contractors.
Preethi made 4,500 rupees ($54) after working with Karya for three days, which is more than four times what the 22-year-old high school graduate typically makes in a month as a tailor. She said that the amount would cover the entire month’s payment on a loan she had taken out to partially restore her home’s decaying mud walls, which had been painstakingly covered in vibrant saris. “My phone and the internet are all I need.”
Also read: Australian woman stops flight after running onto tarmac, See here
Although Karya was established in 2021—prior to ChatGPT’s ascent—the tech industry’s ravenous appetite for data has only grown as a result of this year’s generative AI craze. By 2030, there will be about a million data annotation workers in India alone, according to Nasscom, the trade association for the nation’s IT sector. Karya sets itself apart from other data vendors by paying up to 20 times the current minimum wage to its contractors, the majority of whom are women living in rural areas, in exchange for higher-quality Indian-language data that IT companies are willing to pay more for.