How can the big data generated by churches be used to understand their diversity?
I have been interested in applying Data Science tricks to understanding the varieties of worship in the US for several years. Similar to my Genetic Tree of Aid post, I created a hierarchical map of the topics that are found on thousands of church, mosque, and temple websites across the religious spectrum:
The method is called Agglomerative Clustering. It takes the text from all the web pages on all the sites and sorts them, putting the most similar websites next to each other. Then it repeats this again and again with the little clusters of sites until the groups start to represent broader topics. I’ve gone and annotated the map with the categories of sites as I see them. As I expected, different church denominations cluster together, but not every church or religion clusters nicely. That’s where the real insights lay.
Going from rightmost column to left, the first thing that jumps out is Baptist Churches dominate the searchable Internet. This is a statistically valid conclusion, because I plugged tons of search terms into the google search API to fetch these websites. I combined the dozen most common denomination names with the 300 largest US cities and scraped all the pages that resulted, saving those that had any resemblance to a church, mosque, temple, fellowship, or society. Baptist Churches are the largest swath of the found space, and they are also quite distinct in the language they use. Similar churches appear adjacent to each other on the spectrum (that I had to slice up into pieces to make fit on a screen). So Baptists are adjacent to other message-oriented christian churches, but also to Baptist hospitals and schools. Catholic churches and dioceses are lumped in with youth ministry, but also the Calvary Baptist Church. That’s surprising, but it shows that “Baptist” is a broader tent than I had believed.
Lutherans also are split into churches that do good works and activism and those that focus on preaching the gospel. If you were basing your assumptions off of Liturgical history, you would place them adjacent to Catholics, but they are quite different when it comes to how they practice and how they preach.
Christian schools are a different color and appear distinct on this map, as they should. They shouldn’t even be there, but they appear as a result of the flaws in my google searching. They are indexed alongside churches and share enough overlap that google’s algorithm’s can’t filter them out either.
The leftmost column is a mystery. Most of the sites are relevant but disorganized. It is possible that the pages on these sites just don’t conform to a pattern the way church sites on the right do. I’ve put little labels to summarize topics around which smaller clumps of churches appear: God, Hope, Saint, Faith, and a lot more not shown. I omitted most of this column from the figure because it doesn’t reveal much, but you can fetch it from the repo.
Finally, you find the fringe of religion where I spend most of my time: Unitarian Universalists are next to Episcopals. Muslims and Jewish temples get a small cluster too. That #UUs are barely represented (despite being one of the search terms I included) reflects their rather small presence in America. Most #UUs would never believe this because their churches are full of people, but statistically there are 3 trans folk for ever #UU in America. Unitarian Universalists see themselves as activists to speak out on behalf of marginalized fringe groups, like LGBTQIA+ queer folk, but they themselves are a smaller group.
I also searched for Quakers, Mennonites, and Anabaptists, and although I DO have sites for these groups, they did not converge into any cluster in the overall map that I could find. That is partly because they are even less common than UUs as a demographic group, but also because these churches or house of worship put very little effort into being visible on the Internet. I’ve searched Youtube for services and podcasts for mentions of them and often come up with nothing. Quakers do not have formal preachers; everyone is a preacher when he/she is called to be. And so they don’t organize “content” the way typical churches do, and the map shows this omission.
Another surprising omission is the lack of the word “evangelical.” You have to know the culture to realize that many of these are listed in the phone book as “Charismatic” or Pentacostal or Baptist or a hundred other names. Most of the 500 largest single churches in America (there’s a data set for these megachurches) are incorporated enterprises, not religious sects. They are brands, not faiths, as their chosen names show:
North Point, NewSpring, New Life, New Light, Life.Church, Living Word, Elevation, Gateway, Crossroads, Dream City, Greater Cornerstone, etc.
In the future, I hope to turn this map into a classifier that can be used to better hone the search and recording of religious life in the US. I would also love to look at the intersection of these churches with political parties and candidates at the local level, because I know that they have a real impact on who represents America Citizens in Washington. MegaChurches are an interesting power-consolidating force because in 2021 Gallup Research reports that only 47% of Americans attend a church, but nearly all political candidates ally with churches in elections.
Audio for this post is also available as a podcast via anchor.fm.