with Joshua Wheaton
Estimated reading time: 12 minutes
Robin Lee: Welcome back to Honors Spotlight. I’m your host, Robin Lee. And today we’re talking about something that hits close to home. For a lot of people, the connection between social media and depression, and how artificial intelligence could help identify when someone is struggling. Even with today’s technological advancements, the idea still sounds futuristic, yet it reflects something we’ve known for generations. People leave clues about how they’re doing. They always have. The difference now is that more people start that sentence over. The difference now is that more of those clues live online. This episode centers on an honors thesis written by 2025 computer science graduate Joshua Wheaton, titled “Using AI to Identify Depression on Social Media.” It’s a project that blends computer science, mental health, and the digital spaces where many people turn when they feel alone. Let’s get started. Joshua, tell me a little bit about yourself and your background, and then we’ll move on to your thesis project.
Joshua Wheaton: Yeah. My background is in computer science. Computer science has been something I’ve enjoyed for a very long time. I knew way before I was about to start college that it was something I enjoyed and wanted to study more.
What first pushed you towards studying depression detection, and was it something personal, academic, or just something you observed online?
Joshua Wheaton: So, going into starting my thesis, there were a few things I thought I wanted to pursue. One of the big ones was AI. I mean, it’s something I’d taken a lot of classes on. I obviously was getting popular at the time. I was kind of talking to different professors, trying to, like, get ideas. And one of them brought up this idea, and that really resonated with me, because mental health in general is something that has been near and dear to my heart for a very long time. So, when that idea was kind of brought up of looking for signs of depression using AI, I was like, That’s it. I want to do that.
Robin Lee: I know I am one of those students who would memorize something for what I was working on. And then once I was done with that assignment or test, I would just kind of dump the information. And it has been at least a year since you’ve completed your thesis. So, the fact that you remember all of that is amazing to me, because it is super helpful for what we’re talking about today. So now we’re going to get into a little bit about why this is so important. You don’t need a study to know that depression has become a part of the daily reality for millions. But the data backs it up. Cases are rising, and getting help isn’t always easy. Some people don’t know where to start, some don’t feel safe speaking out, and some head to social media not for attention but for release. Reddit, for example, is full of people writing honestly because the ability to stay anonymous gives them room to breathe. I see them all the time when I scroll through posts to find something relevant for the Honors College here at MTSU, and that’s what made it the backbone of Joshua’s research. The post is raw, unpolished, and honest. Exactly the kind of data an AI model can learn from.
When you first explored the Reddit dataset, what stood out to you about how people talk about depression in those online spaces?
Joshua Wheaton: Hmm. A little bit of a difficult question, but ultimately, I think the answer is just variety. Like, there’s no one way that people talk about it. You know, even with Reddit’s anonymity, it allows some people to be a bit more open. But even then, you still have just a wide variety in how people talk about it and how they speak about their own experiences.
Did you consider any other social media platforms, and how do you think that data might be different?
Joshua Wheaton: Ultimately, Reddit was went with just because of the ease of access. Pretty much. My thesis advisor helped a lot with that. But yes, I do think there are some differences on different, social media platforms like Facebook definitely skews older, as with its, I guess, user base. And I think there’s a lot more stigma among older people with not just depression, but mental health in general. Looking at it in a different way, Twitter/X, I think, would also be very different just because of how it’s generally shorter, as far as you know, the text or tweets or whatever they’re called now. And that being brief about it, I think would also be a very different, uh, experience, probably a bit more difficult just because there’s less, uh, Less text to work with for each data point.
Robin Lee: So now we’re going to go into a little bit about how Joshua built his project. For this thesis, he trained three different machine learning models: logistic regression, LSTM, and BERT. In simple terms, one was basic and fast. One was designed to understand the longer strings of text, and the last one, BERT, was powerful enough to read context the way people actually speak. According to the data he shared in his thesis, all three models performed well, with BERT reaching nearly 98% accuracy. It’s impressive, sure, but the simplicity behind the results revealed something bigger. Sometimes identifying a depressed post was easy, because the dataset came from a subreddit where people talked openly about depression. But that’s not how people talk in the real world, and that gets to the heart of the limitations.
Did anything surprise you about how easy the data set seemed to be for the models? And did that change how you thought about real-world application?
Joshua Wheaton: Absolutely. Honestly, I didn’t realize it was the data set that was, quote unquote, too easy. At first, when I was kind of finishing up the training and the testing of the models and looking at all the results, I was surprised. I was like, you know, not just all of them doing very well, but even the simplest model only being a few percentage points away from the performance of the most advanced model. So, I had thought that I had messed up somewhere. I was like, oh, I, you know, messed up. I mixed up the training and test data, or, you know, didn’t do enough, uh, randomization in the way that the data was presented to the models, and I was, you know, going back and forth with my professor, and she was giving me things to check. And I was going through, and I was like, No, it all looks good. The code all looks fine. And then I started unwrapping how the final trained models were working and looking, especially the simplest one. And, kind of turned out that with the simplest one, if the word depression was just in the text, it was pretty much automatically labeled as having depression. A little bit of a simplification, but it was at that point that I started digging deeper and realizing that, oh, the thing that I thought made this data set such a great avenue for training these models having a space where people are willing to talk so openly about depression was at the same time, the kind of undoing of the purpose of these models, because instead of creating things that would be able to be useful in looking for signs of depression just on social media in general. Instead, they became just hyper-specialized in looking for it in spaces where people are already talking very openly about it. Yeah, and I think that’s important to go back and discover things like that, because so many people will post things on Facebook, for example, that it’s you can tell they’re having a bad day or that something bad has just happened to them in their life. And so they’re just posting and venting, and it’s not something that they would ever say in a real-world situation, face-to-face or anything like that. But it’s out there. Even if they immediately take it down, you know, that’s what your models essentially would be trained to pick up eventually, without the word depression being listed in the post.
Robin Lee: Naturally, anytime you mix technology with mental health. You walk a tightrope. You want to help, but you don’t want to mislabel someone. And that’s exactly what you were just talking about. You don’t want to invade someone’s privacy or replace actual professionals with algorithms. Joshua approached that with a healthy level of caution. He pointed out the dangers, the false positives, the false negatives, and the fact that social media isn’t a diagnosis. It’s a snapshot, sometimes an unreliable one.
If an AI tool misidentified someone, either by missing the signs or flagging them incorrectly, what concerns do you think matter the most?
Joshua Wheaton: Well, I think the most obvious concern is, you know, the false negative, you know, saying, oh, this person doesn’t have depression, or this person’s fine when really, they’re not. Because that leads to people not getting the help or the support that they need. And, you know, that’s it’s not good. But on the other hand, false positives can also be not good because, let’s say, someone’s flagged as having depression when they don’t. That could mean the resources being put towards them aren’t available to someone who may need them. Or you could also, you know, be giving this person a sense of, oh, no, is there something wrong with me? When there’s not. So, I don’t know if I can really say that there’s a singular concern in that kind of realm that I think matters the most. I think they’re both important, just in different ways. Yeah. And a false positive, I think, happens a lot. Just like when people start googling their symptoms, or they look up something in ChatGPT to figure out what’s wrong with them and why their belly hurts. You know, there are a lot of different ways to get false positives, and the more the AI becomes involved. I think, at least right now, that makes it more likely to produce false positives, unfortunately.
Robin Lee: The core takeaway from your study, to me, was simple: AI can pick up on language signals faster and more consistently than people can, but it’s not perfect and shouldn’t stand alone. Bert performed best, but the logistic model also revealed something important. Words like depression and anxiety were the strongest predictors. That’s obvious. But it also raises a hard truth. Real depression doesn’t always announce itself. It doesn’t always use the word. And instead, I think people are more accustomed to phrases like ‘I’m fine’ or the meme’ everything’s fine’ with a fire in the background. AI will need better data, deeper context, and more nuance before it can operate in real-world settings.
Given that real depression doesn’t always announce itself with obvious words, how could you redesign your data set to capture those subtler signs or subtle signals and better reflect real-world communication?
Joshua Wheaton: Yeah, with the power of hindsight. Hindsight is 20/20. I think the one thing that I wish I could go back and do differently is the actual data set itself. I think if I were doing it again, or I don’t know, making suggestions for other people in the future, I would say spend so much more time on the data set. I think, like in an ideal world, what that would kind of look like is having a, you know, your sample size, your sample population, and having all of them take tests by psychologists, psychiatrists for depression, depression screening. And then so you have. That’s going to be one part of it where you have not just depression or no depression, but also the degree, you know, mild, moderate, severe. Then on the other part of that, instead of just, you know, taking parts of, you know, single posts or anything like that, taking as much of their social media as possible across multiple platforms, um, and across a. Longer amount of time. So, I think, with a data set like that, you’d have a much more accurate model that could actually be used in the real world.
Robin Lee: I definitely agree that there are a lot of different ways that this could continue to develop in the future. So, anybody out there looking for a thesis topic? I mean, you’ve already got a place to start. So one of the most forward-thinking aspects of your thesis was your call for more accurate data from real social media users who complete clinical depression screenings. That kind of data set would let AI read someone’s broader digital footprint, not just a single post taken out of context. And that’s how early detection becomes practical, not just theoretical.
Where do you think the future of AI and mental health support is heading? I know you already talked about that a bit. So, I’m going to add, do you envision tools that assist clinicians, or do you envision AI integrated into platforms people use every day?
Joshua Wheaton: I think it could absolutely head in multiple directions, right? So, kind of like the latter part of that, the integration of AI into everyday social media platforms. That’s obviously the approach I was going towards. Uh, one of my I don’t know if I’d say inspirations, but one of my kind of ideas of an application from Reddit, specifically Reddit, currently has something called Reddit Cares, which is where if you see posts or comments that you find concerning, you think someone might be struggling, you can send them a Reddit cares, which is like, oh, hey, you know, you’re worth it. Here’s a resource kind of thing. Unfortunately, at least from my perspective, it seems more used for bullying, in a way. Like, if people see takes or opinions they don’t agree with, they will kind of send that as, like, “oh, hey, that’s a, you know, mentally ill opinion you have there.” That’s super stupid. So kind of my own, I suppose, is ideal. Idealized use of what I was working on would kind of replace that with something a little more automated and a bit less, uh, susceptible to misuse. Nefarious purposes, if you will. And I think, with the hope of having something like that, you know, functional, automatic, can help give people who might not think to look for resources or might not have a ton of access to resources. It would be able to help them all across the internet.
Robin Lee: I think that’s an awesome idea, because I know a lot of times people may see something and they know that a person is feeling down, but they feel uncomfortable maybe reaching out to them personally. And so, something along what you’re describing, I think, would be a great way that they could still reach out and show their support when they may not know the words to say or know how to interact with somebody who’s going through something that they may not have been through themselves. And so yeah, that’s a very awesome point. Thank you. Joshua’s research shows that technology may help alert people when something’s wrong, but it’s not a substitute for community, family, or professional care. And that hasn’t changed, and it never will. However, if AI can prompt someone to seek help sooner or help platforms identify individuals who may be struggling, then it’s worth exploring. The blend of innovation and responsibility is where the future sits. Joshua, thank you for joining me and for helping to move this conversation forward. Counseling services are available with licensed mental health professionals at MTSU, as well as a variety of workshops and other resources. Even a Zen Den to help you relax and recharge. For more information, visit counseling.mtsu.edu.