How-to ยท Focus environments

How to Set Up a Solo "Study With Me" Session (Without the YouTube Video)

The Korean gongbang trend works for a real psychological reason. Once you understand why millions of students study to a stranger's YouTube video, you can recreate the same effect on your own laptop โ€” and it works better than the original.

May 21, 2026 ยท 10 min read ยท By Cozy Study

Somewhere in Seoul, a 27-year-old woman named Hani Kang sits at a desk and pages through textbooks for twelve hours straight. There is nothing on her desk except books and pens. There is no music, no commentary, no narration. There is, however, a camera. And on the other side of that camera, almost a hundred people are watching her study.

This is gongbang (๊ณต๋ฐฉ) โ€” short for the Korean gongbu bangsong, meaning "study broadcast" โ€” and it's one of the strangest, fastest-growing genres on YouTube. By 2018 more than four thousand gongbang videos had been uploaded by Korean creators alone. During the pandemic, the genre exploded globally: students from France, the US, the UK, Brazil, and India started filming themselves studying for hours, in real time, with no editing. Their videos pulled millions of views from other students who were studying along.

The format is almost aggressively unentertaining. A desk. A person hunched over a book. The scratch of a pencil. The turn of a page. Sometimes rain. Sometimes a Pomodoro timer in the corner of the screen. Hours of this.

And yet people swear by it. Reviewers on the gongbang subreddits report studying twice as long with it as without. Korean medical students who routinely put in twelve-hour days credit it with the discipline to keep going. The comments under popular videos are full of strangers cheering on a girl in Seoul they will never meet.

What's going on here is interesting โ€” and once you see the mechanism, you don't actually need the YouTube video to capture the benefit. You can build a better version of it on your own laptop in about three minutes. This is how.

Why "study with me" videos actually work

The genre seems baffling until you decompose it into the psychological mechanisms it activates. There are at least five of them stacked on top of each other, and each one is a real, separately-studied effect:

1. Implicit accountability

Watching someone else study creates what psychologists call a witness effect โ€” the feeling, however imagined, that another person is observing your behaviour. The creator on the other side of the screen doesn't know you exist. But your brain doesn't care; it responds to the impression of being seen. This is the same mechanism that makes studying in a library more effective than studying in bed even though no librarian is grading your effort. Gongbang turns the social pressure of a library into something you can summon at 2 AM from your bedroom.

2. Mirroring behaviour

Humans are mimetic. When you watch someone perform a behaviour โ€” especially a sustained, deliberate behaviour โ€” your motor and intention systems light up in sympathetic patterns. Watching a creator turn pages and write notes for ten minutes makes the action of turning pages and writing notes feel more accessible than it did at minute zero. This is the same reason it's easier to start a workout if a friend starts first. Studying alone is starting cold. Watching a study-with-me video is starting with a buddy already going.

3. External timing cues

The popular videos almost universally include a visible Pomodoro timer. Research on attention and breaks has consistently shown that external cues outperform internal ones โ€” you can hold a focus better if the question "should I stop now?" is delegated to a clock instead of debated by your prefrontal cortex every twelve minutes. A gongbang video bakes that timer into your peripheral vision.

4. Sustained ambient sound

The audio in these videos is usually constant: rain, fire crackling, pencil scratches, ASMR-textured silence. Sustained ambient sound has a real effect on what cognitive psychologists call the "irrelevant sound effect" โ€” it gives your auditory system something low-stakes to process, which paradoxically makes it harder for distracting sounds (a notification, a sibling, the air-con clicking on) to capture your attention. The brain has a vigilance system for novel sounds. Steady rain saturates it.

5. The framing of "this is study time"

Possibly the most powerful effect: opening the video is itself an act of commitment. You've signalled to yourself, with a deliberate decision, that the next two hours are study. That decision is the hardest part of the session. Once made, it has momentum.

Why a YouTube video is a clumsy way to get those benefits

Once you see what gongbang is actually doing, the YouTube delivery is just one possible vehicle. And it's not the most efficient one:

The good news is that none of the mechanisms above require a YouTube video specifically. They require: a sustained ambient scene, a timer, the sense of being inside a focus environment, and the deliberate framing of "this is study time." Any tool that bundles those four things will produce the same effect, often more reliably.

The six-step solo setup

What follows is the simplest reproducible version, refined over a lot of late-night sessions. It takes about three minutes to set up the first time and about ten seconds every time after.

  1. Pick a study time, not a study task.

    The session starts with deciding when, not what. Most failed sessions die at "I'll start once I figure out what to study." If you commit to a clock โ€” "9 to 10:30 PM tonight, then again at 11" โ€” the question of what to do at 9 PM gets answered by the constraint of having already started. A wrong task in a started session beats a perfect task in a never-started session.

  2. Define one concrete sub-task before pressing start.

    Write down โ€” physically, on paper โ€” the single specific thing you'll do in the first 25 minutes. Not "study math." "Do problems 1 through 5 in Chapter 3." Not "work on the essay." "Write the second paragraph of the introduction." The specificity is the work. Once the task is concrete, the session is almost on autopilot. If you can't make it concrete, the problem isn't focus โ€” it's that you don't actually know what you're trying to accomplish, and no timer will fix that.

  3. Set the scene โ€” and commit to one.

    This is where the YouTube video gets replaced. Open one ambient environment that fills your screen: a fullscreen aesthetic timer with a cinematic scene, a study-with-me video on a second monitor with the main screen on your work, or a single ambient image with rain audio in a separate tab. The specific tool matters less than the commitment to a single one. Choice paralysis at this step is itself a procrastination loop. Pick something, then don't pick again.

    The reason Cozy Study works as this layer is that the scene is the screen โ€” the whole screen โ€” once you press F for fullscreen. No sidebar, no recommendations, no ads. Tokyo in the rain or a closed bookstore at 3 AM, with matching audio, plus the timer baked in. The benefit is exactly the same as a gongbang video but without the distractions of being on YouTube.

  4. Stage your physical space.

    The screen is half. The room is the other half. Three things matter and the rest is optional:

    • Phone out of the room, or face-down with notifications silenced. The single most-studied factor in focus research is the visible phone effect โ€” the mere presence of a phone, even powered off, measurably reduces cognitive performance. The phone is in the room, the room is not a study room. Put it somewhere else.
    • One drink within reach. Water, tea, coffee, whatever. The point isn't the drink โ€” it's that getting up for a drink mid-session is one of the most reliable ways to break a flow. Stage it so you don't have to leave.
    • Lamp on, overhead light off. Or whatever the equivalent is for your room. The goal is a clear visual delineation between your study zone and the rest of the space. Lit zones tell the brain "this is the activity area." Dimmed peripheries push the rest of the room into the background.
  5. Press start before opening anything else.

    The order matters. Press start on the timer first, then open the task. Not the other way around. The reason: starting the timer is a tiny commitment, and you're more likely to follow through on a session you've already started than one you're still deciding whether to start. The clock running creates a small irreversible event that biases you toward continuing.

    This is essentially the Zeigarnik effect in microcosm. Bluma Zeigarnik's classic research showed that people remember interrupted tasks better than completed ones; the running timer is an interruption-in-waiting, and your brain doesn't want to leave it dangling.

  6. Close the session deliberately.

    When the timer ends, stop. Don't "just finish this paragraph." Stand up. The break is part of the session โ€” not waste, not lost time. Walk around. Drink water. Look at something more than four feet away (this helps your eyes more than you'd guess). When you sit back down, decide consciously: another session, or done for now? The deliberateness is the muscle you're training.

    Don't end on a phone scroll. The post-session habit is what carries to the next session. Ending into a five-minute Instagram check makes the next session harder to begin. Ending into a stretch, a drink, or just looking out the window makes the next session feel like a continuation rather than a restart.

What to use for the scene

The "ambient environment" step is where most beginners stall โ€” too many options, none of them obviously right. A quick breakdown:

A study-with-me YouTube video still works if you can put it on a second monitor or a tablet. Strongly recommend not putting it on your primary screen; the YouTube UI is too tempting. Search "no music study with me 2 hours" or "lofi study with me real time" and pick whichever creator's room you find most calming. Channels that are popular for a reason: Merve, Cracker ASMR, The Strive Studies, iMia.

An aesthetic study timer like Cozy Study, Flocus, or ZenFocus covers all four mechanisms in one tab โ€” the scene, the timer, the ambient audio, and the framing. The trade-off is no "other person" element. Cozy Study's scenes are designed to compensate for this by being specific places at specific times: a Tokyo capsule hotel at 3:22 AM, a midnight diner, a Paris rooftop. The implication that someone else is also in that scene, also awake, also working, replicates the gongbang witness effect without needing an actual streamer.

A single looping background and a separate timer is the lowest-tech option. Open a fullscreen image (rainy window, fireplace, library shelf), put a Pomodoro timer in a separate tab, and play ambient audio on Spotify or YouTube Music. Works fine. Requires three tabs instead of one.

The hardest mistake to avoid here is layering too much. Three ambient sounds at once, two videos, music plus rain plus an audiobook โ€” beyond a certain density, ambient stops being ambient and starts being noise. One scene, one sound layer, one timer. That's the floor and almost always the ceiling.

The four-hour test

A practical way to validate any setup is the four-hour test: can you sit down at the desk and study, with one short break per hour, for four hours straight? If yes, your setup is fine. If no, identify which of the six steps failed.

Most commonly, the failure point isn't the scene or the timer. It's step 2 โ€” the task wasn't concrete enough. Or step 4 โ€” the phone was in the room. Or step 5 โ€” you opened the task before pressing start, which gives you a hundred small reasons to put off pressing start.

The setup is not magic. None of these steps cancel the underlying difficulty of doing hard cognitive work. What they do is reduce the friction of starting hard cognitive work, and keep your environment from actively undermining you once you're going. The actual studying is still on you.

If you only do one thing

Put the phone in another room. The single biggest gain across thousands of student-reported study sessions isn't the scene, the timer, or the sound โ€” it's the absence of the phone. Everything else is decoration. The phone is the whole game.

Why the late-night scene specifically

You may have noticed that gongbang videos, the popular study-with-me creators, and most aesthetic study timers all converge on a similar aesthetic register: dim, warm-lit, late at night. There's a reason for this beyond fashion.

Late-night studying tends to be more focused for two reasons. First, the world is quieter โ€” fewer notifications, fewer demands on your time, fewer choices about how to spend the next hour. Second, the visual environment is naturally simpler โ€” you can see less, so there's less to process peripherally. The aesthetic-timer industry has effectively engineered a "perpetual 2 AM" mode that captures the focus quality of late-night studying without requiring you to actually stay up that late.

This is also why a Tokyo street at 02:14, a New York bodega at 2 AM, or an Antarctic research station at 4 AM are over-represented as study scenes. They're places that share the same low-stimulus, high-deliberation quality of your own late-night room. The scene doesn't have to be calm to be useful โ€” it has to be quiet in the right way.

If you're a morning person who genuinely focuses better at 7 AM, the principle still holds, but the visual register flips: bright, cold-lit, simple. Pick the time of day where the friction is lowest, and the scene that matches it. The point isn't to romanticise the 2 AM grind. The point is to find the time when your brain is least busy fighting you and put a clean scene around it.

The very short version

If you wanted this whole post compressed into the smallest possible chunk: gongbang works because watching someone study activates accountability, mirroring, external timing, ambient sound, and the framing of "this is study time." You can get the same effect โ€” usually better โ€” by replacing the YouTube video with a fullscreen aesthetic timer, putting your phone in another room, deciding the one concrete sub-task you'll do first, and pressing start before you do anything else.

None of this is new. Gongbang exists because Korean students figured out a way to brute-force focus environments using the only tool they had โ€” YouTube. The web caught up. The tools are better now. The principle is the same.

Try a session in Cozy Study

11 cinematic late-night scenes, ambient audio, and a built-in Pomodoro. Hit F for fullscreen and the scene takes over the screen.

Open Cozy Study โ†’