In machine code, an INC(rement) opcode (or an ADD, for that matter) acting on a register (assuming your int is already loaded into that register) or memory location is itself atomic, taking one clock cycle (or tick) on an ALU (which is part of your CPU), and depending on your architecture, multiple INCs (3 is common) can even be handled within a single clock tick, although other bottlenecks in the pipeline may limit this again). See these tables to get an idea of the basic latencies of various instructions.
But in your high-level world of compiled or interpreted instructions, you're doing much more than that: you allocate some memory for your integer variable first, then you fill it with an initial value, you add unity to that memory location using an ADD or INC instruction, either directly or indirectly (if you had stored that value of one somewhere else before, and have to load it first), and then you may wish to evaluate the result and decide to do something depending on the outcome, as in trancexx's example. All of those steps cost clock ticks; some are handled by ALUs, others by AGUs, floating point maths is handled by your FPU, etc. Many instructions require multiple CPU units to interact, and some single opcodes require over one hundred clock ticks to complete. Special scheduling units attempt to keep the pipelines optimally filled, but cache misses, faults, exceptions, and even suboptimal memory alignment of addresses and buffer sizes all degrade performance. Now imagine this in an environment where dozens or hundreds of threads are vying for processing time and memory, and the CPU is always simultaneously playing catch-up and divide-and-conquer.
So you're missing the point if you're thinking purely in terms of your single INC in a single thread. Your CPU itselfs consists of numerous parts working on numerous jobs, and constantly switching to keep as many tasks going in parallel as possible, as efficiently as possible. This means that any thread can be temporarily suspended at any point (and not necessarily always when a full instruction is completed, if that instruction takes more than one clock tick), so if any information is shared between threads, or acted upon and evaluated by multiple threads, race conditions may occur, unless the programmer explicitly adds safeguards to prevent this.