That's my problem with this. It tries to be a desktop display server protocol without unifying all desktop requirements. Sure, X11 is old and have unnecessary things that aren't relevant anymore, however, as someone who builds their own DE, (e.g.: tiling window managers) I see it as the end of this masterrace. Unless everybody moves to wlroots. Flameshot, for example, is already dealing with this, having at least 5 implementations only for linux, and only wlroots and x11 are standards.
Also, imo, having windows in windows is useful when you want to use your favourite terminal in your favourite IDE. But as you said DEs can implement it simply. Let's say wlroots will implement this but others can decide otherwise. And for those the app won't run.
Another example, that affects my app personally, is the ability to query which monitor is the pointer at. Wayland doesn't care having these so I doesn't care supporting wayland. And I"m being sad about this because X is slowly fading away so new apps will not run on my desktop.
Moreover with X11 I could write my own hotkey daemon in my lanuage of choice, now I would have to fork the compositor.
Do I see it wrong?
What do you suggest?