| Revision History | |
|---|---|
| Revision 1.0 | 2017-04-27 |
| Initial version | |
This document is targeted at rc-core developers and people interested in rc-core development. It describes the current situation, some of the issues with the current approach and proposes solutions to those issues. The plan is to update this document as development progresses and consensus is found and/or further issues are identified and addressed.
By far, the most common kind of remote control hardware (receivers and transmitters for receiving/generating the kind of signals used by handheld remote controls - like the kind you would normally expect to use to control e.g. your TV set) is based on InfraRed, IR, technology. This kind of hardware is often referred to as Consumer InfraRed, CIR, in order to distinguish it from other types of IR hardware, such as the wireless data transfer technology standardized by the Infrared Data Association, IrDA.
There are, however, other solutions based on e.g. radio-frequency
(RF), HDMI, etc. One such example
that is supported by the Linux kernel is the
ATI Remote Wonder which uses RF
rather than IR to transmit commands. For that
reason, this document will use the terminology “remote
control”, RC, rather than e.g.
IR. This is also the approach adopted in the kernel,
as evidenced by the
drivers/media/rc directory and
rc-core module in the kernel sources.
The granddaddy of RC drivers in the Linux ecosystem is the LIRC project. LIRC was developed out-of-kernel and contained drivers as well as software tools to receive/send IR signals. The focus was traditionally on hardware which measured and generated pulse-space timings from IR signals. Such hardware, which will be referred to as raw IR hardware, has the advantage that it will (in theory) work with more or less any kind of remote, no matter which specific IR protocol it uses - as long as there is an appropriate decoder written for that protocol. In LIRC, such decoders were implemented in a userspace daemon which received captured pulse-space values from the in-kernel hardware driver.
In addition to special-purpose IR hardware, many digital and analog TV/video capture cards included hardware to receive IR signals. These cards often come bundled with a remote and typically do not support IR transmission. Many of the cards also have on-board decoders for received IR signals, meaning that rather than delivering raw pulse-space timings, the cards only provide already decoded commands. The advantage is that no further decoding of the signal is necessary, but the hardware is typically limited to supporting one, or at most a few, IR protocols. Some hardware decoders do not even return the full decoded signal, but only parts of it (e.g. only the command bits and not the address bits), making it impossible to know the full signal. Hardware which does some kind of hardware decoding will be referred to as cooked in this article.
As cooked hardware was typically found on digital and analog video
cards, the IR support for this kind of hardware
developed in the linux kernel (via the linux-media/linux-tv projects)
independently from LIRC. As a result of discussions
on how to merge the LIRC tree into the kernel,
rc-core was created in order to merge the two
approaches into a consistant API. The legacy of the
two code-bases is, however, still very evident today if one looks at
the rc-core code (i.e. everything under
drivers/media/rc/) in detail (example: the
in-kernel keymaps were traditionally associated with cooked hardware
while the in-kernel decoders and lirc kernel
module are examples of code specifially for raw hardware).
The following picture provides an overview of how rc-core works today. Note that dashed lines denote optional parts.
As part of its initialization, a kernel hardware driver calls
rc_register_device() which returns a
struct rc_dev. This struct holds all the
relevant state of the rc device (similar to the role played by
struct input_dev in the input subsystem). All
further interaction with rc-core (via the
functions defined in include/media/rc-core.h) uses
the struct rc_dev to keep track of the
relevant device.
For each registered struct rc_dev,
rc-core maintains a keytable,
which is simply an array with scancode-keycode mappings. Scancodes are
protocol specific, but most IR protocols divide the
transmitted message into system and
command bits (though the terminology varies).
Roughly, the system bits define the device that is the intended
recipient (which is why the remote control for the stereo will,
hopefully, not have any effect on the TV and vice versa, even if they
use the same protocol).
For example, an IR in the RC5 protocol has 5 system bits and 7 (originally 6) command bits as well as one toggle bit (to distinguish between a long keypress and several short ones, not used in the scancode). The command bits are stored in the lowest 8 bits and the system bits are stored in the next 8 bits. Given a scancode of 0x0000031F, we can therefore deduce that the system was 0x03 and the command was 0x1F (assuming the scancode is an RC5 scancode).
Keycodes are defined in input/uapi/linux/input.h
as integer constants like KEY_A,
KEY_ESC, etc. The keytable is typically populated
at module load by the rc hardware driver setting the
map_name of the struct
rc_dev to the name of a keymap. Default keymaps
are implemented as loadable kernel modules (see
drivers/media/rc/keymaps/ for a long list) and
rc-core will make sure that the corresponding
keymap kernel module is loaded and used to populate the keytable.
For raw hardware, rc-core
will also load various decoding modules (per default all known modules
are loaded) which are used to decode the pulse-space timings produced
by this kind of hardware into the scancodes explained above.
rc-core also exposes a sysfs
API (see /sys/class/rc/rcX/)
which can be used to, inter alia, enable/disable
various protocols during runtime for raw as well as cooked hardware
(the former simply enabling/disabling the use of the software decoder
while latter usually means that the underlying hardware is
reconfigured).
rc-core also maintains an input
device (struct input_dev) per
struct rc_dev. As rc messages are received
and, optionally, decoded, the keytable is consulted. If a matching
scancode-keycode entry is found, the event (e.g.
KEY_A is pressed) is reported to the input
subsystem. In any case, the scancode is reported to the input
subsystem (as a
EV_MSC/MSC_SCAN event),
allowing keymaps to be constructed in userspace for new remotes.
The input subsystem, in turn, maintains a chardev per input device
(e.g. /dev/input/event12) which allows interested
userspace applications to be notified of input events on the relevant
device(s) by open():ing the chardev and
read():ing struct
input_event events from it (this is how e.g. the
X server and libinput
library receive events. The input device that
rc-core creates looks just like a keyboard to a
userspace application, allowing it to be blissfully unaware of any rc
related details.
The input subsystem also contains functionality to allow userspace to
manipulate the keytable via ioctl():s on the event
chardev. The traditional ioctl():s
(EVIOCGETKEYCODE and
EVIOCSKEYCODE) only took a pair of integers
(scancode/keycode) as arguments. Since scancodes could be larger than
an integer, the EVIOCGKEYCODE_V2 and
EVIOCSKEYCODE_V2 ioctl():s,
which are able to take much larger scancodes, were introduced. The new
functions also allow scancode/keycode mappings to be looked up by index
rather than by scancode, which allows the whole keytable to be dumped
in a simple fashion.
Finally, the rc subsystem includes the optional
lirc module, which acts like a bridge between
rc-core and userspace applications which expect the
LIRC API. For implementational reasons, the
lirc module is implemented as a codec, meaning
that it will receive the raw pulse/space events for raw hardware. If
lirc is loaded, rc-core
maintains a struct lirc_codec per
struct rc_dev, which maintains the relevant
state.
The lirc module maintains a per-device
/dev/lircX chardev which a userspace application
can open() and then perform
read(), write(), and
ioctl() on in order to receive and send data in
the form of pulse-space timing integers and to control various
parameters of the hardware (such as carrier, duty cycle, timeout, etc).
The lirc API is IR specific.
As noted above, setting and getting keycodes in the input subsystem
used to be done via the EVIOC[GS]KEYCODE
ioctl() with an unsigned
int[2] parameter (one int for scancode and one for
the keycode).
The interface has since been extended to use the
EVIOC[GS]KEYCODE_V2 ioctl()
which uses the an input_keymap_entry
structure:
struct input_keymap_entry {
__u8 flags;
__u8 len;
__u16 index;
__u32 keycode;
__u8 scancode[32];
};
Note that scancode can of course be even bigger, thanks to the
len field.
This is what is currently used in rc-core, thereby
allowing arbitrary sized scancodes and other advantages (such as
index-based lookup of keytable entries).
The problem is that even though the kernel is aware of the protocol
which was used to generate any given scancode/keypress, it has no way
of communicating that information to userspace. While userspace can get
the scancode of an unknown remote control from the kernel (assuming the
kernel was able to decode it) via reading
EV_MSC/EV_SCAN events from
the input device, there is no way to get the corresponding protocol.
Furthermore, there is no way for userspace to tell the kernel which protocol a given scancode corresponds to, meaning that the kernel can't know the protocol a given scancode corresponds to in the keytable.
Scancodes can, and will, overlap.
RC5 message to address 0x00, command 0x03 has scancode 0x00000503
JVC message to address 0x00, command 0x03 has scancode 0x00000503
It is only possible to distinguish (and parse) scancodes by knowing the
scancode and the protocol.
The NEC represents a special case of the problems
identified above. While the kernel correctly makes a distinction
between protocols within the same family that are genuinely different
(e.g. Sony12 and Sony15,
which have different lenghts) it also distinguishes between
NEC16, NEC24 and
NEC32 (which do not).
The NEC protocol historically contained 8 bits of
address and 8 bits of command
and the inverse of address and
data (which acts like a checksum and also make the
transmission time constant). So, in reality, 32 bits were used to
transmit 16 bits.
The adress space of the NEC protocol was quickly
used up and NEC24 was created by removing the
address redundancy (i.e. the 32 bits were redefined as
address_low, address_high,
command and command_inverted.
This was later extended to do the same trick with the
command bits, giving 16 bits of
address and 16 bits of
command in what the kernel calls
NEC32.
Distinguishing between the three variations in the kernel creates
ambiguity, not just between protocol scancodes (as shown above), but
also within the NEC protocol itself.
NEC16 message to address 0x05, command 0x03 has scancode 0x00000503
NEC24 message to address 0x0005, command 0x03 has scancode 0x00000503
These two messages look completely different when transmitted, but result in the same scancode. And userspace will never be able to tell them apart. Furthermore, the pointless distinction (in-kernel) of the scancodes mean that the same code is repeated over and over again througout different drivers.
if (buf[14] == (u8) ~buf[15]) {
if (buf[12] == (u8) ~buf[13]) {
/* NEC */
state->rc_keycode = RC_SCANCODE_NEC(buf[12],
buf[14]);
proto = RC_TYPE_NEC;
} else {
/* NEC extended*/
state->rc_keycode = RC_SCANCODE_NECX(buf[12] << 8 |
buf[13],
buf[14]);
proto = RC_TYPE_NECX;
}
} else {
/* 32 bit NEC */
state->rc_keycode = RC_SCANCODE_NEC32(buf[12] << 24 |
buf[13] << 16 |
buf[14] << 8 |
buf[15]);
proto = RC_TYPE_NEC32;
}
(the above example is taken from
drivers/media/usb/dvb-usb-v2/af9015.c).
The only solution is to make the protocol explicit, for userspace and
for the kernel. This requires the
EVIOC[GS]KEYCODE_V2 ioctl():s
to include the protocol. The proposed solution changes how the
input_keymap_entry struct is interpreted by
rc-core by casting it to a new
rc_keymap_entry struct:
struct rc_keymap_entry {
__u8 flags;
__u8 len;
__u16 index;
__u32 keycode;
union {
struct rc_scancode rc;
__u8 raw[32];
};
};
struct input_keymap_entry {
__u8 flags;
__u8 len;
__u16 index;
__u32 keycode;
__u8 scancode[32];
};
The u64 scancode field is large
enough for all current protocols and because of the
len field, it would be possible to define a
larger scancode, should it be necessary for
some exotic protocol.
The proposed situation would thus look something like this (compare to the previous situation in the first figure):
The advantages with this change are:
The protocol is made explicit for both userspace and kernel.
struct rc_map no longer hardcodes the protocol, meaning that keytables with mixed entries are possible.
The kernel can use the protocol information to automatically
enable/disable the right protocols in hardware or software,
making the use of /sys/class/rc/rcX/protocols
unnecessary in most cases.
The disadvantages are:
For userspace application which use the old style
ioctl():s, we have no choice but to
guess the protocol which was intended when
a call to EVIOC[GS]KEYCODE[_V2] is made.
That guess might be wrong, meaning we break
userspace.
The change means that the protocol number definitions become part of the userspace ABI and API. Any changes to the protocol list (save for additions) is therefore impossible once a change like this has been made.
The userspace breakage is unavoidable if we ever
want to escape the current protocol-less situation since, as noted
above, we have no choice but to guess when ioctl()
calls are made where the protocol is missing.
However, it should be noted that LIRC applications are unaffected as well as any “RC unaware” application which only consume input events. The damage is limited to applications which manipulate the kernel keytables (and if those tools fail, then input consumers will of course be affected).
Fortunately, that means that it is essentially only ir-keytable which is affected and needs to be updated. Updating various tools as the kernel is updated is a job that distributions are good at.
There are two possible solutions for the NEC
protocol issue. One is to simply perform the above mentioned change to
the new ioctl() structs and export three different
NEC protocols to userspace. The alternative solution is to use the
opportunity provided by the changeover to the new structs to also move
over to always use NEC32 in the kernel and in the
kernel ⇄ userspace interface.
The advantage of the first solution is that it requires less change to
existing drivers and that some “cooked” hardware only
supports a subset of the NEC protocol types
(usually, the hardware only supports NEC16 and
only reports two bytes to the kernel driver). Separating the
NEC protocols in the API makes
it simpler to report that this is the case to userspace (not that there
is much userspace can do with that knowledge).
The major advantage of the second solution is that we can remove a lot
of pointless code from the kernel and simplify the kernel ⇄ userspace
communication at the same time since NEC32
representation allows an unambigous definition of any scancode. At
present all duplications of NEC
“decoding” logic in the kernel is around 300 lines of code
and each additional driver risks adding more.
Of course, userspace programs would still be free to represent
NEC scancodes in a suitable manner as 16, 24 and
32 bit variations. We're just talking about the kernelspace ⇄ userspace
API here.
To use an analogy, the current situation is like the kernel would report the IPv6 loopback address “::1/128” as some kind of special protocol because it “knows” that it has a special meaning.