meniOS exposes a 64-bit system call interface for userland programs. This document summarises the calling convention, error handling contract, and the catalog of currently implemented syscalls so the toolchain and libc work can rely on a stable specification.
- Fast path: user mode executes the
syscallinstruction. The MSR-programmed entry stub (src/kernel/lidt.s:sym=syscall_entry) switches to the per-CPU kernel stack, materialises asyscall_frame_t, and tail-callssyscall_dispatch. The return path properly preserves full 64-bit return values. - Compatibility: the legacy
int $0x80gate remains in the IDT for debugging, but production binaries should use the wrappers ininclude/menios/syscall_user.hwhich emitsyscalldirectly. - CPU mode: long mode (x86-64). All arguments and return values are 64-bit.
| Register | On entry | On return |
|---|---|---|
rax |
System call number (see include/menios/syscall.h). |
Return value (non-negative success, negative -errno on failure). |
rdi |
Argument 0 | Preserved unless documented otherwise. |
rsi |
Argument 1 | Preserved. |
rdx |
Argument 2 | Preserved. |
r10 |
Argument 3 | Preserved. |
r8 |
Argument 4 | Preserved. |
r9 |
Argument 5 | Preserved. |
rcx, r11 |
Used internally by syscall; always destroyed. |
|
| Stack pointer | Must remain 16-byte aligned. The kernel neither adjusts nor validates the user stack. |
The inline helpers in include/menios/syscall_user.h already follow this
convention. For calls with four or more parameters (for example mmap) you must
place arguments manually in r10, r8, and r9, mirroring Linux.
The kernel returns non-negative values on success. Failures are reported as a
negative errno (-E...) stored in rax. Userland helpers typically convert
negative results into errno and return -1. No errno values are written by
the kernel directly into user memory.
The errno namespace is defined in include/sys/errno.h and is intentionally
aligned with POSIX where possible.
- Paths copied from user space are limited to
SYSCALL_PATH_MAX(256 bytes). They are resolved relative to the caller'scwdusingvfs_build_absolute_path. execvecopies up to 64 arguments and 64 environment variables and limits the total copied string data to 4096 bytes. Excess arguments or environment values yield-E2BIG.- All buffers provided by userland are validated through
proc_user_buffer_accessiblebefore the kernel reads or writes them. If the check fails the kernel returns-EFAULT.
System call numbers are listed in include/menios/syscall.h. They currently
mirror Linux values for the overlapping subset but that is not a compatibility
promise. Below is a summary of the calls that ship in meniOS v0.1.0.
SYS_READ(0) —ssize_t read(int fd, void *buf, size_t count);Reads from an open file descriptor. Returns the number of bytes read or-EBADF,-EFAULT, or storage-driver specific errors.SYS_WRITE(1) —ssize_t write(int fd, const void *buf, size_t count);Writes to an open descriptor. Returns the number of bytes written or negative errno on failure.SYS_OPEN(2) —int open(const char *path, int flags /*mode unused*/);Opens a file relative to the caller'scwd.O_CLOEXECis honoured at file installation; the mode parameter is currently ignored. Returns a descriptor number.SYS_CLOSE(3) —int close(int fd);Closes a descriptor.SYS_LSEEK(8) —off_t lseek(int fd, int64_t offset, int whence);Adjusts the file position.SYS_DUP(32) /SYS_DUP2(33) —int dup(int oldfd);,int dup2(int oldfd, int newfd);Duplicate file descriptors.dup2will overwrite the target descriptor.SYS_PIPE(22) —int pipe(int fds[2]);Creates a unidirectional pipe. On successfds[0]is readable andfds[1]is writable.SYS_FCNTL(72) —int fcntl(int fd, int cmd, unsigned long arg);Currently supportsF_GETFDandF_SETFDfor theFD_CLOEXECflag.SYS_IOCTL(73) —int ioctl(int fd, unsigned long request, void *argp);Forwards to the underlying driver. Only the device filesystem exposes handlers today; unsupported requests return-ENOTTYor driver-specific errors.
SYS_MMAP(9) —void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);Currently limited toMAP_ANONYMOUS | MAP_PRIVATEmappings. File-backed mappings andMAP_FIXEDare not yet supported. Memory is allocated in page sized chunks and aligned to the userland mmap window.SYS_MUNMAP(11) —int munmap(void *addr, size_t length);Unmaps a region previously returned bymmap.SYS_SHMGET(74) /SYS_SHMAT(75) /SYS_SHMDT(76) /SYS_SHMCTL(77) — System V shared memory primitives.SHM_REMAPis rejected, and onlyIPC_RMID/IPC_STATare implemented inshmctl.
Note on heap allocation: POSIX brk()/sbrk() are provided by libc
(src/libc/brk.c) as a compatibility shim backed by mmap(MAP_ANONYMOUS).
There are no SYS_BRK or SYS_SBRK syscalls; the shim maintains a locked
arena entirely in userspace.
SYS_FORK(57) —pid_t fork(void);Parent receives the child's PID, the child sees zero, and failures return-ENOMEMor another negative errno.SYS_EXECVE(59) —int execve(const char *path, char *const argv[], char *const envp[]);Replaces the current image. Path resolution honours the caller'scwd.SYS_EXIT(60) —void _exit(int status);Terminates the calling process. Does not return.SYS_WAITPID(61) —pid_t waitpid(pid_t pid, int *status, int options);SupportsWNOHANG. On success writes a status code that follows traditional POSIXwaitpidencoding.SYS_PROC_KILL(64) —int proc_kill(pid_t pid, int status);Forcefully transitions the target process to zombie state and records the provided status code. Used by init's supervision loop.SYS_PROC_LIST(65) —ssize_t proc_list(char *buffer, size_t capacity);Writes a newline-delimited snapshot of active processes beginning with the headerPID STATE NAME. Requires enough capacity to include a trailing NUL.SYS_YIELD(24) —int sched_yield(void);Marks the current thread ready and switches to the scheduler. Always returns zero after rescheduling.SYS_SLEEP(35) —int usleep(uint64_t usec);Parks the caller for at least the requested microseconds (rounded up to the scheduler tick). Always returns zero and is not interruptible by signals.SYS_NANOSLEEP(83) —int nanosleep(const struct timespec *req, struct timespec *rem);Suspends the caller for the requested duration with microsecond resolution. Returns zero on success. If interrupted by a signal, writes the remaining time torem(when non-NULL) and returns-EINTR.
SYS_KILL(66) —int kill(pid_t pid, int signo);Delivers a signal to another process. The kernel validates the signal number and process state.SYS_SIGACTION(67) —int sigaction(int signo, const struct sigaction *act, struct sigaction *oldact);Copies handler state between userland and the kernel. Handlers execute on the user stack.SYS_SIGPROCMASK(68) —int sigprocmask(int how, const sigset_t *set, sigset_t *oldset);Adjusts or queries the calling process's signal mask.
SYS_LISTDIR(62) —ssize_t listdir(const char *path, char *buffer, size_t size);Enumerates directory entries into a newline-delimited string. IfbufferisNULLorsizeis zero the call returns the number of bytes that would have been written.SYS_CHDIR(78) —int chdir(const char *path);Updates the process working directory.SYS_GETCWD(79) —char *getcwd(char *buf, size_t size);Copies the current working directory into user memory. The return value is the provided buffer pointer on success.SYS_STAT(99) —int stat(const char *path, struct stat *buf);Retrieves file metadata for the given path. Follows symbolic links. Returns zero on success or-ENOENTif the path does not exist,-EFAULTifbufis inaccessible, or-ENOSYSfor pseudo-filesystems that don't yet expose metadata (devfs, procfs, pipes). FAT32 and tmpfs provide full metadata; additional filesystems will be updated incrementally.SYS_LSTAT(100) —int lstat(const char *path, struct stat *buf);LikeSYS_STATbut does not follow symbolic links. Since symbolic links are not yet implemented, this currently behaves identically toSYS_STAT.SYS_FSTAT(101) —int fstat(int fd, struct stat *buf);Retrieves file metadata for an open file descriptor. Returns zero on success or-EBADFif the descriptor is invalid,-EFAULTifbufis inaccessible, or-ENOSYSif the underlying filesystem driver does not implement the.statoperation. The VFS caches metadata at open time when available.SYS_CHMOD(102) —int chmod(const char *path, mode_t mode);Updates the permission bits associated with the path. For tmpfs this applies immediately; FAT32/devfs/procfs currently return-ENOSYS.SYS_FCHMOD(103) —int fchmod(int fd, mode_t mode);Updates permissions on an open file descriptor. Behaves likeSYS_CHMODbut operates on an already opened handle.SYS_UTIME(104) —int utime(const char *path, const struct utimbuf *times);Sets the access and modification timestamps for the provided path. PassingNULLupdates both to the current realtime clock. Supported on tmpfs; other filesystems currently respond-ENOSYS.SYS_SHUTDOWN(105) —int shutdown(void);Initiates an orderly shutdown by flushing the block cache, dismantling VFS mounts, and issuing an ACPI S5 power-off request. Returns zero on success (the system should power down before returning) or-EIOif ACPI reports a failure.
SYS_STDIN_POLL(63) —int stdin_poll(void);Non-blocking read from the kernel's keyboard buffer. Returns the next byte (0–255) or-EAGAINif no input is waiting. Used bymosh's interactive loop.
SYS_TIME(81) —time_t time(time_t *tloc);Returns the current time as seconds since the Unix epoch (January 1, 1970). Iftlocis non-NULL and accessible, the time is also stored at that address. Returns the time value or-EFAULTiftlocis invalid.SYS_GETTIMEOFDAY(82) —int gettimeofday(struct timeval *tv, struct timezone *tz);Retrieves the current time with microsecond precision. Thetvparameter receives seconds and microseconds since the Unix epoch. Thetzparameter is ignored (for POSIX compatibility). Returns zero on success or-EFAULTiftvis NULL or inaccessible.SYS_CLOCK_GETTIME(84) —int clock_gettime(clockid_t clk_id, struct timespec *tp);SupportsCLOCK_REALTIME(wall clock with microsecond resolution and adjustable viaclock_settime) andCLOCK_MONOTONIC(time since boot). Returns zero on success or-EINVALfor unsupported clocks.SYS_CLOCK_SETTIME(85) —int clock_settime(clockid_t clk_id, const struct timespec *tp);Currently allows adjustingCLOCK_REALTIME. Other clocks return-EINVAL.SYS_CLOCK_GETRES(86) —int clock_getres(clockid_t clk_id, struct timespec *res);Reports the kernel's nominal resolution (microsecond granularity) forCLOCK_REALTIMEandCLOCK_MONOTONIC.SYS_SETITIMER(87) —int setitimer(int which, const struct itimerval *new_value, struct itimerval *old_value);Currently supportsITIMER_REAL(deliversSIGALRM). Interval timers are maintained with microsecond granularity.SYS_GETITIMER(88) —int getitimer(int which, struct itimerval *curr_value);Reports the pending expiration and interval for supported timers.SYS_GETPAGESIZE(80) —long getpagesize(void);Returns the system page size (typically 4096 bytes). Used by the userland allocator and other memory management utilities.
Beyond shared memory the kernel does not yet expose sockets, message queues, or other IPC surfaces. Future milestones (issues #105–#107) will extend the ABI; for now userland should rely on pipes, signals, shared memory, and the supervision helpers documented above.
#include <menios/syscall.h>
void *anon_page(void) {
register long rax asm("rax") = SYS_MMAP;
register long rdi asm("rdi") = 0; // hint address
register long rsi asm("rsi") = 4096; // length
register long rdx asm("rdx") = PROT_READ | PROT_WRITE;
register long r10 asm("r10") = MAP_PRIVATE | MAP_ANONYMOUS;
register long r8 asm("r8") = -1; // fd (ignored)
register long r9 asm("r9") = 0; // offset
asm volatile("syscall"
: "+a"(rax)
: "D"(rdi), "S"(rsi), "d"(rdx), "r"(r10), "r"(r8), "r"(r9)
: "rcx", "r11", "memory");
if(rax < 0) {
// errno style handling here
return (void*)-1;
}
return (void*)rax;
}The SYS_STAT, SYS_LSTAT, and SYS_FSTAT syscalls were added to expose
filesystem metadata to userland. The implementation spans kernel, VFS, and
individual filesystem drivers:
Kernel infrastructure (include/kernel/fs.h:24, src/kernel/fs/fat32.c:2721):
struct fs_path_info— uniform descriptor for file metadata (size, permissions, timestamps, file type)fs_path_info_to_stat()— helper to convertfs_path_infoto POSIXstruct stat
VFS integration (include/kernel/vfs.h:21, src/kernel/fs/vfs.c:203):
vfs_path_info()— query function to retrieve metadata for a path.statcallback infile_ops— allows drivers to report metadata for open file descriptors- VFS caches
fs_path_infowhen opening files for efficientfstat()queries
Syscall handlers (include/menios/syscall.h:40,
src/kernel/syscall/syscall.c:933):
SYS_STAT/SYS_LSTAT— resolve absolute path, callvfs_path_info(), convert tostruct stat, copy to userspaceSYS_FSTAT— validate file descriptor, retrieve cached metadata or invoke driver's.statcallback, copy to userspace
Driver support:
- FAT32 — full implementation; parses directory entries for size and read-only flag. Timestamps, DOS attributes (beyond read-only), and long name metadata are not yet parsed. Permissions are hard-coded (0644 for files, 0755 for directories).
- tmpfs, procfs, devfs, pipes, console devices —
.stat = NULL; all stat queries return-ENOSYS. These drivers need to be extended to report metadata.
Libc wrappers (src/libc/stat.c:20, src/libc/unistd.c:325,
user/libc/realpath.c:1):
stat(),lstat(),fstat()— delegate to corresponding syscallsaccess()— usesstat()to check real mode bits for accessibilityrealpath()— canonicalizes paths and verifies existence withstat()pathconf()— partially implemented; only_PC_PATH_MAXsupported (all other queries return-ENOSYS)
Known limitations (tracked in issue #364):
- Pseudo-filesystems lack metadata support
- FAT32 metadata is skeletal (size + read-only only)
- File mutation syscalls (
chmod,utime) remain stubs
- The ABI intentionally mimics Linux where practical, but only the calls listed above are guaranteed to exist. Expect differences in flag handling and error coverage.
- Implementation status: The
syscallinstruction is fully implemented with proper 64-bit return value handling (#221, #274 complete). - Be conservative with path lengths and buffer sizes. Exceeding the documented
limits results in
-ENAMETOOLONG,-EFAULT, or-ERANGE.
This document will evolve as the GCC milestone lands and new system calls are added.