hubertf's NetBSD Blog
Send interesting links to hubert at feyrer dot de!
 
[20080908] source-changes catchup mid-July to early September 2008 (Updated)
Welcome to yet another catch-up of NetBSD source-changes mailing list, this time from mid-July to early September 2008. Besides FFS having journaling now (yai! first in BSD-land, ever! :-), here's what's new and/or exciting:
  • In order to re-initialize x86 machines' video/VGA state after suspend and resume, some BIOS functions can be used. This needs to be done in real mode(?), which is a bit hard to do from an operating system kernel that runs in protected mode. To help doing so, a x86 CPU emulator was added to NetBSD some time ago, to help run VGA bios for ACPI resume. Now Joerg has added a sysctl that does just this, assuming your kernel has the VGA_POST options -- set machdep.acpi_vbios_reset=2

  • Inside the kernel, data sent/received through the network stack is stored in chains of mbufs. So far, the mbufs were also used to store socket options, i.e. data describing further how the sending/receiving is done. This was split out into a separate struct sockopt by Ian 'plunky' Hibbert now. For more information, see sockopt(9).

  • Hans 'woodstock' Rosenfeld has added a new accalerated driver for SPX graphics boards found in some VAXstations, which replaces the old and broken lcspx driver. The work is based on work by Blaz Antonic.

  • The simonb-wapbl branch was merged: ``Add Wasabi System's WAPBL (Write Ahead Physical Block Logging) journaling code. Originally written by Darrin B. Jewell while at Wasabi and updated to -current by Antti Kantee, Andy Doran, Greg Oster and Simon Burge.'' This makes NetBSD the first second (see update below) BSD operating system that has a working file system with journaling (not counting LFS, which again and again has issues). Mmm, no more fsck! :-) See my other posts for more on journaling / wapbl.

    Update: James Mansion wrote me to that NetBSD's not the first BSD to have journaling, and I think he's right: DragonflyBSD's HAMMER file system apparently offers similar functionality: ``HAMMER implement an instant-mount capability and will recover information on a cluster-by-cluster basis as it is being accessed.''

  • Accept filters were ported from FreeBSD by Coyote Point Systems, and integrated into NetBSD by Thor Lancelot Simon. What are accept filters? According to the accept_filter(9) manpage, they ``allow an application to request that the kernel pre-process incoming connections.'' Pre-defined filters are available with accf_data(9) and accf_http(9). The latter makes sure that the application's accept(2) call only sees the connection if there's a valid HTTP header, moving parts of the parsing from userland (httpd) to the kernel.

  • Work is underway for crossbuilds of modular X.org. This is done via src/external/mit/xorg, which needs xsrc/external/mit. The results will be installed in /usr/X11R7(!). (XXX Where can I find more about this?)

  • Gregory McGarry is working to get the tree compiled with PCC instead of GCC. This is still ongoing.

  • nvi was updated from version 1.79 to 1.81. The most important part of this update is that internationalization is now handled by default.

  • Following a bigger masterplan, new 3rd party software packages are now imported into src/external/${license}, which will replace src/dist, src/crypto/dist and src/gnu/dist in the long run. Packages will be moved on upgrades only, existing packages are not being moved just for the sake of moving them.

  • Adam Hamsik is working on getting Logical Volume Management (LVM) going in NetBSD. He has adapted Linux' "device mapper" kernel-interface as part of his Google Summer-of-Code project, and with the help of the (GPL'd) Linux tools, things are looking pretty good. More on this in a separate post. This work is currently happening on the haad-dm branch.

  • In the context of his work on UDF, Reinoud has added routines for speeding up directory handling by using hash gables. Lookup of files was O(n*n) and is now O(1) even for file creation. See my other blog posting for details and impressive numbers.

  • Perry Metzger is working to make binary builds identical. This is useful for binary diffs between releases/builds, e.g. when providing binary patches for updates and security fixes. Areas where this had an impact on are C++ programs and various bootloaders (which had a builder, build date, etc. in it so far).

  • EHCI (USB) can now do high speed isochronous support. This was developed by Jeremy Morse as part of his Google Summer-of-Code "dvb" project this year, it is useful for fast transfer of data that comes in steady streams, e.g. from video cards.

  • fsck_ffs(8) now has options -x and -X (just like dump) that create a file system snapshot via fss(4), and then operates on the snapshot. This allows "fsck_ffs -n" to work on a snapshot of a read/write mounted file system, and avoid errors related to file system activity. Can be made permanent for the nightly script by setting run_fsck_flags="-X" in /etc/daily.conf. This was brought to you by our Xen-hacker Manuel Bouyer. :-)
So much for this time. Many of the above projects are work-in-progress, and we can look forward for further news on them next time. Stay tuned!

[Tags: , , , , , , , , , ]


[20080826] Journaling performance (Updated)
Mathias Scheler has an interesting blog posting about the difference of using journaling on file system performance. The test he did was extracting NetBSD 4.0 sources.
  • With plain FFS, the extract took 15:19 minutes
  • With journaling, it took 3:24 minutes.
A clear winner. (No numbers with soft dependencies, though, but they can be expected to be comparable to the journaling number).

Update: Matthias has posted updated numbers that include soft dependencies. The best bet so far is still WAPBL. It's a bit slower than async mounts, but as they are VERY unsafe, that's not recommended, at all. Use WAPBL!

[Tags: , ]


[20080823] Trying out journaling
After NetBSD got journaling integrated into FFS recently, I've built and installed -current, and had a look. In short: it works just as expected. In other words: Yai! :-) :-) :-)

The wapbl(4) manpage gives more details: To enable, a kernel with "options WAPBL" needs to run, which is available in NetBSD-current since end of July 2008. Userland from a similar date is useful, as the mount(8) command needs to know about the new "log" option. With the proper system, it's pretty much a no-brainer:

  1. In /etc/fstab, enable logging for the file system(s) you need, in my case it's just /:
         /dev/wd0a       /       ffs     rw,log        1 1 

    This is actually the only thing that needs to be done. All the rest writen here just explains things in a bit more details.

  2. Note that journaling is not active on the file system(s) at this point, so pressing the reset button for testing will result in a file system check (fsck) - don't do it right now. :)

  3. Reboot the system. Nothing special will show up in the boot messages:
         ...
         audio2 at pad0: half duplex
         boot device: wd0
         root on wd0a dumps on wd0b
         root file system type: ffs
         Fri Aug 22 20:45:55 CEST 2008
         swapctl: adding /dev/wd0b as swap device at priority 0
         Starting file system checks:
         /dev/rwd0a: file system is clean; not checking
         Setting tty flags.
         ...  

  4. Let's recall what happens here: after probing the hardware and initializing device drivers (audio, ...), the kernel looks at disk drives for a file system with a root partition (i.e. a disk with BSD disklabel, "a" partition, and a known file system in it). It will use the first root file system it finds, and mount it read-only.

    As the above output is from a multi-user boot (not a single-user boot), the kernel continues to run init(8), which in turn runs /etc/rc (which then runs all of /etc/rc.d/* etc.). First things in the boot process can be determined by using the rcorder(8) tool just like /etc/rc does:

         $ cd /etc/rc.d/
         $ rcorder * | head
         wdogctl
         raidframe
         cgd
         ccd
         swap1
         fsck
         root
         ... 

    Of the above scripts, raidframe, cgd and ccd configure additional disk devices, wdogctl and swap1 are of minor interest here. The two interesting scripts are "fsck" and "root": "fsck" runs fsck(8), which in turn goes through the list of known file systems in /etc/fstab, and checks for each file system if it was unmounted cleanly last time. If not, the file system will be checked, possibly repaired, and marked as clean. This is the much-hated, time consuming process preventing a fast reboot when the system crashed.

    After ensuring all file systems are in a consistent state, the "root" script mounts the root (/) file system read-write.

    Following that, all other scripts run, create temporary files, configure network devices, enable login and whatnot. Important parts here are the order of the kernel first mounting the root file system read-only, and after checking enable writing.

  5. As we have marked the root file system for journaling, the log (journal) is created when mounting the file system read-write. For NetBSD, the log has only meta-data, i.e. information on what changes were made to the file system's management data structures like directories, link counts, etc. No data blocks are journaled. This may not be 100% optimal from a user point, but it ensures that the file system is in a consistent state with respect to meta-data.

  6. When the file system is mounted with journaling enabled, bad things are welcome (well, sort of :-) to happen, and the system will handle them gracefully: kernel panics, power failures, someone pressing the reset button - everything that disrupts system operation and gets the file system into an inconsistent state will be caught by replaying the journal on the next boot.

    Note that journaling will not help about user/admin errors like when you accidentally remove a file!

  7. After the system went down in flames -- for research purpose and better predictability, let's assume we've pressed the reset button -- with the file system in an unclean state, this will be displayed on the next boot:
         ...
         audio2 at pad0: half duplex
         boot device: wd0
         root on wd0a dumps on wd0b
         /: replaying log to memory
         root file system type: ffs
         Fri Aug 22 20:49:55 CEST 2008
         swapctl: adding /dev/wd0b as swap device at priority 0
         Starting file system checks:
         /dev/rwd0a: file system is journaled; not checking
         /: replaying log to disk
         Setting tty flags.
         ...  
  8. After finding the root file system, the kernel first recognizes the journal, and assumes that the system crashed. The system doesn't know what's up with the disk so far, so won't go and alter the disk by writing the changes from the log onto the disk. Instead, those changes are replayed to memory only. This leaves the disk as-is, but the in-memory view of the file system will be consistent.

    Running fsck then recognizes the file system as journaled, and won't touch it, assuming that the log caught all bads. Mounting the file system in the next step finally replays the changes in the journal onto the disk, and finally sets it into a consistent state permanently. After that, the regular boot process can proceed as usual.

    Please note that the messages "/: replaying log to memory/disk" are printed by the kernel, as it's the kernel that runs all the file system code.

  9. When the system is up and running, the mount(8) command can be used to determine if logging is enabled or not:
         # mount
         /dev/wd0a on / type ffs (log, local) 
    The "log" here in the mount options indicates that journaling is enabled.

First impressions of journaling are pretty good, the facts that the journal needs no further maintenance. The fact that it's placed inside the file system per default and doesn't need extra space is very nice, too. People that want to keep the log after a partition for a reason can do so, plus also specify a maximum journal size.

The enduser impact of this is that lenghty file system checks are (hopefully :-) a thing of the past now!

[Tags: , ]


[20080816] Catching up, once more
After a few days of offline-experience, here's a short summary of what happens that I haven't seen mentioned widely:
  • NetBSD achieves permanent charity status: ``The Foundation has been a 501(c)(3) charity since 2004, but previously the status was given under an advanced ruling period, i.e. it was of limited time. The permanent charity status is also known as 170(b)(1)(A)(vi).

    Being a public charity is important to us, as it means that we are eligible to receive employer matching donations, as well as to enjoy the most beneficial tax treatment. ''

  • Metadata journaling support added to FFS: ``In case of a crash or unexpected power loss however, the journaled file system will not need a lengthy file system check at boot time, but instead the kernel will replay the log within seconds. This allows faster crash recovery, less overall downtime and higher availability.

    Converting an existing system to use the log feature is as easy as updating (both kernel and userland), making sure the kernel option WAPBL is selected (this is the default for GENERIC kernels now), adding a ?log? option to /etc/fstab and rebooting. Note that WAPBL is not compatible with soft-dependencies, so please ensure that you first remove the ?softdep? option if present. See the wapbl(4) manual page for more information. ''

    Kudos for this go to Wasabi Systems, Darrin B. Jewell, Simon Burge, Greg Oster, Antti Kantee, and Andrew Doran!

  • Uli 'rhaen' Habel wrote me that he wrote a blosxom plugin for gnats: ``During my work for pkgsrc I started to write articles for my blog and I referred to several PRs from the NetBSD gnats system. However I just wanted to type the PR in the form of e.g. NetBSD PR pkg/39230 and would like to have my blog software to link to the webpage automatically''.

    Blosxom is the blogging software that Uli and I use, and you can learn more about his GNATS plugin, and download it, here. (Apparently I didn't get to install this plugin yet, that's why you don't see a link on the above quoted text :-).

  • Stefan Schumacher wrote me that the german magazine Die Zeit has an article on operating systems showing screenshots of several operating systems, starting with C64 Basic V2, going over MS-DOS and Windows to more esoteric ones like Mac OS X, Solaris, and *cough* BSD. Check the screenshot of the latter one! ;)

  • Another one from Uli Habel: His (NetBSD|pkgsrc) blog is now syndicated on www.onetbsd.org.

  • Wilhelm Buehler hints me at EuroBSDcon 2008: ``EuroBSDCon is the european technical conference for people working on and with 4.4BSD based operating systems and related projects. EuroBSDCon 2008 will take place in Strasbourg, France 18-19 October 2008 at University of Strasbourg.''

  • There's an article by Warren Webb titled "Free software encircles embedded design" at Electronic Design, Strategy, News (EDN). The article starts by illustriating open source software as a natural (and cheap, or course) alternative to commercial systems, describes benefits of the development model and the wealth of applications and how they can be used in an embedded environment. It continues talking about licenses, tools, and alternatives to Linux, including NetBSD.

  • Those into funky gadgets may like the MoPods may be for you: ``As if a little charm pet wasn't reason enough for being, the MoPods are actually practical. When your mobile phone rings or receives a text within a metre of your MoPod then the little blighter will get in a tizz, spin round and round and a little light will flash wildly in reaction. The perfect visual warning if your phone is on silent or you are in a noisy bar.

    Whether hung on your bag, your clothes, your keys or your mobile, MoPods are a must-have, or as they say in Japan, a "hitsuyou".''

  • Back to our fine operating system: Ian Hibbert, who has written NetBSD's bluetooth stack, has worked on a PAN daemon for NetBSD. This allows to perform personal area networking in various ways:
    NAP
    Network Access Point is like an ethernet bridge
    GN
    Group ad-hoc Network is a NAP with no external network
    PANU
    Personal Area Networking User in both host (like GN but a single connection) and client (the device that connects to all the others) mode.
    All this will come in an upcoming NetBSD release (well, and FreeBSD too, it seems, as they like it :-) near you pretty soon, see Iain's mail to tech-net.

May the source be with you!

[Tags: , , , , , , , , , ]


Tags: , 2bsd, 34c3, 3com, 501c3, 64bit, acl, acls, acm, acorn, acpi, acpitz, adobe, adsense, Advocacy, advocacy, advogato, aes, afs, aiglx, aio, airport, alereon, alex, alix, alpha, altq, am64t, amazon, amd64, anatomy, ansible, apache, apm, apple, arkeia, arla, arm, art, Article, Articles, ascii, asiabsdcon, aslr, asterisk, asus, atf, ath, atheros, atmel, audio, audiocodes, autoconf, avocent, avr32, aws, axigen, azure, backup, balloon, banners, basename, bash, bc, beaglebone, benchmark, bigip, bind, blackmouse, bldgblog, blog, blogs, blosxom, bluetooth, board, bonjour, books, boot, boot-z, bootprops, bozohttpd, bs2000, bsd, bsdca, bsdcan, bsdcertification, bsdcg, bsdforen, bsdfreak, bsdmac, bsdmagazine, bsdnexus, bsdnow, bsdstats, bsdtalk, bsdtracker, bug, build.sh, busybox, buttons, bzip, c-jump, c99, cafepress, calendar, callweaver, camera, can, candy, capabilities, card, carp, cars, cauldron, ccc, ccd, cd, cddl, cdrom, cdrtools, cebit, centrino, cephes, cert, certification, cfs, cgd, cgf, checkpointing, china, christos, cisco, cloud, clt, cobalt, coccinelle, codian, colossus, common-criteria, community, compat, compiz, compsci, concept04, config, console, contest, copyright, core, cortina, coverity, cpu, cradlepoint, cray, crosscompile, crunchgen, cryptography, csh, cu, cuneiform, curses, curtain, cuwin, cvs, cvs-digest, cvsup, cygwin, daemon, daemonforums, daimer, danger, darwin, data, date, dd, debian, debugging, dell, desktop, devd, devfs, devotionalia, df, dfd_keeper, dhcp, dhcpcd, dhcpd, dhs, diezeit, digest, digests, dilbert, dirhash, disklabel, distcc, dmesg, Docs, Documentation, donations, draco, dracopkg, dragonflybsd, dreamcast, dri, driver, drivers, drm, dsl, dst, dtrace, dvb, ec2, eclipse, eeepc, eeepca, ehci, ehsm, eifel, elf, em64t, embedded, Embedded, emips, emulate, encoding, envsys, eol, espresso, etcupdate, etherip, euca2ools, eucalyptus, eurobsdcon, eurosys, Events, exascale, ext3, f5, facebook, falken, fan, faq, fatbinary, features, fefe, ffs, filesystem, fileysstem, firefox, firewire, fireworks, flag, flash, flashsucks, flickr, flyer, fmslabs, force10, fortunes, fosdem, fpga, freebsd, freedarwin, freescale, freex, freshbsd, friendlyAam, friendlyarm, fritzbox, froscamp, fsck, fss, fstat, ftp, ftpd, fujitsu, fun, fundraising, funds, funny, fuse, fusion, g4u, g5, galaxy, games, gcc, gdb, gentoo, geode, getty, gimstix, git, gnome, google, google-soc, googlecomputeengine, gpio, gpl, gprs, gracetech, gre, groff, groupwise, growfs, grub, gumstix, guug, gzip, hackathon, hackbench, hal, hanoi, happabsd, hardware, Hardware, haze, hdaudio, heat, heimdal, hf6to4, hfblog, hfs, history, hosting, hotplug, hp, hp700, hpcarm, hpcsh, hpux, html, httpd, hubertf, hurd, i18n, i386, i386pkg, ia64, ian, ibm, ids, ieee, ifwatchd, igd, iij, image, images, imx233, imx7, information, init, initrd, install, intel, interix, internet2, interview, interviews, io, ioccc, iostat, ipbt, ipfilter, ipmi, ipplug, ipsec, ipv6, irbsd, irc, irix, iscsi, isdn, iso, isp, itojun, jail, jails, japanese, java, javascript, jetson, jibbed, jihbed, jobs, jokes, journaling, kame, kauth, kde, kerberos, kergis, kernel, keyboardcolemak, kirkwood, kitt, kmod, kolab, kvm, kylin, l10n, landisk, laptop, laptops, law, ld.so, ldap, lehmanns, lenovo, lfs, libc, license, licensing, linkedin, links, linksys, linux, linuxtag, live-cd, lkm, localtime, locate.updatedb, logfile, logging, logo, logos, lom, lte, lvm, m68k, macmini, macppc, macromedia, magicmouse, mahesha, mail, makefs, malo, mame, manpages, marvell, matlab, maus, max3232, mbr95, mbuf, mca, mdns, mediant, mediapack, meetbsd, mercedesbenz, mercurial, mesh, meshcube, mfs, mhonarc, microkernel, microsoft, midi, mini2440, miniroot, minix, mips, mirbsd, missile, mit, mixer, mobile-ip, modula3, modules, money, mouse, mp3, mpls, mprotect, mtftp, mult, multics, multilib, multimedia, music, mysql, named, nas, nasa, nat, ncode, ncq, ndis, nec, nemo, neo1973, netbook, netboot, netbsd, netbsd.se, nethack, nethence, netksb, netstat, netwalker, networking, neutrino, nforce, nfs, nis, npf, npwr, nroff, nslu2, nspluginwrapper, ntfs-3f, ntp, nullfs, numa, nvi, nvidia, nycbsdcon, office, ofppc, ohloh, olimex, olinuxino, olpc, onetbsd, openat, openbgpd, openblocks, openbsd, opencrypto, opendarwin, opengrok, openmoko, openoffice, openpam, openrisk, opensolaris, openssl, or1k, oracle, oreilly, oscon, osf1, osjb, paas, packages, pad, pae, pam, pan, panasonic, parallels, pascal, patch, patents, pax, paypal, pc532, pc98, pcc, pci, pdf, pegasos, penguin, performance, pexpect, pf, pfsync, pgx32, php, pie, pike, pinderkent, pkg_install, pkg_select, pkgin, pkglint, pkgmanager, pkgsrc, pkgsrc.se, pkgsrcCon, pkgsrccon, Platforms, plathome, pleiades, pocketsan, podcast, pofacs, politics, polls, polybsd, portability, posix, postinstall, power3, powernow, powerpc, powerpf, pppoe, precedence, preemption, prep, presentations, prezi, Products, products, proplib, protectdrive, proxy, ps, ps3, psp, psrset, pthread, ptp, ptyfs, Publications, puffs, puredarwin, pxe, qemu, qnx, qos, qt, quality-management, quine, quote, quotes, r-project, ra5370, radio, radiotap, raid, raidframe, rants, raptor, raq, raspberrypi, rc.d, readahead, realtime, record, refuse, reiserfs, Release, Releases, releases, releng, reports, resize, restore, ricoh, rijndael, rip, riscos, rng, roadmap, robopkg, robot, robots, roff, rootserver, rotfl, rox, rs323, rs6k, rss, ruby, rump, rzip, sa, safenet, san, sata, savin, sbsd, scampi, scheduler, scheduling, schmonz, sco, screen, script, sdf, sdtemp, secmodel, Security, security, sed, segvguard, seil, sendmail, serial, serveraptor, sfu, sge, sgi, sgimips, sh, sha2, shark, sharp, shisa, shutdown, sidekick, size, slackware, slashdot, slides, slit, smbus, smp, sockstat, soekris, softdep, softlayer, software, solaris, sony, sound, source, source-changes, spanish, sparc, sparc64, spider, spreadshirt, spz, squid, ssh, sshfs, ssp, statistics, stereostream, stickers, storage, stty, studybsd, subfile, sudbury, sudo, summit, sun, sun2, sun3, sunfire, sunpci, support, sus, suse, sushi, susv3, svn, swcrypto, symlinks, sysbench, sysctl, sysinst, sysjail, syslog, syspkg, systat, systrace, sysupdate, t-shirt, tabs, talks, tanenbaum, tape, tcp, tcp/ip, tcpdrop, tcpmux, tcsh, teamasa, tegra, teredo, termcap, terminfo, testdrive, testing, tetris, tex, TeXlive, thecus, theopengroup, thin-client, thinkgeek, thorpej, threads, time, time_t, timecounters, tip, tk1, tme, tmp, tmpfs, tnf, toaster, todo, toolchain, top, torvalds, toshiba, touchpanel, training, translation, tso, tty, ttyrec, tulip, tun, tuning, uboot, ucom, udf, ufs, ukfs, ums, unetbootin, unicos, unix, updating, upnp, uptime, usb, usenix, useradd, userconf, userfriendly, usermode, usl, utc, utf8, uucp, uvc, uvm, valgrind, vax, vcfe, vcr, veriexec, vesa, video, videos, virtex, virtualization, vm, vmware, vnd, vobb, voip, voltalinux, vpn, vpnc, vulab, w-zero3, wallpaper, wapbl, wargames, wasabi, webcam, webfwlog, wedges, wgt624v3, wiki, willcom, wimax, window, windows, winmodem, wireless, wizd, wlan, wordle, wpa, wscons, wstablet, X, x.org, x11, x2apic, xbox, xcast, Xen, xen, xfree, xfs, xgalaxy, xilinx, xkcd, xlockmore, xmms, xmp, xorg, xscale, youos, youtube, zaurus, zdump, zfs, zlib

'nuff. Grab the RSS-feed, index, or go back to my regular NetBSD page

Disclaimer: All opinion expressed here is purely my own. No responsibility is taken for anything.

Access count: 36036008
Copyright (c) Hubert Feyrer