6

I have a Raspberry Pi that is acting as a sensor which I am sending out to various customers. The sensor is recording approximately 1 GB every 2-3 days, so I would like to have a way to remove old data.

The only way I have found involves using crontab to delete the files, which requires internet to know what time it is. The commands I have found are the following: crontab -e then input: 0 15 * * * find PATH -mindepth 1 -mtime +30 -delete which would delete files 30 days old at 3pm every day. The problem is, these sensors won't have access to the internet, so the time will reset when they are restarted. Is there an alternative to this that wouldn't require internet?

To give a little more information: the sensors will typically be in place for about 10 days before being turned off and back on a few days later. The reason I want the files to delete after some time is in case there is a problem. If they deleted 30 days after creation, this would give the customer time to ship it back to me so I could take a look at it

I already have the folders sequentially numbered, so that route may be easiest for me. The folders are labelled 1, 2, 3 etc., with each folder being data from one startup/shutdown procedure and every file inside the folders is a .csv file. Is there a way that I could write a cron command so that when it is close to running out of space, it would delete say folders 1-X to clear 10 GB of space or something like that? If it's easier I could also tell it to delete folders 1-5 when its running out of space or something like that.

I also have looked into the savelog option a little bit, and this might be what I am looking for although I am not sure how to use it. I would want to use it with folders of data since that would be easier. If it could only keep the last 5 folders at a time that would work since I expect each folder to be ~2-3 GB.

Greenonline
  • 2,448
  • 4
  • 18
  • 33
  • 5
    The time isn't _reset_, instead by default Raspberry OS [includes fake-hwclock](https://raspberrypi.stackexchange.com/a/1606/20014), which on boot sets the time to the last saved time value before shutting down. This at least lets the clock retain monotonicity. – Ruslan Sep 11 '20 at 16:32
  • 1
    Maybe a better cron task would be to delete the *oldest* files to get the log folder down to say <= 1gb? Kind of like log rotation. – trognanders Sep 11 '20 at 21:53
  • What's the naming convention for your files? – A C Sep 11 '20 at 22:59

5 Answers5

14

You could add a battery powered real time clock chip to each pi but there is an obvious cost to this.

One other way is to use a sequential number for each file and delete the older files based on the number sequence. Obviously, you would need to keep track of the numbers but it should be possible to work out the values if you know how many are generated per day.

  • https://thepihut.com/products/mini-rtc-module-for-raspberry-pi – Ari Fordsham Sep 11 '20 at 14:45
  • this! a clock doesn't really add up much in battery and cost, in fact every modern computer has a clock timer, and the battery lasts for years – clockw0rk Sep 13 '20 at 01:02
  • @clockw0rk As a one off no but it you are using a Pi Zero W and making 10 or 15 (as I have) then another £50 on the project costs add up esp if its only needed for one task. Just because 'every' computer you use / buy does (and I can see 4 that do not from my chair) may not fit all scenarios. Cost can be a driver and it gives another thing that could go wrong and needs management - plus all we know is that data is being collected so its possible the pins for the linked device are being used for the sensors. Experience tells you not to add things when there are ways around to save £££ :-) –  Sep 13 '20 at 01:43
  • @Andyroo You are actually right, all of the GPIO pins are used in the sensor. Also I am planning on making 50+ of them which is driving me away from purchasing equipment for this :) – Cameron Greenwalt Sep 14 '20 at 14:40
14

Since you don't have Internet NTP, the file timestamps are meaningless, being reset to the Pi's "/etc/fake-hwclock.data" time whenever the system is booted.

If the daily data file(s) have a standard name ( e.g. /PATH/sensor1.data or common tag ( e.g. the ".data" ) you can use savelog ( or logrotate ) to save only the most recent ( e.g. "30" ) copies:

savelog -l -n -c30 /PATH/sensor1.log

or

savelog -l -n -c30 /PATH/*.data
  • '-l' Don't compress
  • '-n' Do not rotate empty files
  • '-c' Save cycle versions

Just add to your user crontab: ( for example )

0 15 * * * savelog -l -n -c30 /PATH/sensor1.log
0 15 * * * savelog -l -n -c30 /PATH/*.data

You'll wind up with the current file and then consecutively numbered files, automatically deleting the 30 day oldest

p.s. The commands 'man savelog' or 'savelog --help' are your friends.

dave58
  • 345
  • 1
  • 3
7

On a system where the time resets on boot, you can't know what time it is when the system starts, and similarly can't know how long it was when the system was down. But you can count reboots, and you probably have a clock that works while the system is up.

Have a global counter ("run id"), stored in a file or database, or whatever, that you increment by one every time your sampling software starts. Then keep a running counter while the software is running, and increment that by one for each sample. With the samples tagged with the [run id, sample id] pairs, you'll be able to determine their order in real time, tell where the gaps from a shutdown have been, and remove the oldest files. Alternatively, use the time elapsed since the software started instead of sample id's.

You can also count the samples to determine how long the system was up on each run, which gives a lower bound of the elapsed wall clock time. But I don't think that's necessary, you're probably fine with just removing the oldest samples (or whole runs) when the storage space starts running out.

ilkkachu
  • 213
  • 1
  • 7
6

Raspbian includes fake-hwclock, which saves the clock to the SD card on shutdown and restores it on boot. However, if you're just cutting the power to restart it, this is fairly useless; it will never shut down, so it'll keep restoring the last-saved time which is the same as always.

Deanna Earley's solution is to add a line to /etc/crontab:

* * * * * root fake-hwclock save

This will save the clock every minute, allowing your other cron job to work properly even if the Pi's power is cut. (Provided that fake-hwclock hasn't been uninstalled, anyway.)

Note that the clock won't tick when the power's off, so the clock will gradually shift sideways; you won't be able to guarantee that the files get wiped at 3pm.

wizzwizz4
  • 186
  • 5
4

You could

  • name your files sequentially and delete the oldest when there’s not much free space left
  • use a larger SD card
  • get the time by other means, for example GPS, DCF77 or RDS
Martin
  • 143
  • 4
  • 1
    upvoting, cause deleting the oldest files first should be the way to go. who cares whether the files are 3 days or 3 months old, this way the file system has a fixed size at all times – clockw0rk Sep 13 '20 at 01:04
  • How would you to about deleting the oldest files in this case? My files are all sequentially numbered (at least the folders) so this may be the easiest solution. I'm just not sure if there's a cron task that would read the amount of space left or if there's another way to go about that. – Cameron Greenwalt Sep 14 '20 at 14:41
  • @CameronGreenwalt I’d suggest you create a new question, maybe at https://unix.stackexchange.com/ – Martin Sep 16 '20 at 21:44
  • @Martin I will do that, thanks for all the help! – Cameron Greenwalt Sep 17 '20 at 14:04