Occasionally when syncing my watch, a Umidigi Urun, the GPX file that Strava receives is malformed. The gpx file is really just an XML file, and on there will be duplicated tracking points.

Looking at a sample, the first 10 lines show the time is increasing by 5 seconds, but jump backwards and then repeat the previous times. This is sub-optimal.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:15Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:20Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:25Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:30Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:35Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:40Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:45Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:50Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:55Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:16:00Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:30Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:35Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:40Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:45Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:50Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:15:55Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:16:00Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:16:05Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:16:10Z</time>...
<trkpt lat="-38.82" lon="175.93"><time>2021-12-25T05:16:15Z</time>...

This can be fixed by copying, pasting and cutting text as needed but is annoying to do so.

Thankfully, we can use a shell script to manipulate the gpx file back into logical order.

#!/bin/bash

#   - delete every new line
#   - delete all white space before '<'
#   - trim padding to singular space (not really needed)
#   - at every '<trkpt lat' insert new line 
#   - move closing tags of '</trkseg></trk></gpx>' to new line
#   - all 'trkpt' now on their own line, duplicates can be removed
#   - remove duplicates

mangledgpx='downloaded-from-strava.gpx'

head -8 "$mangledgpx"

cat "$mangledgpx" \
 | tr -d '\n' \
 | sed -e 's/ \+</</g' -e 's/  \+//g' -e 's/<trkpt/\n<trkpt/g' \
 | sed -e 's/<\/trkseg>/\n<\/trkseg>/' \
 | grep trkpt \
 | awk '!x[$0]++'

tail -3 "$mangledgpx"

The above code, avaiable from my Github, basically forms the gpx file into something that can be logically read, with the duplicate lines removed, and leaving the remaining ‘trkpt’ data points in the correct order.

To run it, use:

./fix-gpx.sh > new.gpx

Then delete the saved activity from Strava, and upload the new and fixed gpx file manually.

And yes, the number of process calls this script uses could be reduced significantly, but as is, it works fine, is readable, and hasn’t taken a week of coding to get down to a singular awk script.