Purpose

This vignette documents the workflow to create the training dataset which is extracted from the target dataset at selected labelled locations. Alternatively, if there is no need for a target dataset, for example, when prototype which spatial resolution to use for predictions, the workflow described for making the target dataset is wholly valid for creating a training dataset only.

Libraries

library(RiverML)
library(magrittr)

Loading target data

Here we use the example of the South Fork Eel (SFE) river catchment (California, USA) and load the target streamlines for this region. Notice that using as.df = FALSE, get_target_points() now returns a SpatialPointsDataFrame which is projected on latitude/longitude.

region <- "SFE"
target_streamlines <- target_streamlines_SFE
target_points <- get_target_points(target_streamlines, as.df = FALSE)
#> Warning in proj4string(sldf): CRS object has comment, which is lost in output
target_points
#> class       : SpatialPointsDataFrame 
#> features    : 8022 
#> extent      : -124.0731, -123.4309, 39.60465, 40.37879  (xmin, xmax, ymin, ymax)
#> crs         : +proj=longlat +datum=WGS84 +no_defs 
#> variables   : 59
#> names       :   COMID_FID,         LENGTH, Rowid_,  COMID_FI_1,   COMID,      AREA,         SLOPE, CONFINEMEN,         RUSLE, CONFINEM_1,     RUSLEBINS, REGION, CSBIN, KMEANS_ID,     RTLBIN, ... 
#> min values  : 8283534_280, 0.886343917158,      0, 8283534_280, 8283534,    0.0036, 2.1034561e-08,          0,   0.148176916,   Confined, High Sediment,  North,    CH,         1, North_CH_1, ... 
#> max values  :  8290162_92,  200.000000006,  15660,   8290160_7, 8290162, 1785.0888, 1.57441710441,        994, 693.464281156, Unconfined,  Low Sediment,  North,    UL,         5, North_UL_5, ...

SFE_all_data_df is included in the package and contains the target data for the SFE region.

target_data_df <- SFE_all_data_df
dim(target_data_df)
#> [1] 8022  287
SFE target data
CatAreaSqKm WsAreaSqKm CHYD CCHEM CSED CCONN CTEMP CHABT ICI WHYD WCHEM WSED WCONN WTEMP WHABT IWI PctCarbResidCat PctNonCarbResidCat PctAlkIntruVolCat PctSilicicCat PctExtruVolCat PctColluvSedCat PctGlacTilClayCat PctGlacTilLoamCat PctGlacTilCrsCat PctGlacLakeCrsCat PctGlacLakeFineCat PctHydricCat PctEolCrsCat PctEolFineCat PctSalLakeCat PctAlluvCoastCat PctCoastCrsCat PctWaterCat PctCarbResidWs PctNonCarbResidWs PctAlkIntruVolWs PctSilicicWs PctExtruVolWs PctColluvSedWs PctGlacTilClayWs PctGlacTilLoamWs PctGlacTilCrsWs PctGlacLakeCrsWs PctGlacLakeFineWs PctHydricWs PctEolCrsWs PctEolFineWs PctSalLakeWs PctAlluvCoastWs PctCoastCrsWs PctWaterWs MineDensCat MineDensWs MineDensCatRp100 MineDensWsRp100 PctOw2011Cat PctIce2011Cat PctUrbOp2011Cat PctUrbLo2011Cat PctUrbMd2011Cat PctUrbHi2011Cat PctBl2011Cat PctDecid2011Cat PctConif2011Cat PctMxFst2011Cat PctShrb2011Cat PctGrs2011Cat PctHay2011Cat PctCrop2011Cat PctWdWet2011Cat PctHbWet2011Cat PctOw2011Ws PctIce2011Ws PctUrbOp2011Ws PctUrbLo2011Ws PctUrbMd2011Ws PctUrbHi2011Ws PctBl2011Ws PctDecid2011Ws PctConif2011Ws PctMxFst2011Ws PctShrb2011Ws PctGrs2011Ws PctHay2011Ws PctCrop2011Ws PctWdWet2011Ws PctHbWet2011Ws PctOw2011CatRp100 PctIce2011CatRp100 PctUrbOp2011CatRp100 PctUrbLo2011CatRp100 PctUrbMd2011CatRp100 PctUrbHi2011CatRp100 PctBl2011CatRp100 PctDecid2011CatRp100 PctConif2011CatRp100 PctMxFst2011CatRp100 PctShrb2011CatRp100 PctGrs2011CatRp100 PctHay2011CatRp100 PctCrop2011CatRp100 PctWdWet2011CatRp100 PctHbWet2011CatRp100 PctOw2011WsRp100 PctIce2011WsRp100 PctUrbOp2011WsRp100 PctUrbLo2011WsRp100 PctUrbMd2011WsRp100 PctUrbHi2011WsRp100 PctBl2011WsRp100 PctDecid2011WsRp100 PctConif2011WsRp100 PctMxFst2011WsRp100 PctShrb2011WsRp100 PctGrs2011WsRp100 PctHay2011WsRp100 PctCrop2011WsRp100 PctWdWet2011WsRp100 PctHbWet2011WsRp100 Precip8110Cat Tmax8110Cat Tmean8110Cat Tmin8110Cat Precip8110Ws Tmax8110Ws Tmean8110Ws Tmin8110Ws RunoffCat RunoffWs ClayCat SandCat ClayWs SandWs OmCat PermCat RckDepCat WtDepCat OmWs PermWs RckDepWs WtDepWs SLOPE CONFINEMEN RUSLE SO LDD aspect_max.rstr aspect_mean.rstr aspect_median.rstr aspect_min.rstr aspect_sd.rstr aspect_skew.rstr curvplan_max.rstr curvplan_mean.rstr curvplan_median.rstr curvplan_min.rstr curvplan_sd.rstr curvplan_skew.rstr curvprof_max.rstr curvprof_mean.rstr curvprof_median.rstr curvprof_min.rstr curvprof_sd.rstr curvprof_skew.rstr flowdir_max.rstr flowdir_mean.rstr flowdir_median.rstr flowdir_min.rstr flowdir_sd.rstr flowdir_skew.rstr layer_max.rstr layer_mean.rstr layer_median.rstr layer_min.rstr layer_sd.rstr layer_skew.rstr roughness_max.rstr roughness_mean.rstr roughness_median.rstr roughness_min.rstr roughness_sd.rstr roughness_skew.rstr slope_max.rstr slope_mean.rstr slope_median.rstr slope_min.rstr slope_sd.rstr slope_skew.rstr tpi_max.rstr tpi_mean.rstr tpi_median.rstr tpi_min.rstr tpi_sd.rstr tpi_skew.rstr tri_max.rstr tri_mean.rstr tri_median.rstr tri_min.rstr tri_sd.rstr tri_skew.rstr aspect_max.nrch aspect_mean.nrch aspect_median.nrch aspect_min.nrch aspect_sd.nrch aspect_skew.nrch curvplan_max.nrch curvplan_mean.nrch curvplan_median.nrch curvplan_min.nrch curvplan_sd.nrch curvplan_skew.nrch curvprof_max.nrch curvprof_mean.nrch curvprof_median.nrch curvprof_min.nrch curvprof_sd.nrch curvprof_skew.nrch flowdir_max.nrch flowdir_mean.nrch flowdir_median.nrch flowdir_min.nrch flowdir_sd.nrch flowdir_skew.nrch layer_max.nrch layer_mean.nrch layer_median.nrch layer_min.nrch layer_sd.nrch layer_skew.nrch roughness_max.nrch roughness_mean.nrch roughness_median.nrch roughness_min.nrch roughness_sd.nrch roughness_skew.nrch slope_max.nrch slope_mean.nrch slope_median.nrch slope_min.nrch slope_sd.nrch slope_skew.nrch tpi_max.nrch tpi_mean.nrch tpi_median.nrch tpi_min.nrch tpi_sd.nrch tpi_skew.nrch tri_max.nrch tri_mean.nrch tri_median.nrch tri_min.nrch tri_sd.nrch tri_skew.nrch H.640 H.960 H.1280 H.1600 H.1920 H.2240 H.2560 H.2880 H.3200 H.3840 H.4480 H.5120 H.5760 H.6400 H.7680 H.8960 H.10240 H.11520 H.12800 H.15360 H.17920 H.20480 H.23040 H.25600 H.30720 H.35840 H.40960 H.46080 H.51200 H.61440 H.71680 H.81920
3.586 5.681 0.950 0.954 0.955 0.957 0.942 0.945 0.738 0.963 0.960 0.962 0.969 0.958 0.952 0.786 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 0 0.000 0 0.000 0.000 0 0 0.000 0 87.025 0.000 8.302 0.000 0 0 3.657 1.016 0.000 0.000 0.000 0.000 0 0 0.000 0 87.025 0.000 8.302 0.000 0 0 3.657 1.016 0.000 0 0 0 0 0 0.000 0 63.636 0 36.364 0.000 0 0 0 0.000 0.000 0 0 0 0 0 29.630 0 33.333 0 28.889 8.148 0 0 0 0.000 1210.073 13.646 7.371 1.092 1210.073 13.646 7.371 1.092 293 293.000 13.877 56.248 13.877 56.248 0.714 10.486 125.433 182.880 0.714 10.486 125.433 182.880 0.060 14 5.806 1 0.553 6.282 2.531 3.426 0.001 2.096 0.124 0.040 0.000 0.000 -0.047 0.006 -0.812 0.036 -0.001 -0.001 -0.046 0.008 -0.764 128 9.464 16 1 10.094 3.627 794.441 710.960 705.982 648.451 31.922 0.448 18.264 9.234 9.455 1.076 3.321 -0.133 0.764 0.384 0.389 0.026 0.133 -0.249 2.626 -0.068 -0.024 -1.847 0.447 -0.085 6.010 2.869 2.896 0.255 1.069 -0.032 6.270 1.566 0.346 0.013 2.202 1.165 0.003 -0.003 0.000 -0.036 0.009 -2.255 0.000 -0.014 -0.012 -0.037 0.011 -0.731 32 6.067 1 1 9.288 1.577 682.511 673.280 671.432 668.006 3.962 0.739 11.544 5.276 4.775 2.553 2.016 1.057 0.441 0.199 0.185 0.060 0.088 0.707 0.033 -0.685 -0.562 -1.846 0.568 -0.558 3.318 1.663 1.636 0.899 0.573 0.948 0.910 0.962 0.918 0.894 0.839 0.852 0.839 0.816 0.806 0.743 0.695 0.707 0.768 0.712 0.680 0.637 0.597 0.578 0.611 0.527 0.524 0.481 0.487 0.395 0.443 0.362 0.454 0.398 0.449 0.404 0.368 0.366
2.855 4.098 1.000 0.986 0.994 1.000 1.000 0.994 0.974 0.995 0.983 0.991 0.996 0.995 0.991 0.953 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 0 0.820 0 0.000 0.000 0 0 41.803 0 9.016 0.000 28.689 19.672 0 0 0.000 0.000 0.816 0.000 0.000 0.000 0 0 75.184 0 2.204 0.000 12.000 9.796 0 0 0.000 0.000 0.000 0 0 0 0 0 0.000 0 90.390 0 8.312 1.299 0 0 0 0.000 1.791 0 0 0 0 0 0.000 0 87.910 0 8.806 1.493 0 0 0 0.000 710.003 7.348 1.348 -4.660 742.730 6.808 0.896 -5.018 293 293.000 5.710 69.130 5.710 69.130 1.450 23.220 152.400 163.830 1.450 23.220 152.400 163.830 0.041 0 52.215 1 0.802 6.277 2.765 2.109 0.000 1.855 0.171 0.042 0.001 0.002 -0.063 0.009 -1.089 0.057 -0.002 0.000 -0.068 0.013 -1.126 128 10.273 16 1 14.654 4.839 248.074 177.737 175.381 120.033 30.005 0.263 32.795 11.026 9.515 0.692 6.060 1.200 1.333 0.483 0.419 0.007 0.269 1.002 2.672 -0.010 0.067 -3.748 0.716 -0.754 8.824 3.261 2.848 0.225 1.706 1.012 6.116 3.262 4.527 0.212 2.066 -0.435 0.010 -0.010 -0.004 -0.054 0.016 -1.461 -0.002 -0.027 -0.020 -0.068 0.017 -0.589 64 25.259 16 1 20.538 0.996 139.348 134.755 134.129 130.633 3.001 0.232 8.451 4.609 4.017 2.040 1.824 0.538 0.419 0.166 0.149 0.024 0.113 0.765 -0.285 -0.932 -0.752 -1.972 0.464 -0.583 2.513 1.371 1.287 0.460 0.578 0.332 0.858 0.792 0.706 0.793 0.695 0.644 0.772 0.753 0.674 0.653 0.656 0.640 0.709 0.589 0.650 0.692 0.504 0.613 0.560 0.601 0.537 0.459 0.416 0.508 0.390 0.485 0.394 0.385 0.449 0.368 0.368 0.474
3.994 4.843 0.994 0.981 0.990 0.996 0.995 0.993 0.949 0.993 0.980 0.986 0.993 0.991 0.987 0.932 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 0 6.091 0 0.000 0.000 0 0 75.127 0 0.000 0.000 13.706 5.076 0 0 0.000 0.000 0.711 1.665 0.000 0.000 0 0 83.764 0 0.000 0.000 3.629 10.232 0 0 0.000 0.000 0.000 0 0 0 0 0 2.899 0 0.000 0 66.667 30.435 0 0 0 0.000 0.000 0 0 0 0 0 35.836 0 0.000 0 31.367 32.797 0 0 0 0.000 834.400 6.771 1.236 -4.305 927.973 5.398 -0.443 -6.283 293 304.963 5.710 69.130 5.710 69.130 1.450 23.220 152.400 163.830 1.450 23.220 152.400 163.830 0.291 0 308.980 1 0.663 5.895 3.613 3.727 1.478 1.234 0.059 0.044 0.000 0.002 -0.081 0.014 -1.764 0.023 -0.001 0.000 -0.073 0.009 -2.209 16 9.325 16 1 7.366 -0.212 455.246 349.543 352.887 249.622 44.744 -0.162 27.273 13.526 13.358 2.836 4.193 0.315 1.146 0.595 0.598 0.064 0.194 -0.020 3.093 -0.039 0.086 -4.022 0.794 -1.289 8.454 4.047 3.982 0.974 1.242 0.283 4.652 3.545 3.625 2.564 0.550 0.213 -0.007 -0.036 -0.034 -0.065 0.020 -0.040 0.007 -0.008 -0.005 -0.048 0.011 -1.616 16 8.464 8 1 5.809 0.225 327.129 296.531 295.940 268.134 18.875 0.044 15.034 9.594 8.873 4.949 2.626 0.426 0.610 0.360 0.326 0.185 0.113 0.681 -0.113 -1.309 -1.120 -3.021 0.670 -0.840 4.886 3.090 3.088 1.826 0.860 0.470 0.922 0.877 0.843 0.814 0.866 0.835 0.815 0.793 0.786 0.787 0.764 0.700 0.706 0.724 0.633 0.677 0.631 0.617 0.620 0.584 0.553 0.616 0.497 0.545 0.390 0.485 0.472 0.385 0.366 0.368 0.368 0.366
3.856 746.815 0.997 0.978 0.978 0.993 0.990 0.965 0.904 0.984 0.971 0.976 0.975 0.970 0.969 0.855 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 0 0.000 0 0.000 0.000 0 0 0.000 0 0.000 0.000 100.000 0.000 0 0 0.000 0.000 0.000 0.000 0.000 0.000 0 0 0.000 0 0.000 0.000 100.000 0.000 0 0 0.000 0.000 0.000 0 0 0 0 0 0.000 0 31.642 0 62.164 4.925 0 0 0 1.269 0.000 0 0 0 0 0 0.000 0 31.642 0 62.164 4.925 0 0 0 1.269 280.990 20.029 13.628 7.228 280.990 20.029 13.628 7.228 106 106.000 4.960 76.570 4.960 76.570 0.240 43.470 85.440 182.880 0.240 43.470 85.440 182.880 0.001 42 289.641 5 0.873 5.905 2.734 1.629 0.747 1.553 0.192 0.013 0.000 0.000 -0.023 0.003 -2.323 0.044 -0.002 -0.002 -0.044 0.008 0.211 64 7.998 1 1 8.477 1.446 277.513 200.941 195.291 163.743 31.948 0.468 26.282 9.189 10.201 0.311 5.143 -0.249 1.234 0.461 0.507 0.000 0.271 -0.026 1.870 -0.048 -0.055 -1.838 0.352 -0.043 7.967 2.904 3.225 0.129 1.681 -0.091 1.779 1.389 1.355 1.149 0.177 0.474 0.002 0.000 0.000 -0.003 0.001 -0.169 0.002 -0.005 -0.007 -0.008 0.004 0.938 2 1.042 1 1 0.204 4.304 165.333 164.504 164.495 163.915 0.401 0.265 3.588 1.680 1.530 0.542 0.929 0.427 0.155 0.081 0.074 0.030 0.040 0.311 0.147 -0.124 -0.148 -0.308 0.114 0.562 1.012 0.521 0.476 0.195 0.256 0.316 0.919 0.925 0.835 0.768 0.880 0.795 0.790 0.776 0.755 0.734 0.688 0.711 0.640 0.535 0.435 0.493 0.498 0.551 0.465 0.470 0.367 0.334 0.417 0.395 0.466 0.362 0.394 0.513 0.449 0.404 0.368 0.474
4.895 4.895 1.000 0.977 0.988 0.991 0.988 0.984 0.931 1.000 0.977 0.986 0.990 0.988 0.981 0.926 0 0 0 98.517 0 0 0 0 0 0 0 0 0 0 0 0 0 1.483 0 0 0 98.517 0 0 0 0 0 0 0 0 0 0 0 0 0 1.483 0 0 0 0 0.000 0 3.377 1.188 0 0 0.000 0 6.361 3.404 71.235 14.168 0 0 0.000 0.268 0.000 0.000 3.377 1.188 0 0 0.000 0 6.361 3.404 71.235 14.168 0 0 0.000 0.268 0.000 0 0 0 0 0 1.711 0 22.368 0 46.184 25.789 0 0 0 3.947 0.000 0 0 0 0 0 1.711 0 22.368 0 46.184 25.789 0 0 0 3.947 417.362 20.607 13.517 6.421 417.362 20.607 13.517 6.421 293 293.000 14.204 56.572 14.204 56.572 0.626 7.282 86.530 181.455 0.626 7.282 86.530 181.455 0.102 9 6.452 1 0.623 6.282 3.501 3.964 0.001 2.163 -0.447 0.033 0.000 0.000 -0.060 0.008 -1.670 0.036 -0.001 0.000 -0.051 0.010 -0.978 128 12.922 16 1 11.387 4.437 428.525 378.313 378.780 330.742 21.190 0.048 17.571 6.649 6.103 1.645 2.655 1.065 0.707 0.279 0.259 0.008 0.112 0.905 2.459 -0.051 0.005 -3.086 0.555 -1.015 5.462 2.096 1.922 0.414 0.829 0.977 6.212 2.358 0.811 0.121 2.246 0.401 0.005 -0.011 -0.004 -0.060 0.016 -1.368 0.010 -0.018 -0.021 -0.033 0.012 0.955 64 15.938 16 1 15.020 0.972 358.031 348.184 350.932 340.357 5.211 -0.210 10.845 5.691 5.260 1.702 2.350 0.485 0.469 0.191 0.193 0.016 0.119 0.529 -0.069 -0.974 -0.907 -2.902 0.683 -1.210 4.109 1.669 1.581 0.414 0.892 0.842 0.847 0.917 0.865 0.828 0.839 0.799 0.754 0.752 0.767 0.762 0.713 0.713 0.636 0.657 0.591 0.573 0.580 0.649 0.600 0.546 0.538 0.500 0.448 0.468 0.581 0.438 0.397 0.385 0.366 0.464 0.812 0.474
0.860 18.095 0.988 0.972 0.978 0.986 0.981 0.974 0.884 0.998 0.980 0.988 0.986 0.983 0.982 0.919 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 100.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 0 0 0 0 0.000 0 0.000 0.000 0 0 24.715 0 28.517 0.000 46.008 0.760 0 0 0.000 0.000 0.659 0.088 0.000 0.000 0 0 77.920 0 2.512 0.000 11.739 7.082 0 0 0.000 0.000 14.062 0 0 0 0 0 0.000 0 10.938 0 53.125 21.875 0 0 0 0.000 14.062 0 0 0 0 0 0.000 0 10.938 0 53.125 21.875 0 0 0 0.000 671.741 9.725 3.010 -3.713 943.536 6.541 0.480 -5.583 293 293.000 5.710 69.130 5.710 69.130 1.450 23.220 152.400 163.830 1.450 23.220 152.400 163.830 0.015 148 4.515 2 0.864 6.278 3.713 4.567 0.004 1.778 -0.309 0.024 0.001 0.000 -0.024 0.005 0.032 0.032 -0.002 0.000 -0.046 0.008 -1.426 128 11.021 16 1 13.912 4.940 254.109 186.173 180.822 169.393 17.459 1.544 18.335 4.515 3.628 0.155 3.231 1.072 0.790 0.200 0.158 0.001 0.148 1.107 2.067 -0.027 -0.012 -2.588 0.387 -0.757 5.383 1.364 1.088 0.024 0.983 1.067 5.489 4.736 4.846 4.023 0.427 -0.193 0.014 0.002 0.003 -0.015 0.006 -0.635 0.006 -0.016 -0.011 -0.046 0.018 -0.486 16 15.520 16 4 2.400 -4.416 176.251 171.978 171.122 170.226 1.647 1.140 13.452 5.784 6.345 0.770 4.373 0.339 0.697 0.254 0.216 0.030 0.204 0.507 0.192 -0.530 -0.161 -2.588 0.836 -1.269 4.591 1.629 1.462 0.223 1.306 0.569 0.727 0.889 0.848 0.751 0.749 0.717 0.705 0.666 0.799 0.749 0.628 0.606 0.601 0.608 0.537 0.461 0.421 0.394 0.577 0.458 0.388 0.459 0.535 0.377 0.390 0.438 0.394 0.385 0.449 0.368 0.812 0.474

Loading labelled locations

We now use get_input_data() to load and sort an input .csv file and convert the information herein as a SpatialPoints object with get_points_from_input_data().

input_dir <- system.file("extdata/input_data", package = "RiverML")
fname <- paste0(region,"_input.csv")
input_data <- get_input_data(file.path(input_dir, fname))
head(input_data)
#>   Name ward.grp      long      lat year
#> 1    3        5 -123.6740 39.68902 2017
#> 2   10        4 -123.5543 39.80925 2017
#> 3   17        6 -123.5150 39.73622 2017
#> 4   18        4 -124.0260 40.33604 2017
#> 5   24        4 -123.6221 39.64763 2017
#> 6   25        4 -123.8076 40.13879 2017
labelled_points <- get_points_from_input_data(input_data)
labelled_points
#> class       : SpatialPoints 
#> features    : 96 
#> extent      : -124.0401, -123.4844, 39.64582, 40.36147  (xmin, xmax, ymin, ymax)
#> crs         : +proj=longlat +datum=WGS84 +no_defs

Snapping labelled locations to target locations

Using snap_points_to_points() we extract the indices of target_points corresponding to the minimum distances between the labelled_points and the target_points.

snap <- snap_points_to_points(labelled_points, target_points)
length(snap)
#> [1] 96
head(snap)
#> [1] 6292 2887 3779 1792 7514 3769

From the snap indices, we can easily retrieve the training data, the corresponding groups and save.

training_data_df <- target_data_df[snap, ]
groups <- input_data$ward.grp
# write.csv(training_data_df, # saving 
#   file = file.path(out_dir, paste0(region,'_data_df.csv')), 
#   row.names = FALSE)
# write.csv(groups, # saving
#   file = file.path(out_dir, paste0(region,'_groups.csv')), 
#   row.names = FALSE)