The growth in available geospatial data, along with the rise of machine learning methods, have let themselves to numerous spatial-temporal forecasting applications to solve real-world problems such as deforestation, pollution, and food security. Choosing the right performance evaluation matters for generating accurate and trustworthy out-of-sample predictions. However, with spatial-temporal dependencies between observations in both the training and testing data, the independence assumption of the testing set is violated. As a result, model performance evaluated using cross-validation (CV), and out-of-sample (OOS) can be over-optimistic. In this study, we show the changes in CV and OOS performance when we adjust for different types of spatiotemporal correlations in both simulated data and real-world panel data. We also show how the model selection is affected by the performance evaluation process to prefer overfitting models. Lastly, we propose and compare solutions such as blocking and clustering to improve performance evaluation procedures in both simulated and real-world data with spatiotemporal structures.